# Prof. Y. Wiaux Heriot-Watt B31XO Course project DEEP LEARNING IN COMPUTATIONAL IMAGING 1 Introduction 1.1 Project in a nutshell Computational imaging is an approach to the process of imaging a scene of interest, where the acquired data are not the sought images themselves, but rather contain only incomplete information about these images (e.g. due to time,

Prof. Y. Wiaux Heriot-Watt B31XO Course project

DEEP LEARNING IN COMPUTATIONAL IMAGING

1 Introduction

1.1 Project in a nutshell

Computational imaging is an approach to the process of imaging a scene of interest, where the acquired

data are not the sought images themselves, but rather contain only incomplete information about these

images (e.g. due to time, cost, or physical constraints inherent to the sensing procedure). The data is

often not acquired in the image domain itself, but rather in a transform domain (e.g. the Fourier domain).

In this context, an ill-posed inverse problem appears for image formation, and advanced algorithms are

needed to reconstruct an image from data. Important imaging modalities in science and technology,

ranging from astronomy to medicine, fall into this category.

In the first part of the course, we have introduced the framework of convex optimisation, which pro-

vides a whole wealth of algorithms enabling to solve imaging inverse problems. We have in particular

studied proximal splitting methods, such as the Forward-Backward algorithm, and the Alternating Di-

rection Method of Multipliers (ADMM). The second part of the course will take the form of a group

project which will enable us to explore how machine learning algorithms, more specifically deep neural

networks, can provide an alternative framework to solve imaging inverse problems, or otherwise inte-

grate and enhance optimisation algorithms. For the sake of the illustration, the project will focus on

a specific computational imaging modality known as Magnetic Resonance (MR) Imaging, which is a

state-of-the-art imaging technique in medicine.

1.2 MR imaging

MR imaging is a non-invasive non-ionising medical imaging technique that finds its superiority in the

flexibility of its contrast mechanisms. It comes in various modalities ranging from structural imaging

aiming at mapping detailed tissue structures, or diffusion imaging mapping the structural neuronal con-

nectivity by probing molecular diffusion in each voxel of the brain, to dynamic imaging mapping for

example the heart movements. MR imaging is a good example of how it is possible to play with phys-

ical phenomena to encode data. Here a phenomenon called “magnetic resonance”, which relates to the

alignment of the spins of protons in a magnetic field, enables the sequential measurement of Fourier co-

efficients of the image of interest [1]. MR data acquisition is intrinsically long, sometimes prohibitively,

due to its sequential nature. MR acquisition sequences that are fast and simultaneously enable high-

resolution imaging constitutes a big challenge for medical research. In this context, the acquisition of

incomplete Fourier data (i.e. of only part of the Fourier coefficients) is a commonly used acceleration

strategy. This places the modality in the category of computational imaging techniques, with advanced

algorithms required for image reconstruction from the data.

1.3 Deep Learning?

Recently, Deep Neural Networks (DNNs), with mixed architectures, including simple feed-forward net-

works, variational networks, generative adversarial networks, autoencoders, and convolutional neural

networks (CNNs) such as U-net, were demonstrated solving inverse imaging problems with outstanding

DEEP LEARNING IN COMPUTATIONAL IMAGING (COURSE PROJECT) 1/11

Prof. Y. Wiaux Heriot-Watt B31XO Course project

Figure 1: Illustration of a generic U-net CNN architecture. The network will be fed with an “input image”

and produce an “output image”. The input is progressively processed through the sequential application

of simple activation functions (typically thresholding operations such as ReLU) and convolutions with

small kernels (Conv) over a number of layers. On top of that “Max pooling”, “Transposed Conv”, “Con-

cat”, and “BN” respectively perform down-sampling, up-sampling, concatenation, and normalisation

operations, which all help creating a complex structure of “feature maps” (yellow planes) over the layers.

The parameters of the network (typically the convolution kernels that lead to the feature maps) will be

learned, i.e. their values will be chosen to optimise the output of the network over a training data set of

input/output images. The more complex the architecture, the larger the inference power of the network.

precision and for a variety of applications, in particular in medical imaging [2–5]. For illustration, a

CNN will process data, in the form of an input “corrupted” image, through the sequential application

of simple nonlinear activation functions and small convolutions over a number of layers, to produce an

output “corrected” image (see Figure 1). If the network is not too deep, i.e. the number of layers is not

too high, its application may be extremely fast in comparison with an optimisation algorithm, opening

the door to a significant scalability to the large image and data dimensions characterising modern ap-

plications. The computational cost is transferred to training the DNN offline. The training task itself is

actually performed using optimisation algorithms! It requires large training databases, as well as power-

ful computation resources, in particular dedicated GPU systems. The lack of theoretical understanding

of DNNs however raises the question of the reliability of the estimated image. In particular, the relation

between the data and the underlying image is, in most applications, dependent on a variety of acquisition

parameters. Training can only be performed over a limited range of conditions, making the question of

the power of generalisation of a DNN to imaging conditions outside those considered during training a

constant challenge.

Emerging “Plug-and-Play” (PnP) methods in optimisation propose to replace the proximal operator

associated with the regularisation term in a proximal splitting optimisation method by a more general

denoising operator characterising a more general prior image model [6–8]. They were also shown to pro-

vide outstanding imaging precision, in particular when the image model is learnt and the corresponding

denoiser takes the form of a DNN [9–14]. Interestingly, the DNN being merely a denoiser, it becomes

agnostic to the data acquisition conditions. The PnP approach thus solves the generalisation issue men-

tioned above.

DEEP LEARNING IN COMPUTATIONAL IMAGING (COURSE PROJECT) 2/11

Prof. Y. Wiaux Heriot-Watt B31XO Course project

Figure 2: Left: the groundtruth xo we wish to recover, center: the Fourier mask diag(M†M), and right:

the observed Fourier data correctly positioned in the Fourier plane M†y (in logscale of the modulus).

2 The project

2.1 Problem formulation and objectives

Problem formulation. In MR imaging, one wishes to recover an image xo ∈ RN from a noisy and

sub-sampled version of its Fourier coefficients y ∈ CM , with M 0 satisfies δ ≤ σ−1max(R(ΦΦ†)), and ρ > 0 is a free parameter. The

value of this parameter will usually be fine-tuned to maximize the reconstruction quality over some test

set of groundtruth images, i.e. a data set of Te pairs (y i,xoi )i∈Te . A properly tuned value of ρ should lead

to x?i ‘ xoi ∀i ∈ Te. If the required accuracy is not reached, the value of ρ needs to be changed. Or more

importantly the regularisation term and the objective function might have to be adapted!

Note that, in theory, the optimisation algorithm will converge after an infinite number of iterations.

In a practical implementation the iterative process will typically be stopped when the relative variation

of consecutive iterates is smaller than a fixed bound α > 0, i.e. ‖xt+1 − xt‖2/‖xt+1‖2 ≤ α. Note that

other criteria can also be applied.

2.3 M2

What is a neural network? Given a specific architecture (sequence of nonlinear activation functions

and convolution operations, see Figure 1), a neural network can be represented as a highly nonconvex

function G of a set of parameters θ. The latter are optimised in order to have G solve a specific problem,

which can be formalised as recovering some variable xo from input data y . Ideally, one seeks that

the network output G(y,θ?) be equal to xo. In order to find the optimal value θ? of the parameters,

the network is trained by minimizing a loss (objective) function l over a training data set of Tr pairs

(y i,x

o

i )i∈Tr . A widely used loss function is the mean squared error over the data set between the network

output G(y,θ) and the true variable xo:

θ? = argmin

θ

1

M

M∑

i=1

‖G(y i, θ)− xoi ‖22. (4)

Algorithms such as Stochastic Gradient Descent (SGD) and Adaptive Moment Estimation (ADAM) solve

problem (4) efficiently (even though this problem is nonconvex). Just as ADMM, these algorithmic

structures contain free parameters that will affect the optimal value θ?. Once the network is trained, it

is evaluated on a separate testing data set, i.e. a data set of Te pairs (y i,xoi )i∈Te on which the network

has not been trained. A properly trained network should satisfy G(y i, θ?) ‘ xoi ∀i ∈ Te. If the required

accuracy is not reached, the free parameters need to be re-tuned and the network re-trained. Or more

importantly the architecture G of the network might have to be adapted!

Application to imaging inverse problems. To solve an imaging inverse problem, a neural network

will aim to estimate the sought image xo from the incomplete data y . For our problem, the training

and testing data sets should be built as Tr + Te input/output pairs (y i,xoi )i∈Tr+Te , where y i is related

to the groundtruth image xoi through (1). Note that working on pairs where both input and output are

represented as images greatly simplifies the network architecture and training procedure. One can simply

DEEP LEARNING IN COMPUTATIONAL IMAGING (COURSE PROJECT) 4/11

Prof. Y. Wiaux Heriot-Watt B31XO Course project

Figure 3: Left: the groundtruth xo we wish to recover, middle: the observed Fourier data correctly

positioned in the Fourier plane M†y (in logscale of the modulus), and right: the backprojected image

x˜ = 0 satisfies δ ≤ σ−1max(R(ΦΦ†)) and ε = σ

√

M + 2

√

M , and a denoising neural

network G is substituted to the prox of the regularisation term in the objective function (2), imposing a

learned image model. As the network is to be applied at each iteration of ADMM (rather than only once

as in M2), we choose a simpler architecture, namely “DnCNN” [10], which implies a faster application

of G and will help containing the overall computation time. The explicit DnCNN architecture proposed

is detailed in Appendix B.

Recall that a network is trained by minimizing a loss function over a training data set Tr using

optimisation algorithms containing additional free parameters that need tuning. Once the network is

trained, it is evaluated on a testing data set Te, and if the required accuracy is not reached, the free

parameters need to be re-tuned and the network re-trained, or more importantly the architecture of the

network might have to be adapted!

DEEP LEARNING IN COMPUTATIONAL IMAGING (COURSE PROJECT) 5/11

Prof. Y. Wiaux Heriot-Watt B31XO Course project

The training and testing data sets are built as input/output pairs (xˆi,xoi )i∈Tr+Te , where xˆi are noisy

versions of the groundtruth images xoi :

xˆi = x

o

i + n¯i, (7)

with n¯i ∈ RN is some additive i.i.d. Gaussian noise, whose components follow a normal distribution

with zero mean and variance σ¯2. Note that this variance σ¯2 should be adapted to the variance σ2 of the

noise on the observed data.

Note that the convergence PnP version of ADMM is not proven for standard network architectures

such as DnCNN [7, 8, 11]. It is therefore safe to stop the algorithm after a fixed number of iterations,

rather than “hoping” for the relative variation of consecutive iterates to be smaller than a fixed bound.

3 Implementation and Data

The algorithms should be implemented in MATLAB. All data and files needed for the project are pro-

vided on dropbox and organised in the following directories:

• implementation/ contains the MATLAB files M1.m, M2.m, and M3.m, for your implementation

and validation of the 3 methods; the 3 files contain useful MATLAB functions; M2.m also contains

a partial implementation of the U-net architecture proposed in Appendix A, for you to finalise;

M3.m also contains a partial implementation of the DnCNN architecture proposed in Appendix B,

for you to finalise;

• trainingset/ contains a training data set of approximately 7000 MR images of size 320 × 320

borrowed from [16] to serve as groundtruths for training both the U-net and DnCNN. From these,

you will need to build the backprojected images to serve as an input to the Unet according to

equation (5) in the training phase of M2, and the noisy images to serve as an input to the DnCNN

according to equation (7) in the training phase of M3;

• testingset/ contains a testing data set of 20 MR images of size 320 × 320 borrowed from [16] to

serve as groundtruths for the validation of the 3 methods.

4 DNN online tutorials

For building background knowledge on DNNs (CNNs in particular, including their MATLAB implemen-

tation) please refer to the following tutorials:

• MATLAB Deep Learning Onramp

• Stanford course on CNNs

5 Beyond M1, M2, M3

The design and development of M1, M2, and M3 raises a significant amount of questions with regards to

possible improvements. Can the objective function for M1 be enhanced? Can the network architectures

and/or the training data sets (input/output) for M2 and M3 be enhanced? Would another PnP algorithm

DEEP LEARNING IN COMPUTATIONAL IMAGING (COURSE PROJECT) 6/11

Prof. Y. Wiaux Heriot-Watt B31XO Course project

SNR (dB) SSIM Reconstruction time (s)

Method Average Standard dev. Average Standard dev. Average Standard dev.

M1

M2

M3

M4

Table 1: SNR, SSIM, and computation time results

enable increased performance of M3? Can DNNs and optimisation algorithms be interfaced in a dif-

ferent way? These questions lay the ground for the development of new methods that would possibly

outperform M1, M2, and M3…

Challenge. You should attempt to develop and implement a new method M4 to outperform M1, M2,

and M3.

6 Report

Your results. You are asked to provide the results of validation of your developments over the 20

images from the testing data set. More precisely, you need to provide, for each method (M1, M2, M3,

and your M4):

1. the average and standard deviation of the Signal-to-Noise Ratio (SNR (dB)) and Structural Similar-

ity Index Measure (SSIM) of the reconstructions (use Table 1). Note that the SNR of reconstruction

x? is defined as SNR = 20 log10 (‖xo‖2/‖xo − x?‖2). The more involved definition of SSIM can

be found, e.g. on Wikipedia, and MATLAB has a built-in function for it. Both metrics are standard

estimators of the reconstruction quality;

2. the average computation time and standard deviation (use Table 1);

3. the reconstructed images x?i in comparison with the groundtruth images x

o

i and the backprojected

images x˜i =