Prof. Y. Wiaux Heriot-Watt B31XO Course project DEEP LEARNING IN COMPUTATIONAL IMAGING 1 Introduction 1.1 Project in a nutshell Computational imaging is an approach to the process of imaging a scene of interest, where the acquired data are not the sought images themselves, but rather contain only incomplete information about these images (e.g. due to time,

Prof. Y. Wiaux Heriot-Watt B31XO Course project
DEEP LEARNING IN COMPUTATIONAL IMAGING
1 Introduction
1.1 Project in a nutshell
Computational imaging is an approach to the process of imaging a scene of interest, where the acquired
data are not the sought images themselves, but rather contain only incomplete information about these
images (e.g. due to time, cost, or physical constraints inherent to the sensing procedure). The data is
often not acquired in the image domain itself, but rather in a transform domain (e.g. the Fourier domain).
In this context, an ill-posed inverse problem appears for image formation, and advanced algorithms are
needed to reconstruct an image from data. Important imaging modalities in science and technology,
ranging from astronomy to medicine, fall into this category.
In the first part of the course, we have introduced the framework of convex optimisation, which pro-
vides a whole wealth of algorithms enabling to solve imaging inverse problems. We have in particular
studied proximal splitting methods, such as the Forward-Backward algorithm, and the Alternating Di-
rection Method of Multipliers (ADMM). The second part of the course will take the form of a group
project which will enable us to explore how machine learning algorithms, more specifically deep neural
networks, can provide an alternative framework to solve imaging inverse problems, or otherwise inte-
grate and enhance optimisation algorithms. For the sake of the illustration, the project will focus on
a specific computational imaging modality known as Magnetic Resonance (MR) Imaging, which is a
state-of-the-art imaging technique in medicine.
1.2 MR imaging
MR imaging is a non-invasive non-ionising medical imaging technique that finds its superiority in the
flexibility of its contrast mechanisms. It comes in various modalities ranging from structural imaging
aiming at mapping detailed tissue structures, or diffusion imaging mapping the structural neuronal con-
nectivity by probing molecular diffusion in each voxel of the brain, to dynamic imaging mapping for
example the heart movements. MR imaging is a good example of how it is possible to play with phys-
ical phenomena to encode data. Here a phenomenon called “magnetic resonance”, which relates to the
alignment of the spins of protons in a magnetic field, enables the sequential measurement of Fourier co-
efficients of the image of interest [1]. MR data acquisition is intrinsically long, sometimes prohibitively,
due to its sequential nature. MR acquisition sequences that are fast and simultaneously enable high-
resolution imaging constitutes a big challenge for medical research. In this context, the acquisition of
incomplete Fourier data (i.e. of only part of the Fourier coefficients) is a commonly used acceleration
strategy. This places the modality in the category of computational imaging techniques, with advanced
algorithms required for image reconstruction from the data.
1.3 Deep Learning?
Recently, Deep Neural Networks (DNNs), with mixed architectures, including simple feed-forward net-
works, variational networks, generative adversarial networks, autoencoders, and convolutional neural
networks (CNNs) such as U-net, were demonstrated solving inverse imaging problems with outstanding
DEEP LEARNING IN COMPUTATIONAL IMAGING (COURSE PROJECT) 1/11
Prof. Y. Wiaux Heriot-Watt B31XO Course project
Figure 1: Illustration of a generic U-net CNN architecture. The network will be fed with an “input image”
and produce an “output image”. The input is progressively processed through the sequential application
of simple activation functions (typically thresholding operations such as ReLU) and convolutions with
small kernels (Conv) over a number of layers. On top of that “Max pooling”, “Transposed Conv”, “Con-
cat”, and “BN” respectively perform down-sampling, up-sampling, concatenation, and normalisation
operations, which all help creating a complex structure of “feature maps” (yellow planes) over the layers.
The parameters of the network (typically the convolution kernels that lead to the feature maps) will be
learned, i.e. their values will be chosen to optimise the output of the network over a training data set of
input/output images. The more complex the architecture, the larger the inference power of the network.
precision and for a variety of applications, in particular in medical imaging [2–5]. For illustration, a
CNN will process data, in the form of an input “corrupted” image, through the sequential application
of simple nonlinear activation functions and small convolutions over a number of layers, to produce an
output “corrected” image (see Figure 1). If the network is not too deep, i.e. the number of layers is not
too high, its application may be extremely fast in comparison with an optimisation algorithm, opening
the door to a significant scalability to the large image and data dimensions characterising modern ap-
plications. The computational cost is transferred to training the DNN offline. The training task itself is
actually performed using optimisation algorithms! It requires large training databases, as well as power-
ful computation resources, in particular dedicated GPU systems. The lack of theoretical understanding
of DNNs however raises the question of the reliability of the estimated image. In particular, the relation
between the data and the underlying image is, in most applications, dependent on a variety of acquisition
parameters. Training can only be performed over a limited range of conditions, making the question of
the power of generalisation of a DNN to imaging conditions outside those considered during training a
constant challenge.
Emerging “Plug-and-Play” (PnP) methods in optimisation propose to replace the proximal operator
associated with the regularisation term in a proximal splitting optimisation method by a more general
denoising operator characterising a more general prior image model [6–8]. They were also shown to pro-
vide outstanding imaging precision, in particular when the image model is learnt and the corresponding
denoiser takes the form of a DNN [9–14]. Interestingly, the DNN being merely a denoiser, it becomes
agnostic to the data acquisition conditions. The PnP approach thus solves the generalisation issue men-
tioned above.
DEEP LEARNING IN COMPUTATIONAL IMAGING (COURSE PROJECT) 2/11
Prof. Y. Wiaux Heriot-Watt B31XO Course project
Figure 2: Left: the groundtruth xo we wish to recover, center: the Fourier mask diag(M†M), and right:
the observed Fourier data correctly positioned in the Fourier plane M†y (in logscale of the modulus).
2 The project
2.1 Problem formulation and objectives
Problem formulation. In MR imaging, one wishes to recover an image xo ∈ RN from a noisy and
sub-sampled version of its Fourier coefficients y ∈ CM , with M 0 satisfies δ ≤ σ−1max(R(ΦΦ†)), and ρ > 0 is a free parameter. The
value of this parameter will usually be fine-tuned to maximize the reconstruction quality over some test
set of groundtruth images, i.e. a data set of Te pairs (y i,xoi )i∈Te . A properly tuned value of ρ should lead
to x?i ‘ xoi ∀i ∈ Te. If the required accuracy is not reached, the value of ρ needs to be changed. Or more
importantly the regularisation term and the objective function might have to be adapted!
Note that, in theory, the optimisation algorithm will converge after an infinite number of iterations.
In a practical implementation the iterative process will typically be stopped when the relative variation
of consecutive iterates is smaller than a fixed bound α > 0, i.e. ‖xt+1 − xt‖2/‖xt+1‖2 ≤ α. Note that
other criteria can also be applied.
2.3 M2
What is a neural network? Given a specific architecture (sequence of nonlinear activation functions
and convolution operations, see Figure 1), a neural network can be represented as a highly nonconvex
function G of a set of parameters θ. The latter are optimised in order to have G solve a specific problem,
which can be formalised as recovering some variable xo from input data y . Ideally, one seeks that
the network output G(y,θ?) be equal to xo. In order to find the optimal value θ? of the parameters,
the network is trained by minimizing a loss (objective) function l over a training data set of Tr pairs
(y i,x
o
i )i∈Tr . A widely used loss function is the mean squared error over the data set between the network
output G(y,θ) and the true variable xo:
θ? = argmin
θ
1
M
M∑
i=1
‖G(y i, θ)− xoi ‖22. (4)
Algorithms such as Stochastic Gradient Descent (SGD) and Adaptive Moment Estimation (ADAM) solve
problem (4) efficiently (even though this problem is nonconvex). Just as ADMM, these algorithmic
structures contain free parameters that will affect the optimal value θ?. Once the network is trained, it
is evaluated on a separate testing data set, i.e. a data set of Te pairs (y i,xoi )i∈Te on which the network
has not been trained. A properly trained network should satisfy G(y i, θ?) ‘ xoi ∀i ∈ Te. If the required
accuracy is not reached, the free parameters need to be re-tuned and the network re-trained. Or more
importantly the architecture G of the network might have to be adapted!
Application to imaging inverse problems. To solve an imaging inverse problem, a neural network
will aim to estimate the sought image xo from the incomplete data y . For our problem, the training
and testing data sets should be built as Tr + Te input/output pairs (y i,xoi )i∈Tr+Te , where y i is related
to the groundtruth image xoi through (1). Note that working on pairs where both input and output are
represented as images greatly simplifies the network architecture and training procedure. One can simply
DEEP LEARNING IN COMPUTATIONAL IMAGING (COURSE PROJECT) 4/11
Prof. Y. Wiaux Heriot-Watt B31XO Course project
Figure 3: Left: the groundtruth xo we wish to recover, middle: the observed Fourier data correctly
positioned in the Fourier plane M†y (in logscale of the modulus), and right: the backprojected image
x˜ = 0 satisfies δ ≤ σ−1max(R(ΦΦ†)) and ε = σ

M + 2

M , and a denoising neural
network G is substituted to the prox of the regularisation term in the objective function (2), imposing a
learned image model. As the network is to be applied at each iteration of ADMM (rather than only once
as in M2), we choose a simpler architecture, namely “DnCNN” [10], which implies a faster application
of G and will help containing the overall computation time. The explicit DnCNN architecture proposed
is detailed in Appendix B.
Recall that a network is trained by minimizing a loss function over a training data set Tr using
optimisation algorithms containing additional free parameters that need tuning. Once the network is
trained, it is evaluated on a testing data set Te, and if the required accuracy is not reached, the free
parameters need to be re-tuned and the network re-trained, or more importantly the architecture of the
network might have to be adapted!
DEEP LEARNING IN COMPUTATIONAL IMAGING (COURSE PROJECT) 5/11
Prof. Y. Wiaux Heriot-Watt B31XO Course project
The training and testing data sets are built as input/output pairs (xˆi,xoi )i∈Tr+Te , where xˆi are noisy
versions of the groundtruth images xoi :
xˆi = x
o
i + n¯i, (7)
with n¯i ∈ RN is some additive i.i.d. Gaussian noise, whose components follow a normal distribution
with zero mean and variance σ¯2. Note that this variance σ¯2 should be adapted to the variance σ2 of the
noise on the observed data.
Note that the convergence PnP version of ADMM is not proven for standard network architectures
such as DnCNN [7, 8, 11]. It is therefore safe to stop the algorithm after a fixed number of iterations,
rather than “hoping” for the relative variation of consecutive iterates to be smaller than a fixed bound.
3 Implementation and Data
The algorithms should be implemented in MATLAB. All data and files needed for the project are pro-
vided on dropbox and organised in the following directories:
• implementation/ contains the MATLAB files M1.m, M2.m, and M3.m, for your implementation
and validation of the 3 methods; the 3 files contain useful MATLAB functions; M2.m also contains
a partial implementation of the U-net architecture proposed in Appendix A, for you to finalise;
M3.m also contains a partial implementation of the DnCNN architecture proposed in Appendix B,
for you to finalise;
• trainingset/ contains a training data set of approximately 7000 MR images of size 320 × 320
borrowed from [16] to serve as groundtruths for training both the U-net and DnCNN. From these,
you will need to build the backprojected images to serve as an input to the Unet according to
equation (5) in the training phase of M2, and the noisy images to serve as an input to the DnCNN
according to equation (7) in the training phase of M3;
• testingset/ contains a testing data set of 20 MR images of size 320 × 320 borrowed from [16] to
serve as groundtruths for the validation of the 3 methods.
4 DNN online tutorials
For building background knowledge on DNNs (CNNs in particular, including their MATLAB implemen-
tation) please refer to the following tutorials:
• MATLAB Deep Learning Onramp
• Stanford course on CNNs
5 Beyond M1, M2, M3
The design and development of M1, M2, and M3 raises a significant amount of questions with regards to
possible improvements. Can the objective function for M1 be enhanced? Can the network architectures
and/or the training data sets (input/output) for M2 and M3 be enhanced? Would another PnP algorithm
DEEP LEARNING IN COMPUTATIONAL IMAGING (COURSE PROJECT) 6/11
Prof. Y. Wiaux Heriot-Watt B31XO Course project
SNR (dB) SSIM Reconstruction time (s)
Method Average Standard dev. Average Standard dev. Average Standard dev.
M1
M2
M3
M4
Table 1: SNR, SSIM, and computation time results
enable increased performance of M3? Can DNNs and optimisation algorithms be interfaced in a dif-
ferent way? These questions lay the ground for the development of new methods that would possibly
outperform M1, M2, and M3…
Challenge. You should attempt to develop and implement a new method M4 to outperform M1, M2,
and M3.
6 Report
Your results. You are asked to provide the results of validation of your developments over the 20
images from the testing data set. More precisely, you need to provide, for each method (M1, M2, M3,
and your M4):
1. the average and standard deviation of the Signal-to-Noise Ratio (SNR (dB)) and Structural Similar-
ity Index Measure (SSIM) of the reconstructions (use Table 1). Note that the SNR of reconstruction
x? is defined as SNR = 20 log10 (‖xo‖2/‖xo − x?‖2). The more involved definition of SSIM can
be found, e.g. on Wikipedia, and MATLAB has a built-in function for it. Both metrics are standard
estimators of the reconstruction quality;
2. the average computation time and standard deviation (use Table 1);
3. the reconstructed images x?i in comparison with the groundtruth images x
o
i and the backprojected
images x˜i =