Lecture 5 Flashcards
- ConvNet can classify patches and label the central pixel
o Can slide all patches and classify each of them -> but very slow
▪ We repeat a lot of convolutions
Fully Convolutional Network
- Fully connected layers has fixed dimensions and
throws away spatial coordinates
o Can be seen as a convolution with kernels
that cover the entire input region - Train with patches of size PxP to predict one value
o Pros:
▪ Smaller patches -> less memory -> faster training
▪ Can combine patches from different sources -> increase diversity & converge
▪ Patch sampling allows us to control class balancing
▪ Sample more from difficult locations to control difficulty of training
o Cons:
▪ Borders between objects are not explicitly defined
▪ Only the label of the centralised pixel may be available
▪ Output is usually not full resolution - Low resolution: consequence of pooling layers
o But pooling is good: increases receptive field and reduces size of feature maps - Full resolution output by:
o Combining multiple low-res results
o Avoiding low resolution (no pooling)
o Upscaling: interpolation or learning de-convolution filters - Convolutions at original image resolution are very expensive
o Effective receptive field is also very small
Receptive Field
- Area of the input image that is seen by single neurons in any
layer of the network
o Each activation map #1 sees an area of 3x3 in the input
o Each activation map #2 sees an area of 3x3 in map #1
▪ Sees a 5x5 region of the original input - Factors that affect receptive field:
o Number of layers
o Filter size
o Presence of pooling
▪ Increases receptive field
▪ Decrease resolution
▪ Lose spatial information - Effective receptive field: follows a Gaussian distribution
o Occupies a fraction of the theoretical receptive field
o Depends on:
▪ Initialisation strategy
▪ Non-linearity
▪ Type of layers used in the network
Dilated Convolutions
- Aims to efficiently increase the receptive field
- Exponentially expanding receptive field
o Based on dilation rate - Can be plugged into existing architectures
- No pooling, no subsampling
- Allows dense prediction at full resolution
- No new filters made, only apply convolution in a
different way
Deconvolution Network
- Learn filters to do up-sampling
- Unpooling: remember which element was max
o Fill that element with input value, all other
elements with 0 - To learn a filter to be used during upscaling:
o Multiply filter coefficients by input value
o Slide filter by stride value
o Sum where outputs overlap - Can also do up-sampling with nearest neighbour
+ same convolutions with stride 1 - Often leaves checkerboard artifacts behind
- Implemented up- and down-sampling
- Each 2x2 convolution halves the
number of feature channels - Skip-connections: add back details to
o Sometimes called
concatenation - Can be trained with little data
- Often uses heavy data augmentation
- Uses cross-entropy loss at pixel level
and loss weight to enforce good
segmentation at object borders - Problem: input size and feature map sizes need to be configured to correspond to each other,
often difficult to do
o Particularly between skip-connections, especially difficult if cropping is not symmetric
o Can use same convolutions, feature maps now have exactly the same size -> easy to
▪ But this introduces artifacts, because we have to do zero-padding - Dense layers in the bottom part maximise the field of view of the network
- nnU-Net is a framework for U-Nets
that does initialisation of parameters
and network for you
o Thus no expert knowledge
required - Very fast
- Uses standardised baseline and outof-the-box segmentation method
Image Registration
- Aims to have a spatial correspondence between two+ images
o E.g. given that one image shows the heart, where is
the heart located in the other image? - Mainly used to find geometrical, anatomical and functional
alignment for e.g.
o Disease monitoring, motion analysis and growth
analysis - Intra-patient image registration: images of the same patient,
e.g. to quantify change - Inter-patient image registration: images of different patients, e.g. to find structures present
in both patients - Image fusion: combine images from multiple sources into one image
o E.g. combining CT and PET scans into one image - Registration pipeline:
o Center alignment
o Translation
o Affine registration
o Deformable registration
▪ Aligning images into a
common coordinate
Medical Image Representations
- Continuous: using functions
o 𝐹, 𝑀: 𝑅
𝑑 → 𝑅 - Discrete: using d-dimensional
matrix - Meta information (in world matrix)
o Pixel/voxel spacing
o Image origin
o Image direction/orientation - Transformations can also be nonparametric
Objective Function
- To measure whether a deformation is reasonable, we
need an objective function - How do we compute similarity between images?
o Mono-modal image similarity (NGF):
▪ sum of squared differences (SSD): how
different each pixel is between two
identical locations in different images.
Assumes identity relationship between
▪ Normalised Cross Correlation (NCC):
assumes linear relationship between
o Multi-modal image similarity:
▪ Normalised gradient fields (NGF): assumes intensity at same location
▪ Mutual Information (MI): describes how well one image is explained by the
other image
▪ Modality Independent Neighbourhood Descriptor (MIND): exploits selfsimilarity
Evaluation of Image Registration
- Very difficult task
o Rarely a point-wise correspondence from one image to another available - Quantitative evaluation: exact definition of correct transformation
o Synthetic transformation applied to the images
o Hard to define realistic deformation
o Simplified problems
o Likely to get phantoms
o Auxiliary measures:
▪ Segmentation overlap
▪ Landmark error (error in identifying significant structures in an image)
▪ Quality of deformation field, e.g. - Deformation field: a large matrix of 3D translations for each voxel in
source and target image - # foldings
- Smoothness
- Evaluation must be independent of cost function or registration features
- Highly depends on application
Learning-Based Image Registration
- Conventional: iterative optimisation for each image pair
-> time consuming - Learning based: train a neural network to learn network
parameters - But often no point-wise correspondence available
between registration network output and ground truth
o Medical experts cannot annotate a reference
deformation field - How to supervise a registration network:
o Supervised methods: use ground-truth
deformation field for training
o Self-supervised/unsupervised methods: use the cost function of conventional image
registration (similarity measure + regulariser) as loss function
o Weakly-supervised methods: are supervised with prior information (e.g.
segmentation masks)
Multi-Level Registration Networks
- Avoid local minima
- Speed up computations
- Avoid foldings
- Loss function:
o Add prior knowledge into
training by designing
specific loss functions
o Time consuming annotations are only required on training data
o Can also generate labels automatically - HyperMorph strategy: way to improve learning hyper-parameters
o Trains a single model instead of iteratively improving on previous optimal values