Lecture 6 Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

GoogLeNet

A
  • new layer architecture
  • Outputs are concatenated
  • Problems
    o Concatenation makes the
    matrix very big
    o Max-pooling does not change
    the feature maps, only their
    size
  • Thus, we should reduce the feature
    maps at some locations in the
    architecture
  • Solution
    o Use additional 1x1 convolutions -> does not change feature map size
    ▪ Given a feature map of NxNxM and a 1x1xM convolutional filter results in a
    NxN feature map
    ▪ Given K 1x1xM convolutional filters and a feature map of
    NxNxM results in NxNxK feature map
    o Can tune the number of output feature maps -> dimension
    reduction
    o Combines all activations across feature maps and makes a smaller set
  • Also called inception network -> often repeated multiple times in a bigger network
  • Uses two additional classifiers during training
    o To counteract very deep network that may not propagate gradients back through all
    layers in an effective manner
    o Encourages discrimination in lower stages
    o Increases gradient signal that gets propagated back
    o Fully connected layers are replaced by average pooling
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

ResNet

A
  • Deeper networks did not necessarily improve training error
    o Gradients no longer flow back to the input signal
    ▪ Use skip connections
    o As we go deeper, representations become difficult to learn
    ▪ Learn the residual
  • Skip connection skips one activation function and is added to the
    next
  • In DenseNet, skip connections go to all future activation
    functions
    o Strong gradient flow
    o Computationally efficient -> controlled by constant K
    channels per layer
    o Low complexity features -> final classifier sees
    features from all layers
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Using pre-trained Networks

A
  • Common networks (VGG16,Inception,ResNet) are trained on ImageNet and available
  • Helps when we have limited data
  • Very small dataset: feature extractor
    o Only train final layer, freeze all other layers
  • Medium dataset: fine-tuning
    o Train last few layers, freeze all other layers
  • Much faster, because low-level features are often similar between tasks
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Detection Tasks

A

Detection Tasks
* Find certain structures in an image
o E.g. can we find any nodules within this scan?
* Possible approaches:
o Patch classification (is the patch centred on the structure)
o Segmentation (U-Net, Dilated Networks)
o Predict location

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Sampling Strategies

A
  • Often most of the image area is easy to recognise
  • Few target objects
  • Difficult negatives are difficult to identify upfront
  • Thus sampling using a uniform grid or random
    sampling is not likely to find target objects or
    difficult negatives
  • Can train a CNN to first identify possible matches
    using random sampling
  • Then, use a second CNN to find the (hard)
    negative samples (hard-negative mining)
    o Negative samples with a high likelihood are samples more often in CNN-2
    o Can use a deep network to identify diffucult negatives (often healthy cells that look a
    lot like unhealthy cells)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Intersection over Union (IoU)

A
  • Measure of object detection when using bounding boxes
    o Bounding boxes are a box around an object of interest
  • IoU measures how large the intersection is compared to the total
    combined area of the ground truth and detection
    o A larger score means that the detection covers more of
    the ground truth
  • Often use a threshold (e.g. 0.5) as hit criteria
    o I.e. the bounded box from detection must cover at least half of the ground
    truth detection box
    o Also depends on the application, for higher accuracy the threshold is often
    higher
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Region Proposal Networks

A
  • Using ConvNets for detection tasks
  • For example the Region with CNN features (R-CNN)
    network
  • Region proposal is based on selective search
    o Proposes 2000 bounding boxes per image
    o Very slow
  • Each region is resized and processed with AlexNet
    o Regions may have different shapes, but need to match exactly with input size of
    AlexNet
    o Pre-trained on ImageNet, which does not have bounding boxes
    o Extract a vector of 4096 features from the last fully connected layer from AlexNet
    ▪ We remove the output layer from AlexNet
  • We train a linear Support Vector Machine (SVM) per class
    o Positive examples: bounding boxes with IoU > 0.5 with ground truth
    o Negative examples: all other bounding boxes
    o Very slow: one per class
  • R-CNN performs multiple forward passes
    o Fast R-CNN performs one single
    forward pass
    ▪ Use VGG16 to process
    whole image
    ▪ Obtain feature maps
    from convolution +
    pooling layers
    ▪ Proposed bounding
    boxes are applied to the
    feature map
    ▪ Region of interest is cropped and downscaled (to a fixed size) via a RoI
    pooling layer
    ▪ Rest of the tnetwork processes RoI:
  • Softmax predicts the class
  • Bbox regressor refines bounding box
    ▪ Does not need additional SVM
    ▪ But still uses region proposal -> slow, most computation is in region proposal
  • Faster R-CNN: another level of improvements by using a single
    network
    o Tasks performed by network:
    ▪ Region proposal
    ▪ Classification
    ▪ Bounding box refinement
    o New part is region proposal network
    ▪ Input: feature map
  • Each feature map is
    processed by a small
    convolutional layer,
    produces a 256 feature
    vector
  • Vector processed to produce object bounding box and score
    ▪ Output: set of rectangular objects, each with an objectness score
  • Likelihood of containing an object or background
  • Anchor boxes: different size boxes -> objects can present in different
    sizes. Number of boxes is typically a hyper-parameter
    o For example on the left, we have smaller object detections
    within the boxes that detect the lungs
    o Bottom part is a fully-convolutional network
    o Top part is Fast R-CNN
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

YOLO: You Only Look Once

A
  • A single network does:
    o Bounding box prediction
    o Class prediction for each bounding box
  • The image is divided into an SxS grid
    o Each cell predicts B bounding boxes and confidence score for
    those boxes
    ▪ Bounding boxes = anchor boxes
    o If the centre of an object falls into a grid cell, that grid cell is
    responsible for detecting the object
  • Each bounding box consists of 5 predictions
    o (x,y): centre of the bounding box relative to the bounds of
    the grid cell
    o (w,h): predicted width and height relative to the whole
    image (0 <= (w,h) <= 1)
    o Confidence: IoU between predicted box and any ground truth box -> not for class, but
    for background-foreground
    ▪ Probability of having an object in a cell multiplied by IoU of the bounding box
    ▪ No object: confidence should be 0
    ▪ If there is an object: confidence should be equal to the IoU with ground
    truth
  • Classification:
    o Each cell also predicts C conditional class probabilities
    ▪ Conditioned on the grid cell containing an object
    ▪ In practice, a vector of C values is produced for each grid cell
  • Output as a tensor with several components:
    o SxS: grid cell
    o B: bounding boxes
    o x,y,h,w,confidence: per bounding box
    o C probabilities
    o shape of output tensor is S x S x (B*5 + C)
  • 24-layer convolutional network, pre-trained on ImageNet
  • Uses a custom loss function
    o Minimises sum-squared error
    o Sum over all bounding boxes B and
    all grid cells SxS
    o First term: want to predict the
    centre of the bounding box with a
    low error
    o Second term: want to predict the
    size of the bounding box with low
    error
    ▪ Relevant in small bounding
    boxes, not so much in large
    bounding boxes (hence we use a square root)
    o Third term: confidence should be high when there is an object in the ground truth
    ▪ But also many grid cells without objects where the confidence is very small
    o Fourth term: Accounts for grid cells without objects and avoids instability during
    gradient descent
    ▪ Adds coefficients λcoord and λnoobj
    o Fifth term: classification error should be small
  • Limitations:
    o Imposes strong spatial constraints on bounding box
    ▪ Limits number of nearby objects that model can predict
    o Struggles with small objects that appear in groups
  • Multiple (anchor) boxes can be found for the same
    object
    o Each has its own probability of containing
    an object
    o Typically use non-max suppression
    algorithm
    ▪ Ensures we end up with a single
    bounding box for an object
How well did you know this?
1
Not at all
2
3
4
5
Perfectly