Unit 3: Convolutional Neural Networks Flashcards

1
Q

Cross-correlation process

A

The centre of the kernel is placed at each location of the image.

Each value in the kernel is then multiplied with the value beneath and the resulting products are summed, with this value being inserted into a new image at the corresponding position.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Padding

A

Because the kernel can’t be centered on the boundary rows and columns of the image, there are fewer rows and columns in the output than the input.

To make the output image the same size as the input, it is common practice to expand the original image with additional rows and columns to the left and right and on the top and bottom.

This means that the kernel can be centered at all pixels locations of the original image.

The additional rows and columns are normally filled with zeros.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Stride

A

A variation of the procedure where the kernel is moved one or more steps at a time across the columns and down the rows.

The step size is called the stride.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Dilation

A

Another variation of the procedure is to interspace the values in the kernel as they are applied to the image values.

We refer to the degree of interspacing as the dilation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

spread of a Gaussian kernel

A

The amount of smoothing, controlled by the value of sigma.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Convolution

A

If we rotate the kernel by a half turn (180 degrees), before applying cross-correlation, the resulting operation is nown as convolution.

The convolution of an image f with a kernel h is written as h * f for short.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Convolutional Neural Network

A

A ConvNet applied to images is organised as a series of layers.

Each layer uses convolution to produce a set of feature maps using a different kernel for each feature map.

The feature maps from one layer are passed as input to the next layer.

The first layer received the image as input.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Pooling

A

A pooling layer is used in CNNs to reduce the spatial size of the feature maps and give some invariance to small spatial transformations of the input image, which might arise from small translations or deformations.

The idea is to tile the feature maps with a fixed window and then to aggregate the values in the window at each position into a single scalar value.

The aggregate value provides a summary of the values within the window.

Max-pooling is the maximum of the values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Data augmentation (5)

A

One way to avoid over-fitting is to increase the size of the training set.

Data augmentation produces new images by:
- Applying random transformations to the images in the training set.
- Simulating changes in camera position and orientation by translating.
- Rotating and scaling the image.
- Scene lighting by scaling intensity values.
- Intraclass variations in shape and appearance by applying small image deformations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Dropout

A

The idea is to perturb training examples by zeroing input values to a layer with some given probability p.

We are effectively removing features at a chosen layer in the CNN.

Dropout is a form of regularisation that can reduce generalisation error and thereby improve performance on unseen data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Batch normalisation

A

Training can be improved by reducing the variability of the data using batch normalisation.

Here we normalise the values in the input image by standardising over each bach of data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Receptive field

A

In a CNN, the value at each position in a feature map derives from a region of values in the input image known as the receptive field for that feature map position.

Typically the receptive field grows as we move through the layers of the CNN.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Grad-Cam

A

A way to find out which parts of an input image contribute most to selection of a given class labe.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Semantic segmentation task

A

To label each and every pixel in a given image with the object class to which it belongs, including a catch-all class for the background.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Jaccard Index

A

Performance of a semantic segmentation process is normally measured using the Jaccard index, which is a statistic for measuring the similarity between a pair of sets, A and B.

Defined as the size of the intersection over the size of the union of the two sets.

J(A, B) = |A ∩ B| / |A ∪ B|

Aka Intersection over Union

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Encoder-Decoder Architecture

A

One way to produce a semantic segmentation.

The idea is that the encoder produces a compact encoding of the input image within an embedding layer and the decoder expands this encoding into a semantic segmentation.