Unit 3: Convolutional Neural Networks Flashcards

Question 1

Q

Cross-correlation process

Answer

A

The centre of the kernel is placed at each location of the image.

Each value in the kernel is then multiplied with the value beneath and the resulting products are summed, with this value being inserted into a new image at the corresponding position.

Question 2

Q

Padding

Answer

A

Because the kernel can’t be centered on the boundary rows and columns of the image, there are fewer rows and columns in the output than the input.

To make the output image the same size as the input, it is common practice to expand the original image with additional rows and columns to the left and right and on the top and bottom.

This means that the kernel can be centered at all pixels locations of the original image.

The additional rows and columns are normally filled with zeros.

Question 3

Q

Stride

Answer

A

A variation of the procedure where the kernel is moved one or more steps at a time across the columns and down the rows.

The step size is called the stride.

Question 4

Q

Dilation

Answer

A

Another variation of the procedure is to interspace the values in the kernel as they are applied to the image values.

We refer to the degree of interspacing as the dilation.

Question 5

Q

spread of a Gaussian kernel

Answer

A

The amount of smoothing, controlled by the value of sigma.

Question 6

Q

Convolution

Answer

A

If we rotate the kernel by a half turn (180 degrees), before applying cross-correlation, the resulting operation is nown as convolution.

The convolution of an image f with a kernel h is written as h * f for short.

Question 7

Q

Convolutional Neural Network

Answer

A

A ConvNet applied to images is organised as a series of layers.

Each layer uses convolution to produce a set of feature maps using a different kernel for each feature map.

The feature maps from one layer are passed as input to the next layer.

The first layer received the image as input.

Question 8

Q

Pooling

Answer

A

A pooling layer is used in CNNs to reduce the spatial size of the feature maps and give some invariance to small spatial transformations of the input image, which might arise from small translations or deformations.

The idea is to tile the feature maps with a fixed window and then to aggregate the values in the window at each position into a single scalar value.

The aggregate value provides a summary of the values within the window.

Max-pooling is the maximum of the values.

Question 9

Q

Data augmentation (5)

Answer

A

One way to avoid over-fitting is to increase the size of the training set.

Data augmentation produces new images by:
- Applying random transformations to the images in the training set.
- Simulating changes in camera position and orientation by translating.
- Rotating and scaling the image.
- Scene lighting by scaling intensity values.
- Intraclass variations in shape and appearance by applying small image deformations.

Question 10

Q

Dropout

Answer

A

The idea is to perturb training examples by zeroing input values to a layer with some given probability p.

We are effectively removing features at a chosen layer in the CNN.

Dropout is a form of regularisation that can reduce generalisation error and thereby improve performance on unseen data.

Question 11

Q

Batch normalisation

Answer

A

Training can be improved by reducing the variability of the data using batch normalisation.

Here we normalise the values in the input image by standardising over each bach of data.

Question 12

Q

Receptive field

Answer

A

In a CNN, the value at each position in a feature map derives from a region of values in the input image known as the receptive field for that feature map position.

Typically the receptive field grows as we move through the layers of the CNN.

Question 13

Q

Grad-Cam

Answer

A

A way to find out which parts of an input image contribute most to selection of a given class labe.

Question 14

Q

Semantic segmentation task

Answer

A

To label each and every pixel in a given image with the object class to which it belongs, including a catch-all class for the background.

Question 15

Q

Jaccard Index

Answer

A

Performance of a semantic segmentation process is normally measured using the Jaccard index, which is a statistic for measuring the similarity between a pair of sets, A and B.

Defined as the size of the intersection over the size of the union of the two sets.

J(A, B) = |A ∩ B| / |A ∪ B|

Aka Intersection over Union

Question 16

Q

Encoder-Decoder Architecture

Answer

A

One way to produce a semantic segmentation.

The idea is that the encoder produces a compact encoding of the input image within an embedding layer and the decoder expands this encoding into a semantic segmentation.

Brainscape's Knowledge GenomeTM

Unit 3: Convolutional Neural Networks Flashcards

Brainscape's Knowledge Genome^TM