Lecture 3 Flashcards

Question

How do you determine the output of a convolutional unit?

Answer 1

- Determine the size of the output - Multiply and sum: "scan" the unit over the input image - multiply the weights and the sum. It is NOT matrix multiplication, it is element-wise. Multiplying the convolutional unit by a portion of the image matrix each time. - Add a bias: add/subtract the bias from each element. This is an extra weight. - Apply the activation function

Answer 2

Feature detectors

Answer 3

0 0 0 1 1 1 0 0 0

Answer 4

If a horizontal line is present, there will be an output of 1 and the unit will "fire" due to this input. The output is a 1x1 image with element 1. The unit will only have an output when presented with a horizontal image. If a horizontal line is not present, there will be an output of 0 and the unit will not "fire" due to the input.

Answer 5

Convoluting the image with the CU

Answer 6

Eg want to detect a cross We can have two convolutional units, one for a diagonal line sloping down and one for a diagonal line sloping upwards. Where both fire, there is an intersection of the lines and a cross is present.

Answer 7

We as humans do not, the network learns what the important features are in the images presented to it - ie it will learn convolutional units based on features.

Answer 8

By the weights, which are set during learning.

Answer 9

The memory cost is smaller, as the number of weights is very small compared to fully-connected network. Invariance - all parts are treated equally, so if part of the image is translated, it will still be recognised in the same way. The convolutional units tell you what the ML model has identified as important - what kinds of features were detected and were important.

Answer 10

All parts of the input image are treated equally. If part of the image is translated, it will still be recognised the same way (invariance).

Answer 11

A single convolutional layer detects simple features from the inputs. The outputs of convolutional units are smaller images.

Answer 12

These outputs can be fed into more convolutional layers, which put simple features together to make more complex ones. Each convolutional unit scans over the image to produce a new image, each looking for a different (basic) feature. We can stack up these images to produce more complex ones. Eg simple straight lines - build up to corners

Answer 13

If the input data is correlated spatially, some of the information is redundant. Nearby points will be very similar to each other. We don't need to keep all of this data and we can compress the image without losing a lot of information. We combine (pool) sets of nearby pixels into one.

Answer 14

The pooling unit has no weights - it combines pixels (units) in a predetermined way to make a smaller image. For example, using max pooling the pixel of a subset with maximum value of is used to represent the subset. In this way, a 2x2 square is combined into a single value. We could also use average pooling. NO WEIGHTS INVOLVED IN THESE OPERATIONS.

Answer 15

CNNs are comprised of several layers, which may be fully-connected, convolutional or pooling. Typically feature learning section - consisting of convolutional and pooling layers. In a way, this could be considered pre-processing. At the end of this, we have a stack of images out of convoluting and pooling. There is much smaller feature space (a much lower dimensions of features). Followed by fully-connected layer used for classification. We get probabilities from this which determine the prediction.

Answer 16

High dimensional input --> much lower dimensional output

Answer 17

- Mostly image recognition - Handwriting detection - classifying written letters/numbers - Facial recognition - user authentication - Object detection - eg detection and classification of obstacles by self-driving cars - Computational chemistry - drug discovery, atomistic simulation - Board games - AlphaGo (2014)

Answer 18

One of the most modern CNNs, using GPUs in training. It popularised CNNs for image classification, had around 15% error. The first layer detected edges, lines and gradients of colour. Layers resided on different GPUs (different GPUs looking for different features) - this emerged it wasn't a constraint. The second layer than detects more complex features (corners). The complexity of the features detected builds up with each layer. Dog breeds.

Answer 19

GoogLeNet - 22 layers and achieved around 6^ error, which is comparable to human performance. Utilised inception nodes.

Answer 20

Lots of weights which can lead to overfitting.

Lecture 3 Flashcards

(48 cards)