Week 5 Flashcards

1
Q

What is a deep NN? What is shallow NN?

A

Deep is > 3, typically&raquo_space; 3 layers

Shallow is <=3 layers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Why question he’s of Deep NNs? Why should we use?

A

Shallow NN can appreciate any continuous non linear function arbitrarily well

Deep helps capture complexities

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Unnecessarily confusing example of RNN

A

Because output of last hidden layer goes back into input of first hidden layer, in this example, hidden layer goes into itself

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Difficulties with backprop

A

Vanishing and exploding gradients

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Mitigate vanishing gradients

A

Use activation functions with non vanishing gradients

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Mitigate exploding gradients

A

Batch normalisation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Activation functions with non vanishing derivatives

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Better ways to initialise weights

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How does back prop struggle with gradients in cost function

A

Variation in magnitude of gradient may occur between:
- different layers
- different parts of the cost function for a single neuron
- Different direction for a multi dimensional function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Momentum in back prop ? Useful for?

A

Adds moving average of previous gradients to current gradient (helps with plateaus and local minima)

Movement = negative of gradient + momentum

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Adaptive learning rate and examples

A

Vary learning rate (for individual parameters) during training:
- increasing learning rate if cost is decreasing
- decreasing learning rate if cost is increasing

Eg AdaGrad or RMS Prop

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Examples of back prop algorithms with adaptive learning rate and momentum

A

ADAM
Nadam

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Motivate batch Normalisation

A

Different inputs to 1 neuron can have very different scales (depending on the neuron the input comes from)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Batch normalisation

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Benefit of batch normalisation

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Skip connections

18
Q

Define CNN and motivate

A

Any NN in which at least 1 layer has a transfer function implemented using convolution/crosscorrelation

Motivated by desire to recognise patterns with tolerance to location

19
Q

Weight sharing in CNNs, how?

20
Q

Transfer function for CNN

21
Q

Output of 1 neuron of CNN layer

A

Array of numbers, sometimes called feature map

22
Q

Cross correlation formula

23
Q

Convolution formula (the operation)

24
Q

Mask for CNN

25
Variations of standard cross correlation
26
Multiple masks
27
How to reduce number of channels and computational burden of next layer
use 1x 1 convolution
28
Reasons for pooling layers in CNNs
To increase tolerance to: Location of patterns Configuration of sub patterns
29
30
How pooling layers work
Replacing values within a region of the input array with a single value calculated (eg mean “average pooling”, or maximum “max pooling”)
31
Pooling typically has
Stride > 1 to decrease spatial size of the layer
32
Different sized pooling regions can be used if
Pooling regions here are RFs Used if RF is same size as input array, output is scalar value “global pooling”
33
Is pooling operation learnt
No, user specified
34
Other names for pooling
Sub sampling Down sampling
35
Typically pooling is applied separately to each input array, therefore
The number of output channels is equal to the number of input channels (However it is possible to perform cross channel pooling to reduce the resulting number of channels)
36
Final layers of CNNs are typically? Especially when?
Typically fully connected Especially when applied to classification tasks
37
To create the input to the 1st fully connected layer (for CNN) it is necessary to
Flatten the output of the last convolutional or pooling layer
38
3 issues with Deep NNs and why
Deep NNs/CNNs have lots of tunable parameters Therefore: 1) need lots of training data 2) training is computationally intensive 3) there is a danger of over fitting on training data
39
How to address need for lots of training data for deep NNs/CNNs
Data augmentation (eg applying transformations to images) Transfer learning
40
How to address overfitting of CNNs/Deep NNs
Regularization
41
Dropout regularization
During usage, all neurons behave normally Fro put forces random sub networks to correctly classify each sample, preventing recognition of individual samples being reliant on individual neurons
42
Even if all precautions taken; what issue can occur with training deep NNs/CNNs
Failure to generalise