Week 5 Flashcards by tyrion lannister

What is a deep NN? What is shallow NN?

Deep is > 3, typically&raquo_space; 3 layers

Shallow is <=3 layers

How well did you know this?

Not at all

Perfectly

Why question he’s of Deep NNs? Why should we use?

Shallow NN can appreciate any continuous non linear function arbitrarily well

Deep helps capture complexities

How well did you know this?

Not at all

Perfectly

Unnecessarily confusing example of RNN

Because output of last hidden layer goes back into input of first hidden layer, in this example, hidden layer goes into itself

How well did you know this?

Not at all

Perfectly

Difficulties with backprop

Vanishing and exploding gradients

How well did you know this?

Not at all

Perfectly

Mitigate vanishing gradients

Use activation functions with non vanishing gradients

How well did you know this?

Not at all

Perfectly

Mitigate exploding gradients

Batch normalisation

How well did you know this?

Not at all

Perfectly

Activation functions with non vanishing derivatives

How well did you know this?

Not at all

Perfectly

Better ways to initialise weights

How well did you know this?

Not at all

Perfectly

How does back prop struggle with gradients in cost function

Variation in magnitude of gradient may occur between:
- different layers
- different parts of the cost function for a single neuron
- Different direction for a multi dimensional function

How well did you know this?

Not at all

Perfectly

Momentum in back prop ? Useful for?

Adds moving average of previous gradients to current gradient (helps with plateaus and local minima)

Movement = negative of gradient + momentum

How well did you know this?

Not at all

Perfectly

Adaptive learning rate and examples

Vary learning rate (for individual parameters) during training:
- increasing learning rate if cost is decreasing
- decreasing learning rate if cost is increasing

Eg AdaGrad or RMS Prop

How well did you know this?

Not at all

Perfectly

How well did you know this?

Not at all

Perfectly

Examples of back prop algorithms with adaptive learning rate and momentum

ADAM
Nadam

How well did you know this?

Not at all

Perfectly

Motivate batch Normalisation

Different inputs to 1 neuron can have very different scales (depending on the neuron the input comes from)

How well did you know this?

Not at all

Perfectly

Batch normalisation

How well did you know this?

Not at all

Perfectly

Benefit of batch normalisation

How well did you know this?

Not at all

Perfectly

Skip connections

Study These Flashcards

Define CNN and motivate

Study These Flashcards

Any NN in which at least 1 layer has a transfer function implemented using convolution/crosscorrelation

Motivated by desire to recognise patterns with tolerance to location

Weight sharing in CNNs, how?

Study These Flashcards

Transfer function for CNN

Study These Flashcards

Output of 1 neuron of CNN layer

Study These Flashcards

Array of numbers, sometimes called feature map

Cross correlation formula

Study These Flashcards

Convolution formula (the operation)

Study These Flashcards

Mask for CNN

Study These Flashcards

Variations of standard cross correlation

Multiple masks

How to reduce number of channels and computational burden of next layer

use 1x 1 convolution

Reasons for pooling layers in CNNs

To increase tolerance to: Location of patterns Configuration of sub patterns

How pooling layers work

Replacing values within a region of the input array with a single value calculated (eg mean “average pooling”, or maximum “max pooling”)

Pooling typically has

Stride > 1 to decrease spatial size of the layer

Different sized pooling regions can be used if

Pooling regions here are RFs Used if RF is same size as input array, output is scalar value “global pooling”

Is pooling operation learnt

No, user specified

Other names for pooling

Sub sampling Down sampling

Typically pooling is applied separately to each input array, therefore

The number of output channels is equal to the number of input channels (However it is possible to perform cross channel pooling to reduce the resulting number of channels)

Final layers of CNNs are typically? Especially when?

Typically fully connected Especially when applied to classification tasks

To create the input to the 1st fully connected layer (for CNN) it is necessary to

Flatten the output of the last convolutional or pooling layer

3 issues with Deep NNs and why

Deep NNs/CNNs have lots of tunable parameters Therefore: 1) need lots of training data 2) training is computationally intensive 3) there is a danger of over fitting on training data

How to address need for lots of training data for deep NNs/CNNs

Data augmentation (eg applying transformations to images) Transfer learning

How to address overfitting of CNNs/Deep NNs

Regularization

Dropout regularization

During usage, all neurons behave normally Fro put forces random sub networks to correctly classify each sample, preventing recognition of individual samples being reliant on individual neurons

Even if all precautions taken; what issue can occur with training deep NNs/CNNs

Failure to generalise

Week 5 Flashcards

(42 cards)