Dropout Flashcards

1
Q

Overview

What is dropout in machine learning?

A

Dropout is a regularization technique used to prevent overfitting by reducing the reliance on individual neurons (Makes the network suffer)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Overview

What does dropout do during training?

A

During training, dropout randomly sets a fraction of the neurons’ outputs to zero.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Overview

What benefit does dropout provide in terms of features?

A

Dropout encourages the learning of more robust and generalizable features

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Overview

What happens to the dropout layer during inference?

A

During inference, the dropout layer is deactivated.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Overview

How are neuron outputs affected during inference in dropout?

A

During inference, all neurons are used, but their outputs are scaled down by the same factor as during training.

e.g. if 2 out of 3 were used in training then the activation learned weights for 2 inputs not 3. By adjusting each input by 2/3, it means that the 3 inputs will still have the same magnitude as was used during training.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Forward phase of dropout

What is the effect of output masking in dropout during forward propagation?

A

The outputs of the selected neurons are set to zero, effectively dropping them out from the network for that particular forward pass.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Forward phase of dropout

What happens to the modified outputs from non-dropped out neurons during forward propagation?

A

The modified outputs from the non-dropped out neurons are propagated forward to the next layer in the network.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Forward phase of dropout

What happens to the dropped neurons on the next iteration?

A

A new set of neurons will be selected across the network according to a probability. Each time a different subset is used.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

backward phase of dropout

What happens to the gradient/weights of the dropped neurons during backpropagation

A

The backward pass starts with the modified outputs from the previous layer. Since some neurons were dropped out during the forward pass, their gradients are also set to zero.

(Gradient Masking)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Why is extra scaling necessary in the forward pass during dropout?

A

To ensure that the expected contribution of each neuron remains consistent between training and inference phases.

e.g. if drop out caused 1 out of 3 neurons not to fire during training. (if each neuron has a max weight of 1), The activation function is expecting a max 2. Since at any given point only two neurons would enter the activation function. Now when 3 neurons are all being used during inference, our activation function could end up with a max weight of 3. To keep the magnitude the same, the weights must be adjusted by 2/3 (called the keep probability.) ,

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the concept of inverted dropout during training?

A

All nodes that were not dropped out are scaled up by the inverse of the dropout rate.

This increase means that if 1 out of 3 neurons is dropped out, we multiply the weights of the remaining 2 by 3/2. So if each neuron had a max weight of 1 before, it now has a weight of 1.5. The activation function then learns off of a magnitude of 3.

At inference time when all three neurons are in use (no dropout) the input doesn’t need scaling since the activation and adjustments learned values in a magnitude of 3 (the non dropped out magnitude)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Why is scaling applied during training in inverted dropout?

A

To compensate for dropped neurons and maintain the total contribution of remaining neurons similar to no dropout.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How does the network behave at inference time in inverted dropout?

A

It works the same as if dropout wasn’t present, requiring no scaling.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What problem arises when applying standard dropout to each activation of a convolutional feature map before a 1 × 1 convolution layer?

A

It leads to increased training time without effectively preventing overfitting, mainly due to spatial correlation among feature map activations in fully convolutional networks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

With CNN what happens to the gradient contribution of certain neurons when dropout is applied in a neural network?

A

Some neurons may have zero contributions due to dropout, but in an image others will still exist and despite the holes dropout created the remaining “pixels” still have enough information to over come the missing data. (Strong correlation of pixels.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

In CNN how does dropout affect the independence of neurons in a network?

A

The overall learning rate is scaled by the dropout probability, indicating a reduction in the magnitude of updates, the independence or lack of interdependence among neurons is not improved

17
Q

Why do some argue against using dropout in convolutional layers?

A

Convolutional layers have fewer parameters and are less likely to overfit, and dropout may slow down training by affecting the gradient updates for the weights of convolutional layers, which are the average of all gradients from convolutions.

18
Q

What is the key difference between SpatialDropout and standard dropout?

A

SpatialDropout performs dropout trials on the entire feature tensor as a whole, maintaining spatial coherency, while standard dropout acts independently on each activation.

19
Q

How does SpatialDropout affect adjacent pixels in the dropped-out feature map?

A

In SpatialDropout, adjacent pixels in the dropped-out feature map are either all zero or all active, preserving spatial relationships.

20
Q

What is Droplayer (StochasticDepth)?

A

Droplayer is a regularization technique that randomly skips entire layers during training instead of dropping out individual neurons, improving the robustness and reducing overfitting in deep networks like ResNets.

21
Q

What is DropBlock?

A

DropBlock is a regularization technique that randomly masks out contiguous blocks of activations within a feature map, promoting diverse feature learning and spatial generalization in CNNs while preventing overfitting.

22
Q

How do Droplayer and DropBlock differ from traditional dropout?

A

Droplayer skips entire layers during training, while DropBlock masks spatial regions within feature maps, in contrast to traditional dropout that targets individual neurons.