Chapter 11 Flashcards

(29 cards)

1
Q

What are common challenges when training deep neural networks?

A

Vanishing/exploding gradients, insufficient data, slow training, and overfitting.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What causes the vanishing gradient problem in deep neural networks?

A

Gradients get smaller during backpropagation, often due to the sigmoid activation function and poor initialization.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the exploding gradient problem?

A

Gradients increase exponentially during backpropagation, causing divergence during training.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How do Glorot and He initialization help in training?

A

They maintain variance across layers, helping stabilize the signal during forward and backward passes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What activation function is commonly used to avoid vanishing gradients?

A

ReLU (Rectified Linear Unit), though it can suffer from the ‘dying ReLU’ problem.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is Leaky ReLU?

A

A ReLU variant that allows a small gradient when the neuron is not active, preventing dying neurons.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is ELU and its advantage?

A

Exponential Linear Unit, which speeds up training and avoids vanishing gradients due to its non-zero mean and negative slope.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is SELU?

A

Scaled ELU that self-normalizes layers, requiring specific input standardization and initialization.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is batch normalization?

A

A technique that normalizes layer inputs to reduce vanishing/exploding gradients and speed up training.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What does batch normalization do during training?

A

It zero-centers, normalizes inputs, and then rescales and shifts using learned parameters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is gradient clipping used for?

A

To address exploding gradients by limiting the gradient values during backpropagation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is transfer learning?

A

Reusing layers from a pretrained network on a similar task to reduce training time and improve performance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How does transfer learning work in Keras?

A

By transferring and locking lower layers from a trained model and training the upper layers on a new task.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is unsupervised pretraining?

A

Training each layer of a network sequentially using unlabeled data, followed by supervised fine-tuning.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are examples of faster optimizers?

A

Momentum, Nesterov Accelerated Gradient, AdaGrad, RMSProp, Adam, and Nadam.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is momentum optimization?

A

An optimizer that accelerates gradient descent by building momentum from past gradients.

17
Q

What is Nesterov Accelerated Gradient (NAG)?

A

A variation of momentum that looks ahead in the direction of momentum for more accurate gradient estimation.

18
Q

What is AdaGrad?

A

An optimizer that adapts learning rates based on past gradients, but may converge too early.

19
Q

What is RMSProp?

A

An optimizer that improves AdaGrad by decaying past gradients, helping avoid premature convergence.

20
Q

What is Adam optimizer?

A

Combines momentum and RMSProp by tracking both past gradients and squared gradients with bias correction.

21
Q

What is Nadam?

A

Adam optimizer with the Nesterov momentum technique.

22
Q

Why are second-order optimizers rarely used in deep learning?

A

They require Hessians, which are computationally expensive for large networks.

23
Q

What is a learning rate schedule?

A

A strategy to change the learning rate during training to balance speed and convergence.

24
Q

Name some learning rate scheduling strategies.

A

Power scheduling, exponential scheduling, piecewise constant scheduling, and performance scheduling.

25
What is regularization?
Techniques to prevent overfitting in deep neural networks, such as L1/L2 regularization and dropout.
26
What is dropout regularization?
Randomly dropping units during training to prevent co-dependence among neurons and improve generalization.
27
What is Monte Carlo dropout?
Using dropout at prediction time to average multiple predictions, improving uncertainty estimation.
28
What is max-norm regularization?
A constraint that limits the norm of the weight vector for each neuron to prevent large weights.
29
What practical steps help train deep neural networks effectively?
Use proper initialization, batch normalization, dropout, fast optimizers, and transfer learning when possible.