Class Eight Flashcards

1
Q

What is Gradient Descent?

A

Gradient Descent is an optimization algorithm used to minimize the cost or loss function in machine learning. It iteratively adjusts model parameters by following the negative gradient of the cost function.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the advantages of Gradient Descent?

A

Advantages of Gradient Descent include simplicity, applicability to a wide range of optimization problems, and the ability to find a global minimum for convex functions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the limitations of Gradient Descent?

A

Limitations of Gradient Descent include sensitivity to the learning rate and potential convergence to a local minimum for non-convex functions. Also it needs hyperparameters (regularization parameter, no of iterations) and it is sensitive to feature scaling.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is Stochastic Gradient Descent (SGD)?

A

Stochastic Gradient Descent is a variant of Gradient Descent that updates model parameters using the gradient computed on a single training instance at each iteration. It is more computationally efficient but can have higher variance compared to regular Gradient Descent.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is Batch Gradient Descent?

A

Batch Gradient Descent computes the gradient of the cost function over the entire training dataset at each iteration. It provides a more accurate estimate of the true gradient but can be computationally expensive for large datasets.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is Mini-Batch Gradient Descent?

A

Mini-Batch Gradient Descent computes the gradient on a small randomly selected subset of the training dataset (mini-batch) at each iteration. It combines the advantages of both Stochastic and Batch Gradient Descent by providing a trade-off between computational efficiency and accuracy.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the difference between gradient descent and stochastic gradient descent?

A

GD computes the gradient of the cost function using the entire dataset, leading to accurate but computationally expensive updates. On the other hand, SGD randomly samples training examples or mini-batches, resulting in faster updates with some noise but greater computational efficiency.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is regularization in machine learning?

A

Regularization is a technique used to prevent overfitting in machine learning models by adding a penalty term to the loss function. It encourages models to have simpler and smoother parameter values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is L1 regularization?

A

L1 regularization, also known as Lasso regularization, adds the absolute values of the model parameters as a penalty term. It promotes sparsity in the model by driving some parameters to exactly zero.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is L2 regularization?

A

L2 regularization, also known as Ridge regularization, adds the squared values of the model parameters as a penalty term. It encourages small parameter values and helps reduce the impact of irrelevant features.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is Early Stopping regularization?

A

Early Stopping regularization is a technique where the training process is stopped early based on the performance on a validation set. It helps prevent overfitting by stopping the model when further training no longer improves generalization performance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is Dropout regularization?

A

Dropout regularization randomly sets a fraction of the output units in a layer to zero during training, effectively “dropping out” those units. It reduces co-dependencies between neurons and helps prevent overfitting.

Make sure to evaluate training loss after training.
* If overfits training set: increase dropout rate.
* If underfits training set: decreasing dropout rate.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is Batch Normalization?

A

Batch Normalization is a technique used to normalize the input to a layer by subtracting the batch mean and dividing by the batch standard deviation. It helps stabilize and speed up training by reducing internal covariate shift.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the Gradient Problem in neural networks?

A

The Gradient Problem refers to the issue of vanishing or exploding gradients during the training of deep neural networks. It can lead to slow convergence or numerical instability in the learning process.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How can the Gradient Problem be mitigated?

A

The Gradient Problem can be mitigated by using activation functions that alleviate the saturation problem (e.g., ReLU), careful initialization of model weights, gradient clipping to limit the magnitude of gradients, and using techniques like residual connections or skip connections.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Why is the Gradient Problem important in deep learning?

A

The Gradient Problem is important in deep learning because it affects the convergence and stability of training deep neural networks. Addressing the Gradient Problem helps improve the training process and the performance of deep learning models.