Optimizers Flashcards

1
Q

What is optimization?

A

determines the algorithms we will use to vary our model’s parameters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a type of optimization algorithm?

A
  1. Gradient decent - slow
  2. SGD - Stochastic Gradient Descent
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the difference between the local minimum and global minimum?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How do we extend the GD and SGD?

A

using momentum

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is momentum?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Is the Alpha of momentum a hyper parameter?

A

Yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the learning rate?

A

eta - (greek)

Small enough - so we gently descend through the loss function, instead of oscillating wildly around the minimum and never reaching it or diverging to infinity

Big enough - so we reach the optimization in a rational amount of time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are learning rate schedules?

A

Best of both worlds. Small enough and Big enough

it causes the loss function to converge much faster

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are two examples of Adaptive Learning Rate Schedules?

A

AdaGrad and RMSProp

AdaGrad = Adaptive Gradient Algorithm

RMSProp = Root Mean Square Propagation

Adam = Adaptive moment estimation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly