Optimizers Flashcards

Question 1

Q

What is optimization?

Answer

A

determines the algorithms we will use to vary our model’s parameters

Question 2

Q

What is a type of optimization algorithm?

Answer

A

Gradient decent - slow
SGD - Stochastic Gradient Descent

Question 3

Q

What is the difference between the local minimum and global minimum?

Question 4

Q

How do we extend the GD and SGD?

Answer

A

using momentum

Question 5

Q

What is momentum?

Question 6

Q

Is the Alpha of momentum a hyper parameter?

Question 7

Q

What is the learning rate?

Answer

A

eta - (greek)

Small enough - so we gently descend through the loss function, instead of oscillating wildly around the minimum and never reaching it or diverging to infinity

Big enough - so we reach the optimization in a rational amount of time

Question 8

Q

What are learning rate schedules?

Answer

A

Best of both worlds. Small enough and Big enough

it causes the loss function to converge much faster

Question 9

Q

What are two examples of Adaptive Learning Rate Schedules?

Answer

A

AdaGrad and RMSProp

AdaGrad = Adaptive Gradient Algorithm

RMSProp = Root Mean Square Propagation

Adam = Adaptive moment estimation

Brainscape's Knowledge GenomeTM

Optimizers Flashcards

Brainscape's Knowledge Genome^TM