Optimization Algorithms Flashcards

Question 1

Q

Mention ways to make the decrease of J faster

Answer

A

Normalize the input data
Use gradient descent with momentum
RMSprop
Initialize the weights randomly and with different ways of initialization to make weights not be too large
Mini batch gradient descent

Adam optimization algorithm is both gradient descent with momentum and RMSprop in two moments

Question 2

Q

In very high dimensional spaces it is most likely that the gradient descent process gives us a local minimum than a saddle point of the cost function. True/False?

Question 3

Q

What are the steps required to create the mini batches

Answer

A

First shuffling the dataset and then partitioning making sure the last mini batch has size starting at the last one until m which is the total of examples

Question 4

Q

What is the usual size of batches

Answer

A

Numbers equal to powers of 2, like 64 or similar

Question 5

Q

What is momentum doing in gradient descent

Answer

A

It is the exponentially weighted average of the gradient on previous steps so there is less oscillation

Question 6

Q

What are the steps required to use momentum in gradient descent

Answer

A

First, initialize v to zeros, this is for each dW and db

Question 7

Q

What are the usual recommended values of the hyperparameters alpha, beta 1, beta 2 and epsilon

Answer

A

Alpha is something that needs to be tuned
Beta 1 is the momentum and usually is 0.9
Beta 2 is the RMSprop and 0.999
Epsilon usually a low number 10 to the power of -8

Question 8

Q

What are the steps required to update the parameters with momentum

Answer

A

After initializing the velocities to zero,
1. Compute the new velocity of that parameter using the beta
2. Update the parameters with that velocity

Question 9

Q

How do you implement Adam optimization

Answer

A

First initialize v s to zeros
Then compute the velocity, then the corrected velocity
Then the s , then the s corrected
Finally update the parameters using v and s corrected with epsilon

Question 10

Q

You need to make the model run faster and converge faster, what are different options to use

Answer

A

Mini batch gradient descent
Momentum in gradient descent
Adam (momentum + RMSprop)

Question 11

Q

What is learning decay

Answer

A

It is making learning decrease so basically we move smaller steps forward as we do more iterations and we get closer to convergence

Question 12

Q

What is a problem that can occur if we add learning decay

Answer

A

We might make the learning rate go to zero because with every iteration it decreases so it can quickly become zero and stop the learning

Question 13

Q

What is a fix for learning decay not becoming zero quickly

Answer

A

Adding fixed interval scheduling. This is done in the same formula by dividing the epochNum by the time interval

Optimization Algorithms Flashcards

(13 cards)