Optimisation Flashcards

Question 1

Q

What is the gradient of the error?

Question 2

Q

What is the error surface in weight space?

Answer

A

A function of the error of a fixed training set given each setting of weights

Question 3

Q

Where does the gradient of the error point?

Answer

A

The steepest error descent in weight space

Question 4

Q

Whats the 2 common inputs to a numerical optimization algorithm?

Answer

A

A procedure that computes E(w)
A procedure that computes the partial derivative for each weight

Question 5

Q

How does gradient desent work?

Answer

A

Calculate the gradient of the error
Move in that direction by a fixed step (eta)

Question 6

Q

What is step size in gradient descent aka?

Answer

A

Learning rate

Question 7

Q

What happens if the step size in gradient desent is too small?

Answer

A

To slow to optimize

Question 8

Q

What happens if the step size in gradient desent is to large?

Answer

A

Instability (jumps over the minimum region)

Question 9

Q

What happens when the next step increases the error rate in “bold driver” gradient descent?

Answer

A

Dont step, agressively reduce learning rate (0.5 eta)

Question 10

Q

What happens when the next step decreases the error rate in “bold driver” gradient descent?

Answer

A

Cautiously increase learning rate (1.01 eta)

Question 11

Q

What is batch learning?

Answer

A

Uses all the instances in the training set, updating the weights using

Question 12

Q

What is online learning?

Answer

A

Adapt weights after each instance using

Question 13

Q

Which (batch/online) has the more powerfull optimization methods?

Question 14

Q

Which (batch/online) is easier to analyze?

Question 15

Q

Which (batch/online) is more feasible for large datasets?

Question 16

Q

Which (batch/online) may have the ability to jump over local optima?

Answer

Study These Flashcards

A

online

Question 17

Q

What type (batch/online) is stochastic gradient ascent?

Answer

Study These Flashcards

A

online

Question 18

Q

How is a training instance picked in stocastic gradient acsent?

Answer

Study These Flashcards

A

Uniformly random number 1 … n

Question 19

Q

Whats the problem with using gradient descent on this error surface?

Answer

Study These Flashcards

A

Gradient descent very slow once in shallow valley

Question 20

Q

What is the definition of momentum?

Answer

Study These Flashcards

A

Question 21

Q

What is the problem with using gradient descent on this error surface?

Answer

Study These Flashcards

A

Wont find global minima, stuck in local minima

Question 22

Q

How can try to find a global minima not a local minima?

Answer

Study These Flashcards

A

Rerun the optimizer with random starting points
Momentum

Question 23

Q

Whats the problem with momentum?

Answer

Study These Flashcards

A

Another parameter to pick, even less heuristics to help

Optimisation Flashcards

(23 cards)