Optimisation Flashcards

1
Q

What is the gradient of the error?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the error surface in weight space?

A

A function of the error of a fixed training set given each setting of weights

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Where does the gradient of the error point?

A

The steepest error descent in weight space

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Whats the 2 common inputs to a numerical optimization algorithm?

A
  • A procedure that computes E(w)
  • A procedure that computes the partial derivative for each weight
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How does gradient desent work?

A
  1. Calculate the gradient of the error
  2. Move in that direction by a fixed step (eta)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is step size in gradient descent aka?

A

Learning rate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What happens if the step size in gradient desent is too small?

A

To slow to optimize

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What happens if the step size in gradient desent is to large?

A

Instability (jumps over the minimum region)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What happens when the next step increases the error rate in “bold driver” gradient descent?

A

Dont step, agressively reduce learning rate (0.5 eta)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What happens when the next step decreases the error rate in “bold driver” gradient descent?

A

Cautiously increase learning rate (1.01 eta)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is batch learning?

A

Uses all the instances in the training set, updating the weights using

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is online learning?

A

Adapt weights after each instance using

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Which (batch/online) has the more powerfull optimization methods?

A

batch

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Which (batch/online) is easier to analyze?

A

batch

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Which (batch/online) is more feasible for large datasets?

A

online

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Which (batch/online) may have the ability to jump over local optima?

A

online

17
Q

What type (batch/online) is stochastic gradient ascent?

A

online

18
Q

How is a training instance picked in stocastic gradient acsent?

A

Uniformly random number 1 … n

19
Q

Whats the problem with using gradient descent on this error surface?

A

Gradient descent very slow once in shallow valley

20
Q

What is the definition of momentum?

A
21
Q

What is the problem with using gradient descent on this error surface?

A

Wont find global minima, stuck in local minima

22
Q

How can try to find a global minima not a local minima?

A
  • Rerun the optimizer with random starting points
  • Momentum
23
Q

Whats the problem with momentum?

A

Another parameter to pick, even less heuristics to help