SGD Flashcards

1
Q

What optimizer for non linear function of θ?

A

Gradient based optimizer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How to compute the gradient of a NN?

A

compute the partial derivative of the loss with respect to all parameters θk (i.e., the weights and the biases of all layers):

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does Splitting the training set into B minibatches do?

A
  • reduces the computation cost of one gradient by a factor of B
  • increases the standard deviation on the gradient estimate by a factor of √B only.

More iterations, but fewer epochs (hence smaller total
computation cost).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly