wk3 Flashcards

(10 cards)

1
Q

what is equation for gradient descent in linear regression

A

w^(t+1)= w^(t) - learning rate * d.C(w)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is stochastic gradient descent

A

-initialise all weights
-for each epoch or time step:
- select a random sample ,i, from the set of points with equal probability and compute the gradient of that point with respect to weights and update the weight

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are advantages and disadvantages of SGD vs G

A

-faster execution O(1) rather than O(n)
-it is not guaranteed to converge or come up with optimal solution
- on average it is correct with enough iterations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is commonly the best learning rate

A

learning rate = C / root(t) | Learning rate decreases as the algorithm begins to converge

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is minibatch SGD

A

rather than selecting individual points during gradient descent we randomly select a batch of points

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what are common choices of batches

A

32, 64, 128

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the two approaches to sampling

A

sampling with replacement and sampling without replacement

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Describe intuition behind margin-based loss

A

if yW^TX is positive then correct. If incorrect then negative:
-W^T X is the prediction, we take the sign of the function to indicate the prediction (negative or positive). If prediction is same sign as answer then it is correct otherwise not.
-If we multiply opposite signs we get a negative answer, if we multiply by same signs we get positive answer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is loss function for margin based loss

A

L(y predicted, y actual) = g(y predicted, y actual) where g is a decreasing function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Why do we want g to be a decreasing function

A

-we want to minimise g. Since g is decreasing, minimising loss function L is equivalent to maximising the margin
-maximising margin means getting better model performance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly