wk3 Flashcards

Question 1

Q

what is equation for gradient descent in linear regression

Answer

A

w^(t+1)= w^(t) - learning rate * d.C(w)

Question 2

Q

What is stochastic gradient descent

Answer

A

-initialise all weights
-for each epoch or time step:
- select a random sample ,i, from the set of points with equal probability and compute the gradient of that point with respect to weights and update the weight

Question 3

Q

What are advantages and disadvantages of SGD vs G

Answer

A

-faster execution O(1) rather than O(n)
-it is not guaranteed to converge or come up with optimal solution
- on average it is correct with enough iterations

Question 4

Q

What is commonly the best learning rate

Answer

A

learning rate = C / root(t) | Learning rate decreases as the algorithm begins to converge

Question 5

Q

What is minibatch SGD

Answer

A

rather than selecting individual points during gradient descent we randomly select a batch of points

Question 6

Q

what are common choices of batches

Answer

A

32, 64, 128

Question 7

Q

What are the two approaches to sampling

Answer

A

sampling with replacement and sampling without replacement

Question 8

Q

Describe intuition behind margin-based loss

Answer

A

if yW^TX is positive then correct. If incorrect then negative:
-W^T X is the prediction, we take the sign of the function to indicate the prediction (negative or positive). If prediction is same sign as answer then it is correct otherwise not.
-If we multiply opposite signs we get a negative answer, if we multiply by same signs we get positive answer

Question 9

Q

What is loss function for margin based loss

Answer

A

L(y predicted, y actual) = g(y predicted, y actual) where g is a decreasing function

Question 10

Q

Why do we want g to be a decreasing function

Answer

A

-we want to minimise g. Since g is decreasing, minimising loss function L is equivalent to maximising the margin
-maximising margin means getting better model performance

wk3 Flashcards

(10 cards)