Topic 8: Intro to Gradient Descent Flashcards

1
Q

What can we say about ε in GD

A

“target accuracy”
Gradient Descent runtime scales logarithmically with target accuracy (ε)
O (log (1/ε)) number of steps

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Why is x0 ≠ 0 in GD

A

As gradient descent works logarithmically:
xt+1 =xt − ηtF ′(xt)

It would mean all xt onwards are multiplied by x0 so would all be 0 and the algorithm would never move
Furthermore if the algorithm starts at F’(x0) = 0
(ie x0 = critical point) then it would never move

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Where do we want xt to tend

A

xt -> 0 as t -> inf

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is an objective function

A

The loss function
The function you want to maximise or minimise

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Describe the GD algorithm steps on a differentiable function F

A

Initial w1 and T>0 (number of steps)
set ηt > 0
for t=1 to T
wt+1 = wt − ηt ⋅ ∇F(wt)
output wt

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what is a differentiable function

A

If it has a well-defined derivative at every point in its domain

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What properties does a function need to have for gd

A

Differentiable
convexity and lipschitzness are ideal
existence of at least one minima

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

why does gradient descent converge faster on strongly convex functions compared to non-convex ones

A

strong convexity:
provides unique global minimum
provides a convergence rate guarantee for gradient descent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what is k in gd

A

k is a constant parameter chosen from the interval (0,1)
It’s a tuning parameter that allows you to control the size of the step length
is not always present in all gd

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

will gd find local minima for non convex non lipshcitz functions

A

yes
as is often the case irl, functions are not convex and lipschitz
minima is still found through careful consideration of step size and algorithm parameters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly