GD, SGD, and SVM Flashcards

1
Q

Gradient Descent

A

General approach for minimizing a differentiable convex function f(w).

Measures the local gradient of the error function with regards to the parameter vector theta, and it goes in the direction of descending gradient.

GD algorithm

w(0) <– 0
for t <– 0 to T-1 do
- w(t+1) <– w(t) - eta grad(f(w(t)))
return barw = 1/T SUM w(t)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Stochastic Gradient Descent

A

instead of using exactly the gradient we take a (random) vector with expected value equal to the gradinet direction.

SGD algorihm

w(0) <– 0
for t <– 0 to T-1 do
- choose vt at random from distribution such that E[vt|w(t)] in grad(f(w(t)))
/// vt has expected value equal to the gradient of f(w(t))
- w(t+1) <– w(t) - etavt
return barw = 1/T SUM w(t)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Hard-SVM optimization problem

A

Learning rule in which we return an ERM hyperplane that separate the training set with the largest possible margin. (only for linearly separable data.

Computational problem:

argmax ( min |<w,xi> + b |)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Hard-SVM: equivalent formulation, and quadratic formulation

A

Due to separability assumption :

argmax ( min yi ( <w,xi> + b ) )

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Definition of support vectors for Hard-SVM

A

Support vectors = vectors at minimimum distance from w0
the are the only ones that matter for defining w0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Soft-SVM optimization problem, hinge loss

A

Used to train not linearly separable. It allows to have some points inside the margin or wrongly classified.

yi(<w,xi> +b) >=1

Soft-SVM constraints:
- slack variables: eta
- for each i : yi(<w,xi> +b) >=1 - eta
- eta how much constraint yi(<w,xi> +b) >=1 is violated

Soft-SVM minimizes combinations of
- norm of w (margin)
- average of etai (violation of the constraints)

Soft-SVM optimization problem

input (x1,y1)…(xm,ym), parameter lambda > 0
solve:

min(lambda||w||^2 +1/m SUM etai)

subject to for all i yi(<w,xi> +b) >=1 - eta and etai >= 0
output: w,b

Hinge Loss: it is the equivalent formulation

Ls(hinge) = 1/m SUM lhinge ((w,b),(xi,yi))

Hinge formulation
min(lambda ||w||^2 + Lshinge(w,b))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

SGD for solving soft-SVM

A

We want to solve

min(lambda/2||w||^2 + 1/m SUM max {0,1 - y<w,xi>})

Note: it’s the standard to add a 1/2 in the regularization term to semplify some computation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Hard-SVM dual formulation

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly