Sequential Minimisation Optimisation (SMO) Flashcards

(11 cards)

1
Q

What is a reason why we might not use gradient descent or Newton-Raphson methods?

A

Because they might break constraints and can be too costly.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is sequential minimisation optimisation (SMO)?

A

Sequential Minimisation Optimisation breaks the problem into smaller sub-problems that can be solved analytically.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What do we use to decide what problem to solve first for SMOs?

A

We use heuristics to decide which problem to solve first

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How do we update values without breaking constraints?

A

Updating one value will break the constraint Σ a(n)y(n) = 0, so we need to update two values to act as counter weights

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How do we run the SMO?

A

Initialise a to any value that satisfies the constraints or such that a(n) = 0, ∀n ∈ {1,..,N} and for a maximum number of iterations optimise L(a) w.r.t a(i) and a(j)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the lower and upper bounds for SMO?

A

If y(i) = y(j):

L = max(0, a(i) + a(j) - C
H = min(C, a(j) + a(i)

If y(i) != y(j):

L = max(0, a(i) - a(j))
H = min(C, C+a(j)-a(i))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the formula for a(j,new)

A

a(j,new) = a(j) + ((y(i)(E(i)-E(j))) / (k(x(i),x(i) + k(x(j),x(j) + 2k(x(i),x(j))

where E(i) = h(x(i)) - y(i) is the error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How do we calculate a(j,new,clipped)

A

a(j,new,clipped) = {
- H if a(j,new) >= H
- L uf a(j,new) <= L
- a(j,new) otherwise

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the formula for a(i,new)

A

(ζ - a(j,new,clipped)y(i)) / (y(i))

where ζ= a(i)y(i) + a(j)y(j)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How do we pick the Lagrange multiplier a(i) to update in SMO?

A

a(i) can be picked by randomly selecting corresponding training examples that violate KKT conditions with a certain margin of error e around y(n)(h(x(n)) or that violate KKT conditions given the margin e and 0 < a(i) < c

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How do we pick the Lagrange multiplier a(j) to update is SMO?

A

a(j) is picked that will obtain the largest change in a(j) based on |E(i) - E(j)|. If this doesn’t lead to a positive improvement in the objective, we pick a new a(j) or eventually a new a(i)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly