Sequential Minimisation Optimisation (SMO) Flashcards
(11 cards)
What is a reason why we might not use gradient descent or Newton-Raphson methods?
Because they might break constraints and can be too costly.
What is sequential minimisation optimisation (SMO)?
Sequential Minimisation Optimisation breaks the problem into smaller sub-problems that can be solved analytically.
What do we use to decide what problem to solve first for SMOs?
We use heuristics to decide which problem to solve first
How do we update values without breaking constraints?
Updating one value will break the constraint Σ a(n)y(n) = 0, so we need to update two values to act as counter weights
How do we run the SMO?
Initialise a to any value that satisfies the constraints or such that a(n) = 0, ∀n ∈ {1,..,N} and for a maximum number of iterations optimise L(a) w.r.t a(i) and a(j)
What are the lower and upper bounds for SMO?
If y(i) = y(j):
L = max(0, a(i) + a(j) - C
H = min(C, a(j) + a(i)
If y(i) != y(j):
L = max(0, a(i) - a(j))
H = min(C, C+a(j)-a(i))
What is the formula for a(j,new)
a(j,new) = a(j) + ((y(i)(E(i)-E(j))) / (k(x(i),x(i) + k(x(j),x(j) + 2k(x(i),x(j))
where E(i) = h(x(i)) - y(i) is the error
How do we calculate a(j,new,clipped)
a(j,new,clipped) = {
- H if a(j,new) >= H
- L uf a(j,new) <= L
- a(j,new) otherwise
What is the formula for a(i,new)
(ζ - a(j,new,clipped)y(i)) / (y(i))
where ζ= a(i)y(i) + a(j)y(j)
How do we pick the Lagrange multiplier a(i) to update in SMO?
a(i) can be picked by randomly selecting corresponding training examples that violate KKT conditions with a certain margin of error e around y(n)(h(x(n)) or that violate KKT conditions given the margin e and 0 < a(i) < c
How do we pick the Lagrange multiplier a(j) to update is SMO?
a(j) is picked that will obtain the largest change in a(j) based on |E(i) - E(j)|. If this doesn’t lead to a positive improvement in the objective, we pick a new a(j) or eventually a new a(i)