Dual SVMs Flashcards

Question 1

Q

What is the Lagrange relaxation for SVMs?

Answer

A

min w,b {1/2 ∥w∥ ^ 2 + Σ a(n)(1 − y(n)(wTϕ(x(n)) + b))}

Subject to: a(n) ≥ 0, ∀(x(n), y(n)) ∈𝒯

Question 2

Q

What is the primal formulation for SVMs?

Answer

A

min w,b {1/2 ∥w∥ ^2}

Subject to: y(n)(wTϕ(x(n)) + b) ≥ 1

∀(x(n), y(n)) ∈ 𝒯

Question 3

Q

What is the dual formulation for SVMs?

Answer

A

max a min w,b {1/2 ∥w∥ ^ 2 + Σ a(n)(1 − y(n)(wTϕ(x(n)) + b))}

Subject to: a(n) ≥ 0, ∀(x(n), y(n)) ∈𝒯

Question 4

Q

What is the minimax primal formulation for SVMs?

Answer

A

min w,b max a {1/2 ∥w∥ ^ 2 + Σ a(n)(1 − y(n)(wTϕ(x(n)) + b))}

Subject to: a(n) ≥ 0, ∀(x(n), y(n)) ∈𝒯

Question 5

Q

How do we get from a primal formulation to a dual formulation?

Answer

A

Rewrite the constrains to satisfy the KKT conditions ( condition <= 0)
Apply the Lagrange multiplier for each constraint
Add max min

Question 6

Q

How do we remove w and b from the dual formulation? What are the equations?

Answer

A

Once a is fixed, there are no constrains and at the optimum ∇L(w) = 0.

=> w - Σa(n)y(n)ϕ(x(n)) = 0
=> w = Σa(n)y(n)ϕ(x(n))

Since the minimisation problem is also w.r.t b, we get:
=> Σ a(n)y(n) = 0

Question 7

Q

What is the dual formulation for SVMs after removing w and b?

Answer

A

argmax a L(a) = Σ a(n) - 1/2 Σ Σ a(n)a(m) y(n)y(m) k(X(n), X(m))

Subject to: Σ a(n) ≥ 0, ∀n ∈ {1,⋯,N} and a(n)y(n) = 0

Question 8

Q

What is the formula for the kernel trick?

Answer

A

k(x(n), x(m)) = ϕ(x(n))Tϕ(x(m)) = (1+xTz)^p

Where p = polynomial order

Question 9

Q

Why do we use the kernel trick?

Answer

A

Because the inner dot product of the original space and the transformed space is the same, so we don’t have to calculate the basis functions.

Question 10

Q

Whats the hypothesis for dual SVMs? (Hint: uses the kernel trick)

Answer

A

h(x) = Σ a(n)y(n)k(X,X(n)) + b

(for n ∈ S for support vectors only)

Question 11

Q

What is the value of a(n) when the training example is a support vector?

Answer

A

a(n) is either equal to 0 or > 0. When a(n) > 0, the training example is a support vector.

Since KKT slackness states a(n)(1-y(n)(wTϕ(X(n))+b)) = 0 and a > 0, the second half of the equation must be equal to 0. This can be rearranged to y(n)h(x(n)) = 1, so the value is on the margin.

Question 12

Q

Why can we only store support vectors when making predictions?

Answer

A

Because values that aren’t support vectors have a Lagrange multiplier of 0, and so don’t change the output of the equation.

Question 13

Q

How can we calculate the value of b and what is the final formula?

Answer

A

For a given support vector, y(n)h(x(n)) = 1. We can then take the steps:
- multiply by y(n)
- substitute y(n)^2 = 1 to get h(x(n)) = y(n)
- substitute h(x(n)) = y(n) into h(x) = Σ a(m)y(m)k(X(n),X(m)) + b
- rearrange for b

=> b = y(n) - Σ a(m)y(m)k(X(n),X(m))
where m ∈ S

Question 14

Q

What is the equation for the average b over all support vectors?

Answer

A

b = (1 / Ns) Σ (y(n) - Σ a(m)y(m)k(X(n),X(m)))

where n, m ∈ S

Question 15

Q

What is the benefit of getting an average b?

Answer

A

We get a more numerically stable solution

Dual SVMs Flashcards

(15 cards)