Dual SVMs Flashcards

(15 cards)

1
Q

What is the Lagrange relaxation for SVMs?

A

min w,b {1/2 ∥w∥ ^ 2 + Σ a(n)(1 − y(n)(wTϕ(x(n)) + b))}

Subject to: a(n) ≥ 0, ∀(x(n), y(n)) ∈𝒯

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the primal formulation for SVMs?

A

min w,b {1/2 ∥w∥ ^2}

Subject to: y(n)(wTϕ(x(n)) + b) ≥ 1

∀(x(n), y(n)) ∈ 𝒯

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the dual formulation for SVMs?

A

max a min w,b {1/2 ∥w∥ ^ 2 + Σ a(n)(1 − y(n)(wTϕ(x(n)) + b))}

Subject to: a(n) ≥ 0, ∀(x(n), y(n)) ∈𝒯

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the minimax primal formulation for SVMs?

A

min w,b max a {1/2 ∥w∥ ^ 2 + Σ a(n)(1 − y(n)(wTϕ(x(n)) + b))}

Subject to: a(n) ≥ 0, ∀(x(n), y(n)) ∈𝒯

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How do we get from a primal formulation to a dual formulation?

A
  • Rewrite the constrains to satisfy the KKT conditions ( condition <= 0)
  • Apply the Lagrange multiplier for each constraint
  • Add max min
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How do we remove w and b from the dual formulation? What are the equations?

A

Once a is fixed, there are no constrains and at the optimum ∇L(w) = 0.

=> w - Σa(n)y(n)ϕ(x(n)) = 0
=> w = Σa(n)y(n)ϕ(x(n))

Since the minimisation problem is also w.r.t b, we get:
=> Σ a(n)y(n) = 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the dual formulation for SVMs after removing w and b?

A

argmax a L(a) = Σ a(n) - 1/2 Σ Σ a(n)a(m) y(n)y(m) k(X(n), X(m))

Subject to: Σ a(n) ≥ 0, ∀n ∈ {1,⋯,N} and a(n)y(n) = 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the formula for the kernel trick?

A

k(x(n), x(m)) = ϕ(x(n))Tϕ(x(m)) = (1+xTz)^p

Where p = polynomial order

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Why do we use the kernel trick?

A

Because the inner dot product of the original space and the transformed space is the same, so we don’t have to calculate the basis functions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Whats the hypothesis for dual SVMs? (Hint: uses the kernel trick)

A

h(x) = Σ a(n)y(n)k(X,X(n)) + b

(for n ∈ S for support vectors only)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the value of a(n) when the training example is a support vector?

A

a(n) is either equal to 0 or > 0. When a(n) > 0, the training example is a support vector.

Since KKT slackness states a(n)(1-y(n)(wTϕ(X(n))+b)) = 0 and a > 0, the second half of the equation must be equal to 0. This can be rearranged to y(n)h(x(n)) = 1, so the value is on the margin.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Why can we only store support vectors when making predictions?

A

Because values that aren’t support vectors have a Lagrange multiplier of 0, and so don’t change the output of the equation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How can we calculate the value of b and what is the final formula?

A

For a given support vector, y(n)h(x(n)) = 1. We can then take the steps:
- multiply by y(n)
- substitute y(n)^2 = 1 to get h(x(n)) = y(n)
- substitute h(x(n)) = y(n) into h(x) = Σ a(m)y(m)k(X(n),X(m)) + b
- rearrange for b

=> b = y(n) - Σ a(m)y(m)k(X(n),X(m))
where m ∈ S

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the equation for the average b over all support vectors?

A

b = (1 / Ns) Σ (y(n) - Σ a(m)y(m)k(X(n),X(m)))

where n, m ∈ S

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the benefit of getting an average b?

A

We get a more numerically stable solution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly