Soft Margin SVMs Flashcards

(16 cards)

1
Q

What causes overfitting?

A

Overfitting happens when we fit noise into our training data, which worsens the generalisation. Can also be caused by high dimensional embeddings.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Why would we misclassify data?

A

To reduce overfitting instead of maximising the margin

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are slack variables ξ ?

A

Slack variables tell us how much an example can be within the margin or on the wrong side of the decision boundary

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does the value of the slack variable ξ tell us?

A
  • 0 = on or past the margin
  • (0,1) = within the margin
  • 1 = on the decision boundary
  • > 1 is misclassified
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the constraint when maximising the margin using slack variables?

A

y(n)h(x(n)) >= 1 - ξ(n)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does the hyperparameter c represent in the new margin optimisation algorithm?

A

C is the trade off between the amount of slack and the margin

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the new margin optimisation algorithm with slack introduced? What are the constraints?

A

argmin{1/2 ∥w∥^2 + C Σξ(n)}

y(n)h(x(n)) >= 1 - ξ(n), ξ(n) >= 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How does changing the value of c change the optimisation problem?

A

A smaller c allowed for misclassification, and better generalisation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a soft margin?

A

When we allow slack > 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the constraint after using Lagrange relaxation for soft SVMs?

A

1 - ξ (n) - y(n)h(x(n)) <= 0
(where h(x(n)) = wTϕ(x(n)) + b)
-ξ <= 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the dual formulation for soft SVMs?

A

max a,β min w,b,ξ {1/2 ∥w∥^2 +C Σ ξ(n)+ Σa(n)(1 − ξ(n) − y(n)(wTϕ(x(n)) + b)) − Σ β(n)ξ(n)}

a(n) >= 0, β(n) >= 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the dual formulation for soft SVMs after replacing w and b? (i.e at the optimum when a and β are fixed)

A

argmax L(a) = Σ a(n) - 1/2 Σ Σ a(n)a(m) y(n)y(m) k(x(n)x(m)

where 0<= a(n) <= C, Σa(n)y(n) = 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How do we replace w and b in soft margin SVMs? How do we remove β?

A

Replacing w and b is the same as hard SVMs:

Σ a(n)y(n) = 0 and w = Σ a(n)y(n)ϕ(x(n))

To remove β:

C − a(n) − β(n) = 0 so β(n) = C − a(n)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How do we know if a variable is a support variable when slack is added?

A

With slack variables, when a(n) > 0 and y(n)h(x(n)) = 1 - ξ(n) , the example is on the margin, in the margin or misclassified.

If ξ = 0, then it is on the margin

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What do we do when there are no support vectors on the margin for calculating b?

A

We can calculate b based on any support vectors, which now depend upon ξ(n)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What does a(n) say about the position of the training point for soft margin SVMs?

A

Using the KKT conditions, we get:
- 0 < a(n) < C iff y(n)(h(x(n)) = 1 and it is on the margin
- a(n) = c iff y(n)(h(x(n)) <= 1 and it is on or violating the margin
- a(n) = 0 iff y(n)(h(x(n)) >= 1 and its not a support vector