Soft Margin SVMs Flashcards
(16 cards)
What causes overfitting?
Overfitting happens when we fit noise into our training data, which worsens the generalisation. Can also be caused by high dimensional embeddings.
Why would we misclassify data?
To reduce overfitting instead of maximising the margin
What are slack variables ξ ?
Slack variables tell us how much an example can be within the margin or on the wrong side of the decision boundary
What does the value of the slack variable ξ tell us?
- 0 = on or past the margin
- (0,1) = within the margin
- 1 = on the decision boundary
- > 1 is misclassified
What is the constraint when maximising the margin using slack variables?
y(n)h(x(n)) >= 1 - ξ(n)
What does the hyperparameter c represent in the new margin optimisation algorithm?
C is the trade off between the amount of slack and the margin
What is the new margin optimisation algorithm with slack introduced? What are the constraints?
argmin{1/2 ∥w∥^2 + C Σξ(n)}
y(n)h(x(n)) >= 1 - ξ(n), ξ(n) >= 0
How does changing the value of c change the optimisation problem?
A smaller c allowed for misclassification, and better generalisation
What is a soft margin?
When we allow slack > 0
What is the constraint after using Lagrange relaxation for soft SVMs?
1 - ξ (n) - y(n)h(x(n)) <= 0
(where h(x(n)) = wTϕ(x(n)) + b)
-ξ <= 0
What is the dual formulation for soft SVMs?
max a,β min w,b,ξ {1/2 ∥w∥^2 +C Σ ξ(n)+ Σa(n)(1 − ξ(n) − y(n)(wTϕ(x(n)) + b)) − Σ β(n)ξ(n)}
a(n) >= 0, β(n) >= 0
What is the dual formulation for soft SVMs after replacing w and b? (i.e at the optimum when a and β are fixed)
argmax L(a) = Σ a(n) - 1/2 Σ Σ a(n)a(m) y(n)y(m) k(x(n)x(m)
where 0<= a(n) <= C, Σa(n)y(n) = 0
How do we replace w and b in soft margin SVMs? How do we remove β?
Replacing w and b is the same as hard SVMs:
Σ a(n)y(n) = 0 and w = Σ a(n)y(n)ϕ(x(n))
To remove β:
C − a(n) − β(n) = 0 so β(n) = C − a(n)
How do we know if a variable is a support variable when slack is added?
With slack variables, when a(n) > 0 and y(n)h(x(n)) = 1 - ξ(n) , the example is on the margin, in the margin or misclassified.
If ξ = 0, then it is on the margin
What do we do when there are no support vectors on the margin for calculating b?
We can calculate b based on any support vectors, which now depend upon ξ(n)
What does a(n) say about the position of the training point for soft margin SVMs?
Using the KKT conditions, we get:
- 0 < a(n) < C iff y(n)(h(x(n)) = 1 and it is on the margin
- a(n) = c iff y(n)(h(x(n)) <= 1 and it is on or violating the margin
- a(n) = 0 iff y(n)(h(x(n)) >= 1 and its not a support vector