SVM Flashcards
(30 cards)
What is the goal of a Maximal Margin Classifier (MMC)?
To find the widest possible margin that separates two linearly separable classes.
What assumption does the MMC make about the data?
That the data is perfectly linearly separable.
What is the equation of a hyperplane used in MMC and SVM?
w · x - b = 0
How is a class label predicted in SVM?
By taking the sign of (w · x - b)
What is the constraint for a positive class point in MMC?
w · x - b ≥ 1
What is the constraint for a negative class point in MMC?
w · x - b ≤ -1
What is the optimization goal of MMC?
Minimize ||w||² subject to margin constraints.
What is a support vector?
A training point that lies on or inside the margin and defines the decision boundary.
Why are support vectors important?
They are the only points that influence the position of the hyperplane.
What is the limitation of MMC?
It cannot handle overlapping classes or outliers.
What does a Soft-Margin Classifier allow?
Margin violations and some misclassifications.
What loss function is used in soft-margin SVMs?
Hinge loss.
When is hinge loss zero?
When a point is correctly classified and outside the margin.
What is the effect of hinge loss on points inside the margin?
It increases the cost linearly with the margin violation.
What does the regularization term in soft-margin SVM control?
The trade-off between margin width and misclassification penalty.
What does a high regularization parameter imply in SVM?
It allows more margin violations to reduce model complexity (more bias).
What does a low regularization parameter imply in SVM?
It enforces fewer violations and a larger margin (more variance).
What is the general name for the model that includes kernels?
Support Vector Machine (SVM).
Why can’t linear SVM handle non-linearly separable data?
Because it can only draw straight-line decision boundaries.
What is the kernel trick?
A method to compute dot products in higher-dimensional space without explicitly transforming the data.
What does the kernel trick allow SVMs to do?
Handle non-linear decision boundaries efficiently.
What is an example of a polynomial kernel?
K(x, y) = (x · y + 1)^d
What is the effect of increasing the degree in a polynomial kernel?
It allows more complex, curved decision boundaries.
What is the RBF (Gaussian) kernel good for?
Highly flexible, non-linear decision boundaries in complex datasets.