Chapter 5: Support Vector Machines Flashcards

1
Q

What is the fundamental idea behind Support Vector Machines?

A

The fundamental idea behind Support Vector Machines is to fit the widest possible “street” between the classes. In other words, the goal is to have the largest possible margin between the decision boundary that separates the two classes and the training instances. When performing soft margin classification, the SVM searches for a compromise between perfectly separating the two classes and having the widest possible street (i.e., a few instances may end up on the street). Another key idea is to use kernels when training on nonlinear datasets.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a support vector?

A

After training an SVM, a support vector is any instance located on the “street”, including its border. The decision boundary is entirely determined by the support vectors. Any instance that is not a support vector (i.e., off the street) has no influence whatsoever; you could remove them, add more instances, or move them around, and as long as they stay off the street they won’t affect the decision boundary. Computing the predictions only involves the support vectors, not the whole training set.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Why is it important to scale the inputs when using SVMs?

A

SVMs try to fit the largest possible “street” between the classes, so if the training set is not scaled, the SVM will tend to neglect small features.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Can an SVM classifier output a confidence score when it classifies an instance? What about a probability?

A

An SVM classifier can output the distance between the test instance and the decision boundary, and you can use this as a confidence score. However, this score cannot be directly converted into an estimation of the class probability. If you set probability=True when creating an SVM in Scikit-Learn, then after training it will calibrate the probabilities using Logistic Regression on the SVM’s scores (trained by an additional five-fold cross-validation on the training data). This will add the predict_proba() and predict_log_proba() methods to the SVM.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Should you use the primal or the dual form of the SVM problem to train a model on a training set with millions of instances and hundreds of features?

A

This question applies only to linear SVMs since kernelized can only use the dual form. The computational complexity of the primal form of the SVM problem is proportional to the number of training instances m, while the computational complexity of the dual form is proportional to a number between . So if there are millions of instances, you should definitely use the primal form, because the dual form will be much too slow.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Say you trained an SVM classifier with an RBF kernel. It seems to underfit the training set: should you increase or decrease (gamma)? What about C?

A

If an SVM classifier trained with an RBF kernel underfits the training set, there might be too much regularization. To decrease it, you need to increase gamma or C (or both).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly