L7 Flashcards
(41 cards)
What are Parameters in the context of machine learning?
Values learned during training
What are Hyperparameters?
Set before training (like learning rate, number of neighbors, or regularization parameter C)
What is a Decision function?
Takes a dataset as input and gives a decision as output
What is the Loss function?
What you are trying to minimize for a single training example to achieve your objective (e.g. square loss)
What is a Cost function?
Average of your loss functions over the entire training set (e.g. mean square error)
What is a Training set used for?
Learn model parameters
What is a Validation set used for?
Tune hyperparameters
What is a Test set used for?
Evaluate final model performance
Why do we need SVMs?
To find the best line (or hyperplane) possible with the largest margin between classes
Logistic regression draws a line to separate classes (good for linear problems).
But we want a model that finds the best line possible — not just any line.
What does SVM stand for?
Support Vector Machine
- supervised learning algorithm used for both classification and regression.
What is the main goal of SVM in classification?
To separate classes with the widest possible gap or margin
What is the Margin in SVM?
The distance between the decision boundary and the closest data points (support vectors)
SVM wants to maximize this margin.
The decision boundary is a straight line (or hyperplane in higher dimensions).
Goal : learn a boundary that leads to the largest margin (buffer) from points on both sides
What are Support Vectors?
The data points closest to the boundary that define the position of the decision boundary
- They “support” or define the position of the decision boundary.
- Only support vectors matter during prediction
What is a Hard Margin SVM?
No errors allowed – aims to find a hyperplane that perfectly separates the classes without any misclassification
Fails when there is overlap or noise
Max margin classification – Focus on observations at the edges of the cluster + Use mid point between them as threshold (Maximal Margin)
What is a Soft Margin SVM?
Allows some misclassification or overlap and measures how much an instance is allowed to violate the margin
What does hyperparameter C control in Soft Margin SVM?
Trade-off between margin size and classification errors
2 contradicting objectives:
making the slack variables as small as possible to reduce the margin violations
Making wT · w as small as possible to increase the margin
Large margin (simpler model)
Fewer classification errors (fit data better)
What does a large C mean?
Large C: less tolerant to errors → narrow margin
Small C: more tolerant to errors → wider margin
What is the decision function for a new data point in SVM?
If result ≥ 1 → positive class
If ≤ -1 → negative class
If in between → uncertain zone (margin)
The smaller the weight vector 𝜃, the larger the margin
What type of problems do SVMs solve?
Convex quadratic optimization problems with linear constraints
What is the kernel trick in SVM?
Projects data into a higher-dimensional space where it becomes linearly separable
- When we transform back this line to original plane, it maps to ellipse boundary. These transformations are called kernels.
- As a function of the original features, the linear SVM model is not actually linear anymore.
What is a Linear kernel used for?
Fast, simple, when data is already separable
Key parameters: C (regularization)
What does the RBF kernel offer?
Very flexible, works on complex data
Key parameters: C, gamma (width of kernel)
What is the purpose of the Gamma parameter in RBF kernels?
Controls how far a single point’s influence reaches
Kernels work best for “small” n_samples → Long runtime for “large” datasets (100k samples)
Real power in infinite-dimensional spaces: rbf! → Rbf (Radial Basis Function / Gaussian kernel) is “universal kernel” - can learn (aka overfit) anything
What is One-vs-Rest (OvR) in multiclass classification?
One classifier per class vs all others
One classifier per class vs all others
1vs{2, 3, 4}, 2vs{1, 3, 4}, 3vs{1, 2, 4}, 4vs{1, 2, 3}
Class with highest score