L7 Flashcards by jolyn Unknown

What are Parameters in the context of machine learning?

Values learned during training

How well did you know this?

Not at all

Perfectly

What are Hyperparameters?

Set before training (like learning rate, number of neighbors, or regularization parameter C)

How well did you know this?

Not at all

Perfectly

What is a Decision function?

Takes a dataset as input and gives a decision as output

How well did you know this?

Not at all

Perfectly

What is the Loss function?

What you are trying to minimize for a single training example to achieve your objective (e.g. square loss)

How well did you know this?

Not at all

Perfectly

What is a Cost function?

Average of your loss functions over the entire training set (e.g. mean square error)

How well did you know this?

Not at all

Perfectly

What is a Training set used for?

Learn model parameters

How well did you know this?

Not at all

Perfectly

What is a Validation set used for?

Tune hyperparameters

How well did you know this?

Not at all

Perfectly

What is a Test set used for?

Evaluate final model performance

How well did you know this?

Not at all

Perfectly

Why do we need SVMs?

To find the best line (or hyperplane) possible with the largest margin between classes

Logistic regression draws a line to separate classes (good for linear problems).
But we want a model that finds the best line possible — not just any line.

How well did you know this?

Not at all

Perfectly

What does SVM stand for?

Support Vector Machine

supervised learning algorithm used for both classification and regression.

How well did you know this?

Not at all

Perfectly

What is the main goal of SVM in classification?

To separate classes with the widest possible gap or margin

How well did you know this?

Not at all

Perfectly

What is the Margin in SVM?

The distance between the decision boundary and the closest data points (support vectors)

SVM wants to maximize this margin.
The decision boundary is a straight line (or hyperplane in higher dimensions).
Goal : learn a boundary that leads to the largest margin (buffer) from points on both sides

How well did you know this?

Not at all

Perfectly

What are Support Vectors?

The data points closest to the boundary that define the position of the decision boundary

They “support” or define the position of the decision boundary.
Only support vectors matter during prediction

How well did you know this?

Not at all

Perfectly

What is a Hard Margin SVM?

No errors allowed – aims to find a hyperplane that perfectly separates the classes without any misclassification

Fails when there is overlap or noise
Max margin classification – Focus on observations at the edges of the cluster + Use mid point between them as threshold (Maximal Margin)

How well did you know this?

Not at all

Perfectly

What is a Soft Margin SVM?

Allows some misclassification or overlap and measures how much an instance is allowed to violate the margin

How well did you know this?

Not at all

Perfectly

What does hyperparameter C control in Soft Margin SVM?

Trade-off between margin size and classification errors

2 contradicting objectives:
making the slack variables as small as possible to reduce the margin violations
Making wT · w as small as possible to increase the margin

Large margin (simpler model)
Fewer classification errors (fit data better)

How well did you know this?

Not at all

Perfectly

What does a large C mean?

Study These Flashcards

Large C: less tolerant to errors → narrow margin
Small C: more tolerant to errors → wider margin

What is the decision function for a new data point in SVM?

Study These Flashcards

If result ≥ 1 → positive class
If ≤ -1 → negative class
If in between → uncertain zone (margin)

The smaller the weight vector 𝜃, the larger the margin

What type of problems do SVMs solve?

Study These Flashcards

Convex quadratic optimization problems with linear constraints

What is the kernel trick in SVM?

Study These Flashcards

Projects data into a higher-dimensional space where it becomes linearly separable

When we transform back this line to original plane, it maps to ellipse boundary. These transformations are called kernels.
As a function of the original features, the linear SVM model is not actually linear anymore.

What is a Linear kernel used for?

Study These Flashcards

Fast, simple, when data is already separable

Key parameters: C (regularization)

What does the RBF kernel offer?

Study These Flashcards

Very flexible, works on complex data

Key parameters: C, gamma (width of kernel)

What is the purpose of the Gamma parameter in RBF kernels?

Study These Flashcards

Controls how far a single point’s influence reaches

Kernels work best for “small” n_samples → Long runtime for “large” datasets (100k samples)
Real power in infinite-dimensional spaces: rbf! → Rbf (Radial Basis Function / Gaussian kernel) is “universal kernel” - can learn (aka overfit) anything

What is One-vs-Rest (OvR) in multiclass classification?

Study These Flashcards

One classifier per class vs all others

One classifier per class vs all others
1vs{2, 3, 4}, 2vs{1, 3, 4}, 3vs{1, 2, 4}, 4vs{1, 2, 3}
Class with highest score

What is One-vs-One (OvO) in multiclass classification?

One classifier for every pair of classes One classifier for every pair of classes 1v1, 1v2, 1v3, 1v4, 2v3, 2v4, 3v4

What is a One-Class SVM (OC-SVM) used for?

Learns from one class, (e.g., only positives) → Desirable if there are many times of negative examples useful for novelty or anomaly detection Learns the "normal" pattern and flags anything different as outlier.

What are the advantages of Linear SVMs?

Accuracy, works well on smaller cleaner datasets, can be more efficient

What are the disadvantages of Linear SVMs?

Not suited to larger datasets, less effective on noisier datasets

What is a key advantage of Kernel SVMs?

Allow for complex decision boundaries, even if the data has only a few features. Work well on low-dimensional and high-dimensional data (i.e., few and many features)

What is a key disadvantage of Kernel SVMs?

Do not scale well with the number of samples 1. Do no scale very well with the number of samples. Running an SVM on data with up to 10,000 samples might work well, but working with datasets of size 100,000 or more can become challenging in terms of runtime and memory usage. 2. Require careful preprocessing of the data and tuning of the parameters. 3. SVM models are hard to inspect; it can be difficult to understand why a particular prediction was made, and it might be tricky to explain the model to a nonexpert.

What is the role of preprocessing in SVMs?

Requires careful preprocessing of the data and tuning of the parameters

What is the meaning of Support Vectors in SVM?

Closest points to the margin

What does Margin refer to in SVM?

Distance between boundary and support vectors

What is a Hyperplane in SVM?

Decision boundary

What does parameter C control in SVM?

Trade-off between margin size and error

What is a Kernel in SVM?

Function that transforms data to higher dimensions

What does Gamma control in RBF kernels?

Influence of data points in RBF

How do you train a margin-based classifier?

- Decision function of SVMs : maximize the margin between the data points and the hyperplane - search for the optimal parameters (𝜃) by finding a solution that: 1. Correctly classifies the training examples 2. Maximizes the margin 3. Can be found through optimisation via projective gradient descent, etc.

What is a polynomial kernel?

Adds interaction terms (e.g., x2x^2, xyxy)

What is Soft Margin SVM with 2 features?

With 2 Features, the threshold is a line, blue lines are the margins Handling Overlapping Classifications → Adding a square of the dosages - start with data in a relatively low dimension (in this example one dimension dosage in mg) - add higher dimensions (in this example from one to two dimensions) - find a Support Vector Classifier that separates the higher dimensional data into two groups Key parameters: C, degree (d)

What is the similarity function of SVM?

add features computed using a similarity function that measures how much each instance resembles a particular landmark

L7 Flashcards

(41 cards)