L7 Flashcards

(41 cards)

1
Q

What are Parameters in the context of machine learning?

A

Values learned during training

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are Hyperparameters?

A

Set before training (like learning rate, number of neighbors, or regularization parameter C)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a Decision function?

A

Takes a dataset as input and gives a decision as output

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the Loss function?

A

What you are trying to minimize for a single training example to achieve your objective (e.g. square loss)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a Cost function?

A

Average of your loss functions over the entire training set (e.g. mean square error)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a Training set used for?

A

Learn model parameters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a Validation set used for?

A

Tune hyperparameters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a Test set used for?

A

Evaluate final model performance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Why do we need SVMs?

A

To find the best line (or hyperplane) possible with the largest margin between classes

Logistic regression draws a line to separate classes (good for linear problems).
But we want a model that finds the best line possible — not just any line.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What does SVM stand for?

A

Support Vector Machine

  • supervised learning algorithm used for both classification and regression.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the main goal of SVM in classification?

A

To separate classes with the widest possible gap or margin

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the Margin in SVM?

A

The distance between the decision boundary and the closest data points (support vectors)

SVM wants to maximize this margin.
The decision boundary is a straight line (or hyperplane in higher dimensions).
Goal : learn a boundary that leads to the largest margin (buffer) from points on both sides

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are Support Vectors?

A

The data points closest to the boundary that define the position of the decision boundary

  • They “support” or define the position of the decision boundary.
  • Only support vectors matter during prediction
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is a Hard Margin SVM?

A

No errors allowed – aims to find a hyperplane that perfectly separates the classes without any misclassification

Fails when there is overlap or noise
Max margin classification – Focus on observations at the edges of the cluster + Use mid point between them as threshold (Maximal Margin)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is a Soft Margin SVM?

A

Allows some misclassification or overlap and measures how much an instance is allowed to violate the margin

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What does hyperparameter C control in Soft Margin SVM?

A

Trade-off between margin size and classification errors

2 contradicting objectives:
making the slack variables as small as possible to reduce the margin violations
Making wT · w as small as possible to increase the margin

Large margin (simpler model)
Fewer classification errors (fit data better)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What does a large C mean?

A

Large C: less tolerant to errors → narrow margin
Small C: more tolerant to errors → wider margin

18
Q

What is the decision function for a new data point in SVM?

A

If result ≥ 1 → positive class
If ≤ -1 → negative class
If in between → uncertain zone (margin)

The smaller the weight vector 𝜃, the larger the margin

19
Q

What type of problems do SVMs solve?

A

Convex quadratic optimization problems with linear constraints

20
Q

What is the kernel trick in SVM?

A

Projects data into a higher-dimensional space where it becomes linearly separable

  • When we transform back this line to original plane, it maps to ellipse boundary. These transformations are called kernels.
  • As a function of the original features, the linear SVM model is not actually linear anymore.
21
Q

What is a Linear kernel used for?

A

Fast, simple, when data is already separable

Key parameters: C (regularization)

22
Q

What does the RBF kernel offer?

A

Very flexible, works on complex data

Key parameters: C, gamma (width of kernel)

23
Q

What is the purpose of the Gamma parameter in RBF kernels?

A

Controls how far a single point’s influence reaches

Kernels work best for “small” n_samples → Long runtime for “large” datasets (100k samples)
Real power in infinite-dimensional spaces: rbf! → Rbf (Radial Basis Function / Gaussian kernel) is “universal kernel” - can learn (aka overfit) anything

24
Q

What is One-vs-Rest (OvR) in multiclass classification?

A

One classifier per class vs all others

One classifier per class vs all others
1vs{2, 3, 4}, 2vs{1, 3, 4}, 3vs{1, 2, 4}, 4vs{1, 2, 3}
Class with highest score

25
What is One-vs-One (OvO) in multiclass classification?
One classifier for every pair of classes One classifier for every pair of classes 1v1, 1v2, 1v3, 1v4, 2v3, 2v4, 3v4
26
What is a One-Class SVM (OC-SVM) used for?
Learns from one class, (e.g., only positives) → Desirable if there are many times of negative examples useful for novelty or anomaly detection Learns the "normal" pattern and flags anything different as outlier.
27
What are the advantages of Linear SVMs?
Accuracy, works well on smaller cleaner datasets, can be more efficient
28
What are the disadvantages of Linear SVMs?
Not suited to larger datasets, less effective on noisier datasets
29
What is a key advantage of Kernel SVMs?
Allow for complex decision boundaries, even if the data has only a few features. Work well on low-dimensional and high-dimensional data (i.e., few and many features)
30
What is a key disadvantage of Kernel SVMs?
Do not scale well with the number of samples 1. Do no scale very well with the number of samples. Running an SVM on data with up to 10,000 samples might work well, but working with datasets of size 100,000 or more can become challenging in terms of runtime and memory usage. 2. Require careful preprocessing of the data and tuning of the parameters. 3. SVM models are hard to inspect; it can be difficult to understand why a particular prediction was made, and it might be tricky to explain the model to a nonexpert.
31
What is the role of preprocessing in SVMs?
Requires careful preprocessing of the data and tuning of the parameters
32
What is the meaning of Support Vectors in SVM?
Closest points to the margin
33
What does Margin refer to in SVM?
Distance between boundary and support vectors
34
What is a Hyperplane in SVM?
Decision boundary
35
What does parameter C control in SVM?
Trade-off between margin size and error
36
What is a Kernel in SVM?
Function that transforms data to higher dimensions
37
What does Gamma control in RBF kernels?
Influence of data points in RBF
38
How do you train a margin-based classifier?
- Decision function of SVMs : maximize the margin between the data points and the hyperplane - search for the optimal parameters (𝜃) by finding a solution that: 1. Correctly classifies the training examples 2. Maximizes the margin 3. Can be found through optimisation via projective gradient descent, etc.
39
What is a polynomial kernel?
Adds interaction terms (e.g., x2x^2, xyxy)
40
What is Soft Margin SVM with 2 features?
With 2 Features, the threshold is a line, blue lines are the margins Handling Overlapping Classifications → Adding a square of the dosages - start with data in a relatively low dimension (in this example one dimension dosage in mg) - add higher dimensions (in this example from one to two dimensions) - find a Support Vector Classifier that separates the higher dimensional data into two groups Key parameters: C, degree (d)
41
What is the similarity function of SVM?
add features computed using a similarity function that measures how much each instance resembles a particular landmark