Support Vector Machines Flashcards

1
Q

What is the length of the projection of x onto w if w is a unit vector?

A

wTx

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the margin?

A

Distance between the decision boundry (hyperplane) and the closest training point

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How is the modulus of a vector w written?

A

|| w ||

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is || w ||?

A

sqrt(wTw)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

If the hyperplane is defined as wx + w0 = 0, what is the distance from the origin to the hyperplane?

A

b = - w0 / ||w||

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the perpendicular distance from a point x to the hyperplane wTx + wo = 0?

A

(1 / ||w||) |wTx + w0|

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the value of the margin under the constraint mini | wTxi + w0 | = 1

A

1 / ||w||

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is maximizing 1 / ||w|| the same as?

A

minimizing ||w||2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the SVN optimization problem?

A

min<strong>w</strong> ||w||2

such that yi(wTx + w0) >= 1 for all i

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the 2 good properties of the optimal weight parameters (for SVM’s)?

A
  • They are linear function of the input and class labels
  • Solution is sparse (optimal hyperplane determined by just a few examples)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are support vectors?

A

The few training examples that determine the hyperplane

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the problem if the data is not linearly seperable for SVM’s?

A

The optimization problem has no solution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What can we add to solve the problem if the data is not linearly seperable (for SVM’s)?

A

Slack variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the SVM optimization problem (with slack variables)?

A

Minimize:

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is k (power of slack variable) usually set to (SVM)?

A

1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is C in SVM optimization problem with slack variables?

A

Trade-off parameter, how important are the slack variables

17
Q

What does this measure (SVM’s)?

A

How well we fit the data

18
Q

Why does adding slack variables increase the number of support vectors?

A

Every non-zero slack variable adds a support vector (because slack puts the point on the margin)

19
Q

What is a kernal function?

A

k(xi, xj) = phi(xi)T phi(xj)

Takes the feature vectors, transforms them into new feature space and takes the dot product

20
Q

Whats special about kernal functions (compared to basis expansion)?

A

Can work with infinite dimensions

21
Q

What is the form of a polynomial kernal function?

A

k(xi, xj) = (xi xj)d

22
Q

What is the form a gaussian radial basis function kernal function?

A

k(xi, xj) = exp(-||xi - xj||2 / c)

23
Q

What does || w || mean?

A

The magnitude of vector w

24
Q

What do slack variables do (SVM)?

A

Move misclassified points to the margin