Week 4 Flashcards

(49 cards)

1
Q

What does the out-of-sample error measure?

A

How well our training on the data set has generalized to data we have not seen before.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What does the in-sample error measure?

A

The training performance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Generalization error

A

The discrepancy between Ein and Eout.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the concept of the growth function?

A

A number that captures how different the hypotheses in H are, and how much overlap the different events have.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

dichotomy

A

The N-tuple that you get when you apply a hypothesis h (in H) to a finite sample. It splits the sample in two groups: h=+1 and h=-1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Wat representeert het aantal dichotomiën eigenlijk?

A

De verscheidenheid van de hypotheseklasse.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the growth function in the case of positive rays?

A

mH(N) = N + 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the condition with positive rays?

A

H consists of all hypothesis h: R -> {-1, +1} of the form h(x) = sign(x-a).
-1 is returned left from a and +1 is returned right from a.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Describe the condition with positive intervals:

A

H consists of all hypotheses in one dimension that return +1 if in some interval and -1 otherwise.
Each hypothesis is specified by two end points of that interval.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How many different dichotomies are there in the positive intervals condition?

A

N+1 / 2 different dichotomies.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the growth function in the case of positive intervals?

A

mH(N) = 0.5N^2 + 0.5N + 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the hypothesis space in the case of convex sets?

A

H consists of all hypotheses in two dimensions h: R^2 -> {-1,+1} that are positive inside some convex set and negative elsewhere.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

When is a set convex?

A

If the line segment connecting any two points within the set lies entirely within the set.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the growth function in the case of convex sets?

A

mH(N) = 2^N

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Gegeven een hypotheseklasse, de VC-dimensie dvc van die hypotheseklasse is…

A

het grootste aantal punten N zodat mH(N) = 2^N. Oneindig als voor alle N geldt dat mH(N) = 2^N.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Wat is het kleinste breekpunt, uitgedrukt in dvc?

A

k= dvc + 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

break point

A

When no data set of size k can be shattered by H, then k is the break point for H.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What kind of dvc do good models have?

A

A finite dvc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What kind of dvc do bad models have?

A

A infinite dvc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

How do you compute a dvc for the perceptron?

A

First show that the dvc is at least a certain value, then show it is at most a certain value.

21
Q

What can you conclude about dvc if there is a set of N points that can be shattered by h?

22
Q

What can you conclude about dvc if any set of N points can be shattered by H?

23
Q

What can you conclude about dvc if there is a set of N datapoints that cannot be shattered by H?

24
Q

What can you conclude about dvc if there is no dataset of N points that can be shattered by H?

25
What is the VC dimension of a d-dimensional perceptron^3?
d+1
26
Pocket algorithm
Adjustment on PLA: Save the hypothesis with the smallest Ein
27
What is a plus of the pocket algorithm vs the PLA?
It works even if the data is not linearly separable
28
What is a disadvantage of the pocket algorithm>
It takes a long time to calculate Ein.
29
Wat is het plan om een kleine Eout te bereiken?
1) Zorg dat je de hypothese kiest met de kleinste Ein. 2) Zorg dat Eout ~=~ Ein
30
Als twee hypothesen op elkaar lijken...
dan zullen zowel de in-sample als de out-of sample errors op elkaar lijken de gebeurtenissen Bi veel overlappen.
31
Wat telt het aantal dichotomieën?
Het effectieve aantal hypothesen.
32
Wat beschrijft de groeifunctie voor concept?
Het is het grootste aantal dichotomieën dat je op N punten kunt krijgen, als je de punten zelf mag kiezen.
33
Wat is M?
Het aantal hypothesen in de hypotheseklasse
34
Hoe ga je om met dat M pessimistisch is?
Maak gebruik van het begrip 'aantal effectieve hypothesen' als vervanging voor M, zo ontwijk je de kans op grote afwijkingen van Eout ten opzichte van Ein.
35
Effectieve aantal hypothesen
Kies N datapunten en kijk alleen naar wat de hypothesen doen op die datapunten.
36
Wat is het verschil tussen een hypothese en een dichotomie?
Hypothese h: X -> {-1, +1} Dichotomie h: {x1, x2, x..., xN} -> {-1, +1}
37
Wat is het maximale aantal dichotomieën voor een set van N punten?
2^N
38
Definieer de groeifunctie:
mH(N) = max voor x1, x..., xN in X |H(x1, x..., xN)|
39
Union bound
For any finite set of events, the probability that at least one of them happens is not greater than the sum of the probabilities of the individual events.
40
In what situation do we say that H shatters the data set of N points?
If H is capable of generating all possible dichotomies on those N points. (H is as diverse as it can be on this set, the max of 2^N dichotomies are realised).
41
If the condition mH(N) = 2^N breaks at any point, ...
we can bound mH(N) for all values of N by a simple polynomial based on this break point.
42
What is the upper bound for any mH(N) that has a break point k?
mH(N) <_ B(N,k) met B(N,k) as the maximum amount of dichotomies on N datapoints such that no subset of size k of the N points can be shattered by these dichotomies.
43
Gegeven een hypotheseklasse, de VC-dimensie dvc van die hypotheseklasse is...
het grootste aantal punten N zodat mH(N) = 2^N. Oneindig als voor alle N geldt dat mH(N) = 2^N.
44
Geef weer hoe mH(N) gebonden wordt door de VC-dimensie:
mH(N) <_ N^dvc + 1
45
Kleinste kwadraten schatter
Wanneer je de gradiënt van de in-sample error = 0 oplost en dan wlin krijgt. Je kunt y voorspellen voor een willekeurige x.
46
Ruisterm
Een toevalsvariabele in lineare regressie, y = f(x) + epsilon. Epsilon is dan de ruisterm.
46
Wat representeert het aantal dichotomiën eigenlijk?
De verscheidenheid van de hypotheseklasse.
47
Hoe bepaal je de groeifunctie van een hypotheseklasse H?
1) Kies N=1, en 1 datapunt x->. Kun je beide dichotomiën realiseren met hypothesen uit H? Dan m(H) = 2^N = 2. Anders is N=2 een breekpunt. 2) Ga door voor N=2, N=3 etc. Als je nooit een breekpunt tegenkomt, dan geldt m(H) = 2^N.
48
Give the generalization bound:
E.out(g) <_ E.in(g) + the root of ( (1/2N) * ln(2M/delta) ) with probability of at least 1 - delta.