Week 4 Flashcards by Jedidja Marsman

What does the out-of-sample error measure?

How well our training on the data set has generalized to data we have not seen before.

How well did you know this?

Not at all

Perfectly

What does the in-sample error measure?

The training performance

How well did you know this?

Not at all

Perfectly

Generalization error

The discrepancy between Ein and Eout.

How well did you know this?

Not at all

Perfectly

What is the concept of the growth function?

A number that captures how different the hypotheses in H are, and how much overlap the different events have.

How well did you know this?

Not at all

Perfectly

dichotomy

The N-tuple that you get when you apply a hypothesis h (in H) to a finite sample. It splits the sample in two groups: h=+1 and h=-1.

How well did you know this?

Not at all

Perfectly

Wat representeert het aantal dichotomiën eigenlijk?

De verscheidenheid van de hypotheseklasse.

How well did you know this?

Not at all

Perfectly

What is the growth function in the case of positive rays?

mH(N) = N + 1

How well did you know this?

Not at all

Perfectly

What is the condition with positive rays?

H consists of all hypothesis h: R -> {-1, +1} of the form h(x) = sign(x-a).
-1 is returned left from a and +1 is returned right from a.

How well did you know this?

Not at all

Perfectly

Describe the condition with positive intervals:

H consists of all hypotheses in one dimension that return +1 if in some interval and -1 otherwise.
Each hypothesis is specified by two end points of that interval.

How well did you know this?

Not at all

Perfectly

How many different dichotomies are there in the positive intervals condition?

N+1 / 2 different dichotomies.

How well did you know this?

Not at all

Perfectly

What is the growth function in the case of positive intervals?

mH(N) = 0.5N^2 + 0.5N + 1

How well did you know this?

Not at all

Perfectly

What is the hypothesis space in the case of convex sets?

H consists of all hypotheses in two dimensions h: R^2 -> {-1,+1} that are positive inside some convex set and negative elsewhere.

How well did you know this?

Not at all

Perfectly

When is a set convex?

If the line segment connecting any two points within the set lies entirely within the set.

How well did you know this?

Not at all

Perfectly

What is the growth function in the case of convex sets?

mH(N) = 2^N

How well did you know this?

Not at all

Perfectly

Gegeven een hypotheseklasse, de VC-dimensie dvc van die hypotheseklasse is…

het grootste aantal punten N zodat mH(N) = 2^N. Oneindig als voor alle N geldt dat mH(N) = 2^N.

How well did you know this?

Not at all

Perfectly

Wat is het kleinste breekpunt, uitgedrukt in dvc?

k= dvc + 1

How well did you know this?

Not at all

Perfectly

break point

When no data set of size k can be shattered by H, then k is the break point for H.

How well did you know this?

Not at all

Perfectly

What kind of dvc do good models have?

A finite dvc.

How well did you know this?

Not at all

Perfectly

What kind of dvc do bad models have?

A infinite dvc.

How well did you know this?

Not at all

Perfectly

How do you compute a dvc for the perceptron?

Study These Flashcards

First show that the dvc is at least a certain value, then show it is at most a certain value.

What can you conclude about dvc if there is a set of N points that can be shattered by h?

Study These Flashcards

dvc >_ N

What can you conclude about dvc if any set of N points can be shattered by H?

Study These Flashcards

dvc>_N

What can you conclude about dvc if there is a set of N datapoints that cannot be shattered by H?

Study These Flashcards

Nothing

What can you conclude about dvc if there is no dataset of N points that can be shattered by H?

Study These Flashcards

dvc<N

What is the VC dimension of a d-dimensional perceptron^3?

d+1

Pocket algorithm

Adjustment on PLA: Save the hypothesis with the smallest Ein

What is a plus of the pocket algorithm vs the PLA?

It works even if the data is not linearly separable

What is a disadvantage of the pocket algorithm>

It takes a long time to calculate Ein.

Wat is het plan om een kleine Eout te bereiken?

1) Zorg dat je de hypothese kiest met de kleinste Ein. 2) Zorg dat Eout ~=~ Ein

Als twee hypothesen op elkaar lijken...

dan zullen zowel de in-sample als de out-of sample errors op elkaar lijken de gebeurtenissen Bi veel overlappen.

Wat telt het aantal dichotomieën?

Het effectieve aantal hypothesen.

Wat beschrijft de groeifunctie voor concept?

Het is het grootste aantal dichotomieën dat je op N punten kunt krijgen, als je de punten zelf mag kiezen.

Wat is M?

Het aantal hypothesen in de hypotheseklasse

Hoe ga je om met dat M pessimistisch is?

Maak gebruik van het begrip 'aantal effectieve hypothesen' als vervanging voor M, zo ontwijk je de kans op grote afwijkingen van Eout ten opzichte van Ein.

Effectieve aantal hypothesen

Kies N datapunten en kijk alleen naar wat de hypothesen doen op die datapunten.

Wat is het verschil tussen een hypothese en een dichotomie?

Hypothese h: X -> {-1, +1} Dichotomie h: {x1, x2, x..., xN} -> {-1, +1}

Wat is het maximale aantal dichotomieën voor een set van N punten?

2^N

Definieer de groeifunctie:

mH(N) = max voor x1, x..., xN in X |H(x1, x..., xN)|

Union bound

For any finite set of events, the probability that at least one of them happens is not greater than the sum of the probabilities of the individual events.

In what situation do we say that H shatters the data set of N points?

If H is capable of generating all possible dichotomies on those N points. (H is as diverse as it can be on this set, the max of 2^N dichotomies are realised).

If the condition mH(N) = 2^N breaks at any point, ...

we can bound mH(N) for all values of N by a simple polynomial based on this break point.

What is the upper bound for any mH(N) that has a break point k?

mH(N) <_ B(N,k) met B(N,k) as the maximum amount of dichotomies on N datapoints such that no subset of size k of the N points can be shattered by these dichotomies.

Gegeven een hypotheseklasse, de VC-dimensie dvc van die hypotheseklasse is...

het grootste aantal punten N zodat mH(N) = 2^N. Oneindig als voor alle N geldt dat mH(N) = 2^N.

Geef weer hoe mH(N) gebonden wordt door de VC-dimensie:

mH(N) <_ N^dvc + 1

Kleinste kwadraten schatter

Wanneer je de gradiënt van de in-sample error = 0 oplost en dan wlin krijgt. Je kunt y voorspellen voor een willekeurige x.

Ruisterm

Een toevalsvariabele in lineare regressie, y = f(x) + epsilon. Epsilon is dan de ruisterm.

Wat representeert het aantal dichotomiën eigenlijk?

De verscheidenheid van de hypotheseklasse.

Hoe bepaal je de groeifunctie van een hypotheseklasse H?

1) Kies N=1, en 1 datapunt x->. Kun je beide dichotomiën realiseren met hypothesen uit H? Dan m(H) = 2^N = 2. Anders is N=2 een breekpunt. 2) Ga door voor N=2, N=3 etc. Als je nooit een breekpunt tegenkomt, dan geldt m(H) = 2^N.

Give the generalization bound:

E.out(g) <_ E.in(g) + the root of ( (1/2N) * ln(2M/delta) ) with probability of at least 1 - delta.

Week 4 Flashcards

(49 cards)