PAC Learning Flashcards by Dan Hoskins

What is proof by contraposition?

How well did you know this?

Not at all

Perfectly

What’s the intuition of the VC Dimension?

Even if you have infinitely many hypotheses in your hypothesis class, given some training sample, many of those hypotheses will look functionally the same wrt that specified sample

How well did you know this?

Not at all

Perfectly

What is h? What does it do?

A hypothesis.
Applied to some dataset S, generates a labeling of S

How well did you know this?

Not at all

Perfectly

What is S?

the dataset

How well did you know this?

Not at all

Perfectly

For the realizable setting, what are the relations between the bound and 1) epsilon, 2) abs(H)?

Bound is inversely linear in epsilon (e.g. halving the error requires double the examples)
Bound is only logarithmic in |H| (e.g. quadrupling the hypothesis space only requires double the examples)

How well did you know this?

Not at all

Perfectly

For the agnostic setting, what are the relations between the bound and

1) epsilon,
2) abs(H)

Bound is inversely quadratic in epsilon (e.g. halving the error requires 4x the examples)
Bound is only logarithmic in |H| (i.e. same as Realizable case)

How well did you know this?

Not at all

Perfectly

What is shattering?

The hypotheses in H can perfectly classify every possible labeling of S

How well did you know this?

Not at all

Perfectly

What is the VC-dimension?

Def: The VC-dimension (or VaporikChervonenkis dimension) of ℋ is the cardinality of the largest set “ such that ℋ can shatter “.

Or the definition from the recitation. Get rid of one of these

VC dimension of a hypothesis space H is the maximum number of points such that there exists at least one arrangement of these points and a hypothesis h ∈ H that is consistent with any labeling of this arrangement of points

How well did you know this?

Not at all

Perfectly

When is the VC dimension infinity?

If ℋ can shatter arbitrarily large finite sets, then the VC-dimension of ℋ is infinity

How well did you know this?

Not at all

Perfectly

To prove that that VC(H) = some value M, what do you have to do?

How well did you know this?

Not at all

Perfectly

What’s the vc dimension for separators in n dimensions?

n+1

How well did you know this?

Not at all

Perfectly

What does ∃ mean?

There exists

How well did you know this?

Not at all

Perfectly

(high level) What’s the corollary to Theorem 1 of the PAC theorem?

Give a numerical bound of the true error

How well did you know this?

Not at all

Perfectly

What’s the key idea that makes Correlary 4 of theorem 1 of pac learning useful?

We want to tradeoff between low training error and keeping H simple (aka low VC(H))
We can tune the lambda parameter in the regularize to hopefully get us to land at the sweet spot in the graph

How well did you know this?

Not at all

Perfectly

What are the practical ways we can tradeoff between low training error and keeping H simple? (1)

Use a regularizer

How well did you know this?

Not at all

Perfectly

What are discriminative models?

How well did you know this?

Not at all

Perfectly

What’s a pmf?

p(x) : Function giving the probability that discrete r.v. X takes value x.

How well did you know this?

Not at all

Perfectly

What’s a pdf?

Study These Flashcards

f(x) : Function the returns a nonnegative real indicating the relative likelihood that a continuous r.v. X falls in a certain interval

What’s a cdf? What’s the symbol?

Study These Flashcards

F(x) : Function that returns the probability that a random variable X is less than or equal to x:

What does a beta distribution look like?

Study These Flashcards

What does a dirichlet distribution look like?

Study These Flashcards

What’s the symbol for expected value of x?

Study These Flashcards

E[X]

What are the equations for expected value?

Study These Flashcards

What’s the equation for variance (one that applies to both discrete and continuous)

Study These Flashcards

What's the key concept of joint probability? What's its symbol?

* p(x, y) * Key concept: two or more random variables may interact. Thus, the probability of one taking on a certain value depends on which value(s) the others are taking.

What's a marginal distribution? How is it written?

For x, it's written: p(x) It gives the probabilities of various values of the variables in a subset without reference to the values of the other variables. E.g. you might have probabilities for two variables x and y each taking some values. Here, the marginal distribution of x wouldn't include any reference to y

What's the conditional probability?

p(x|y)

What's the equation for conditional probability?

p(x, y) = p(x|y)\*p(y)

In mathematical terms, how do we know if two variables are independent?

p(x, y) = p(x)p(y)

What does it mean to be conditionally independent? Write it in mathematical terms

p(x, y|z) = p(x|z)p(y|z)

What is p\*?

probability distribution (unknown)

What is S?

The training dataset {x^(1),x^(2),...x^(N)}

What is H (callographic)?

Our hypothesis space

What is h?

maps from inputs (xs) to outputs (ys)

What is epsilon?

The amount of error

What is delta?

(1-delta) = the probability of PAC?? (Not sure about this)

Define consistent in mathematical terms

How many points can a linear boundary (with bias) classify exactly for d-Dimensions?

d + 1

What's an important thing to remember about shattering?

To say we shatter n points in d dimensions, we just need to find **one** arrangement where **all** possible labelings of that arrangement can be correctly classified. I.e. shattering does not mean it can classify every possible arrangement

When given a certain shape classifier and asked if it can shatter a configuration, what's a good progression to run through to check?

Sequentially check if the classifier can segregate any combination of n points from n=1 to N, where N is the total number of points. If so, then it does shatter. E.g. Can this classifier select: 1. Any 1 point 2. Any combination of two points 3. Any combination of 3 points 4. .... 5. All N points

What's a concept class?

Synonymous with hypothesis class

What is a consistent learner?

A learner that achieves 0 training error in a realizable setting

What gives a numerical bound of the true error?

The corollary to Theorem 1 of the PAC theorem?

d + 1

How many points can a linear boundary (with bias) classify exactly for d-Dimensions?

PAC Learning Flashcards

(44 cards)