Topic 10: Rademacher Complexity: a measure of capacity Flashcards

Question 1

Q

What is Rademacher Complexity

Answer

A

A concept that helps us measure how well our model will do on new, unseen data without being tied to any specific training method

Question 2

Q

What can be said about the space of all possible f that an algorithm is effectively searching through

Answer

A

It can be infinite-dimensional space

Whilst the space of its parameters is not

Question 3

Q

How can we define F in terms of the algorithm

Answer

A

F ∶= the set of all possible models(predictors) mapping Rd → Rk,
that a given training algorithm is searching over

Question 4

Q

What is ALG

Answer

A

Any chosen training algorithm which takes as input the 3−tuple (ℓ,Sn, F) and searches through the above F for the “best” predictor

Question 5

Q

What is inf f∈F R(f)

Answer

A

The infimum (greatest lower bound = lowest lower bound) of the loss function
It is precisely want we want the loss function to be - as low as possible

Question 6

Q

What randomness do we account for

Answer

A

The randomness inherent in the training algorithm and introduced by sampling the training dataset from the entire dataset

There may be other sources of randomness that we’re not taking into account

Question 7

Q

What is stochasticity

Answer

A

Randomness or unpredictability in a process or system

Question 8

Q

What is arginf

Answer

A

The argument of the infimum. In mathematical terms, it represents the input value(s) that correspond to the infimum (the greatest lower bound) of a given function
ferm is the arginf of Rˆ(f,Sn)

Question 9

Q

How to we account for all randomness in ALG

Answer

A

Replacing ferm with ALG and average over all possible random outcomes of the algorithm to get a more comprehensive measure of risk

Question 10

Q

What do we know about R(ALG(ℓ,Sn, F))

Answer

A

Not much
it is a random variable that we do not know how to control

Question 11

Q

What is the worst generalisation gap between population and empirical risk

Answer

A

E(xi,yi)∼D [sup (Rˆ(f,Sn) − R(f))]

Trying to control this instead frees us from the details of any particular ML trying to find arginfR(f)

We instead are focused on the broader concept of how choices of predictors F and datasets D interact to find the best predictor f

Question 12

Q

What is Empirical Rademacher Complexity

Answer

A

R^n(H) =
Eϵ [sup 1/n ∑ϵi h (zi) ]

The empirical Rademacher complexity tells us how well the functions in H can adapt to random fluctuations in the dataset

It checks how much these functions change when we add random noise, and then taking the average of the maximum change across all functions in H (ϵi is the random noise)

Question 13

Q

What is Average Rademacher Complexity

Answer

A

Rn(H) =
E Sn∼Dn (Eϵ [sup 1/n ∑ϵi h(zi) ] )

A way to measure the complexity of the function class H when considering all possible datasets drawn from the distribution D

note z = (x,y) and h(z) = ℓ(y, f(x))

Question 14

Q

What does Sn ∼ Dn mean

Answer

A

The elements of Sn are sampled independently and identically from a common distribution D

Question 15

Q

What is the Expectation of the Worst Generalisation Gap equation

Answer

A

ESn∼Dn [sup 1/n ∑h(zi) − Ez∼D [h(z)] ) ] ≤ 2Rn(H)

The theorem is about understanding how well the functions in H generalize to unseen data compared to how they perform on the data they were trained on

Question 16

Q

What does ℓ ○ F mean

Answer

A

Applying the loss function ℓ to each function in F
R = ℓ ○ F for some class of predictors F

Question 17

Q

What is the upper bound of the worst generalisation gap

Answer

A

2Rn(H)
An upper bound (H) that is dependent on the loss, the predictor class, the distribution, and the number of samples - useful

Question 18

Q

How do we limit rademacher complexity

Answer

A

Use a function C(H,D) to set an upper limit on this complexity

Ideally Rn(H) is less than or equal to C(H,D) / √n

where n is the number of training samples

It shows our model isn’t overly complex for our data size

Question 19

Q

What is sample complexity

Answer

A

An estimate of how many training samples we need for a certain prediction accuracy

Use
n ≥ ( 2C(H,D) / ϵ )^2

Where ϵ is the worst case prediction error

Question 20

Q

When are sample complexity estimates wildly overestimated

Answer

A

When the function class F gets more complicated

Question 21

Q

What does C tell us about the number of parameters

Answer

A

C is not dependent on the number of parameters in the functions in H
So we can vary parameters without changing C

Question 22

Q

What is Symmetrisation

Answer

A

Introduces additional symmetry into a problem
Eg creating a “ghost sample” of data points that are sampled independently from the same distribution as the original data

Symmetrical datasets = sets of data where the statistical properties are the same when certain transformations are applied

Question 23

Q

Why do we take the supremum in rademacher complexity

Answer

A

This operation gives us the maximum value of a certain expression across all functions in H
By taking the supremum, we are essentially finding the function in H that maximally contributes to the expression which helps us understand the worst-case scenario or the maximum complexity

Question 24

Q

Why do we take the expectation in rademacher complexity

Answer

A

we take the expectation over a set of random variables that represent noise
this gives us the average behavior of the supremum over different realizations of random noise (the typical behavior of the model)

Question 25

Q

In the supervised setting, what is z in rademacher complexity formula

Question 26

Q

In the supervised setting, what is h(z) in rademacher complexity formula

Answer

A

ℓ(y, f(x))

Question 27

Q

What is H

Answer

A

H = real valued function class

Question 28

Q

What is ϵ

Answer

A

ϵ = n−dimensional random vector where all its coordinates are either +1/-1

Question 29

Q

What is 1/n ∑h(zi)

Answer

A

the average performance on the sampled data

Question 30

Q

What is Ez∼D [h(z)]

Answer

A

the average performance on all possible data

Brainscape's Knowledge GenomeTM

Topic 10: Rademacher Complexity: a measure of capacity Flashcards

Brainscape's Knowledge Genome^TM