Topic 11: Bounding Rademacher Complexity Flashcards

1
Q

What is Massart’s Finite Class Lemma

A

provides a bound on the empirical Rademacher complexity
the lemma states that if we have a finite set of functions and we know the maximum norm of the evaluations of these functions on a dataset (M), then we can bound the empirical Rademacher complexity of the set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How can Massarts Lemma be applied to linear regression

A

The lemma deals with a finite set of functions H - the set of linear predictors defined by quantized weight vectors and associated loss functions

It provides a bound on the expected risk of this finite class of functions
It provides a theoretical guarantee on the performance of these models

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the Jensens Lemma (v1)

A

Let F: R -> R be a convex function and let X be a real-valued random variable

F(E[X]) ≤ E[F(X)]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is jensens lemma (v2)

A

Let f : R -> R be a convex function, X is a n-dimensional vector random variable, T: Rn -> R is a given function:

F(E[T(X)]) ≤ E[F(T(X))]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does it mean for a weight vector to be bounded by its 2-norm

A

The 2-norm of w is:
∥w∥2 = √w1^2 + w2^2… + wn^2

If it is bounded then:
∥w∥2 ≤ B

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is important about rademacher for 2-norm bounded linear predictors

A

The bound on rademacher complexity does not scale with the dimension of the data d

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is rademacher for 2-norm bounded linear predictors dependent on

A

Value B - the norm bound

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What happens to average Rademacher and empirical rademacher complexity when n goes to infinity

A

They vanish
(at different rates)

This decrease implies as we gather more data, the generalisation gap tends to narrow

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the general eqn for rademacher complexity of 2-norm bounded linear predictors

A

Rn(F) ≤ B . C / √n

B = 2-norm bound on the weights of the linear functions
C = constant that bounds the expected squared of the data points
n = number of data points

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How can we use rademacher complexity of 2-norm bounded linear predictors to show size is not always better

A

it is not related to d

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the generalisation gap

A

difference between the performance of the model on the training data and its performance on the test data
small generalization gap -> model successfully learned to generalize from the training data to unseen data
large gap -> overfitting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the generalisation error

A

Difference between the training error and test error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does it mean for a neural network to become wider

A

Have more parameters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is surprising about increasingly wide nets (property 1)

A

The (estimated) Lipschitz constant of the trained nets will neither be very small nor be decreasing
– might even be moderately increasing as the number of parameters increase.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is surprising about increasingly wide nets (property 2)

A

Such good nets can be found with as many parameters as,
data−dimension × number−of−training−data = d × n, or even more.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is surprising about increasingly wide nets (property 3)

A

The test error would not be increasing and might even be improving as the number of parameters
increase (while all of them have nearly identical and nearly zero training errors)