Topic 11: Bounding Rademacher Complexity Flashcards

Question 1

Q

What is Massart’s Finite Class Lemma

Answer

A

provides a bound on the empirical Rademacher complexity
the lemma states that if we have a finite set of functions and we know the maximum norm of the evaluations of these functions on a dataset (M), then we can bound the empirical Rademacher complexity of the set

Question 2

Q

How can Massarts Lemma be applied to linear regression

Answer

A

The lemma deals with a finite set of functions H - the set of linear predictors defined by quantized weight vectors and associated loss functions

It provides a bound on the expected risk of this finite class of functions
It provides a theoretical guarantee on the performance of these models

Question 3

Q

What is the Jensens Lemma (v1)

Answer

A

Let F: R -> R be a convex function and let X be a real-valued random variable

F(E[X]) ≤ E[F(X)]

Question 4

Q

What is jensens lemma (v2)

Answer

A

Let f : R -> R be a convex function, X is a n-dimensional vector random variable, T: Rn -> R is a given function:

F(E[T(X)]) ≤ E[F(T(X))]

Question 5

Q

What does it mean for a weight vector to be bounded by its 2-norm

Answer

A

The 2-norm of w is:
∥w∥2 = √w1^2 + w2^2… + wn^2

If it is bounded then:
∥w∥2 ≤ B

Question 6

Q

What is important about rademacher for 2-norm bounded linear predictors

Answer

A

The bound on rademacher complexity does not scale with the dimension of the data d

Question 7

Q

What is rademacher for 2-norm bounded linear predictors dependent on

Answer

A

Value B - the norm bound

Question 8

Q

What happens to average Rademacher and empirical rademacher complexity when n goes to infinity

Answer

A

They vanish
(at different rates)

This decrease implies as we gather more data, the generalisation gap tends to narrow

Question 9

Q

What is the general eqn for rademacher complexity of 2-norm bounded linear predictors

Answer

A

Rn(F) ≤ B . C / √n

B = 2-norm bound on the weights of the linear functions
C = constant that bounds the expected squared of the data points
n = number of data points

Question 10

Q

How can we use rademacher complexity of 2-norm bounded linear predictors to show size is not always better

Answer

A

it is not related to d

Question 11

Q

What is the generalisation gap

Answer

A

difference between the performance of the model on the training data and its performance on the test data
small generalization gap -> model successfully learned to generalize from the training data to unseen data
large gap -> overfitting

Question 12

Q

What is the generalisation error

Answer

A

Difference between the training error and test error

Question 13

Q

What does it mean for a neural network to become wider

Answer

A

Have more parameters

Question 14

Q

What is surprising about increasingly wide nets (property 1)

Answer

A

The (estimated) Lipschitz constant of the trained nets will neither be very small nor be decreasing
– might even be moderately increasing as the number of parameters increase.

Question 15

Q

What is surprising about increasingly wide nets (property 2)

Answer

A

Such good nets can be found with as many parameters as,
data−dimension × number−of−training−data = d × n, or even more.

Question 16

Q

What is surprising about increasingly wide nets (property 3)

Answer

A

The test error would not be increasing and might even be improving as the number of parameters
increase (while all of them have nearly identical and nearly zero training errors)

Brainscape's Knowledge GenomeTM

Topic 11: Bounding Rademacher Complexity Flashcards

Brainscape's Knowledge Genome^TM