Topic 9: Understanding GD: Overparameterisation Flashcards

1
Q

What is the empirical loss function of linear regression

A

(Mean squared error)
R^(β) = 1/2 || y - X β||22

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What assumptions do we make about the empirical loss function of linear regression

A

The model is Overparameterised
aka n (no of training data) < d (number of training parameters)

The data matrix X is full rank

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is an Invertible Matrix

A

(non-singular)
Given A, there exists a A-1
where: AA-1 = Identity matrix

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

m x n: which is column and row

A

row by column

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is meant by X has a trivial null space

A

It has only one element, the zero vector

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a psuedo inverse

A

𝑋† = X⊺(XX⊺)^-1
Acts like an inverse in certain respects, even when X does not have a true inverse (is singular or not invertible)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What can we say about XX⊺

A

It has a trivial null space
It does not map non-zero vectors to zero
It is an invertible matrix

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

When is the R^(β) = 1/2 || y - X β||22 loss function at a global minima

A

When
β = X⊺ (XX⊺)^−1 y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How do we express the multiplicity of global minima

A

|| y - X β||22 = 0 (aka at a global minima)
when
β = X†y + ξ

Where ξ is s.t ξ⊺xi = 0 for all i

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What can we say about β = X†y

A

When ξ = 0
β = X†y is global minima with the least norm
It is the global minima closest to the origin

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is 2-norm

A

The euclidean distance or standard vector length

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What conditions must be met to converge to the global minimum with the least norm

A

The data matrix X must be full rank
Initially β 0 =0
There must exist a continuum of steps η

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is Implicit Bias

A

The inherent tendency of machine learning algorithms, particularly neural networks, to prefer certain solutions over others, even when these preferences are not explicitly programmed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is Algorithmic Regularization

A

The phenomenon of algorithmic regularization emerges as a consequence of implicit bias

Refers to the regularization effect that involves preventing overfitting or improving generalisation performance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is β and what are its dimensions

A

β is the linear predictor
β = d x 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is X and what are its dimensions

A

X is the input matrix
X = n x d

17
Q

What is y and what are its dimensions

A

vector of labels
y = n x 1

18
Q

What is ξ

A

A vector that is orthogonal (perpendicular) to each row of X (each data point)