Topic 4: The Bias-Variance Decomposition Flashcards

Question 1

Q

What is the variance of a model a measure of

Answer

A

Sensitivity to the training data

Question 2

Q

What is the bias of a model

Answer

A

how far the cluster of predictions are from the target
roughly translates to a measure of strength in the predictor

High bias -> centred around a point that is not the bullseye target)

Question 3

Q

What is a joint random variable

Answer

A

A variable taken from P(x,y)^n
Joint = two variables x and y
n = n observations taken of (x,y)

Question 4

Q

What is the expected squared risk

Answer

A

ESn[R(f)] = ESn[ E(x,y) [(f(x) − y)2] ]

Question 5

Q

What is ESn

Answer

A

Average of all possible training datasets

Question 6

Q

What is E(x,y) ~ D

Answer

A

The average over all possible testing points
The random variable (x,y) follows a certain probability distribution D

Question 7

Q

What is the Bias Variance decomposition for squared risk

Answer

A

ESn[R(f)] = Ex (noise + bias + variance)

Question 8

Q

What is the noise term

Answer

A

Ey∣x[ (y − Ey∣x[y])^2]

An irreducible constant, independent of any model parameters
Caused by choice of data/features and not by the model

Question 9

Q

What is the bias term

Answer

A

(ESn [f(x)] − Ey∣x[y])^2

This is the loss of the expected model against Ey|x[y]
The expected model (ESn [f(x)]) is the average response we would get if we could average over all possible training data sets

Question 10

Q

What is the variance term

Answer

A

ESn [ ( f(x) − ESn [f(x)] )2 ]

Compares a standard prediction f(x) with the average ESn [f(x)] and then takes the squared average
Captures variation in f due to different training sets, varying around the expected model
Model too flexible -> will grow large

Question 11

Q

How do you reduce the bias

Answer

A

Increase the flexibility of the model
So increase the model family size

Potentially can be reduced by adding more features

Question 12

Q

How do you reduce the noise

Answer

A

Can only reduce it by getting better quality labelled data (not by increasing data size)
It is equal to R(y*) = Bayes risk

Question 13

Q

How do you reduce the variance

Answer

A

(Potentially)
Increasing the number of training examples
Adding some regularization to the model
Bagging algorithm

Question 14

Q

What other losses does the bias variance decomposition hold for

Answer

A

squared loss
cross entropy loss

Question 15

Q

What is the relationship between bias-variance decomposition and approximation-estimation decomposition

Answer

A

They are not equal but strongly related
Noise is equal to bayes risk

Question 16

Q

What is the most common loss function used to train neural networks

Answer

A

Cross entropy

Question 17

Q

What does Ey∣x [y] mean

Answer

A

The average value of y, given that x is assigned the value x

Question 18

Q

What is f with a small circle above it

Answer

A

represents a new function, a modified version of f

Question 19

Q

What is ℓ(y, f(x))

Answer

A

A (non-negative) loss function, evaluated at a point x, y

Question 20

Q

What is the geometric mean

Answer

A

represents the central tendency of a finite set of real values
Calculated by:
GM=(i=1∏n xi) ^1/n

To normalise it: divide by some constant so the resulting distribution integrates to 1

Question 21

Q

what is ℓtrain(f)

Answer

A

Training error

Question 22

Q

Cross entropy vs squared risk B-V decomposition

Answer

A

For cross entropy, the geometric mean takes the place of the arithmetic mean (squared risk)
So we no longer have an ‘expected model’ but instead a ‘centroid model’

Question 23

Q

What happens to bias and variance as the depth of a regression tree increases

Answer

A

Bias decreases
Variance increases

Question 24

Q

What sort of bias and variance does linear regression exhibit

Answer

A

If a true relationship is too complex, linear regression will exhibit high bias leading to underfitting
Variance is generally low in linear regression

Question 25

Q

How does bias relate to fitting

Answer

A

High bias -> underfitting
If a model is too simple for the data it has high bias and underfitting

Question 26

Q

How does variance relate to fitting

Answer

A

High variance -> overfitting
Model is too complex and capturing too much noise

Question 27

Q

What sort of bias and variance do decision trees exhibit

Answer

A

Trees can have low bias due to complexity
They are prone to high variance
Techniques like pruning can control variance and avoid over-fitting

Question 28

Q

What sort of bias and variance does kNN exhibit

Answer

A

Low bias, especially in complex, non-linear datasets
May suffer from high variance due to noisy data
Choosing the appropriate k value helps manage the tradeoff

Question 29

Q

What sort of bias and variance do neural networks exhibit

Answer

A

Can model highly complex relationships with low bias
Prone to high variance if network is too large or trained for too long

Question 30

Q

What is the Over-paramterisation ratio

Answer

A

With p parameters to learn, and n training points, the overparameterization ratio is ρ = p/n A model is said to be over-parameterized if
ρ > 1, i.e. p > n
NOTE ρ ≠ p

Question 31

Q

For huge neural nets, what can we say about ρ

Answer

A

Often ρ&raquo_space; 1
aka p&raquo_space; n

Question 32

Q

What is monotonic?

Answer

A

something that does not vary or change

Question 33

Q

What is non monotonic

Answer

A

something which can vary according to the situation or condition
Eg Variance in deep neural networks

Question 34

Q

What is “Double descent”

Answer

A

Usually, as deep neural networks increase in complexity, the risk decreases slowly then faster
Instead of just the classic U shape where risk decreases to a sweet spot then increases again due to overfitting

After it increases again it then drops again (second descent)
This is thought to be due to networks being implicitly regularised due to stochastic gradient descent (we don’t know)

Question 35

Q

Does the bias variance decomposition hold for all losses

Answer

A

NO
eg it does not hold for 0/1 loss

Question 36

Q

What does it generally mean when a model has low bias

Answer

A

complex
flexibilty - have enough capacity to fit the training data closely, often resulting in low error on the training set
few assumptions - make fewer assumptions about the underlying data distribution allowing them to learn complex functions

Question 37

Q

What does it generally mean when a model has high bias

Answer

A

simple
underfitting
not enough parameters to capture data

Question 38

Q

What does it generally mean when a model has low variance

Answer

A

consistent - often produce similar predictions across different datasets
robust - less sensitive to small fluctuations or noise in the training data
model’s ability to generalize from the training data to unseen data is strong - captures underlying patterns

Question 39

Q

What does it generally mean when a model has high variance

Answer

A

overfitting
sensitive to noise
poor generalisation - perform badly on unseen data

Question 40

Q

What can be said of variance in the bv decomposition

Answer

A

Variance is independent of y

Question 41

Q

what can be said of bias in the bv decomposition

Answer

A

it is the loss of a predictor ◦q = argminY ED(l (z,q))
not dependent on any particular training set

Question 42

Q

what is Ey|x[y]

Answer

A

The average true label (for a given input x) if we had perfect knowledge of the underlying distribution of labels

Question 43

Q

how do we control the bias variance tradeoff in linear regression

Answer

A

With l2 regularisation

Question 44

Q

what techniques help reduce variance in neural networks

Answer

A

dropout, early stopping and implicit regularization help manage variance

Brainscape's Knowledge GenomeTM

Topic 4: The Bias-Variance Decomposition Flashcards

Brainscape's Knowledge Genome^TM