Regularisation Flashcards

1
Q

Regularisation

A

Try to make sure, that the model does not overfit to the training data - i.e. let’s not don’t trust the training data too much. The more iterations, the more we fit the model to the training data, minimising loss.

Regularisation does not depend on the features. It looks over the model’s weights and tries to keep them overall small.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Why regularisation?

A

We try to generalise. The model does not need to be perfect for the training data - but should be useful for unseen data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

A simple analogy for overfitting

A

If you learn english by speaking only to a teenager. You’ll pick up slang so good, that you might end up not being able to speak english with any other person. Regularisation helps to stop learning - to generally be able to speak english.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How to do regularisation

A

We still minimise the loss, but we add penalty for a complex model. Loss(data|model) + complexity(model)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

A strategy to model complexity: complexity(model)

A

One popular strategy is to try to prefer smaller weights, that is make the parameters as small we can get away with while still getting the training examples right.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

L2 regularisation

A

also called ridge regularisation
sum of the squared weights

(for linear models)
penalise big weights
weights should be centred around zero
weights should be normally distributed

L(w|D) + lamda*sum(square(w))
(lamda a weight indicating, how much we care about)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly