Loss function 2 Flashcards

(27 cards)

1
Q

What is the main purpose of a loss function?

A

To quantify the error between a model’s prediction and the true target.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What does minimising a loss function achieve?

A

It trains the model to improve prediction accuracy.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How is the loss function typically constructed in probabilistic models?

A

As the negative log-likelihood of the data under the model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Why do we take the log of the likelihood?

A

To turn products into sums and avoid numerical instability.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does maximum likelihood estimation correspond to in ML training?

A

Minimising the negative log-likelihood loss function.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the steps to construct a loss function?

A

Choose a distribution, predict its parameters, write the log-likelihood, minimise NLL.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What distribution justifies the use of squared error?

A

Gaussian (Normal) distribution with fixed variance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the formula for squared error loss?

A

ℓ(y, ŷ) = (y - ŷ)²

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is heteroscedastic regression?

A

Regression where the model predicts both the mean and variance of the output.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Why model variance in regression tasks?

A

To capture uncertainty and avoid overconfidence in predictions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What distribution models binary classification tasks?

A

Bernoulli distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What activation function is used before binary cross-entropy loss?

A

Sigmoid function.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the formula for binary cross-entropy loss?

A

ℓ(y, ŷ) = -y log(ŷ) - (1 - y) log(1 - ŷ)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

When do we predict class 1 in binary classification?

A

When the predicted probability is greater than 0.5

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is used to model multiclass classification tasks?

A

Categorical distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What function converts logits to class probabilities?

A

Softmax function.

17
Q

What is the formula for softmax for class k?

A

λ_k = e^{z_k} / Σ_j e^{z_j}

18
Q

What is the loss function used in multiclass classification?

A

Multiclass cross-entropy.

19
Q

What is one-hot encoding?

A

A vector where the true class has value 1 and all others are 0.

20
Q

How does cross-entropy relate to KL divergence?

A

Cross-entropy = entropy + KL divergence between true and predicted distributions.

21
Q

What is the loss for multivariate regression?

A

Sum of individual negative log-likelihoods or squared errors per output dimension.

22
Q

Why can multivariate regression outputs dominate the loss?

A

If they are on different numerical scales (e.g. height vs. weight).

23
Q

What are two ways to handle scale differences in multivariate outputs?

A

Normalize outputs or learn per-output variances.

24
Q

What loss function is commonly used for classification tasks?

A

Cross-entropy loss.

25
What loss function is commonly used for regression tasks?
Mean squared error (MSE).
26
What is the advantage of using probabilistic loss formulations?
They reflect the data distribution and uncertainty more accurately.
27
What does minimising cross-entropy achieve in classification?
It makes the predicted distribution match the true label distribution.