Loss function 2 Flashcards by ROWAN Gomanee

What is the main purpose of a loss function?

To quantify the error between a model’s prediction and the true target.

How well did you know this?

Not at all

Perfectly

What does minimising a loss function achieve?

It trains the model to improve prediction accuracy.

How well did you know this?

Not at all

Perfectly

How is the loss function typically constructed in probabilistic models?

As the negative log-likelihood of the data under the model.

How well did you know this?

Not at all

Perfectly

Why do we take the log of the likelihood?

To turn products into sums and avoid numerical instability.

How well did you know this?

Not at all

Perfectly

What does maximum likelihood estimation correspond to in ML training?

Minimising the negative log-likelihood loss function.

How well did you know this?

Not at all

Perfectly

What are the steps to construct a loss function?

Choose a distribution, predict its parameters, write the log-likelihood, minimise NLL.

How well did you know this?

Not at all

Perfectly

What distribution justifies the use of squared error?

Gaussian (Normal) distribution with fixed variance.

How well did you know this?

Not at all

Perfectly

What is the formula for squared error loss?

ℓ(y, ŷ) = (y - ŷ)²

How well did you know this?

Not at all

Perfectly

What is heteroscedastic regression?

Regression where the model predicts both the mean and variance of the output.

How well did you know this?

Not at all

Perfectly

Why model variance in regression tasks?

To capture uncertainty and avoid overconfidence in predictions.

How well did you know this?

Not at all

Perfectly

What distribution models binary classification tasks?

Bernoulli distribution.

How well did you know this?

Not at all

Perfectly

What activation function is used before binary cross-entropy loss?

Sigmoid function.

How well did you know this?

Not at all

Perfectly

What is the formula for binary cross-entropy loss?

ℓ(y, ŷ) = -y log(ŷ) - (1 - y) log(1 - ŷ)

How well did you know this?

Not at all

Perfectly

When do we predict class 1 in binary classification?

When the predicted probability is greater than 0.5

How well did you know this?

Not at all

Perfectly

What is used to model multiclass classification tasks?

Categorical distribution.

How well did you know this?

Not at all

Perfectly

What function converts logits to class probabilities?

Study These Flashcards

Softmax function.

What is the formula for softmax for class k?

Study These Flashcards

λ_k = e^{z_k} / Σ_j e^{z_j}

What is the loss function used in multiclass classification?

Study These Flashcards

Multiclass cross-entropy.

What is one-hot encoding?

Study These Flashcards

A vector where the true class has value 1 and all others are 0.

How does cross-entropy relate to KL divergence?

Study These Flashcards

Cross-entropy = entropy + KL divergence between true and predicted distributions.

What is the loss for multivariate regression?

Study These Flashcards

Sum of individual negative log-likelihoods or squared errors per output dimension.

Why can multivariate regression outputs dominate the loss?

Study These Flashcards

If they are on different numerical scales (e.g. height vs. weight).

What are two ways to handle scale differences in multivariate outputs?

Study These Flashcards

Normalize outputs or learn per-output variances.

What loss function is commonly used for classification tasks?

Study These Flashcards

Cross-entropy loss.

What loss function is commonly used for regression tasks?

Mean squared error (MSE).

What is the advantage of using probabilistic loss formulations?

They reflect the data distribution and uncertainty more accurately.

What does minimising cross-entropy achieve in classification?

It makes the predicted distribution match the true label distribution.

Loss function 2 Flashcards

(27 cards)