Loss function 2 Flashcards
(27 cards)
What is the main purpose of a loss function?
To quantify the error between a model’s prediction and the true target.
What does minimising a loss function achieve?
It trains the model to improve prediction accuracy.
How is the loss function typically constructed in probabilistic models?
As the negative log-likelihood of the data under the model.
Why do we take the log of the likelihood?
To turn products into sums and avoid numerical instability.
What does maximum likelihood estimation correspond to in ML training?
Minimising the negative log-likelihood loss function.
What are the steps to construct a loss function?
Choose a distribution, predict its parameters, write the log-likelihood, minimise NLL.
What distribution justifies the use of squared error?
Gaussian (Normal) distribution with fixed variance.
What is the formula for squared error loss?
ℓ(y, ŷ) = (y - ŷ)²
What is heteroscedastic regression?
Regression where the model predicts both the mean and variance of the output.
Why model variance in regression tasks?
To capture uncertainty and avoid overconfidence in predictions.
What distribution models binary classification tasks?
Bernoulli distribution.
What activation function is used before binary cross-entropy loss?
Sigmoid function.
What is the formula for binary cross-entropy loss?
ℓ(y, ŷ) = -y log(ŷ) - (1 - y) log(1 - ŷ)
When do we predict class 1 in binary classification?
When the predicted probability is greater than 0.5
What is used to model multiclass classification tasks?
Categorical distribution.
What function converts logits to class probabilities?
Softmax function.
What is the formula for softmax for class k?
λ_k = e^{z_k} / Σ_j e^{z_j}
What is the loss function used in multiclass classification?
Multiclass cross-entropy.
What is one-hot encoding?
A vector where the true class has value 1 and all others are 0.
How does cross-entropy relate to KL divergence?
Cross-entropy = entropy + KL divergence between true and predicted distributions.
What is the loss for multivariate regression?
Sum of individual negative log-likelihoods or squared errors per output dimension.
Why can multivariate regression outputs dominate the loss?
If they are on different numerical scales (e.g. height vs. weight).
What are two ways to handle scale differences in multivariate outputs?
Normalize outputs or learn per-output variances.
What loss function is commonly used for classification tasks?
Cross-entropy loss.