# ML Metrics Flashcards

1
Q

Offline metrics for classification models

A

Precision, recall, F1 score, accuracy, ROC-AUC, PR-AUC,
confusion matrix

2
Q

Offline metrics for regression

A

Mean squared error (MSE)
MAE
RMSE

3
Q

Offline metrics for ranking system

A
• MRR
• mAP
• nDCG
4
Q

Online metric for ad click prediction

A
• Click through rate
5
Q

Online metric for harmful content detection

A
• Number of reports
• Actioned reports
6
Q

Online metric for video recommendations

A
• Click through rate
• Total watch time
• Number of completed videos
7
Q

Types of Loss Functions

A

Mean squared error
Categorical cross-entropy loss
Binary cross-entropy loss

8
Q

Mean squared error

A
• Measures the difference between the predicted output and the true output
• Used to optimize the model parameters during training
• what we’re trying to minimize when we train a model
9
Q

precision

A

positive predictive value

probability a sample classified as positive is actually positive

TP/(TP+FP)

10
Q

recall

A

same as true positive rate
true positives / total positives
TP / (TP+FN)

sensitivity of the classification

11
Q

What’s the best metric when you have a large number of negative samples

A

Precision and recall

Precision is not affected by a large number of negative samples because it measures the fraction of true positives out of the number of predicted positives (TP +FP).

Precision measures the probability of correct detection of positive values while FPR, TPR, and ROC measure the ability to distinguish between classes.

12
Q

Highest value of F1

A

1.0 indicating perfect precision and recall

13
Q

Lowest value of F1

A

0 if either precision or recall are 0

14
Q

AUC range

A

0 to 1

15
Q

ROC

A
• true positive rate (recall) on the y axis
false positive rate on the x axis
• captures the performance of a classification model at all classification thresholds (probability thresholds)
• does not depend on class distribution!
16
Q

AUC

A
• area under the ROC curve
• used to evaluate a binary classification model
• Quantifies the ability of the model to correctly classify

AUC ranges from 0 to 1

17
Q

AUC of 0

A

A model that is 100% wrong

18
Q

AUC of 1

A

A model that is 100% correct

19
Q

What’s the best metric when you have a large number of positive samples

A

ROC is a better metric

20
Q

What metric should you use when detection of both classes is equally important

A

ROC

21
Q

F1

A
• used to evaluate the performance of a binary classification model
• combines precision and recall into a single measure
• harmonic mean of the precision and recall which provides a balanced measure of the model’s accuracy
• F1 is 0 if either precision or recall is 0
22
Q

true positive rate

A

aka recall
true positives / all positives
TP / (TP + FN)

23
Q

Offline Metrics

A

Score the model when building it
Before model is put into production (train, eval, and test datasets)
Examples of offline metrics: ROC, AUC, F1, R^2, MSE, intersection over union

24
Q

online metrics

A

scores from model once it is running in prod and serving

domain specific. things like click through rate or minutes spent watching a video.

25
Q

MRR

A
• mean reciprocal rank
• only considers the rank of the first relevant item
• not a good measure of the quality of the list as a whole
26
Q

mAP

A
• mean average precision
• good for ranking problems
• works well for binary relevance (relevant or irrelevant).
• For continuous relevance scores use nDCG
27
Q

nDCG

A
• winner, winner for ranking problems
• continuous relevance score
• shows how good the ranking is compared to the ideal ranking
• takes into account the position of the relevant item in a ranked list
Ranges from 0 to 1. Higher values indicate better performance.
28
Q

nDCG acronym

A

normalized discounted cumulative gain

29
Q

Cross entropy

A
• how close the model’s predicted probabilities are to the ground truth label.
• CE is zero if we have an ideal system that predicts a 0 for the negative classes and 1 for the positive classes.
• The lower the CE, the higher the accuracy of the prediction.