Classification Flashcards

1
Q

Accuracy

A

Metric for the quality of a model. The things we predicted right, divided by all the things.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What can go wrong with accuracy?

A

Class imbalance, when positives or negatives are extremely rare.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

True positives

A

One of the primary building blocks of the metrics we’ll use to evaluate classification models.

We correctly called wolf!
We saved the town.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

False Positives

A

One of the primary building blocks of the metrics we’ll use to evaluate classification models.

Error: we called wolf falsely.
Everyone is mad at us.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

False Negatives

A

One of the primary building blocks of the metrics we’ll use to evaluate classification models.

There was a wolf, but we didn’t spot it. It ate all our chickens.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

True Negatives

A

One of the primary building blocks of the metrics we’ll use to evaluate classification models.

No wolf, no alarm.
Everyone is fine.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Precision

A

(True Positives) / (All Positive Predictions = TP + FP)

When model said “positive” class, was it right? What proportion of positive identifications was actually correct? Intuition: Did the model cry “wolf” too often? This does not help in imbalanced classification sets (e.g. identify terrorists among all passengers or infected persons among a population)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Recall

A

(True Positives) / (All Actual Positives = TP + FN)

Out of all the possible positives, how many did the model correctly identify? What proportion of actual positives was identified correctly?
Intuition: Did it miss any wolves? Recall can be thought as of a model’s ability to find all the data points of interest in a dataset.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

ROC curve

A

Each point is the True Positives and False Positives rate at one decision threshold.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

AUC

A

Area under the ROC curve

Interpretation:
If we pick a random positive and a random negative, what’s the probability my model ranks them in the correct order? E.g. a AUROC of 1 describes the best possible ROC curve, as it ranks all positives above all negatives.

A ROC curve with an AUC between 0.5 and 1.0, means it ranks a random positive example higher than a random negative example more than 50% of the time. Real-world binary classification AUC values generally fall into this range.

A ROC curve with an AUC of 0.5, means it ranks a random positive example higher than a random negative example 50% of the time. As such, the corresponding classification model is basically worthless, as its predictive ability is no better than random guessing.

The worst possible ROC curve ranks all negatives above all positives, and has an AUC of 0.0. If you were to reverse every prediction (flip negatives to positives and positives to negatives), you’d actually have a perfect classifier!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Prediction Bias

A

Logistic Regression predictions should be unbiased.

average of predictions == average of observations
(if what we predict is different from what we observe)

Zero bias alone does not mean everything in your system is perfect. But it’s a great sanity check. If you have bias, you have a problem. Incomplete feature set?
Buggy pipeline? Biased training sample?
Don’t fix bias with a calibration layer, fix it in the model.
Look for bias in slices of data – this can guide improvements.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Classification threshold

A

A value above that threshold indicates “spam”; a value below indicates “not spam.” It is tempting to assume that the classification threshold should always be 0.5, but thresholds are problem-dependent, and are therefore values that you must tune.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly