Evaluation Metrics for Classification Flashcards

1
Q

Accuracy

A

It tells us about the fraction of correct decisions among the predictions made by the model. It can be computed by dividing the difference between predictions and original outcomes by total number of records.
It’s the same as calculating a mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How to decided the threshold for converting predicted probabilities into binary outcomes?

A

Normally, we can say that if a probability is greater than 0.5 it’s positive outcome and if it’s smaller than 0.5 it’s negative outcome. But we can choose a number of thresholds and check if the accuracy is improved or not e.g. it can be 0.3 or 0.7 etc.

In numpy, you can use np.linspace(0,1,21).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How to count the number of values in Python?

A

from collections import Counter
Counter(y_pred >= 1.0)

It will count the number of values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Why is accuracy not the right metric?

A

If we calculate accuracies from different thresholds from 0 to 1, then at 1, we do have a pretty good accuracy which means that all of the outcomes are positive or negative which can’t be correct in real world scenerio. This often occurs in class imbalances.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Confusion Matrix

A

A way to evaluate the model which is not affected by class Imbalance.
With threshold, we can have two possible scenerio for both positive and negative class. So we can have True positive, False Positive, True Negative, False Negative. Both False Positive and False Negative are incorrect predictions.

True Positive:
g(xi) >= t & y= 1

True Negative:
g(xi) <= t & y = 0

False Positive:
g(xi) >= t & y = 0

False Negative:
g(xi) < t & y= 1

We create a matrix out of it.
[TN, FP
FN, TP]

If we divide confusion_matrix /confusion_matrix.sum(), then we get correct accuracies for each of the four values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Precision and Recall

A

The values in confusion matrix can be used to express different values. e.g.
Accuracy: (TP + TN) / (TP + TN + FP + FN)

Precision tells us fraction of positive predictions turned out to be correct.

Precision: TP/TP+FP (positive class)

Recall: Fraction of correctly identified positive outcomes.
Recall: TP / (TP + FN)

It’s useful where class Imbalance is present.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

ROC Curves

A

ROC stands for receiver operating curves.
It’s a way to evaluate performance of binary classification.

This was used to detect the strength of signals to detect planes.

We are interested in False Positive Rates and True Positive Rates.

FPR = FP/(TN+FP) #first row of confusion matrix
TPR = TP /(FN+TP) #second row of confusion matrix

TPR is equal to recall.

These two values are created for all possible thresholds which forms the ROC curves.

df[::10] means with increments of 10, print every 10th element.

We need to compare our ROC curves with the random model.

We also want to compare it with an Ideal model i.e. 100% accuracy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Calculate ROC Curves using scikit learn

A

from sklearn.metrics import roc_curve
FPR, TPR, thresholds = roc_curve(y_val, y_pred)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

ROC AUC

A

Area Under ROC Curve (AUC) is a useful metric for binary classification model.
For ROC Curve, we want to be as close to the ideal point as possible. These are the best models. If it’s closer to the random line, then it’s a bad model and anything below the random line means that there is something wrong.

Greater the AUC, better the model.
Half of the ROC has AUC as 0.5.

Full AUC is 1.0. Closer to ideal ROC would have 0.9 or 0.8 AUC.

Closer to random ROC has AUC 0.6.

from sklearn.metrics import auc
auc(FPR, TPR)

from sklearn.metrics import roc_auc_score

roc_auc_score(y_val, y_pred)

The probability of a randomly selected positive being greater than the probability of randomly selected negative example is called area under the curve.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

K-Fold Cross Validation

A

We keep the test data separately.
We split the full training dataset into k parts e.g.
So we use part 1 & 2 as training dataset and part 3 as validation dataset.
And then we can train 1 & 3 and part 2 as validation dataset.
We can calculate AUC on validation dataset and calculate the mean AUC, standard deviation.

from sklearn.model_selection import K-Fold

K-Fold = KFold(n_splifs=10, shuffle=True,random_state=1)
Indx_train, indx_val = K-Fold.split(df_full_train)

Usual holdout CV is okay for smaller dataset. For bigger datasets, you can do more splits.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How to check how long an iteration taking?

A

We can use this library.

from tqdm.auto import tqdm

for i in tqdm(range(0,1)):

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

ROC AUC Feature Importance

A

ROC AUC can also be used to evaluate feature importance of the numerical variables

For each numerical variable, use it as a score, compute, AUC with the target variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly