CH3 Classification Flashcards
What is binary classifier?
Let’s simplify the problem for now and only try to identify one digit—for example, the number 5. This “5-detector” will be an example of a binary classifier, capable of
distinguishing between just two classes, 5 and not-5
What is the advantage of the SGD?
Stochastic Gradient Descent (SGD) classifier, using Scikit-Learn’s SGDClassifier class. This clas‐ sifier has the advantage of being capable of handling very large datasets efficiently. This is in part because SGD deals with training instances independently, one at a time
(which also makes SGD well suited for online learning)
What are the performance measures that are available?
- measuring accuracy with cross-validation
- confusion matrix
- precision
- recall
- ROC curve
Why is accuracy generally not the preferred performance measure?
This demonstrates why accuracy is generally not the preferred performance measure for classifiers, especially when you are dealing with skewed datasets (i.e., when some
classes are much more frequent than others)
What is the general idea of the confusion matrix?
The general idea is to count the number of times instances of class A are classified as class B.H
How to compute the confusion matrix?
To compute the confusion matrix, you first need to have a set of predictions, so they can be compared to the actual targets.
Just like the cross_val_score() function, cross_val_predict() performs K-fold cross-validation, but instead of returning the evaluation scores, it returns the predictions made on each test fold. This means that you get a clean prediction for each instance in the training set (“clean” meaning that the prediction is made by a model
that never saw the data during training)
Now you are ready to get the confusion matrix using the confusion_matrix() func‐ tion.
What does the confusion matrix tell?
Each row in a confusion matrix represents an actual class, while each column represents a predicted class.
A perfect classifier would have only true positives and true negatives, so its confusion matrix would have nonzero values only on its main diago‐
nal (top left to bottom right)
How is precision calculated?
= TP / (TP + FP)
What is the formula for recall / sensitivity / true positive rate (TPR)?
= TP (TP + FN)
the ratio of positive instances that are correctly detected by the classifier
What is the F1-score?
It is often convenient to combine precision and recall into a single metric called the F1 score, in particular if you need a simple way to compare two classifiers. The F1
score is
the harmonic mean of precision and recall (Equation 3-3). Whereas the regular mean treats all values equally, the harmonic mean gives much more weight to low values. As a result, the classifier will only get a high F1
score if both recall and precision are
high.
What is the formula of F1-score?
TP / (TP + (FN + FP)/2)W
What is the precision/recall tradeoff?
increasing precision reduces recall, and vice versa
What is the decision function and decision theshold?
For each instance, it computes a score based on a decision function, and if that score is greater than a threshold, it assigns the instance to the positive
class, or else it assigns it to the negative class.
How to set different thresholds to compute the precision and recall?
you can call its decision_function() method, which returns a score for each instance, and then make predictions based on those scores using any
threshold you want
How do you decide which threshold to use?
For this you will first need to get the scores of all instances in the training set using the cross_val_predict() function again, but this time specifying that you want it to return decision scores instead of predictions
Now with these scores you can compute precision and recall for all possible thresh‐ olds using the precision_recall_curve() function
Finally, you can plot precision and recall as functions of the threshold value using Matplotlib
Why is the precision curve bumpier than the recall curve?
The reason is that precision may sometimes go down when you raise the threshold (although in general it will go up). To understand why, look back at Figure 3-3 and notice what happens when you start from the central threshold and move it just one digit to the right: precision goes from 4/5 (80%) down to 3/4 (75%). On the other hand, recall can only go down when the thres‐
hold is increased, which explains why its curve looks smooth.
What is another way to select a good precision/ recall tradeoff?
Another way to select a good precision/recall tradeoff is to plot precision directly against recall
You can see that precision really starts to fall sharply around 80% recall. You will probably want to select a precision/recall tradeoff just before that drop—for example,
at around 60% recall. But of course the choice depends on your project.
If someone says “let’s reach 99% precision,” you should ask,
At what recall?
What is the ROC?
The receiver operating characteristic (ROC) curve is another common tool used with binary classifiers. It is very similar to the precision/recall curve, but instead of plot‐ ting precision versus recall, the ROC curve plots the true positive rate (another name
for recall) against the false positive rate
What is FPR?
The FPR is the ratio of negative instances that are incorrectly classified as positive. It is equal to one minus the true negative rate, which is the ratio of negative instances that are correctly classified as negative. The
TNR is also called specificity
What does the ROC curve plot?
Hence the ROC curve plots sensitivity (recall) versus 1 – specificity
What is the tradeoff in the ROC curve?
the higher the recall (TPR), the more false positives (FPR) the classifier produces. The dotted line represents the ROC curve of a purely random classifier; a good classifier stays as far away from that line as possible (toward
the top-left corner)
What is a way to compare classifiers?
One way to compare classifiers is to measure the area under the curve (AUC). A per‐ fect classifier will have a ROC AUC equal to 1, whereas a purely random classifier will
have a ROC AUC equal to 0.5
How to decide between the ROC curve and the precision/recall curve?
Since the ROC curve is so similar to the precision/recall (or PR) curve, you may wonder how to decide which one to use. As a rule of thumb, you should prefer the PR curve whenever the positive class is rare or when you care more about the false positives than the false negatives, and the ROC curve otherwise. For example, looking at the previous ROC curve (and the ROC AUC score), you may think that the classifier is really good. But this is mostly because there are few positives (5s) compared to the negatives (non-5s). In contrast, the PR curve makes it clear that the classifier has room for improvement (the curve could be closer to the top-
right corner).