Section 3 Evaluation of a classifier Flashcards
(43 cards)
Name 3 measures to evaluate how a linear regression model performs
Root mean squared error
Mean absolute error
R^2
What is the MAP rule
Popular choice along a continuous set of τ (threshold) [0,1] is 0.5, the “maximum a posteriori (MAP) rule” which assigns the most likely class, i.e. corresponding to the largest probability.
When might the MAP rule not be a correct choice to make?
This might not be a sensible choice as it assumes the two classes are balanced.
This is not a sensible choice if different misclassification costs are associated with false positives and false negatives. Ex: Medical applications, where category of interest is rare
How to check balance of data
First thing you should do with data is have a look at the proportions of the sample.
If data is imbalanced, skewed then using MAP rule will get a skewed distribution.
Explain threshold classifier evaluation
If τ = 0, then every observation is predicted as y_hati = 1
If τ = 1, then every observation is predicted as y_hati = 0
Explain meaning of TN, TP,FN,FP
T N – True negatives i.e. Number of 0 correctly classified as 0.
T P – True positives i.e.Number of 1 correctly classified as 1.
F N – False negatives i.e. Number of 1 wrongly classified as 0.
F P – False positives i.e. Number of 0 wrongly classified as 1.
What is sensitivity
Sensitivity/recall focuses on the positive cases assessing: P(y_hat=1|y=1) Of those truly positive how many are classified as positive
What is recall?
Sensitivity/recall focuses on the positive cases assessing: P(y_hat=1|y=1) Also we can get the false positive rate by 1-sensitivity. Of those truly positive how many are classified as positive
What is accuracy
Accuracy would be assessing: P(y_hat=y)
What is specificity
Specificity focuses on the negative cases assessing: P(y_hat=0|y=0). Also we can get the false positive rate by 1- specificity.
Of those truly negative how many are classified as negative
What is precision
Precision would be assessing: P(y=1|y_hat=1)
Of those who are classified as positive how many are truly positive.
Describe the relationship of precision vs recall
Recall decreasing will increase the precision metric.
If positive class is rare what metrics would we focus on?
Important case is when a positive class (for example) is rare, we want to focus more on the positive class being predicted well. In this scenario we would want to have a good balance between precision and recall.
Express recall in terms of precision
Recall = (Precision*prevalence form the model)/(prevalence from the data)
What is the ROC curve
The receiver operating characteristic – ROC curve plots sensitivity (true positive rate) versus specificity (false positive rate).The curve illustrates the diagnostic ability of a binary classifier as the discrimination threshold τ is varied.
How can we evaluate a classifier by the ROC curve
How often will a randomly chosen 1 outcome have a higher probability of being predicted to be a 1 outcome than a randomly chosen true 0?
A perfect classifier would have AU-ROC = 1,and the ROC curve pushed to the top left Corner. This would imply large sensitivity and large specificity.
A classifier not better than random guessing would have AU-ROC = 0.5
A common way of choosing τ in relation to the ROC curve is to maximise the sum of sensitivity and specificity - balance true positives and negatives
When does ROC curve not work
Sensitivity and specificity, in conjunction with the ROC curve, can work in mild imbalanced situations.
However, because sensitivity and specificity are equally weighted, these metrics and ROC curve can be misleading in some very imbalanced applications and provide an optimistic view.
What is the PR cruve
The precision/recall – PR curve plots precision versus recall, as a function of the classification threshold τ.
The area under the PR curve - AU-PR is related to the average precision for varying threshold τ.
The larger the area under the curve, the better is the ability of the classifier at correctly identifying the positive(rare) class.
How to evaluate a classifer based on the PR curve
The larger the area under the curve, the better is the ability of the classifier at correctly identifying the positive(rare) class.
A good classifier will have both a high precision and high recall, and the PR curve pushed to the top right corner.
The precision is lower bounded by the prevalence of the positive class - which would mean no better than random guessing .
What is the F1 score
A quantity to quantify and compare a classifier’s ability to predict positive cases would be the F1 score. This is the harmonic mean of precision and recall
Interpretation: Model’s balanced ability to both detect positive cases (recall) and be accurate with the cases it detects (precision).
The score 0 ≥ F1 ≥ 1, with F1 = 1 denoting perfection
Summarise why we use ROC/ sensitivity+specificity
Mild to none imbalance
“Cost” of false positive vs false negative
Interest also in the negative (0) cases
Summarise why we use Precision, recall and F1
Positive cases are rare
Want to maintain low false positive rate
Does not consider (True) negative cases
Summarise the reasons for using accuracy measure
Easy to interpret and use
Popular
Not always appropriate
What measure is used typically for multinomial logistic regression
Typically simple accuracy is used but maybe also:
Class-specific sensitivity, i.e. the proportion of correctly classified observations for class k.
Class-specific false positive rate, i.e. the proportion of instances incorrectly assigned to class k.