Evaluation Flashcards
What is formal definition of overfitting?
A predictor F is overfit if we can find another predictor F’ where:
- Etrain(F’) > Etrain(F)
- Egen(F’) gen(F)
What is formal definition of underfitting?
Can find another predictor F’ with smaller Etrain and Egen
How is Etrain (training error computed)?

How is Egen calculated (generalization error)?

How can we estimate Egen?
Set aside test set and compute Etest (same way as Etrain)
lim Etest = Egen as the size of the test set -> infinity
How can you compute the confidence interval for Egen from Etest

What do we use training/validation/testing sets
for?
- Training set: construct classifier
- Validation set: pick algorithm + tune hyper parameters
- Testing set: estimate future error rate
How does cross-validation work?
- Randomly split data into k sets
- Test on one portion (train on k-1 others)
- Average error over all k folds
- Final classifier is trained on all date
What is leave-one-out?
Cross validation where k = # of training instances
What is the problem with leave-one-out validation?
Classes not balanced
Testing { 1 of A, 0 of B } vs training: { n/2 of B, n/(2-1) of A }
We would always predict B (most frequent), but we will always be wrong
What does stratification do?
Keeps class labels balanced across training/testing sets
How do you do stratification?
- Split instances by class
- Split class into K parts
- Assemple ith fold by combining 1 part from each path
What is true positive?
Classifier predicts positive, and it is positive
What is true negative?
Classifier predicts negative, and it is negative
What is false positive?
Classifier predicts positive, but they are negitive
What is false negative?
Classifier predicts negative, but it is actuall positive
What is the definition of classification error?

What is the defintion of accuracy?

What is the problem with classification error / accuracy?
Misleading when classes are unbalances
- Predict earthquake: unlikely so always predict no
- Decide if webpage is relevent: 99.999% are not so retreive nothing
What is the definition of False Alarm Rate?
FP / (FP + TN)
What is the definition of Miss rate?
FN / (TP + FN)
What is the definition of Recall?
TP / (TP + FN)
What is the defintion of Precision?
TP / (TP + FP)
What is the problem with False alarm rate / miss rate / recall / precision?
Trivial to get 100% or 0% individually, must report them in pairs


