Evaluation Flashcards by Zoey Sheffield

What is formal definition of overfitting?

A predictor F is overfit if we can find another predictor F’ where:

E_train(F’) > E_train(F)
E_gen(F’) gen(F)

How well did you know this?

Not at all

Perfectly

What is formal definition of underfitting?

Can find another predictor F’ with smaller E_train and E_gen

How well did you know this?

Not at all

Perfectly

How is E_train (training error computed)?

How well did you know this?

Not at all

Perfectly

How is E_gen calculated (generalization error)?

How well did you know this?

Not at all

Perfectly

How can we estimate E_gen?

Set aside test set and compute E_test (same way as E_train)

lim E_test = E_gen as the size of the test set -> infinity

How well did you know this?

Not at all

Perfectly

How can you compute the confidence interval for E_gen from E_test

How well did you know this?

Not at all

Perfectly

What do we use training/validation/testing sets

for?

Training set: construct classifier
Validation set: pick algorithm + tune hyper parameters
Testing set: estimate future error rate

How well did you know this?

Not at all

Perfectly

How does cross-validation work?

Randomly split data into k sets
Test on one portion (train on k-1 others)
Average error over all k folds
Final classifier is trained on all date

How well did you know this?

Not at all

Perfectly

What is leave-one-out?

Cross validation where k = # of training instances

How well did you know this?

Not at all

Perfectly

What is the problem with leave-one-out validation?

Classes not balanced

Testing { 1 of A, 0 of B } vs training: { n/2 of B, n/(2-1) of A }

We would always predict B (most frequent), but we will always be wrong

How well did you know this?

Not at all

Perfectly

What does stratification do?

Keeps class labels balanced across training/testing sets

How well did you know this?

Not at all

Perfectly

How do you do stratification?

Split instances by class
Split class into K parts
Assemple i^th fold by combining 1 part from each path

How well did you know this?

Not at all

Perfectly

What is true positive?

Classifier predicts positive, and it is positive

How well did you know this?

Not at all

Perfectly

What is true negative?

Classifier predicts negative, and it is negative

How well did you know this?

Not at all

Perfectly

What is false positive?

Classifier predicts positive, but they are negitive

How well did you know this?

Not at all

Perfectly

What is false negative?

Classifier predicts negative, but it is actuall positive

What is the definition of classification error?

What is the defintion of accuracy?

What is the problem with classification error / accuracy?

Misleading when classes are unbalances

Predict earthquake: unlikely so always predict no
Decide if webpage is relevent: 99.999% are not so retreive nothing

What is the definition of False Alarm Rate?

FP / (FP + TN)

What is the definition of Miss rate?

FN / (TP + FN)

What is the definition of Recall?

TP / (TP + FN)

What is the defintion of Precision?

TP / (TP + FP)

What is the problem with False alarm rate / miss rate / recall / precision?

Trivial to get 100% or 0% individually, must report them in pairs

What evaluation measure would we use for event detection?

Cost = C_FP \* FP + C_FN \* FN e.g. cost of evacuating with no earthquake vs cost of staying with earthquake

What is the definition of F-measure?

2 / (1 / Recall + 1 / Precision) Simular to accuracy but without TN

What is a ROC curve?

Plot of TP vs FP as threshold varies

What does a perfect and random classifer look like on an ROC curve?

Whats are some problems with mean squared error?

* Very sensitive to outliers (because of the squaring) * Sensitive to mean / scale (mean value might have lower MSE than a model which captures the pattern but the mean is off)

What is the mean absolute error (MAE)?

What is Median absolute deviation (MAD)?

med { |f(x_i) - y_i| }

Whats the pros/cons of median absolute deviation (MAD)?

completly ignores outliers but cant take derivative

What is definition of correlation coefficient?

What does correlation coeffient capture?

Realtive ordering - usefull for ranking tasks