Model Selection and Cross Evaluation Flashcards
(35 cards)
What is model selection?
1) choice of algorithm, feature extraction, selection, and normalization.
2) hyperparameter tuning
What is model evaluation/assessment?
After selecting your model, estimating how it generalizes to new unseen data.
Holdout method of validation
Using separate test and training data.
Why do we need separate test data?
To get an unbiased estimate of how the model trained on training data works.
Variance in error-
How large the difference between estimated and actual error tends to be on average? Ideally close to zero.
Note: method with zero error can have a large variance! Errors in positive and negative directions cancel each other out.
Usually, the more data, the smaller the variance. Remember,
How to combine model selection and final evaluation
What is cross-validation?
Too little data to split,
you have a data set of n instances
When would you use CV?
When there’s not enough data to do a separate test set.
Leave-one-out cross-validation.
With n data, leaving one record out for testing and iterating through all data, predict the error with the ith record. On each iteration, overfitting cannot happen because the test data is not part of the training set. Allows using all the data for both training and testing. Does not always work perfectly.
How to combine model selection and final evaluation when using cross-validation?
Using cross-validation for the training set for parameter tuning. Test the final model on independent test data. Slight optimistic bias. To combat this: Nested cross-validation.
What performance measures are out there for regression and
classification?
Mean square error (or root mean squared error) for regression. 0 for a correct prediction, grows quadratically. Outliers have a large effect. Mean absolute error is another one, not as sensitive to outliers.
For Classification: misclassification rate (1 for error, 0 for correct value). Baseline: majority voter.
What limitations does misclassification rate / classification
accuracy have as a performance measure?
A low misclassification rate doesn’t necessarily mean good performance. A very unbalanced data will give a low misclassification rate.
What are cost and confusion matrices?
If costs for different misclassifications are different, use a cost matrix. (ie. calling a bad product good is worse than rejecting a good product).
Confusion matrix: Classification results are shown in a table where rows represent true classes and columns, predicted classes.
What are true positives, false positives etc., how can one
calibrate classifier to achieve different tradeoffs between them?
false positive - predicted as positive, really negative
false negative is often the most dangerous.
F-score 2(Precision x Recall/Precision + Recall)
What is a ROC curve, how to interpret area under ROC curve?
TPR = TP/(TP+FN) - ability to find positives (recall, sensitivity)
FPR = FP/FP+TN
rate of negatives IDd as positive
(1 specificity)
ROC curve, True positives against the False positive rate
best possible curve
What is a loss function?
We have a hypothesis h(x) = y’, learned from training data by from a ML algorithm (eg. knn, linear model, or decision tree).
Loss (error) function l(y’, y) tells the price we pay for predicting y’ when the true value is y.
What is the expected loss of h on new data?
For example l(y’, y) is 1 if y’ <> y, otherwise 0
Misclassification rate
An average zero-one loss over a data set. If 10% is classified incorrectly, the misclassification rate is 0.1. (0.9 would be the classification accuracy).
Estimating the generalization error.
The training set estimate is unreliable. Standard approach: randomly assigned to training and test data. Also averaging several estimates.
Estimators of generalization error.
Training set error (resubstitution error)
Test set error (aka hold-out error)
Cross-validation error (leave one out CV, k-fold CV)
Bootstrap error (not recommended)
Optimistic bias
Systematically estimating the generalization error to be smaller than it really is. E.g: training set error.
Pessimistic bias
Systematically estimating the error to be larger than it actually is. Large test set merged back to training set before final training.
Training set error
Also called resubstitution error. Has a high optimistic bias since the model was chosen to fit training data. In model selection favors the most complicated models. Don’t use it!
Profs advice: Never, ever report your model’s performance on its…
training data.
Test set error also known as…
holdout estimate