Model Selection and Cross Evaluation Flashcards by Mari Weyand

What is model selection?

1) choice of algorithm, feature extraction, selection, and normalization.
2) hyperparameter tuning

How well did you know this?

Not at all

Perfectly

What is model evaluation/assessment?

After selecting your model, estimating how it generalizes to new unseen data.

How well did you know this?

Not at all

Perfectly

Holdout method of validation

Using separate test and training data.

How well did you know this?

Not at all

Perfectly

Why do we need separate test data?

To get an unbiased estimate of how the model trained on training data works.

How well did you know this?

Not at all

Perfectly

Variance in error-

How large the difference between estimated and actual error tends to be on average? Ideally close to zero.
Note: method with zero error can have a large variance! Errors in positive and negative directions cancel each other out.
Usually, the more data, the smaller the variance. Remember,

How well did you know this?

Not at all

Perfectly

How to combine model selection and final evaluation

How well did you know this?

Not at all

Perfectly

What is cross-validation?

Too little data to split,
you have a data set of n instances

How well did you know this?

Not at all

Perfectly

When would you use CV?

When there’s not enough data to do a separate test set.

How well did you know this?

Not at all

Perfectly

Leave-one-out cross-validation.

With n data, leaving one record out for testing and iterating through all data, predict the error with the ith record. On each iteration, overfitting cannot happen because the test data is not part of the training set. Allows using all the data for both training and testing. Does not always work perfectly.

How well did you know this?

Not at all

Perfectly

How to combine model selection and final evaluation when using cross-validation?

Using cross-validation for the training set for parameter tuning. Test the final model on independent test data. Slight optimistic bias. To combat this: Nested cross-validation.

How well did you know this?

Not at all

Perfectly

What performance measures are out there for regression and
classification?

Mean square error (or root mean squared error) for regression. 0 for a correct prediction, grows quadratically. Outliers have a large effect. Mean absolute error is another one, not as sensitive to outliers.
For Classification: misclassification rate (1 for error, 0 for correct value). Baseline: majority voter.

How well did you know this?

Not at all

Perfectly

What limitations does misclassification rate / classification
accuracy have as a performance measure?

A low misclassification rate doesn’t necessarily mean good performance. A very unbalanced data will give a low misclassification rate.

How well did you know this?

Not at all

Perfectly

What are cost and confusion matrices?

If costs for different misclassifications are different, use a cost matrix. (ie. calling a bad product good is worse than rejecting a good product).
Confusion matrix: Classification results are shown in a table where rows represent true classes and columns, predicted classes.

How well did you know this?

Not at all

Perfectly

What are true positives, false positives etc., how can one
calibrate classifier to achieve different tradeoffs between them?

false positive - predicted as positive, really negative
false negative is often the most dangerous.
F-score 2(Precision x Recall/Precision + Recall)

How well did you know this?

Not at all

Perfectly

What is a ROC curve, how to interpret area under ROC curve?

TPR = TP/(TP+FN) - ability to find positives (recall, sensitivity)

FPR = FP/FP+TN
rate of negatives IDd as positive
(1 specificity)
ROC curve, True positives against the False positive rate
best possible curve

How well did you know this?

Not at all

Perfectly

What is a loss function?

Study These Flashcards

We have a hypothesis h(x) = y’, learned from training data by from a ML algorithm (eg. knn, linear model, or decision tree).
Loss (error) function l(y’, y) tells the price we pay for predicting y’ when the true value is y.
What is the expected loss of h on new data?
For example l(y’, y) is 1 if y’ <> y, otherwise 0

Misclassification rate

Study These Flashcards

An average zero-one loss over a data set. If 10% is classified incorrectly, the misclassification rate is 0.1. (0.9 would be the classification accuracy).

Estimating the generalization error.

Study These Flashcards

The training set estimate is unreliable. Standard approach: randomly assigned to training and test data. Also averaging several estimates.

Estimators of generalization error.

Study These Flashcards

Training set error (resubstitution error)
Test set error (aka hold-out error)
Cross-validation error (leave one out CV, k-fold CV)
Bootstrap error (not recommended)

Optimistic bias

Study These Flashcards

Systematically estimating the generalization error to be smaller than it really is. E.g: training set error.

Pessimistic bias

Study These Flashcards

Systematically estimating the error to be larger than it actually is. Large test set merged back to training set before final training.

Training set error

Study These Flashcards

Also called resubstitution error. Has a high optimistic bias since the model was chosen to fit training data. In model selection favors the most complicated models. Don’t use it!

Profs advice: Never, ever report your model’s performance on its…

Study These Flashcards

training data.

Test set error also known as…

Study These Flashcards

holdout estimate

How to create a test set?

Randomly assign e.g. 70 to training set and 30% to test set.

Overfitting to test set

The test set is only unbiased if used for testing a single hypothesis. Using the test set for model selection multiple times will cause the test set to be biased. Solution: use two test sets, one for hypothesis testing and one for model selection. (50% fitting, 25% comparing models, 25 for final model evaluation).

What is stratification?

For classification, do a random split for each class separately to guarantee a similar distribution of classes in each set.

Tricks of the trade for validation.

Use stratification for classification (possible but more complicated for regression). Never assume data is in random order!

K-fold Cross-validation.

The same basic idea as leave one out cross-validation, but instead of leaving one record for validation, k records are used (usually 5 or 10). Data is randomly divided into k folds, with each iteration one fold if left out and the rest is used for training. Leave one out is a special case of k-fold cross-validation. The actual mode is trained with the entire data set. Much faster than leave one out CV.

K-fold cross-validation analysis

Faster than leave-one out. Slight pessimistic bias. Not completely deterministic: some variation depending on the random fold split.

When would you use stratified k-fold cross-validation?

Imbalanced stratification problems. To guarantee that each fold has roughly the same class distribution as the full set.

How to reduce variance in K-fold cross-validation?

M times K-fold cross-validation. Do fold splitting M times (e.g. M=1000) and take the average. Again computationally expensive.

IID assumption

Data is assumed to be independently sampled and identically distributed. Doesn't always hold. For ex. with image processing, you might have 10 versions of the same image with different lighting, etc. Or spatial data/time series data.

Precision vs Recall

Precision = TP/(TP + FP) - the proportion of relevant results from what is returned Recall = TP/(TP + FN (positives you didn't find) - proportion of relevant result found

What kind of test gives perfect recall?

if you return positives for every result. TP or FP. Ie. everyone has covid. There are no false negatives. Or the search engine that returns the entire internet. It doesn't make sense to just report precision or recall by themselves.

Model Selection and Cross Evaluation Flashcards

(35 cards)