L10 Flashcards
(35 cards)
What should a model be evaluated on to reflect its generalization ability?
- A model should be evaluated on independent test data.
- Performance on unseen data reflects generalization ability.
- Focus here is on supervised learning.
Performance on unseen data is crucial for understanding how well a model generalizes.
What is the basic evaluation setup for model evaluation?
Training set and test set
- Training set to train the model.
- Test set to evaluate generalization.
The training set is used to train the model, while the test set evaluates its generalization.
What are the challenges in train-test split?
- Choosing train/test split sizes
- Test set differing from training data
- Detecting overfitting
Define Bias in the context of model evaluation.
Systematic error caused by wrong assumptions or underfitting
Bias leads to consistently incorrect predictions.
Define Variance in the context of model evaluation.
Sensitivity to data fluctuations, often leading to overfitting
High variance means the model is too complex and captures noise.
What is the ideal model behavior regarding bias and variance?
Low Bias –> Low Variance –> Ideal Model Behavior
Low Bias –> High Variance –> Overfitting Model Behavior
High Bias –> Low Variance –> Underfitting model Behavior
High Bias –> High Variance –> Worst case Model Behavior
Total error = Bias² + Variance + Irreducible error
What is underfitting?
Low training performance
Indicates that the model is too simple to capture the underlying patterns.
What is overfitting?
High training performance but poor test performance
The model learns noise in the training data rather than generalizable patterns.
What is k-Fold Cross-Validation?
Data is split into k folds; each fold is used once as a test set
Pros: Reduces randomness, shows model sensitivity.
Cons: More computation, may cause class imbalance.
This method reduces randomness and shows model sensitivity.
What is the advantage of Stratified k-Fold Cross-Validation?
Class distribution is preserved across folds
What is Leave-One-Out (LOO) Cross-Validation?
Each example is its own test set
Very time-consuming but maximally uses data.
It’s very time-consuming but maximally utilizes data.
What is Grid Search in model tuning?
- Trying combinations of hyperparameters
- Combine with CV for reliable tuning.
⚠️ Never tune on the test set—use a final holdout test set.
It should be combined with cross-validation for reliable tuning.
What is the formula for Accuracy in binary classification metrics?
TP + TN / TP + TN + FP + FN
What does Precision measure?
TP / (TP + FP)
What is Recall also known as?
Sensitivity = TP / (TP + FN)
What is the F1 Score?
2 x (precision x recall / precision + recall)
It is the harmonic mean of Precision and Recall.
Why can accuracy be misleading for imbalanced datasets?
A model could achieve high accuracy by always predicting the majority class.
Example: 90% accuracy could mean model always predicts majority class.
What is the ROC Curve?
Plots True Positive Rate vs False Positive Rate
FPR = FP / FP + TN
TPR = TP / TP + FN
- AUC-ROC: 0.5 = random, 1.0 = perfect.
- ROC curves are appropriate when the observations are balanced between each class
AUC-ROC ranges from 0.5 (random) to 1.0 (perfect).
What is the Precision-Recall Curve used for?
Better evaluation of models on imbalanced datasets
What does a Confusion Matrix represent?
Matrix of actual vs predicted labels
What is Macro F1?
Average F1 of each class, treating all equally
What is Weighted F1?
Weighted average of F1 by class size
What is Micro F1?
Aggregate counts for precision/recall computation
All samples are treated equally.
What does R² represent in regression metrics?
Coefficient of determination