L10 Flashcards by jolyn Unknown

What should a model be evaluated on to reflect its generalization ability?

A model should be evaluated on independent test data.
Performance on unseen data reflects generalization ability.
Focus here is on supervised learning.

Performance on unseen data is crucial for understanding how well a model generalizes.

How well did you know this?

Not at all

Perfectly

What is the basic evaluation setup for model evaluation?

Training set and test set

Training set to train the model.
Test set to evaluate generalization.

The training set is used to train the model, while the test set evaluates its generalization.

How well did you know this?

Not at all

Perfectly

What are the challenges in train-test split?

Choosing train/test split sizes
Test set differing from training data
Detecting overfitting

How well did you know this?

Not at all

Perfectly

Define Bias in the context of model evaluation.

Systematic error caused by wrong assumptions or underfitting

Bias leads to consistently incorrect predictions.

How well did you know this?

Not at all

Perfectly

Define Variance in the context of model evaluation.

Sensitivity to data fluctuations, often leading to overfitting

High variance means the model is too complex and captures noise.

How well did you know this?

Not at all

Perfectly

What is the ideal model behavior regarding bias and variance?

Low Bias –> Low Variance –> Ideal Model Behavior
Low Bias –> High Variance –> Overfitting Model Behavior
High Bias –> Low Variance –> Underfitting model Behavior
High Bias –> High Variance –> Worst case Model Behavior

Total error = Bias² + Variance + Irreducible error

How well did you know this?

Not at all

Perfectly

What is underfitting?

Low training performance

Indicates that the model is too simple to capture the underlying patterns.

How well did you know this?

Not at all

Perfectly

What is overfitting?

High training performance but poor test performance

The model learns noise in the training data rather than generalizable patterns.

How well did you know this?

Not at all

Perfectly

What is k-Fold Cross-Validation?

Data is split into k folds; each fold is used once as a test set

Pros: Reduces randomness, shows model sensitivity.
Cons: More computation, may cause class imbalance.

This method reduces randomness and shows model sensitivity.

How well did you know this?

Not at all

Perfectly

What is the advantage of Stratified k-Fold Cross-Validation?

Class distribution is preserved across folds

How well did you know this?

Not at all

Perfectly

What is Leave-One-Out (LOO) Cross-Validation?

Each example is its own test set

Very time-consuming but maximally uses data.

It’s very time-consuming but maximally utilizes data.

How well did you know this?

Not at all

Perfectly

What is Grid Search in model tuning?

Trying combinations of hyperparameters
Combine with CV for reliable tuning.
⚠️ Never tune on the test set—use a final holdout test set.

It should be combined with cross-validation for reliable tuning.

How well did you know this?

Not at all

Perfectly

What is the formula for Accuracy in binary classification metrics?

TP + TN / TP + TN + FP + FN

How well did you know this?

Not at all

Perfectly

What does Precision measure?

TP / (TP + FP)

How well did you know this?

Not at all

Perfectly

What is Recall also known as?

Sensitivity = TP / (TP + FN)

How well did you know this?

Not at all

Perfectly

What is the F1 Score?

Study These Flashcards

2 x (precision x recall / precision + recall)

It is the harmonic mean of Precision and Recall.

Why can accuracy be misleading for imbalanced datasets?

Study These Flashcards

A model could achieve high accuracy by always predicting the majority class.

Example: 90% accuracy could mean model always predicts majority class.

What is the ROC Curve?

Study These Flashcards

Plots True Positive Rate vs False Positive Rate

FPR = FP / FP + TN
TPR = TP / TP + FN

AUC-ROC: 0.5 = random, 1.0 = perfect.
ROC curves are appropriate when the observations are balanced between each class

AUC-ROC ranges from 0.5 (random) to 1.0 (perfect).

What is the Precision-Recall Curve used for?

Study These Flashcards

Better evaluation of models on imbalanced datasets

What does a Confusion Matrix represent?

Study These Flashcards

Matrix of actual vs predicted labels

What is Macro F1?

Study These Flashcards

Average F1 of each class, treating all equally

What is Weighted F1?

Study These Flashcards

Weighted average of F1 by class size

What is Micro F1?

Study These Flashcards

Aggregate counts for precision/recall computation

All samples are treated equally.

What does R² represent in regression metrics?

Study These Flashcards

Coefficient of determination

What does MSE stand for?

Mean Squared Error

What is the advantage of using MAE or MedAE?

More robust to outliers

What is Random Undersampling?

- solution for imbalance -> changing the data - Dropping random samples from the majority class - Pros: Fast; reduces size - Cons: Loses valuable data ## Footnote It reduces size but can lose valuable data.

What does SMOTE stand for?

Synthetic Minority Oversampling Technique - Generates synthetic data points near minority samples. - Uses interpolation (not duplication). - Works well in many real-world applications.

What is random oversampling?

- solution for imbalance -> changing the data - duplicate minority samples. - Pros: Balances data - Cons: Overfitting risk; slow

What should be considered when choosing evaluation metrics?

Class balance and application goals

What is Shuffle-Split CV?

Randomly splits data multiple times. Controls for size and iteration count

What is Group CV?

Ensures groups (e.g., patients, speakers) don’t cross between train/test. Useful in medical and person-dependent tasks.

What is threshold tuning?

- You can adjust decision thresholds to trade off between Precision and Recall. - Use PR Curve to visualize this trade-off.

What are imbalance sources?

- Asymmetric data: Classes naturally imbalanced - Asymmetric cost: Errors have unequal importance

What is Edited Nearest Neighbors (ENN)?

- solution for imbalance -> changing the data - Removes noisy or borderline examples. - Uses KNN to filter training data.

L10 Flashcards

(35 cards)