L10 Flashcards

(35 cards)

1
Q

What should a model be evaluated on to reflect its generalization ability?

A
  • A model should be evaluated on independent test data.
  • Performance on unseen data reflects generalization ability.
  • Focus here is on supervised learning.

Performance on unseen data is crucial for understanding how well a model generalizes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the basic evaluation setup for model evaluation?

A

Training set and test set

  • Training set to train the model.
  • Test set to evaluate generalization.

The training set is used to train the model, while the test set evaluates its generalization.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the challenges in train-test split?

A
  • Choosing train/test split sizes
  • Test set differing from training data
  • Detecting overfitting
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Define Bias in the context of model evaluation.

A

Systematic error caused by wrong assumptions or underfitting

Bias leads to consistently incorrect predictions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Define Variance in the context of model evaluation.

A

Sensitivity to data fluctuations, often leading to overfitting

High variance means the model is too complex and captures noise.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the ideal model behavior regarding bias and variance?

A

Low Bias –> Low Variance –> Ideal Model Behavior
Low Bias –> High Variance –> Overfitting Model Behavior
High Bias –> Low Variance –> Underfitting model Behavior
High Bias –> High Variance –> Worst case Model Behavior

Total error = Bias² + Variance + Irreducible error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is underfitting?

A

Low training performance

Indicates that the model is too simple to capture the underlying patterns.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is overfitting?

A

High training performance but poor test performance

The model learns noise in the training data rather than generalizable patterns.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is k-Fold Cross-Validation?

A

Data is split into k folds; each fold is used once as a test set

Pros: Reduces randomness, shows model sensitivity.
Cons: More computation, may cause class imbalance.

This method reduces randomness and shows model sensitivity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the advantage of Stratified k-Fold Cross-Validation?

A

Class distribution is preserved across folds

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is Leave-One-Out (LOO) Cross-Validation?

A

Each example is its own test set

Very time-consuming but maximally uses data.

It’s very time-consuming but maximally utilizes data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is Grid Search in model tuning?

A
  • Trying combinations of hyperparameters
  • Combine with CV for reliable tuning.
    ⚠️ Never tune on the test set—use a final holdout test set.

It should be combined with cross-validation for reliable tuning.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the formula for Accuracy in binary classification metrics?

A

TP + TN / TP + TN + FP + FN

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What does Precision measure?

A

TP / (TP + FP)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is Recall also known as?

A

Sensitivity = TP / (TP + FN)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the F1 Score?

A

2 x (precision x recall / precision + recall)

It is the harmonic mean of Precision and Recall.

17
Q

Why can accuracy be misleading for imbalanced datasets?

A

A model could achieve high accuracy by always predicting the majority class.

Example: 90% accuracy could mean model always predicts majority class.

18
Q

What is the ROC Curve?

A

Plots True Positive Rate vs False Positive Rate

FPR = FP / FP + TN
TPR = TP / TP + FN

  • AUC-ROC: 0.5 = random, 1.0 = perfect.
  • ROC curves are appropriate when the observations are balanced between each class

AUC-ROC ranges from 0.5 (random) to 1.0 (perfect).

19
Q

What is the Precision-Recall Curve used for?

A

Better evaluation of models on imbalanced datasets

20
Q

What does a Confusion Matrix represent?

A

Matrix of actual vs predicted labels

21
Q

What is Macro F1?

A

Average F1 of each class, treating all equally

22
Q

What is Weighted F1?

A

Weighted average of F1 by class size

23
Q

What is Micro F1?

A

Aggregate counts for precision/recall computation

All samples are treated equally.

24
Q

What does R² represent in regression metrics?

A

Coefficient of determination

25
What does MSE stand for?
Mean Squared Error
26
What is the advantage of using MAE or MedAE?
More robust to outliers
27
What is Random Undersampling?
- solution for imbalance -> changing the data - Dropping random samples from the majority class - Pros: Fast; reduces size - Cons: Loses valuable data ## Footnote It reduces size but can lose valuable data.
28
What does SMOTE stand for?
Synthetic Minority Oversampling Technique - Generates synthetic data points near minority samples. - Uses interpolation (not duplication). - Works well in many real-world applications.
29
What is random oversampling?
- solution for imbalance -> changing the data - duplicate minority samples. - Pros: Balances data - Cons: Overfitting risk; slow
30
What should be considered when choosing evaluation metrics?
Class balance and application goals
31
What is Shuffle-Split CV?
Randomly splits data multiple times. Controls for size and iteration count
32
What is Group CV?
Ensures groups (e.g., patients, speakers) don’t cross between train/test. Useful in medical and person-dependent tasks.
33
What is threshold tuning?
- You can adjust decision thresholds to trade off between Precision and Recall. - Use PR Curve to visualize this trade-off.
34
What are imbalance sources?
- Asymmetric data: Classes naturally imbalanced - Asymmetric cost: Errors have unequal importance
35
What is Edited Nearest Neighbors (ENN)?
- solution for imbalance -> changing the data - Removes noisy or borderline examples. - Uses KNN to filter training data.