Model Evaluation, Hyperparameter Tuning, Classification & Regression Metrics Flashcards

1
Q

What is the solution to model evaluation problems?

A

Split the data into training, validation, and test sets.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a Training Set?

A

Used to train the model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a Validation Set?

A

Used during training to tune hyperparameters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a Test Set?

A

Used after training to check final performance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the Holdout method in model validation?

A

One-time split: Train (e.g. 60%), Val (20%), Test (20%).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

When is the Holdout method best used?

A

For large datasets.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is K-Fold Cross Validation (KCV)?

A

Split data into k parts, rotate training/testing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

When is K-Fold Cross Validation best used?

A

Best for small data, better accuracy.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is Overfitting?

A

Too good on training, bad on new data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is Underfitting?

A

Bad on both training and new data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the characteristics of Overfitting?

A

High variance, memorizing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the characteristics of Underfitting?

A

High bias, guessing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is Early Stopping?

A

Stop training when validation loss goes up.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is L2 Regularization?

A

Penalizes large weights to keep the model simple.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the L2 Regularization formula?

A

λ * Σ(weights²) → encourages smaller weights.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are Hyperparameters?

A

Settings you pick before training (not learned from data).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Give examples of Hyperparameters.

A
  • Learning rate
  • Batch size
  • Number of layers
  • Activation functions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is Grid Search?

A

Try every combo of settings.

19
Q

When is Grid Search effective?

A

Good for small search spaces.

20
Q

What is a disadvantage of Grid Search?

A

Super slow if too many options.

21
Q

What is Random Search?

A

Pick random combos.

22
Q

When is Random Search better?

A

Better for large/continuous spaces.

23
Q

What are Classification Models used for?

A

To assign a class (label) to data.

24
Q

What is an example of a Classification Model?

A

Is this email spam? Is the tumor benign or malignant?

25
What is Accuracy in model evaluation?
% of correct predictions.
26
What is a limitation of Accuracy?
Doesn't work well when classes are imbalanced.
27
What is a Confusion Matrix?
A table used to describe the performance of a classification model.
28
What does TP stand for in a Confusion Matrix?
True Positives.
29
What does FN stand for in a Confusion Matrix?
False Negatives.
30
What does FP stand for in a Confusion Matrix?
False Positives.
31
What does TN stand for in a Confusion Matrix?
True Negatives.
32
What is Precision?
TP / (TP + FP) → How many predicted positives were correct?
33
What is Recall?
TP / (TP + FN) → How many actual positives were found?
34
What is the F1 Score?
Harmonic mean of Precision & Recall → 2 * (P * R) / (P + R).
35
What is AUC?
Area Under Curve measures model’s ability to distinguish classes.
36
When should you use F1 or AUC?
When data is imbalanced or missing a positive is worse than a few false alarms.
37
What are Regression Models used for?
When predicting a number (not a category).
38
Give examples of Regression Models.
* House prices * Stock market trends
39
What is MAE?
Mean Absolute Error – average of errors, less sensitive to outliers.
40
What is MSE?
Mean Squared Error – squares errors, punishes big mistakes more.
41
What are Clustering Models used for?
When you don’t have labels – model tries to find natural groupings.
42
Give an example of a Clustering Model use case.
Segmenting customers into behavior types.
43
What metric is used for evaluating clustering?
Silhouette Coefficient.
44
What does a higher Silhouette Coefficient indicate?
Better clustering.