L9 Flashcards

(35 cards)

1
Q

What is the formula for estimating generalization error?

A

Generalization error = bias + variance + noise

Bias and variance typically trade off in relation to model complexity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Define bias in the context of model prediction.

A

Bias: the difference between the prediction of the model and the true value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does variance measure in a model?

A

Variance: a measure of the sensitivity of the model to the variations in the training data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is overfitting in relation to model complexity?

A

Complex model choice=> can lead to overfitting, the model learns to capture noise in the training data rather than the underlying patterns (high variance)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are ensemble methods?

A

Combine multiple base learners (weak models) to improve overall performance.

Leverage the diversity and aggregation to reduce bias, variance, or both.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

List three advantages of ensemble methods.

A
  • Higher accuracy
  • More stable than individual models
  • Better generalization on unseen data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is hard voting in ensemble methods?

A

Each model makes a class prediction. Final class is the majority vote.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is soft voting in ensemble methods?

A

Each model outputs class probabilities. Average the probabilities; pick the class with the highest mean probability.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is bagging (Bootstrap Aggregating)?

A

Train models on different random samples with replacement + take average.

Each new bootstap sample will act as another independent dataset → Fit weak learners for each sample → Aggregate them (average the output)
Final prediction:
Classification: majority vote (hard voting) / highest average probability (soft voting)
Regression: average of outputs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the final prediction method for classification in bagging?

A

Majority vote (hard voting) or highest average probability (soft voting).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What does Random Forest combine?

A

Bagging + Decision Trees + Random Features

Each tree: Trained on a bootstrap sample + At each split, randomly selects a subset of features.
✅ Reduces variance
❌ Still may overfit if trees are too deep.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is out-of-bag (OOB) error?

A

Each tree ignores ~1/3 of data, which is used to validate and gives an internal error estimate without cross-validation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Define bootstrapping in the context of data sampling.

A

Generating samples of size B from an initial dataset of size N by randomly drawing with replacement B observations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How does boosting differ from bagging?

A

Boosting trains models sequentially, while bagging trains models in parallel.

Each model learns to fix mistakes of the previous model
run it multiple times on (reweighted) training data, then let learned classifiers vote (Gradient Boosting)
✅ Reduces bias
❌ Can be sensitive to noise and overfitting if not tuned.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is AdaBoost (Adaptive Boosting)?

A

Updates the weights attached to the results, focusing on misclassified observations.

  1. Start with uniform weights
  2. For each iteration tt:
    - Fit the best possible weak model with the current observations weights
    - Compute the value of the update coefficient indicating how much this weak learner should be taken into account into the ensemble model
    - Update the strong learner by adding the new weak learner multiplied by its update coefficient
    - Compute new observations weights that express which observations we would like to focus on at the next iteration (weights of observations wrongly predicted by the aggregated model increase and weights of the correctly predicted observations decrease)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is Gradient Boosting?

A

Fits model to minimize loss by adding a new model to fit the negative gradient of the loss.

✅ More flexible and powerful than AdaBoost
❌ Slower and prone to overfitting

17
Q

List three hyperparameters of Gradient Boosting.

A
  • n_estimators: Number of trees
  • learning_rate: Shrinks impact of each tree
  • max_depth: Tree complexity
18
Q

Compare error handling in AdaBoost and Gradient Boosting.

A
  • AdaBoost: Focuses on misclassified points
  • Gradient Boosting: Fits residuals (gradients)
19
Q

What is stacking in ensemble methods?

A

Trains many models in parallel and combines them by training a meta-model to output a prediction.

20
Q

What are Level-0 and Level-1 models in stacking?

A
  • Level-0 models: diverse models (e.g. logistic, KNN, SVM)
  • Level-1 model: learns how to combine their predictions

Process:
Split training data
Train base models on one part
Predict on held-out part
Use those predictions to train meta-learner
✅ Captures different perspectives from base learners
❌ Requires careful design and cross-validation

21
Q

List the pros and cons of voting methods in ensemble.

A
  • Pros: Simple, effective
  • Cons: No model interaction
  • focus: Aggregation
22
Q

When should you use Gradient Boosting?

A

When you want accuracy with flexibility.

23
Q

What is the Gini index used for?

A

Less computationally expensive and limited to binary classification.

24
Q

What is entropy used for?

A

Suitable for multi-class classification and works better in high class-imbalance cases.

25
Compare Learner weight in AdaBoost and Gradient Boosting.
Adaboost -- Based on classification error Gradient boosting -- Based on gradient step size
26
Compare Base models in AdaBoost and Gradient Boosting.
*Ada boost -- Stumps (shallow trees) * Gradient boosting -- Trees with greater depth
27
Compare flexibility in AdaBoost and Gradient Boosting.
*Ada boost -- Less flexible * Gradient boosting -- Very flexible
28
When should you use Voting or Bagging?
Quick and simple boost
29
When should you use AdaBoost?
Focus on hard examples
30
When to use Stacking?
Combine diverse models
31
When to use Random Forest / Boosting?
Feature importance
32
When to use Trees, AdaBoost (with stumps)
Interpretability
33
List the pros and cons of Bagging methods in ensemble.
focus - Variance pros - Reduces overfitting cons - Needs many models
34
List the pros and cons of boosting methods in ensemble.
focus - Bias pros - High accuracy, powerful cons - Overfitting, slow training
35
List the pros and cons of stacking methods in ensemble.
focus - Diversity pros - Very flexible, custom combos cons - Complex to set up