L9 Flashcards by jolyn Unknown

What is the formula for estimating generalization error?

Generalization error = bias + variance + noise

Bias and variance typically trade off in relation to model complexity.

How well did you know this?

Not at all

Perfectly

Define bias in the context of model prediction.

Bias: the difference between the prediction of the model and the true value.

How well did you know this?

Not at all

Perfectly

What does variance measure in a model?

Variance: a measure of the sensitivity of the model to the variations in the training data.

How well did you know this?

Not at all

Perfectly

What is overfitting in relation to model complexity?

Complex model choice=> can lead to overfitting, the model learns to capture noise in the training data rather than the underlying patterns (high variance)

How well did you know this?

Not at all

Perfectly

What are ensemble methods?

Combine multiple base learners (weak models) to improve overall performance.

Leverage the diversity and aggregation to reduce bias, variance, or both.

How well did you know this?

Not at all

Perfectly

List three advantages of ensemble methods.

Higher accuracy
More stable than individual models
Better generalization on unseen data

How well did you know this?

Not at all

Perfectly

What is hard voting in ensemble methods?

Each model makes a class prediction. Final class is the majority vote.

How well did you know this?

Not at all

Perfectly

What is soft voting in ensemble methods?

Each model outputs class probabilities. Average the probabilities; pick the class with the highest mean probability.

How well did you know this?

Not at all

Perfectly

What is bagging (Bootstrap Aggregating)?

Train models on different random samples with replacement + take average.

Each new bootstap sample will act as another independent dataset → Fit weak learners for each sample → Aggregate them (average the output)
Final prediction:
Classification: majority vote (hard voting) / highest average probability (soft voting)
Regression: average of outputs

How well did you know this?

Not at all

Perfectly

What is the final prediction method for classification in bagging?

Majority vote (hard voting) or highest average probability (soft voting).

How well did you know this?

Not at all

Perfectly

What does Random Forest combine?

Bagging + Decision Trees + Random Features

Each tree: Trained on a bootstrap sample + At each split, randomly selects a subset of features.
✅ Reduces variance
❌ Still may overfit if trees are too deep.

How well did you know this?

Not at all

Perfectly

What is out-of-bag (OOB) error?

Each tree ignores ~1/3 of data, which is used to validate and gives an internal error estimate without cross-validation.

How well did you know this?

Not at all

Perfectly

Define bootstrapping in the context of data sampling.

Generating samples of size B from an initial dataset of size N by randomly drawing with replacement B observations.

How well did you know this?

Not at all

Perfectly

How does boosting differ from bagging?

Boosting trains models sequentially, while bagging trains models in parallel.

Each model learns to fix mistakes of the previous model
run it multiple times on (reweighted) training data, then let learned classifiers vote (Gradient Boosting)
✅ Reduces bias
❌ Can be sensitive to noise and overfitting if not tuned.

How well did you know this?

Not at all

Perfectly

What is AdaBoost (Adaptive Boosting)?

Updates the weights attached to the results, focusing on misclassified observations.

Start with uniform weights
For each iteration tt:
- Fit the best possible weak model with the current observations weights
- Compute the value of the update coefficient indicating how much this weak learner should be taken into account into the ensemble model
- Update the strong learner by adding the new weak learner multiplied by its update coefficient
- Compute new observations weights that express which observations we would like to focus on at the next iteration (weights of observations wrongly predicted by the aggregated model increase and weights of the correctly predicted observations decrease)

How well did you know this?

Not at all

Perfectly

What is Gradient Boosting?

Study These Flashcards

Fits model to minimize loss by adding a new model to fit the negative gradient of the loss.

✅ More flexible and powerful than AdaBoost
❌ Slower and prone to overfitting

List three hyperparameters of Gradient Boosting.

Study These Flashcards

n_estimators: Number of trees
learning_rate: Shrinks impact of each tree
max_depth: Tree complexity

Compare error handling in AdaBoost and Gradient Boosting.

Study These Flashcards

AdaBoost: Focuses on misclassified points
Gradient Boosting: Fits residuals (gradients)

What is stacking in ensemble methods?

Study These Flashcards

Trains many models in parallel and combines them by training a meta-model to output a prediction.

What are Level-0 and Level-1 models in stacking?

Study These Flashcards

Level-0 models: diverse models (e.g. logistic, KNN, SVM)
Level-1 model: learns how to combine their predictions

Process:
Split training data
Train base models on one part
Predict on held-out part
Use those predictions to train meta-learner
✅ Captures different perspectives from base learners
❌ Requires careful design and cross-validation

List the pros and cons of voting methods in ensemble.

Study These Flashcards

Pros: Simple, effective
Cons: No model interaction
focus: Aggregation

When should you use Gradient Boosting?

Study These Flashcards

When you want accuracy with flexibility.

What is the Gini index used for?

Study These Flashcards

Less computationally expensive and limited to binary classification.

What is entropy used for?

Study These Flashcards

Suitable for multi-class classification and works better in high class-imbalance cases.

Compare Learner weight in AdaBoost and Gradient Boosting.

Adaboost -- Based on classification error Gradient boosting -- Based on gradient step size

Compare Base models in AdaBoost and Gradient Boosting.

*Ada boost -- Stumps (shallow trees) * Gradient boosting -- Trees with greater depth

Compare flexibility in AdaBoost and Gradient Boosting.

*Ada boost -- Less flexible * Gradient boosting -- Very flexible

When should you use Voting or Bagging?

Quick and simple boost

When should you use AdaBoost?

Focus on hard examples

When to use Stacking?

Combine diverse models

When to use Random Forest / Boosting?

Feature importance

When to use Trees, AdaBoost (with stumps)

Interpretability

List the pros and cons of Bagging methods in ensemble.

focus - Variance pros - Reduces overfitting cons - Needs many models

List the pros and cons of boosting methods in ensemble.

focus - Bias pros - High accuracy, powerful cons - Overfitting, slow training

List the pros and cons of stacking methods in ensemble.

focus - Diversity pros - Very flexible, custom combos cons - Complex to set up

L9 Flashcards

(35 cards)