L9 Flashcards
(35 cards)
What is the formula for estimating generalization error?
Generalization error = bias + variance + noise
Bias and variance typically trade off in relation to model complexity.
Define bias in the context of model prediction.
Bias: the difference between the prediction of the model and the true value.
What does variance measure in a model?
Variance: a measure of the sensitivity of the model to the variations in the training data.
What is overfitting in relation to model complexity?
Complex model choice=> can lead to overfitting, the model learns to capture noise in the training data rather than the underlying patterns (high variance)
What are ensemble methods?
Combine multiple base learners (weak models) to improve overall performance.
Leverage the diversity and aggregation to reduce bias, variance, or both.
List three advantages of ensemble methods.
- Higher accuracy
- More stable than individual models
- Better generalization on unseen data
What is hard voting in ensemble methods?
Each model makes a class prediction. Final class is the majority vote.
What is soft voting in ensemble methods?
Each model outputs class probabilities. Average the probabilities; pick the class with the highest mean probability.
What is bagging (Bootstrap Aggregating)?
Train models on different random samples with replacement + take average.
Each new bootstap sample will act as another independent dataset → Fit weak learners for each sample → Aggregate them (average the output)
Final prediction:
Classification: majority vote (hard voting) / highest average probability (soft voting)
Regression: average of outputs
What is the final prediction method for classification in bagging?
Majority vote (hard voting) or highest average probability (soft voting).
What does Random Forest combine?
Bagging + Decision Trees + Random Features
Each tree: Trained on a bootstrap sample + At each split, randomly selects a subset of features.
✅ Reduces variance
❌ Still may overfit if trees are too deep.
What is out-of-bag (OOB) error?
Each tree ignores ~1/3 of data, which is used to validate and gives an internal error estimate without cross-validation.
Define bootstrapping in the context of data sampling.
Generating samples of size B from an initial dataset of size N by randomly drawing with replacement B observations.
How does boosting differ from bagging?
Boosting trains models sequentially, while bagging trains models in parallel.
Each model learns to fix mistakes of the previous model
run it multiple times on (reweighted) training data, then let learned classifiers vote (Gradient Boosting)
✅ Reduces bias
❌ Can be sensitive to noise and overfitting if not tuned.
What is AdaBoost (Adaptive Boosting)?
Updates the weights attached to the results, focusing on misclassified observations.
- Start with uniform weights
- For each iteration tt:
- Fit the best possible weak model with the current observations weights
- Compute the value of the update coefficient indicating how much this weak learner should be taken into account into the ensemble model
- Update the strong learner by adding the new weak learner multiplied by its update coefficient
- Compute new observations weights that express which observations we would like to focus on at the next iteration (weights of observations wrongly predicted by the aggregated model increase and weights of the correctly predicted observations decrease)
What is Gradient Boosting?
Fits model to minimize loss by adding a new model to fit the negative gradient of the loss.
✅ More flexible and powerful than AdaBoost
❌ Slower and prone to overfitting
List three hyperparameters of Gradient Boosting.
- n_estimators: Number of trees
- learning_rate: Shrinks impact of each tree
- max_depth: Tree complexity
Compare error handling in AdaBoost and Gradient Boosting.
- AdaBoost: Focuses on misclassified points
- Gradient Boosting: Fits residuals (gradients)
What is stacking in ensemble methods?
Trains many models in parallel and combines them by training a meta-model to output a prediction.
What are Level-0 and Level-1 models in stacking?
- Level-0 models: diverse models (e.g. logistic, KNN, SVM)
- Level-1 model: learns how to combine their predictions
Process:
Split training data
Train base models on one part
Predict on held-out part
Use those predictions to train meta-learner
✅ Captures different perspectives from base learners
❌ Requires careful design and cross-validation
List the pros and cons of voting methods in ensemble.
- Pros: Simple, effective
- Cons: No model interaction
- focus: Aggregation
When should you use Gradient Boosting?
When you want accuracy with flexibility.
What is the Gini index used for?
Less computationally expensive and limited to binary classification.
What is entropy used for?
Suitable for multi-class classification and works better in high class-imbalance cases.