Chapter 7 Flashcards
(18 cards)
Rational behind Random Forests
“Wisdom of the crowd”
* Aggregate group of predictors (ensembles) to get better prediction
* Ensembles can be applied to different classifiers and regressors
* Random Forest uses many Decision Trees
– One of the most powerful Machine Learning algorithms
Voting Classifiers
Train different classifiers (algorithms) on same data
Then predict by majority vote
Weak learners of even 51% accuracy can still preform 75% accuracy with ensemble
How do Voting classifiers do against individual classifiers
Voting classifier outperforms any individual classifier
When would you use Bagging and Pasting
When the same algorithm is used for all the voting classifiers
How do Bagging and Pasting work
Use same algorithm, but different training data
– Choose random subset from training set
bagging
With replacement
pasting
Without replacement
How do Bagging and Pasting models predict
Aggregation of each predictor’s output:
* Statistical mode (most frequent prediction) for classification
* Average for regression
– Ensemble has same bias but lower variance than one predictor trained on data
How does Bagging and Pasting work?
Training and
predictions can
be done in
parallel on
different CPU
cores
What is Out-of-Bag Evaluation (OOB)
Bagging classifier samples m instances from training set of size m
– Sampling with replacement: some training instances will not be picked
Oob instances can be used for evaluation
– No need for separate validation or cross-validation
Does Random Forests uses bagging or pasting?
bagging
How can Random Forests can measure relative importance of each feature
Measure of how much tree nodes that use feature reduce impurity
– Weighted average across all trees
Explain Boosting
“Boosting” or “hypothesis boosting” combines weak learners
– Training of learners is done sequentially
– Each learner is trying to correct its predecessors
Most popular boosting methods
– AdaBoost (“Adaptive Boosting)
– Gradient Boosting
How does ada boost work
New predictor corrects
predecessor by paying
attention to outliers
– More focus on training
instances where
underfitting occurred
– Relative weight of
misclassified instances
is increased in next
iteration
How does Gradient Boosting work
- Also uses sequence of predictors
– Instead of tweaking instance weights as in AdaBoost, fits to residual errors - When using Decision Trees as base estimators
– “Gradient Tree Boosting” or “Gradient Boosted Regression Tree” (GBRT)
Explain Stacking
“Stacking” or “Stacked
Generalization” trains
aggregation function
– Final predictor that aggregates
predictors is called “blender” or
“meta learner”
Training of blender
based on “hold-out set”
– Reserve some training
instances