Random Forest Flashcards
(45 cards)
What is a random forest?
An ensemble of decision trees that votes to make predictions.
Why use multiple decision trees in a random forest?
To reduce overfitting and improve generalisation.
What type of model is a random forest?
An ensemble learning method.
How does a random forest make a prediction?
By taking a majority vote across all decision trees.
What problem does a random forest solve that a single decision tree can’t?
Overfitting and poor generalisation.
What is bootstrapping in random forests?
Sampling training data with replacement to create varied datasets for each tree.
Why is sampling done with replacement in bootstrapping?
To allow some examples to appear multiple times and some to be left out, increasing model diversity.
What is the size of each bootstrapped sample?
The same as the original dataset size, but with duplicates and omissions.
What percentage of data is typically included in each bootstrapped set?
Around 63% of the original training data.
What are out-of-bag (OOB) samples?
The ~37% of data not included in a tree’s bootstrapped training set.
What is the purpose of out-of-bag samples?
To estimate model error without needing a separate validation set.
How is OOB error calculated?
By testing each OOB sample on trees that did not train on it, then averaging the results.
What does a low OOB error indicate?
That the model is generalising well to unseen data.
What evaluation method can be used if OOB is not suitable?
A traditional train/validation/test split or cross-validation.
How does each tree in a random forest avoid using the same features at each split?
By randomly selecting a subset of features at each split (feature bagging).
What is feature bagging?
Randomly selecting a subset of features at each decision node to promote tree diversity.
Why is feature bagging important in random forests?
It reduces correlation between trees, increasing the strength of the ensemble.
What is the final prediction in a random forest classification task?
The class with the majority vote from all trees.
What kind of learning algorithm is each tree in a random forest?
A weak learner (specifically, a high-variance model).
What kind of data can random forests handle?
Both categorical and continuous features.
Do random forests require feature scaling?
No, they are unaffected by feature magnitude.
Are random forests robust to outliers?
Yes, because decision trees are not sensitive to extreme values.
What is a key advantage of random forests over single trees?
They reduce variance and overfitting through averaging.
What happens if too few trees are used in a random forest?
The model may underperform due to insufficient averaging.