Random Forest Flashcards

(45 cards)

1
Q

What is a random forest?

A

An ensemble of decision trees that votes to make predictions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Why use multiple decision trees in a random forest?

A

To reduce overfitting and improve generalisation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What type of model is a random forest?

A

An ensemble learning method.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How does a random forest make a prediction?

A

By taking a majority vote across all decision trees.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What problem does a random forest solve that a single decision tree can’t?

A

Overfitting and poor generalisation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is bootstrapping in random forests?

A

Sampling training data with replacement to create varied datasets for each tree.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Why is sampling done with replacement in bootstrapping?

A

To allow some examples to appear multiple times and some to be left out, increasing model diversity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the size of each bootstrapped sample?

A

The same as the original dataset size, but with duplicates and omissions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What percentage of data is typically included in each bootstrapped set?

A

Around 63% of the original training data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are out-of-bag (OOB) samples?

A

The ~37% of data not included in a tree’s bootstrapped training set.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the purpose of out-of-bag samples?

A

To estimate model error without needing a separate validation set.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How is OOB error calculated?

A

By testing each OOB sample on trees that did not train on it, then averaging the results.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does a low OOB error indicate?

A

That the model is generalising well to unseen data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What evaluation method can be used if OOB is not suitable?

A

A traditional train/validation/test split or cross-validation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How does each tree in a random forest avoid using the same features at each split?

A

By randomly selecting a subset of features at each split (feature bagging).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is feature bagging?

A

Randomly selecting a subset of features at each decision node to promote tree diversity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Why is feature bagging important in random forests?

A

It reduces correlation between trees, increasing the strength of the ensemble.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is the final prediction in a random forest classification task?

A

The class with the majority vote from all trees.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What kind of learning algorithm is each tree in a random forest?

A

A weak learner (specifically, a high-variance model).

20
Q

What kind of data can random forests handle?

A

Both categorical and continuous features.

21
Q

Do random forests require feature scaling?

A

No, they are unaffected by feature magnitude.

22
Q

Are random forests robust to outliers?

A

Yes, because decision trees are not sensitive to extreme values.

23
Q

What is a key advantage of random forests over single trees?

A

They reduce variance and overfitting through averaging.

24
Q

What happens if too few trees are used in a random forest?

A

The model may underperform due to insufficient averaging.

25
Why is a random forest less transparent than a decision tree?
You can't easily trace how a prediction was made across many trees.
26
What is a major strength of random forests?
They reduce overfitting compared to single decision trees.
27
Why are random forests robust to noisy data?
Because individual errors are averaged out across many trees.
28
Why are random forests considered flexible?
They work well with both classification and regression problems.
29
Can random forests handle missing data?
Yes, they can tolerate some missing values during training or prediction.
30
How do random forests support feature importance analysis?
They estimate how much each feature contributes to prediction accuracy.
31
Why are random forests good for high-dimensional data?
Because they can model complex interactions and ignore irrelevant features.
32
What kind of model variance do random forests have compared to single trees?
Lower variance due to ensemble averaging.
33
Can random forests capture non-linear patterns?
Yes, the ensemble of trees can model complex decision boundaries.
34
Why are random forests often used as baselines in ML tasks?
Because they tend to perform well out-of-the-box without much tuning.
35
What is a key limitation of random forests in terms of interpretability?
They are difficult to interpret compared to individual decision trees.
36
Why are random forests computationally expensive?
They require training and storing many trees.
37
How does prediction speed compare between decision trees and random forests?
Random forests are slower due to combining many tree outputs.
38
What is a memory drawback of random forests?
They consume more memory because all trees must be stored.
39
Can random forests extrapolate beyond the training range?
No, like decision trees, they cannot extrapolate well for continuous values.
40
What kind of overfitting risk still exists in random forests?
If the number of trees is too small or trees are too deep, overfitting may still occur.
41
Are random forests sensitive to class imbalance?
Yes, they may favor the majority class unless sampling or weighting is adjusted.
42
What is a drawback when using random forests with very large datasets?
Training can be slow and require parallelization or optimization.
43
Why is explaining individual predictions difficult in random forests?
Because the final decision results from many complex tree paths.
44
How do random forests handle categorical variables with many levels?
They may overfit unless levels are grouped or encoded properly.
45
What is ensemble learning?
Combining multiple models to produce a stronger overall model.