Random Forest Flashcards

(5 cards)

1
Q

What is Bagging in ensembling models

A

Bootstrap Sampling = Randomly creating new datasets from original dataset (with replacement).

Aggregation = Voting (Classification) or Averaging (Regression).

Ensemble = Group of models working together.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is Random Sampling of Data (Bootstrap Samples) and Random Sampling of Features at Each Split

A

Random Sampling of Data
From the original dataset with N rows, we randomly sample N rows with replacement to create a new training set for each tree.
Each tree sees a slightly different version of the dataset.
This is the Bagging part.
✅ This increases diversity and prevents overfitting.

Random Sampling of Features at Each Split
nlike normal decision trees, where all features are considered at each split, in Random Forest, only a random subset of features is considered for splitting.
Example: If we have 10 features, we might randomly select only 3 features to decide the best split at a node.
✅ This makes trees even more different from each other — more diversity = better forest!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

how Train Many Trees in decision tree?

A

Say we build 100 trees, each trained on different data and using random features at splits.

All trees are trained independently and in parallel (this makes Random Forest fast to train).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is Aggregate Predictions

A

For Classification:
Each tree gives a predicted class label.
The final output is the majority vote (whichever class is predicted by most trees).

For Regression:
Each tree gives a numeric prediction.
The final output is the average of all predictions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Explain random forest entire process

A

Random Forest

├── Ensemble of Decision Trees

├── Main Techniques:
│ ├── Bootstrap Sampling
│ │ → Random data samples for each tree (with replacement)
│ ├── Random Feature Selection
│ │ → Random subset of features at each split

├── Training Process:
│ ├── Build multiple trees independently
│ ├── Use Parallel Processing (trees trained simultaneously)

├── Prediction:
│ ├── Classification → Majority Voting
│ └── Regression → Averaging outputs

├── Strengths:
│ ├── High Accuracy
│ ├── Reduces Overfitting
│ ├── Handles Missing Values
│ ├── Measures Feature Importance
│ └── Works well even without heavy tuning

├── Bonus Tip:
│ └── In Scikit-learn, use n_jobs=-1 for full parallel power

└── Keyword Formula:
“Random Data + Random Features + Many Trees + Aggregation”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly