Topic 19 Flashcards

(8 cards)

1
Q

Decision Trees Bias and Variance

A

High Bias (model is too simple) High Variance (Model fits training data too well and performs poorly on test data)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Pre Pruning

A

Stops the tree from growing once it reaches a certain condition

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Post Pruning

A

Removes less significant branches once the tree is fully grown

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Purpose of Pruning Methods

A

Prevents overfitting (reduces complexity), Improves efficiency, enhances interpretability (by reducing unnecessary splits)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Dealing With Continuous Predictors

A

Create a candidate split midway between each training instance and choose the split with the maximum information gain, from the candidate boundaries where the classification changes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Decision Trees Bagging

A

Obtain N training datasets from our population, and fit a decision tree on each training dataset separately (some of the fitted trees might be quite different).
Then for a new test item, we run it through each of our N decision trees, and record the predicted class labels.
Each decision tree β€œvotes” on what it thinks the class label should be. We pick the class label that gets the most votes across all the trees.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Estimating Accuracys Using Bootstrap Sampling

A

Because not all of the training items are used in each bag (because we sample with replacement), we
can evaluate the accuracy of each bagged model using the out-of-bag training items. For each training item di, select all trees models 𝐡~𝑑𝑖
which were not trained using di. Get the majority vote for di for all of the models in 𝐡~𝑑𝑖 .
Do this for all di to estimate the total accuracy across all the training items.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Random Forest Algorithm

A

*
For each feature in a random subset of the features, perform a decision split using that feature and calculate the resulting expected entropy using the current training examples
*
Pick the feature, Fbest, that gives the maximum information gain for that subset of features

How well did you know this?
1
Not at all
2
3
4
5
Perfectly