Topic 19 Flashcards

Question 1

Q

Decision Trees Bias and Variance

Answer

A

High Bias (model is too simple) High Variance (Model fits training data too well and performs poorly on test data)

Question 2

Q

Pre Pruning

Answer

A

Stops the tree from growing once it reaches a certain condition

Question 3

Q

Post Pruning

Answer

A

Removes less significant branches once the tree is fully grown

Question 4

Q

Purpose of Pruning Methods

Answer

A

Prevents overfitting (reduces complexity), Improves efficiency, enhances interpretability (by reducing unnecessary splits)

Question 5

Q

Dealing With Continuous Predictors

Answer

A

Create a candidate split midway between each training instance and choose the split with the maximum information gain, from the candidate boundaries where the classification changes

Question 6

Q

Decision Trees Bagging

Answer

A

Obtain N training datasets from our population, and fit a decision tree on each training dataset separately (some of the fitted trees might be quite different).
Then for a new test item, we run it through each of our N decision trees, and record the predicted class labels.
Each decision tree “votes” on what it thinks the class label should be. We pick the class label that gets the most votes across all the trees.

Question 7

Q

Estimating Accuracys Using Bootstrap Sampling

Answer

A

Because not all of the training items are used in each bag (because we sample with replacement), we
can evaluate the accuracy of each bagged model using the out-of-bag training items. For each training item di, select all trees models 𝐵~𝑑𝑖
which were not trained using di. Get the majority vote for di for all of the models in 𝐵~𝑑𝑖 .
Do this for all di to estimate the total accuracy across all the training items.

Question 8

Q

Random Forest Algorithm

Answer

A

*
For each feature in a random subset of the features, perform a decision split using that feature and calculate the resulting expected entropy using the current training examples
*
Pick the feature, Fbest, that gives the maximum information gain for that subset of features

Topic 19 Flashcards

(8 cards)