Chapter 6 Flashcards

Question 1

Q

What tasks can decision trees accomplish

Answer

A

– Classification
– Regression
– Multioutput tasks

Question 2

Q

Do decision trees require scaling or centering

Answer

A

Decision Tree training does not require feature scaling or centering

Question 3

Q

Define Samples of a decision tree node

Answer

A

how many training instances

Question 4

Q

Define Value of decision tree nodes

Answer

A

classification of training instances

Question 5

Q

Define Class of Decision tree nodes

Answer

A

classification result

Question 6

Q

What type of trees does Scitkit-Learn CART produce

Answer

A

Binary Trees

Question 7

Q

What value of a Gini Impurity makes a node “pure”

Question 8

Q

For the Gini impurity algorithm what does the Pi,k stand for

Answer

A

𝑝𝑖,𝑘 is ratio of class 𝑘 instances among training instances of ith node

Question 9

Q

Decision Tree can use leaf node to estimate class probabilities

Question 10

Q

CART

Answer

A

Classification and Regression Tree

Question 11

Q

How does the cart algorithm work?

Answer

A

Repeated splitting of training set
– Find split that produces the purest subsets (weighted by their size)
– Stop when maximum depth is reached or cannot find split that reduces impurity
- Greedy algorithm

Question 12

Q

Which is faster to compute entropy or gini?

Question 13

Q

What is the advantage of Entropy over Gini?

Answer

A

Entropy provides slightly more balanced trees

Question 14

Q

Decision Trees are a nonparametric model, what does that mean?

Answer

A

– No limit on number of parameters before training
– Typically leads to overfitting

Question 15

Q

Hyperparameters to constrain Decision Trees to avoid overfitting

Answer

A

– Maximum tree depth: max_depth
– Minimum samples in node before splitting: min_samples_split
– Minimum samples in leaf: min_samples_leaf or min_weight_fraction_leaf
– Maximum leaf nodes: max_leaf_nodes
– Maximum features evaluated for splitting: max_features

Question 16

Q

How can we optimize Max_ and Min_ hyperparameters to regularize a model

Answer

Study These Flashcards

A

Increasing min_* and decreasing max_* will regularize model

Question 17

Q

Benefits of decision trees

Answer

Study These Flashcards

A

Easy to use
– Easy to understand and interpret
– Versatile and powerful

Question 18

Q

Short commings of decision trees

Answer

Study These Flashcards

A

– Orthogonal decision boundaries
– Sensitive to small variations in training data
* Very different model if just one training value removed
* Stochastic algorithm may lead to different models on same data
(unless random_state hyperparameter is set)

Question 19

Q

How do Random Forests address instability

Answer

Study These Flashcards

A

Averaging predictions over many trees

Chapter 6 Flashcards

(19 cards)