Decision Trees Flashcards

Question 1

Q

Root Node

Decision Tree

Answer

A

Topmost node that represents the best predictor. Represents the entire population and therefore can be further divided into multiple homogeneous sets.

Question 2

Q

Splitting

Decision Tree

Answer

A

the process to split a node into multiple (2 or more) sub-nodes

Question 3

Q

Decision node

Decision Tree

Answer

A

a sub-node leading to other subsequent sub-nodes (children nodes)

Question 4

Q

leaf node

Decision Tree

Answer

A

a node that does not have any children

Question 5

Q

Pruning

Decision Tree

Answer

A

Trim nodes to reduce the number of nodes and the size of the tree

Question 6

Q

Branch

Decision Tree

Answer

A

sub-section of the tree

Question 7

Q

Parent and child nodes

Decision Tree

Answer

A

a node that is divided into sub-nodes is know as the parent
the sub-nodes are the children

Question 8

Q

What is the decision tree algorithm

Answer

A

1) Start with an empty tree
2) Select an attribute for descending to the next level
3) Keep going until you get to the bottom

Question 9

Q

Entropy

Answer

A

A measure of disorder or impurity.
Entropy = -P1log2(P1) -P0log2(P0)

Question 10

Q

Information Gain

Answer

A

Tells us how much information a class gives about an attribute.
IG = entropy(parent) - (weighted average*entropy children)
Weighted average = (# value left side child)/total in parent *entropy of left side + (# value right side child)/total in parent *entropy of right side
Usually want to select the attribute with the highest information gain

Question 11

Q

Can a decision tree be too large?

Answer

A

Yes, a big tree can affect computational efficiency and can lead to overfitting.

Question 12

Q

Describe the pruning process for a decision tree?

Answer

A

1) use a validation set to determine the effect of post-prunning
2) Use statistics to determine if pruning will enhance the current training set
3) Minimum Description length principle. Its a measure of complexity for encoding decision trees and training set. Stop once encoding is minimized.

Question 13

Q

Explain how to create a random forest?

Answer

A

Decisions trees work great with data that it was trained on but not so much new data.

Step 1: Create a bootstrap dataset - to make this randomly select samples from the dataset (you can repeat)
Step 2: Create a decision tree using the bootstrap dataset but use a random subset of variables to select a variable
Step 3: Build a tree as usual
Step 4: Go back to step 1 and repeat hundreds of times

When running data through the model run through all the trees and then take the majority vote.

Question 14

Q

Gini Impurity

Answer

A

Measures how mixed the classes are.

Example:
Suppose we have a dataset split as follows:

10 samples total

4 are Class A

6 are Class B

So:

p A = 10/4 =0.4
p B = 10/6=0.6

Now plug into the formula:

Gini=1−(0.42+0.62)=1−(0.16+0.36)
=1−0.52=0.48

Question 15

Q

What are the advantages and disadvantages of random forest?

Answer

A

Adv
1) versatile algorithm
2) prediction results more accuratge then decision tree

Disadv
1) Slow computationally since so many trees
2) Does not describe data since a modelling tool

Decision Trees Flashcards

(15 cards)