Lecture 4 - Tree models Flashcards

1
Q

What are the three models based on trees

A
  1. decision trees
  2. random forest
  3. gradient boosting
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the three parts of a decision tree

A
  1. root node
  2. internal/decision nodes
  3. leaf nodes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what is the max depth and max number of leaves of a decision tree with d number of binary features?

A

d+1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

how many leafs does a decision tree have with d number of binary features

A

2^d

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Decision tree

The set of literals at a node is called a ___

A

split

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

each leaf of the tree represents a ___ ___, which is a conjuction of literals encounteresd on the path from the root of the tree to the leaf

A

logical expression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Give 3 decision tree algrithms

A
  1. ID3
  2. C4.5
  3. CART
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Growing a tree is recusive
true/false

A

true

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

how to decide the best split

A

to assign the majority class label to leaves, we look for a clear split between two classes -> Purity of the children of the split

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Give 3 impurity metrics

A
  1. Minority class
  2. Entropy
  3. Gini index
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Both the entropy and the Gini index are smooth and concave upper bounds of the training error. These properties can be advantegeous in some situations

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is entropy

A

Entropy is the expected average level of information, surprise, or uncertainty inherent in all the n possible outcomes of an event

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Entropy for a binary classification tasks:

A

H(p,1-p) = -p log_2 p-(1-p)log_2 (1-p)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How do we assess if a split is useful at all?

A

In assessing the quality of a feature for splitting a parent node D into leaves, it is customary to look at the purity gain. Imp(D)-Imp(D1..Dj)
Purity gain = original entropy - entropy after splitting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Finding the best split for a decision tree is recursive?

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Ho

How to prevent overfitting in decision trees. Give 2

A
  • Limit the number of iterations of the algorithm leading to a tree with a bounded number of nodes.
  • Prune the tree after it is built by removing weak branches.
17
Q

How to do Reduced Error Pruning

A
  1. Starting at the leaves, each node is replaced with the majority class
  2. If the predictino accuracy is not affected then the change is kept.
  3. Keep a validatin set, see pruned tree performance on the validation set. Of course the pruning will not improve accuracy on the training set.
18
Q

What are the two sources of imballance

A
  • Asymmetric class distribution
  • Asymmetric mis-classification cost
19
Q

What does adding more samples accomplish?

A

Adds data for training and increases the training time
It is possible that it does not make any difference

20
Q

what is sqrt(gini)

A

Sqrt(gini) is designed to minimise relative impurity and this is insensitive to changes in class distribution, whereas Gini emphasises children covering more examples

21
Q

Entropy and Gini index are sensitive to fluctuations in the class distrubutions, sqrt(gini) isn’t. We want distribution-insensitive impurities.

22
Q

In regression trees we can replace imp with ____

23
Q

What is weighted variance

A

If a split partions the set of target values Y into mutually exclusive sets {y1,…yj}

24
Q

The variance of boolean variable with success probability p is p(1-p), which is half of the gini index. So we could interpret the goal of tree learning as minimising the calss variance in the leaves.

25
In regression trees out goal is
to find a split that minimises the weighted average of the variance of the child nodes.
26