Lecture 4 - Tree models Flashcards by Alexander Bazba

What are the three models based on trees

decision trees
random forest
gradient boosting

How well did you know this?

Not at all

Perfectly

What are the three parts of a decision tree

root node
internal/decision nodes
leaf nodes

How well did you know this?

Not at all

Perfectly

what is the max depth and max number of leaves of a decision tree with d number of binary features?

d+1

How well did you know this?

Not at all

Perfectly

how many leafs does a decision tree have with d number of binary features

2^d

How well did you know this?

Not at all

Perfectly

Decision tree

The set of literals at a node is called a ___

split

How well did you know this?

Not at all

Perfectly

each leaf of the tree represents a ___ ___, which is a conjuction of literals encounteresd on the path from the root of the tree to the leaf

logical expression

How well did you know this?

Not at all

Perfectly

Give 3 decision tree algrithms

ID3
C4.5
CART

How well did you know this?

Not at all

Perfectly

Growing a tree is recusive
true/false

true

How well did you know this?

Not at all

Perfectly

how to decide the best split

to assign the majority class label to leaves, we look for a clear split between two classes -> Purity of the children of the split

How well did you know this?

Not at all

Perfectly

Give 3 impurity metrics

Minority class
Entropy
Gini index

How well did you know this?

Not at all

Perfectly

Both the entropy and the Gini index are smooth and concave upper bounds of the training error. These properties can be advantegeous in some situations

True

How well did you know this?

Not at all

Perfectly

What is entropy

Entropy is the expected average level of information, surprise, or uncertainty inherent in all the n possible outcomes of an event

How well did you know this?

Not at all

Perfectly

Entropy for a binary classification tasks:

H(p,1-p) = -p log_2 p-(1-p)log_2 (1-p)

How well did you know this?

Not at all

Perfectly

How do we assess if a split is useful at all?

In assessing the quality of a feature for splitting a parent node D into leaves, it is customary to look at the purity gain. Imp(D)-Imp(D1..Dj)
Purity gain = original entropy - entropy after splitting

How well did you know this?

Not at all

Perfectly

Finding the best split for a decision tree is recursive?

False

How well did you know this?

Not at all

Perfectly

How to prevent overfitting in decision trees. Give 2

Limit the number of iterations of the algorithm leading to a tree with a bounded number of nodes.
Prune the tree after it is built by removing weak branches.

How to do Reduced Error Pruning

Starting at the leaves, each node is replaced with the majority class
If the predictino accuracy is not affected then the change is kept.
Keep a validatin set, see pruned tree performance on the validation set. Of course the pruning will not improve accuracy on the training set.

What are the two sources of imballance

Asymmetric class distribution
Asymmetric mis-classification cost

What does adding more samples accomplish?

Adds data for training and increases the training time
It is possible that it does not make any difference

what is sqrt(gini)

Sqrt(gini) is designed to minimise relative impurity and this is insensitive to changes in class distribution, whereas Gini emphasises children covering more examples

Entropy and Gini index are sensitive to fluctuations in the class distrubutions, sqrt(gini) isn’t. We want distribution-insensitive impurities.

yeah true

In regression trees we can replace imp with ____

variance

What is weighted variance

If a split partions the set of target values Y into mutually exclusive sets {y1,…yj}

The variance of boolean variable with success probability p is p(1-p), which is half of the gini index. So we could interpret the goal of tree learning as minimising the calss variance in the leaves.

true

In regression trees out goal is

to find a split that minimises the weighted average of the variance of the child nodes.