4-Decision Trees Flashcards

1
Q

How easy is it to construct optimal decision trees?

A

Construction of decision trees is NP-Hard

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is information gain?

A

Reduction of entropy before and after the data has been partitioned by the attribute A

IG(Attribute | Node)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is entropy?

A

The expected (average) level of surprise or uncertainty. It is a measure of unpredictability.

Given a probability distribution, the entropy is the information required to predict an event

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the level of unpredictability of a singular event?

A

Level of unpredictability for a single event is self-information.

Self-info(i) = log2(1/P(i)) = -log2(P(i))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How is entropy calculated?

A

H(x) = sum from i = 0 to N of (P(x_i)*self_info(x))

H(x) = sum from i = o to N of (-P(x_i)*log2(P(i))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How does entropy inform decision trees?

A

If entropy is low, the event is better predictable. This occurs when there is a single class that has a high probability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is mean information?

A

The weighted average (by probability) of the entropy of child nodes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the cons of information gain?

A

It prefers highly branching attributes, as they are more likely to be homogeneous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is gain ratio?

A

Gain ratio reduces the bias of information gain, by normalising against the split info

GR(Attr|Node) = IG(Attr | Node) / SI(Attr | Node)

It discourages the selection of many uniformly distributed values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is split info?

A

Split info is the entropy of a given split.

SI(Attr | Node) = H(Attr) = sum from i to N of (-P(x_i)*log2(P(x_i))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

When does ID3 stop?

A

ID3 stops when all instances of a node are off the same class. As then the information gain is 0. It can be modified to stop once a threshold is reached

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the characteristics of ID3?

A

It searches a space of hypotheses for one that fits the training data

The space searched is the set of possible decision trees

Performs a greedy simple-to-complex search through the hypothesis space with no backtracking

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the pros of decision trees?

A

Strong, basic, supervised learner

Fast to train and fast to classify

Very transparent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the cons of decision trees?

A

Prone to overfitting
Loss of information for continuous variables
Complex calculation if there are many variables
No guarantee to return global optimum

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are alternatives to ID3?

A

Oblivious trees - require same attribute at every node in a layer

Random tree - uses a random set of attributes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly