Decision Trees Flashcards

1
Q

Things that make decision trees unique

A

supervised learning
batch processing of training examples
uses a preference bias

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Decision tree non-leaf node

A

associated with an attribute/feature

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

decision tree leaf node

A

associated with a classification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

decision tree arc

A

associated with one of the possible values of attribute of parent node

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How does decision tree work?

A

attribute at root is question; answer is determined by value of that attribute in the input example; answer determines movement of child; repeat until leaf (class label at leaf = classification give to input example)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Ockham’s Razor

A

Preference bias; the smallest decision tree that correctly classifies all training examples is best

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Decision Tree Construction (greed) aka ID3, C5.0

A
  1. select best attribute to use for new node at current level
  2. partiton examples using the possible values of this attribute, assign subsets to appropriate child node; recursively generate child node until all examples at node have same label
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How to select best attribute to construct best tree?

A

random; least values; most values; max gain

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Information Value

A

Given a set S of size |S|, the expected work required to determine a specific element is log2|S|

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Entropy Interpretation

A

The number of yes/no questions (in bits) need on average to determine the value of Y in a random drawing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Entropy H(Y)

A

H measures the information content (in bits) associated with a set of examples; 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

bit

A

information needed to answer a yes/no question; real valued scalar

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Perfect balance (maximum inhomogeneity)

A

high entropy - from a nearly uniform distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Perfect homogeneity

A

low entropy - Y is from a varied (peaks/valleys) distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

max value of H is

A

log2c (c is the number of classes)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How is entropy related to tree size

A

small entropy = small tree size

17
Q

conditional entropy H(Y|X)

A

weighted sum of the entropy of each subset of the examples partitioned by the possible values of attribute x; weighted sum of entropy at each child node generated by x;

18
Q

What does conditional entropy measure

A

the total impurity, disorder, or inhomogeneity at all the children nodes

19
Q

Information gain

A

measures the difference in entropy of a node and entropy remaining after the node’s examples are “split” between the children using a chosen attribute; choose attribute that maximizes I(Y;X)

20
Q

Why is high information gain desirable?

A

Means more of the examples are the same class in each child node; the decision trees rooted at each child that are needed to differentiate between the classes are likely to be small

21
Q

The best attribute for a node is the attribute with

A

maximum information gain, minimum conditional entropy