Decision Tree Flashcards

1
Q

Entropy

A

Measure of disorder that can be applied to set.
Disorder corresponds to how mixed(impure) the segment is WRT these properties of interest.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Entropy formula

A

-p1 x log(p1) - p2 log(p2) - ..

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Information Gain measures

A

measures the change in entropy due to any new information being added.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Information Gain formula

A

– IG(parent,children)=entropy(parent)–[p(c1)×entropy(c1)+p(c2)× entropy(c2) + ⋯+ p(ck) × entropy(ck)]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Classification trees

A

Each interior node in the tree contains a test of an attribute, with each branch from the node representing a distinct value of the attribute.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Decision tree - Basic algorithm (a greedy algorithm)

A

Test attributes are selected on the basis of a heuristic or
statistical measure (e.g., information gain)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

When do decision tree stop?

A

There are no remaining attributes for further partitioning.
All sample for a given node belong to the same class.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Information Gain drawback

A

biased towards multivalued attributes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Gain ratio drawback

A

tends to prefer unbalanced splits in which one partition is much smaller than the others

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Gini Index

A
  • biased to multivalued attributes
  • has difficulty when # of classes is large
  • tends to favor tests that result in equal‐sized partitions and purity in both partitions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly