4.1 Decision trees Flashcards

1
Q

What is the stopping criterion in decision tree?

A

The bigger the decision trees, the more the complexity. One method to reduce this complexity is by not allowing the trees to reach the end and stopping them in between through some stopping criteria. These criterias can be one of the following,

· We decide the maximum depth of the tree

· Impurity gets increased

· Information Gain is very small

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is information Gain?

A

Information gain is simply the expected reduction in entropy caused by partitioning all our examples according to a given attribute.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How is information gain calculated?

A

S refers to the entire set of examples that we have. A is the attribute we want to partition or split. |S| is the number of examples and |Sv| is the number of examples for the current value of attribute A.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How is mean information calculated?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is ID3

A

The ID3 algorithm begins with the original set S as the root node. On each iteration of the algorithm, it iterates through every unused attribute of the set S and calculates the entropy H(S) or the information gain IG(S) of that attribute. It then selects the attribute which has the smallest entropy (or largest information gain) value. The set S is then split or partitioned by the selected attribute to produce subsets of the data. (For example, a node can be split into child nodes based upon the subsets of the population whose ages are less than 50, between 50 and 100, and greater than 100.) The algorithm continues to recurse on each subset, considering only attributes never selected before.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Consider how the decision tree shown below will classify the test instance Outlook=rainy, Temp=mild, Humidity=normal, Windy=true? Which node will be used to determine the class?

A

The first decision is based on Outlook, and the algorithm takes the rightmost branch (rainy). The next decision is based on Windy, and the algorithm takes the left branch (true) to end up at node C.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Can a decision tree classify an entire training set with zero errors?

A

A decision tree can classify an entire training set correctly so long as there are no instances which have exactly the same attribute values but different classes. If every instance has a unique combination of attribute values this is guaranteed not to happen. Classifying an entire training set correctly may require a very large tree!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the maximum possible height of a decision tree?

Group of answer choices

A

The longest possible branch in a decision tree would consider every attribute exactly once before making a final classification decision. So the maximum possible height of a tree is the same as the number of attributes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

For the next two questions, consider the following dataset, which has four attributes (Outlook, Temperature, Humidity, Windy). The class to predict is Play.

A

Outlook has information gain = 1 - 0 = 1.

It can perfectly predict class (if only because every instance in the dataset has a different Outlook).

The next best attribute is Humidity, which has information gain = 1 - 0.8113 = 0.1887.

Temperature and Windy each have information gain = 1 - 1 = 0.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Which attribute has highest gain ratio?

A

Outlook has gain ratio 1 / 3 = 0.33, while humidity has gain ratio 0.1887 / 1 = 0.1887. Temperature and Windy each have information gain of 0, so they have gain ratio of 0.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Which of the following statements are true of decision trees? Select all which are true.

A
  • Different branches of a tree may use different attributes to arrive at a final decision.

Decision trees can handle missing values – generally, if the algorithm hits a missing attribute, it will continue down all branches and do some kind of averaging/aggregating of the final decisions. Continuous attributes can be included in a decision tree by discretising these attributes, or splitting based on a threshold. Some types of decision trees (e.g., oblivious decision trees) require every branch to use the same attributes, but this is not required in general. The final nodes of a decision tree may include instances with different classes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Which of the following criteria may be used to terminate a branch in a decision tree? (Select all criteria which may be used.)

A
  • The information gain drops below a threshold
  • All attributes have been used
  • All of the instances at this node have the same class

The decision tree will stop branching once all attributes are used or the decision grain drops below a threshold. When all instances are the same class, the mean info is zero and it is not possible for another branching to yield any further information gain, so this case would be included in the case where the information gain drops below a threshold. Information gain would never drop below zero, so this can’t be used as a stopping condition.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the formula for calculating entropy?

A

• The entropy of a discrete random event 𝑥𝑥 with possible states 1, … ,n is:

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is Mean Information?

A

Weighted average of the entropy over the children after
the split

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Briefly explain — in at most two sentences — the basic logic behind the “ID3” algorthmic
approach toward building decision trees. This should be focussed on labelling the nodes
and leaves; you do not have to explain edge-cases.

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly