Decision Trees and ID3 Algorithm Flashcards

Question

ID3 Gini Impurity

Answer 1

Alternative to information gain as a measure of the information-theoretic purity of a node. Defined as follows: “Where is the set of classes, and is the probability of choosing an item that belongs to the class”. Ex. If we have 100 elements and 3 classes, Class 1 with 40 and the others with 30. First calculate the probabilities of each class, then compute the Gini Impurity. For this dataset it is 0.66. It falls within the range of 0 to 1 as expected. This value indicates the degree of impurity or disorder present in the database. Much quicker to use than Information Gain.

Answer 2

Bias: Information gain tends to favor attributes with more levels, i.e higher reduction in entropy. Gini Impurity is less susceptible to this bias. Tree Structure: Trees grown using Information Gain tend to be deeper and more balanced, while on Gini can be more shallow and skewed. Performance: Generally advisable to experiment with both to see which yields better for the specific problem. Suitability: Gini often used for classification problems, Information Gain used for both classification and regression as it has roots in Information Theory.

Answer 3

Once the attribute is selected for the current node, generate children nodes, one for each possible value of the selected attribute. Partition the examples of this node using the possible values of this attribute and assign these subsets of the examples to the appropriate child nodes. Repeat for each child node until all examples associated with a node are either all positive or negative.

Answer 4

Some possibilities: - Random: Select any attribute at random - Least-values: Choose the attribute with the smallest number of possible values - Most-values: Choose the attribute with largest number of possible values - Max-Gain: Choose the attribute that has the largest expected information gain, i.e select attribute that will result in the smallest expected size of the subtrees rooted at its children - Min-Gini: Choose the attribute that has the smallest Gini Impurity value. ID3 uses max-gain to select the best attribute at each node of the decision tree under construction.

Answer 5

Every learning algorithm has an inductive bias (Restriction vs Preference). ID3: Searches complete hypothesis space Performs an incomplete search through this space looking for simplest tree Called a preference (or search) bias Candidate-Elimination: Searches an incomplete hypothesis space Performs a complete search finding all valid hypotheses Called a restriction (or language) bias Typically a preference bias is better since you don’t limit your search up front by restricting the hypothesis space.

Answer 6

Decision trees are at least as accurate as humans expect (Studies shown) A study for diagnosing breast cancer: Humans correctly classified it 65% of the time The decision tree classified it 72% of the time British Petroleum designed a decision tree for gas-oil separation for offshore oil platforms Replaced an earlier rule based expert system

Answer 7

Many kinds of noise could occur in training examples: Two examples have same attribute/value pairs but different classifications Some values of attributes are incorrect because of: - Errors in the data acquisition process - Errors in the preprocessing phase The classification is wrong because of some error or mistake. Some Attributes may be irrelevant to the decision making process ex, color of eyes may be irrelevant to prognosis. Irrelevant attributes can result in overfitting the training data.

Answer 8

C4.5 is an algorithm used to generate decision trees, which can be employed for classification tasks. Handling Missing Values: C4.5 can handle missing attribute values Continuous Attributes: can work with categorical and continuous attributes Pruning: To avoid overfitting, C4.5 uses a pruning strategy that replaces sub-trees with leaf nodes if the pruning results in minimal loss of accuracy Rule Derivation: Post tree generation, it can convert the decision tree to a set of if-then rules to improve readability. Attribute Selection: It employs a normalized version of information gain known as Gain Ratio to avoid bias toward attributes with a large number of distinct values.

Answer 9

RapidMiner DecisionTree (python package)

Answer 10

Inducing decision trees is one of the most widely used learning methods. Can outperform human experts in many problems

Answer 11

Fast Simple to implement Can convert results to a set of easily interpretable rules Empirically valid in many commercial products Handles noisy data

Answer 12

Univariate splits/partitioning using only one attribute at a time so limits types of possible trees. Large decision trees may be hard to understand Requires fixed-length feature vectors

Answer 13

Shorter trees are preferred over long trees. Accepts the first tree it finds. Information gain heuristic places high information gain attributes near root, and greedy search method is an approximation to finding the shortest (shallowest) tree. Shorter trees would be preferred due to occam’s razor.

Decision Trees and ID3 Algorithm Flashcards

(37 cards)