classification (decision trees) Flashcards by Zcjanin Ollesca

What is the goal of decision trees?

To create a model that predicts the value of a target variable by learning simple decision rules inferred from features

How well did you know this?

Not at all

Perfectly

What are decision trees?

A non-parametric supervised learning method used for classification

How well did you know this?

Not at all

Perfectly

What are benefits of decision trees?

don’t need as much data and gives rules that it learned
no underlying assumptions
handles multidimensional data
achieves good accuracy

How well did you know this?

Not at all

Perfectly

What is the splitting criterion?

Tells us which attribute to test at node N by determining the “best” way to separate or partition tuples into individual classes

How well did you know this?

Not at all

Perfectly

What does it mean for a partition to be pure?

If all tuples belong to the same class

How well did you know this?

Not at all

Perfectly

What does information gain do?

Minimizes the info needed to classify tuples in resulting partitions and reflects least randomness or “impurity”

How well did you know this?

Not at all

Perfectly

(T/F): Info gain guarantees that a simple tree is found

True

How well did you know this?

Not at all

Perfectly

What does Gain(A) tell us?

Tells us how much would be gained by branching on A. It is the difference between original info required and new info required.

How well did you know this?

Not at all

Perfectly

How can you determine the “best” split point for A if it is CONTINUOUS-VALUED?

sort of values of A in increasing order
consider midpoint as split point ( ( ai + ai + 1 ) / 2 )

How well did you know this?

Not at all

Perfectly

(T/F): For info gain, pick the feature and split that MAXIMIZES weighted entropy

False; pick the feature and split that MINIMIZES weighted entropy

How well did you know this?

Not at all

Perfectly

What kind of DT is info gain used in?

ID3

How well did you know this?

Not at all

Perfectly

What kind of DT is the Gini index used in?

CART

How well did you know this?

Not at all

Perfectly

What is Gini Index?

An alternative measure of impurity of class labels among a particular set of tuples

How well did you know this?

Not at all

Perfectly

How many possible ways are there to form 2 partitions of the data based on a binary split on A?

2^v - 2 ways

How well did you know this?

Not at all

Perfectly

(T/F): We want to MINIMIZE the gini index

True

How well did you know this?

Not at all

Perfectly

How do you pick the splitting subset for DISCRETE-VALUED attribute using Gini index?

Study These Flashcards

Subset that gives minimum Gini index for that attribute

How do you pick the splitting subset for CONTINUOUS-VALUED attribute using Gini index?

Study These Flashcards

Consider each possible split-point
midpoint bw each pair of sorted adjacent values is taken as a possible split point

What is the range of the Gini index for a BINARY label?

Study These Flashcards

[0, 0.50]

What does maximum purity mean?

Study These Flashcards

Every instance has the same label (in the binary case)

What does maximum impurity mean?

Study These Flashcards

50% of data are one label and 50% are another

Explain the bias of info gain.

Study These Flashcards

Biased towards multivalued attributes

Explain the bias of the gain ratio.

Study These Flashcards

adjusts for bias, but prefers unbalanced splits in which one partition is much smaller than others

Explain the bias of Gini index.

Study These Flashcards

biased towards multivalued attributes and has difficulty when the number of classes is large. It also favors tests that result in equal size partitions and purity in both partitions.

What are the 3 partitioning scenarios?

Study These Flashcards

1) A is discrete-valued
2) A is continuous-valued
3) A is discrete-valued and a binary tree

Why is pruning performed?

To address overfitting

What is pre-pruning?

"Pruned" by halting its construction early

What are 2 examples of pre-pruning?

- max depth - min leaf size

What does controlling for min leaf size do?

The higher the leaf size, the less overfit it will get because it won't create rules past a certain threshold. This means it could miss out on more detailed rules that capture something more generalizable.

What is post-pruning?

Removes rules after decision tree algorithm has completed.

Explain the cost-complexity ratio

This refers to post-pruning where we are trying to minimize an increase in error from pruning while also minimizing the number of rules used (less rules, less complex)

classification (decision trees) Flashcards

(30 cards)