Tree Construction Flashcards
(12 cards)
What is entropy and how is it measure?
Entropy is a way to measure information in bits
What is information gain?
a value that increases with the average ‘purity’ of the subsets
How is information gain measured?
Entropy
What are the steps to recursively construct a tree using the divide and conquer method?
- Choose a root node
- Create branches for every possible attribute value
- Split the instances into subsets
- repeat recursively until all instances have the same class value
How do we choose the best attribute?
choose the attribute with the highest information gain
Why do all leaves need to be ‘pure’?
Because sometimes identical instances have different classes
When does splitting stop?
When the data cant split any further
What are highly branching attributes?
When attributes with a large number of values -> attributes are likely to be pure if this is the case
what are ensembles of trees?
collection of different trees, let them vote on classification
what is the bagging method?
change input data
what is the randomization method?
semi-random split selection
what is the boosting method?
build trees subsequently, focus on mistakes