Chapter 4 Flashcards
(80 cards)
What’s the most common splitting criterion?
information gain
What’s the role of Decision Trees?
Create a formula/algorithm that evaluates how well each attribute splits a set of example into segments, with respect to a chosen target variable
To what does disorder correspond to?
to how mixed (impure) the segment is with respec to values of attribute of interest
Formula of Entropy
-p1 log(p1) – p2 log (p2) ….
Define Pi
probability of value i within the set (relative percentage/share)
When is Pi = 1?
when all members of set have attribute i
When is Pi = 0?
when no members of the set have the attributte i
What is the parent set?
the original set of examples
What does an attribute do?
It segments a set of instances into several k subsets.
What are K children sets?
The result of splitting on the attribute values.
How does Information gain measure?
- how much an attirbute improves (decreases) entropy
- change in entropy due to new info added
Formula IG(parent)
IG(parent) = Entropy(parent) – p(c1) entropy(c1) – p(c2) entropy(c2) ….
Formula Entropy (HS = square)
Formula Entropy (HS = cricle)
Formula IG = entropy (Write-off)..
What reduces entropy substantially?
splitting parents data set by body shape attribute
- select attribute that reduces entropy the most
How do you find the best attribute to partition the sets?
recursively apply attribute selection
Disadvantages of ID3
- tends to prefer splits that result in larg numbers of partitions, small but pure
- overfitting, less generalization capacity
- cannot handle numeric values, missing values
List ANN (artificial nerual networks)
- neurons
- nucleus
- dendrite
- axon
- synapse
Define neurons
cells (processing elements) of a biological or artifical neural network
Define the nucleus
the central processing portion of a neuron
Define the dendrite
the part of a biological neuron tha tprovides inputs to the cell
Define the axon
an outgoing connection (i.e., terminal) from a biological neuron
Define synapse
the connection (where the weights are) between processing elements in a neural network