Classification and regression Flashcards
(100 cards)
classification
predicts discrete class labels
example of classification
labelling emails spam or ham
decision tree classifier
flowchart-like structure in which each node represents a “test” on an attribute (e.g. whether a coin flip comes up heads or tails), each branch represents the outcome of the test, and each leaf node represents a class label tree like model that makes decisions by splitting data into subsets based on feature values creating branches that lead to outcomes (class labels)
decision tree makes a sequence
of partitions of training data one attribute at a time
probability in classification
probability helps determines likelihood of each class level given a set of features
relates to confidence in predictions
ordering in classification
attributes are selected and split based on a measure like information gain creating an order of importance for features
entropy
entropy is a measure of uncertainty or disorder in a system
info entropy in classification
entropy measures how hard it is to guess the label of a randomly taken sample from dataset
choose level with ___ entropy as ___
lowest
as the data labels are more uniform so its easy to guess
how is entropy used in data splits for decision trees?
decision trees use information gain based on entropy to decide best feature to split the data at each node
entropy is calculated before and after split to determine how well a feature divides that data into pure sets
3 steps of entropy and data splits
1) partition example recursively by choosing one attribute at a time
2) choose attribute based on which attribute can separate classes of training examples best
3) choose goodness function (info gain, gain ratio, gini ratio)
3 attribute types
nominal (categorical values with no order like animal, food)
ordinal (categorical values that have order like hot, warm, cold)
numerical
how do you handle numerical attribute in decision tree? and 3 ways you can?
convert to a nominal attribute
1) assign category to numerical and keep trying until you find a good split
2) use entropy value till you find the best split
3) frequency bining
attribute resulting in ____ info gain is selected for split
highest
process of splitting decision tree by attribtiues is continued recursively ____
building tree by splitting data using features that minimise uncertainty at each step
Th is the
entropy threshold
What is the purpose of Th
criterion for deciding when to stop splitting the data at a node or to continue
When entropy of a node is below Th?
If the entropy of a node is below a certain threshold, it means that the data at that node is sufficiently pure (i.e., it mostly contains examples of one class). As a result, the decision tree can stop splitting further at that node, and the node is labeled with the majority class
When entropy of a node is above Th?
If the entropy is above the threshold, it indicates that the data at the node is still impure, meaning there’s a mix of different class labels. In this case, the decision tree continues splitting by choosing the attribute that reduces entropy the most (maximizing information gain)
only use Th=0 when
example is really simple
Th=0, Th>0
=0 perfect order
>1 can tolerate some mixed levels
avoid overfitting by using 1) and 2) and 3)
entropy threshold
pruning
limit depth of tree
gain ratio formula
information gain A/ (#A x A entropy)
want big or small gain ratio and why?
small as prevents selecting attributes that overfit the model by using many small, specific splits