Decision Tree Modeling Flashcards
(97 cards)
What is tree-based learning? What does it do and how?
Tree-based learning is a type of
- supervised machine learning
- performs classification and regression tasks.
- It uses a decision tree as a predictive model to go from observations about an item represented by the branches to conclusions about the items target value represented by the leaves.
Ensemble Learning
which enable you to use multiple decision trees simultaneously in order to produce very powerful models
What’s the benefit of hyperparameter tuning?
Knowing how and when to tune a model can help increase its performance significantly
What is a Decision Tree?
- non-parametric supervised learning algorithm (not based on assumptions about distribution)
- for classification and regression tasks
- It has a hierarchical tree structure consisting of a root node, branches, internal nodes, and leaf nodes.
How data professionals use decision tree?
to make predictions about future events based on the information that is currently available.
Decision Tree PROs
- require no assumptions on data’s distribution
- handle collinearity easily.
- requiring little preprocessing to prepare data for training
Decision Tree CONs
- susceptible to overfitting.
- sensitive to variations in the training data.
The model might get extremely good at predicting scene data, but as soon as new data is introduced, it may not work nearly as well.
What are made at each node?
Decisions are made at each node.
Edges
The edges connect together the nodes essentially directing from one node to the next along the tree.
What is a Root Node?
- It’s the first node in the tree
- all decisions needed to make the prediction will stem from it
- It’s a special type of decision node because it has no predecessors.
What is a Decision Node?
- All the nodes above the leaf nodes.
- The nodes where a decision is made
- always point to a leaf node or other decision nodes within the tree.
Leaf Node
- where a final prediction is made.
- The whole process ends here as they do not split anymore
What are Child Nodes?
- any node that results from a split.
- The nodes that are pointed to either leaf nodes or other decision nodes
What are Parent Nodes?
node that the child splits from
What prediction outcomes types can decision tree be used for?
- classification: where a specific class or outcome is predicted
- regression: where a continuous variable is predicted—like the price of a car.
What is the criteria to split a Decision node?
A decision node is split on the criterion that minimizes the impurity of the classes in their resulting children.
What is Impurity?
- the degree of mixture with respect to class.
- A perfect split would have no impurity in the resulting child nodes; it would partition the data with each child containing only a single class.
Name 4 metrics to determine impurity
- Gini impurity
- entropy
- information gain
- log loss
What’s the requirement for choosing split points?
- identify what type of variable it is—categorical or continuous
- the range of values that exist for that variable
Choosing split for categorical predictor variable
consider splitting based on the categorical variable, ie. color.
Choosing split for continuous predictor variable
splits can be made anywhere along the range of numbers that exist in the data
Ie. sorting the fruit based on diameter: 2.25, 2.75, 3.25, 3.75, 5, and 6.5 centimeters.
Describe Gini impurity score
- most straightforward
- the best scores are those closest to 0
- The worst score is 0.5, which would occur when each child node contains an equal number of each class.
Classification trees PROs
- Require few pre-processing steps.
- Can work with all types of variables (continuous, categorical, discrete).
- No normalization or scaling required
- Decisions are transparent.
- Not affected by extreme univariate values
Name 2 disadvantages of classification trees
- Can be computationally expensive relative to other algorithms.
- sensitive to data changes. Small changes in data can result in significant changes in predictions