7. Tree-based algorithms Flashcards

1
Q

Purpose of tree-based algorithms

A

Prediction

( applicable to regression and classification )

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Impurity measures

A

Used to determine the quality of the split
- Gini Index
- Entropy
- Re-substitution error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Explain decision tree alg

A

(0. Pre-processing e.g. binarization, else use regression tree instead of classification tree)
1. Recursive binary splitting and determine splits via impurity measures
2. Improve with Cost complexity pruning –> Grow large tree and prune it back.
( Select tuning parameter using (k-fold) cross-validation )

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Decision Tree advantages

A
  1. Low pre-processing effort (normalization, scaling not required)
  2. Missing values have little effect
  3. Easy to explain and interpret (closely mimic human decision-making)
  4. Handle qualitative predictors without need to create dummy variables ( qualitative, quantitative, continuous, and discrete variables )
  5. Faster than RF
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Decision Tree disadvantages

A
  1. Lower predictive accuracy (vs. Regression / Classification)
    –> improve by aggregation, at loss of interpretability and speed
  2. Overfitting risk (unlike Bagging/RF)
  3. Instability for changes in data
  4. Can become quite Complex (expensive)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Effect of Bagging, Boosting, RF

A

+ Increase predictive accuracy (lower variance)

  • Lower interpretability
  • Lower speed (complexity)

RF: Adds random selection to bagging, to produce uncorrelated trees. Less risk of overfitting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Explaining Bagging

A
  • Many weak learners (bootstrap samples) are trained.
  • Take the mean of these estimation over the collection of bootstrap samples
    –> The overall prediction is the most commonly occurring class among the predictions (== majority vote)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Explain Boosting

A

System of ensemble learners, using a gradient boosting function to iteratively train models that use data values that have been modelled poorly in previous iterations.

( Bags chosen at random with replacement = bootstrap samples)
+ Improve performance and reduce variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Boosting vs. RF vs. Bagging

A

Boosting:
— Selects from all predictive variables
— Sequentially depends on error rate of previous iteration
— 3 Tuning Param
Benefit: Learns with previous error term

RF:
— Selects from subset of predictive variables
— Built independently at each iteration
— 2 Tuning Param

Bagging:
— Aggregation (by mean) of bootstrap samples

How well did you know this?
1
Not at all
2
3
4
5
Perfectly