IFN580 Week 4: DT Modelling (11%) Flashcards

(21 cards)

1
Q

4Which of the following is not an advantage of a decision tree:
a) Decision making is explainable.
b) Fast inference time.
c) Reasonable training time.
d) Can learn non-linear decision boundaries.
e) Can only handle a small number of features

A

e) Can only handle a small number of features

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

True or false? Random forests can reduce variance (overfitting) through the use of
bagging (bootstrap aggregating).

A

TRUE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How does a boosting model (e.g. XGBoost) achieve greater performance over a
single tree?

A

Fits a tree to residual errors from the previous model. This process is repeated
multiple times

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What approach do Decision Trees belong to?

A

Supervised Learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the purpose of a Decision Tree?

A

To split the data based on ATTRIBUTES in order to classify/predict

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is entropy?

A

The measure of impurity

E(s) = -(pi * log2 * pi)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the root node?

A

The starting point

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a decision node?

A

Internal nodes where the data splits

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a branch?

A

The path from one node to another

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are leaf nodes?

A

Terminal nodes that contain the prediction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the Gini formula?

A

Gini(s) = -(pi2)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is Pruning?

A

A technique that removes useless sections. Prevents overfitting and helps generalise the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

A large tree depth results in:

A

Low bias, high variance

which leads to overfitting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is a decision boundary?

A

The line that separates different classes/predicted values in the feature space

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is a maximal tree?

A

A tree with maximum depth achieved. It has low bias, high variance and is therefore overfitted

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what is an optimal tree?

A

A tree with good generalisation

17
Q

What is ensemble modelling?

A

When multiple trees are combined to improve prediction

18
Q

What is random forest?

A

Same as bagging + nodes split at random subsets of features

19
Q

What is Bagging?

A

An ensemble method that trains multiple models on random subsets of data

20
Q

What is boosting?

A

When multiple trees are built sequentially in an attempt to fix errors from the previous one e.g. XGBoost

21
Q

What is early stopping?

A

A pre-pruning technique where you set a minimum size for the tree leaves