Chp 5 Decision Tree Flashcards

(56 cards)

1
Q

Greedy Strategy

A

Split the records based on an attribute test that optimizes certain criterion

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Binary Split

A

Divides values into two subsets

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Multi-way split

A

Use as many partitions as there are distinct values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How to specify continuous attributes

A

sort attribute
create split positions, at halfway points
Determine gini index

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

The lower the gini index

A

the better it is

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Gini Index

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Entropy

A

Amount of uncertainty involved in a distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

In decision tree algorithms, entropy measures

A

Purity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Purity

A

The fraction of observations belonging to a particular class in a node

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Pure node

A

If all observations belong to the same class

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Information

A

Amount of certainty involved in a distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

We need to choose a split in a decision tree that maximizes

A

Information Gain

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Information Gain formula

A

Entropy before split - Entropy after split

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Steps to determine split?

A

Calculate entropy at root node
Calculate each information gain for each attribute split
Pick the attribute split that has the highest information gain

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Decision trees are non/parametric

A

Non parametric because you are not specifying any parameters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Time complexity of building a decision tree

A

O(log n)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Does multicollinearity affect decision tree accuracy?

A

No, just added extra height

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Finding optimal decision tree is

A

expensive, NP complete

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Decision tree can create

A

rectilinear boundaries

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Decision trees cannot create

A

non straight (only up or down) lines

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Model =

A

Algorithm + hypothesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

4 steps of selecting a model

A

Prepare training data
Choose hypothesis set and algorithm
Tune algorithm
Train the model, fit the model to out of sample data (test set) and evaluate results

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Goal of model selection

A

Select the best model from training phase

24
Q

Two ways to evaluate models

A

Model Checking
Performance Estimation

25
The best model is the one that gives you _ and _ well on testing set
Smallest prediction error Generalizes
26
Model Checking
Given a data set, divide it into training and testing dataset
27
What happens if you randomly select test points that are not representative of the population in general?
Cross Validation
28
Cross Validation approaches 3
Holdout K-fold cross validation Leave one out cross validation
29
Cross validation
an approach to systematically create and evaluate multiple models on multiple subsets of the dataset
30
Holdout method
Split 80% training data and 20% testing data randomly
31
k-fold cross validation
Split data into k chunks, train on k-1 chunks and test on the kth chunk. Do this k times and calculate average error
32
Leave one out cross-validation
Extreme version of k-fold where k=1 observation (n chunks)
33
Use of kfold or LOOC resampling methods are more robust if
data is split into training, validation, and testing
34
Typical application of holdout methods is to
determine a stopping point with respect to error, stop when test set error starts increasing (this is overfitting)
35
why split data into three part?
If you model has certain hyper parameters, then you can adjust the hyperparameters on the validation dataset
36
two ways to do performance evaluation of a model
Confusion Matrix Receiver Operating Characteristics (ROC) curve
37
Confusion Matrix
Provides numerous metrics computed from the matrix
38
ROC curve
Characterize the trade off between positive hits and false alarms
39
Can you minimize both FP and FN?
No
40
What does the ROC plot?
The true positive rate against the false positive rate
41
What does the middle line mean
a random guess
42
AUC standards
.5-.6 fail .6-.7 worthless .7-.8 poor .8-.9 good >.9 excellent
43
Overfitting
picking up nuances in training data, matches too much with training data
44
Underfitting
model is too simple that it cant capture patterns
45
Prepruning
halt growth of tree based on some constraint good for shorter trees dont know when to stop
46
Post pruning
Grow tree to maximum size, then trim Gives better results Wastes computer cycles
47
How to prune
Focus on complexity parameter (cp), keep splitting until cp reaches a certain value
48
How to find cp
Use cross validation error and see where it starts to rise again after decreasing, use cp at this value
49
What does pruning do
Gives tree that is more generalized
50
Mostly all datasets have what type of class distribution
imbalanced
51
Cost sensitive learning
Penalizes the model when it commits a false negative error
51
Mitigation balance techniques 3
Cost sensitive learning Sampling techniques Synthetic data
52
Synthetic data
May be generated, if possible, to ensure that the class distribution is equivalent
53
Sampling techniques
modify the class distribution such that the rare class is well represented in the training set
54
Undersampling + con
Gathers less of the majority class observations for training Useful observations may not be part of the sample
55
Oversampling + con
gathers more of the minority class observations for training If training data is noise, oversampling may amplify the noise