decision trees Flashcards

1
Q

arrange the data into predefined groups

A

classification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the difference between clustering and classfication?

A

depends whether categories are predefined or not

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are examples of classification

A

classifying emails as “legit” or “spam”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what are common algorithms of classification?

A

decision tree analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the processes of how classficiation works

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the processes of how classficiation works

A

1) choose classes and a set of classifying attributes
2) choose a set of records (randomly) for the training set
3) choose a set of records (randomly) for the test set
4) using the training set, create a model to predict the chosen class as a function of the other classifying attritbutes
5) evaluate the model using the test set
6) classify future records by applying the “best” model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What attributes of families made it more likely they rented a certain class of apartment?

A

To answer this, we can run a classification algorithm over the training data
- the data has known outcomes, we know what class of aprtment each family ended up renting
- the outcome can be displayed on a decision tree

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

The classifcation of events, outcomes, things etc

A

Decision tree

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what is the first question of the decision tree called

A

root node

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the second question of decision tree called

A

split or partition

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what is the third or ifnal part of decision tree called?

A

terminal node or lead

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How do you create a decision tree using training data?

A

find a way to split (partition) the training dataset into smaller sub-groups

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How to decide what attributes and rules to split at each node? (which partition is better than others?)?

A

Several algorithms. One appraoch: recursive paritioning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the steps to recursive partitioning steps?

A
  1. pick one of the predictor variable, Xi
  2. Pick a value of Xi (says, si) that dvides the training data into two portions
  3. measure how “pure” each of the resulting portions (subgroups) are
  4. The idea is to pick Xi and Si to maximize purity improvement in one step
  5. REPEAT the process for each fo the subgroups until a predetermined number of subgroups is reached, or until improvements in purity become too small
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What does pure mean?

A

containing records of mostly one class

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What occurs at each split?

A

the algorithm maximizes the purity of the resulting nodes
- a node is 100% pure when all of its data belongs to a single class
- A node is 100% impure (0% pure) when a node is split evenly 50/50

17
Q

what is the objective of producing a decision tree verizon?

A

split the training population to maximize overall purity gain

18
Q

I(N) is the IMpurity of

A

node
- formula is on slide 19 of decision trees powerpoint

19
Q

What is overfitting?

A

when the model fits the training data so well that it doesn’t do well with new data

20
Q

how to reduce the risk of overfitting

A

-pruning
-specifying a minimum bucket (node) size

21
Q

what is pruning

A

it reduces the size of decision trees by removing splits that provide little power to classify instances