Data Mining - Lecture Decision Tree's Flashcards

1
Q

Why is decision tree popular classification technique?

A
  • Performs well in a wide range of situations
  • Does not require much effort from the analyst
  • Easy to understand
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

In each nodes there are brackets. Based on the lecture, what do we assume is the order in the brackets?

A

[Non acceptor, acceptor] –> meaning [Negative, Positive]

Unless it is stated otherwise.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Where do we look at if we either want to look at incorrectly predicted or the amount of TN/TP/FP/FN ?

A

At the leave nodes and then to their color and the values in their brackets.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Which two types of split do you have for Nominal attributes?

A
  1. Multi-way split.
    You can split it in as many categories you want.
  2. Binary split
    You split it in two subsets. Might need to combine attributes.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Which two types of split do you have for continious attributes?

A
  1. Descretization
    Basically a multi-way split but each category is a range of values then.
  2. Binary decision.
    Two subsets. You have to find the best cut among possible splits
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How do you determine the best split?

A

You look at the Information gain per split. You compute a measure of impurity that can either be Gini Index or Entropy and than you look at which split has the highest number.

The lower the Gini, the higher the information gain the better.

You use this when you are comparing splits!!!!!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How do you compute the GINI index?

A

If you have a split, you have two (or more) classes.

For each class, you divide the #records in that class by the #records of that node level. This way you have the proportion per class.

You square those proportions and you subtract them both from 1. That is your GINI.

Example: Class 1 has 2 and Class 2 has 4. Total of node level = 6.

1 - (2/6)^2 - (4/6)^2 = GINI

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the number in the nodes?

A

The main number is the total amount in the node.
The number in the brackets is the amount per class.
Remember to compute TP, TN etc. only with the leave nodes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the combined impurity?

A

You calculate the GINI index for both nodes in which a layer above is split.

You then perform a weighted average to get the combined GINI:

((#records Node 1 /((#records node 1 + 2) * GINI Node 1) )+ ((# Records node 2 / ((#Records node 1 + 2) * GINI Node 2)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is Entropy measure?

A

Similar to GINI, but a different computation.

  • (proportie node 1)log2 (proportie node 1) - (proportie node 2)log2(proportie node 2) etc.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly