Decision Trees Flashcards

1
Q

What is a decision tree?

A

It is a process of sorting things via a set of parameters and their order

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the idea of Entropy of information?

A

The information of a message is inversely proportional to the probability of a message

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the 2 equations fo the entropy of information?

A

Information:

I(x) = -log2P(x)

H = E[I] = ∑i - pilog2pi

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the equation for total entropy?

A

This is the sum of all the entropies for each class

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How can the entropy of information be applied to compression algorithms?

A

When we have 2 classes (binary), the probability of 1 and 0 can be related by the following equation: p0 + p1 = 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How do you decide what parameters to use and in what order?

A

We want the process in which the change in entropy from a parameter is greatest

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the equation for informatin gain?

A

G(S,F) = H(S) - ∑f∈values(F) |Sf| / |S| × H(Sf)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the ID3 philosophy?

A

Chose the method that has the greatest entropy gain

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the ID3 algorithm?

A

IF: all examples have the same label => return a leaf with that label

ELSE IF: there are no features left to test => return leaf with the most common label

ELSE: Chose the feature F that maximises the informaation gain of S to be the next node. Add a branch of the node for each possible value f in F.

For each branch:

> Calculate Sf by removing F from the set of features

> Recursively call the algorithm with Sf to compute the gain relative to the current set of examples

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the characteristics of the ID3 Algorithm?

A

> Greedy with respect to G. It always goes for the path with the greatest entropy gain. This can lead to a local minima

> Deals with noisy data by assigning the label to the most common class

> Always uses features so it is prone to overfitting

  • Uses pruning
  • Uses continueous varaibles - Can deal with missing attributes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the CART algorithm?

A

This is the same as the ID3 but instead of using entropy we use the gini impurity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the equation fo gini impurity?

A

G(s) = ∑iC (pi(1 - pi))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the equation for the gini split?

A

G(S,F) = G(S) - ∑f∈values(F) |Sf| / |S| × G(Sf)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the issue with ID3 and Gini?

A

Both algorithms are greedy so can find a local minima. It is important to produce different decision trees.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is a random forrest?

A

This is creating 1000s of trees with different subsets. this is called a forrest.

Each tree uses a random fraction of the features and a random fraction of training data points

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the sources of randomness?

A

> Random subset of features

> Random subset of training points

17
Q

What is the idea of combining trees?

A

Each tree votes for a class. The random forest classes the majoriity of the trees. This is called an ensable which is when you combine many weak classifiers into 1 strong one.

18
Q

How does ID3 chose what feature to split with?

A

The features that produces the greatest information gain

19
Q

What type of data can decision trees classify that normal MLPs cannot?

A

Non Numerical data

20
Q
A