Random Forest Flashcards

1
Q

Inductive Learning

A

also known as discovery learning, is a process where the learner discovers rules by observing examples. This is different from deductive learning, where students are given rules that they then need to apply.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Decision Tree Strucutre

A

Consists of root notes (where the tree starts)
Branches (splits with children)
Leaf nodes (end of the tree - represents possible outcomes)
nodes {where a parent and child meet}

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Experience Table

A

A labeled data set with your target variable and all of the features for which data was collected

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What kind of algorithm will we use for our decision trees?

A

ID3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Decision Tree Algorithm

A

(1) Choose the best attribute to split the remaining instances - that becomes the root
(2) repeat process with children
(3) stop when - all instances have the same target attribute value, there are no more attributes, or there are no more instances

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How do you identify the best attribute to become the root of your decision tree?

A

Information gain

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What makes a good decision tree?

A

It must be small AND classify accurately

small trees are less susceptible to overfitting and are easier to understand

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Information Gain and Impurity Levels

A

{xxxxxyxxxxyxxx} not pure
{xxxxxxxxxxxxxx} as pure as it gets
{xxxxxxxyyyyyyyy} least pure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Information Gain

A

We want to determine which attribute in a given set of training feature vectors is most useful for discriminating between classes to be learned.

Information gain tells us how important a given attribute of the feature vectors is

We use it to decide the order of attributes in the nodes of a decision tree

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Decision Tree CONS

A

suffer from a problem of errors propagating throughout the tree (becomes more of an issue as number of classes increases)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Error Propogation

A

Since decision trees work by a series of local decisions, what happens when one of these local decisions is wrong? Everything beyond that point is incorrect, and we may never return to the right path

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Noisy data in decision trees

A

When 2 values have the same attribute / values pairs but different classifications

some values of the attributes are incorrect because of errors in the data acquisition process or the preprocessing phase

Some attributes may be irrelevant to the decision making process (the color of a dice used to roll)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Overfitting in Decision Trees

A

Irrelevant attributes can VERY EASILIY lead to overfitting

Too little training data can also lead to overfitting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How to avoid overfitting in Decision Trees

A

Stop growing the tree when the data split is not statistically significant

Acquire more training data

Remove irrelevant attributes

Grow a full tree then post - prune

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How to select the best decision tree

A

Measure performance over training data
Measure performance over separate validation sets
Add complexity penalty to performance measure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Bootstrapping

A

Very important across all of statistics. You create new datasets by sampling with replacement from your original dataset. Some values may get repeated in your set.

The closer your Boostrap N to the original N, to more overlap you will get

17
Q

Evaluating decision trees

A

Accuracy - how many things can it classify correctly?
Scalability - performs model generation and prediction functions, larger datasets, and speed
Robustness - how well does it perform with missing or noisy data
Intuitive appeal - Results are easily understood, decisions can be made

18
Q

Ensemble Learning

A

Combining weak classifiers in order to produce a strong classifier

19
Q

Random Forest

A

Solving the weaknesses of decision trees by introducing randomness to the equation.

Random Forest is an ensemble classifier that consists of many decision trees and outputs the class that is the mode of the individual trees

Introduces idea of bagging

20
Q

Bagging

A

Bootstrap Aggregation. Used to avoid overfitting (important since RF trees are unpruned) and to improve accuracy / stability

It is broken into two steps - bootstrap a sample set and aggregate

3 variables:
n - data in your original dataset
n’ - data you want in each bag
m - number of bags to create

Works best where n’ < n (about 60% is a typical number)

bagging reduces the variance of the base learner but has a limited effect on the bias.

Strongest if you are using strong learners

21
Q

adaBoosting

A

A variation on bagging, where points that are modeled poorly in your ensemble are weighted to get picked a little better in the subsequent ‘random’ bag of data.

22
Q

Random Forest Algorithm

A

1 - assign variables: N = training cases, M = total features to classify

2 - m = input variables to be used at nodes of tree (should be smaller than M)

3 - chose a training set for the tree - either bagging or adaboost)

4 - at each node of the tree, randomly choose m variable to use. Calculate the best split on these m variables in the training set

5 - grow each tree fully, and do not prune

for a new prediction, a new sample is pushed down all the three and the average vote of all the trees is the prediction it is given

23
Q

Random forest algorithm simplified

A

Grow a forest of many trees. (R default is 500)

Grow each tree on an independent bootstrap sample* from the training data.

At each node:
Select m variables at random out of all M possible variables (independently for each node).

Find the best split on the selected m variables.

Grow the trees to maximum depth
(classification).

Vote/average the trees to get predictions for new data.

*Sample N cases at random with replacement.

24
Q

Random Forest Pros

A

-Can classify and regress
-Handle categorical predictors
-Computationally simple and quick
-No distribution assumptions
-Can handle highly non linear data / classifications
-Automatic variable selection
-Hardy to overfitting
-Handles missing values (using proximities)

25
Q

Random forest cons

A

HARD TO UNDERSTAND

26
Q

How does Random Forest imrpove on decision trees

A

Accuracy & instability (if you change the data a little, the individual trees may change but the forest remains stable)