Regression Forest Flashcards

1
Q

What are the pros and cons of CART?

A

Pros: great at capturing non-linear, complicated panels, yielding a model that can be explain
Cons: It has poor prediction preformance, does not avoid overfitting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the CART problem?

A

Dependence on individual observations is high. And early decisions may depend on small differences between choices

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Define ensemble methods.

A

Combine the results of many imperfect models to produce a prediction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Explain the bootstrap process.

A
  1. Start with the original dataset and draw many repeated samples with replacement
  2. Repeat when sample reaches the size of the original dataset
  3. Repeat for another tree.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Define bagging

A

B - bootstrap the sample (and create many samples
Agg = average the results

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the benefits of the bagging process?

A

It increases the stability of results to create better out-of-sample performance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Why do we limit the number of x variables used for the random forest?

A

It helps reduce the risk of overfitting.

If one variable has high prominence over the others, it can cause multiple trees to follow the same path and look similar. Which leads to highly correlated trees

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

T/F: By decorrelating trees, we are artificially making each model worse, but together the model is better.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the most important elements of a random forest?

A

Bagging: aggregating the predictions of many trees grown on bootstrap samples of data
Stopping Rule: use a less restricting rule to let trees grow large
Decorrelate Trees: Only use a subset of the variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What does a partial dependence plot show?

A

How average y differs for different values of xi when all other x values are the same

(partial because differences are conditional on all other x variables)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How do we decide which variables are most useful in predictions?

A

Use a variable importance plot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly