Using a Holdout Set and LASSO Flashcards

1
Q

Define a houldout set.

A

A selection of the original data that is not used for estimation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Why do we use a holdout set?

A

To help reinforce external validity for live data instead of finding the best model for the original data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the steps to evalutating a prediction using a holdout set?

A
  1. Split the original data into a larger work set and a smaller holdout set
  2. Further split the work set into training sets for k-fold cross-validation
  3. Build models and select the best model using k-fold cross-validation
  4. Re-estimate the best model using all observations in the work set
  5. Take the estimated best model and apply it to the holdout set
  6. Evaluate the prediction using the holdout set
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

T/F: The main reason for model selection problem is that we can try out every potentail combination.

A

False: We cannot try out every single combination, there would be too much!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the two methods to build models?

A
  1. By hand - specifying variables and model
  2. Using smart algorithms
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the pros and cons of using LASSO?

A

Pros: no need to use ouside info for the model
Cons: may be sensitive to overfitting, hard to interpret

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the LASSO method?

A

LASSO - least absolute shrinkage and selection operator
- A method to select variables to include in a linear regression to produce good predictions and avoid overfitting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the two accomplishments that LASSO does at the same time?

A
  1. It selects a subset of the right-hand-side variables, dropping the other variables.
  2. It shrinks coefficients for some variables that it keeps in the regression
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Define the Tuning Parameter.

A

the weight for the penalty term vs OLS fit. This helps strengthen variable selection.

Note: A lambda of 0 means the regression is OLS.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What happens when we have an aggressive threshold? A lenient one?

A

Agressive: higher threshold and fewer variables are left in the regression
Lenient: lower threshold, and more variables left in the regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

T/F: LASSO modifies the way regression coefficients are estimated by adding a penalty term for too many coefficients

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

T/F: Big data leads to larger estimation error

A

False: it makes a smaller estimation error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

T/F: Lasso creates biased estimates.

A

True. This is because it shrinks the coefficients, which creates slight bias in the estimates.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Why is it okay for LASSO to create bias in its estimates? Does this bias mean LASSO is inferior to OLS?

A

Remember that lambda searches for the smallest total loss, so it optimizes both bias and variance. Although OLS does not create biased estiments, its variance may increase the total loss. So, LASSO is not inferior to OLS because of this bias.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What steps of model building does LASSO substitute? What do we decide?

A

We decide the initial right hand variables.

LASSO decides the lambda and which variables to include in the final regression.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly