Week 8 Flashcards

Question 1

Q

Two main reasons we want to limit factors

Answer

A

1) Overfitting. WHen # factors is close to or later than the # of data points than the model might fit too closely to random effects. Few data points can lead to overfitting, casting bad estimates
2) Simplicity: SImple models are better than complex ones. Less data is required. LEss chance of insignificant factors. Easier to interpret.

Question 2

Q

Illegal factors for credit decisions

Answer

A

1) Race, sex, religion, marital status for credit decisions
2) Can;t use factors highly correlated with forbidden ones
3) Hard to demonstrate that a complex model is ok

Question 3

Q

Forward selection

Answer

A

Start with a model that has no factors. At each step, we find the best new factor to add and put it in as long as it is a good enough improvement. If there are no factors good enough to add or when we have added as many factors as we want, we stop. We can optionally remove factors that aren’t good enough. The definitions of good and good enough are parameters that we can set. It is common to allow new factors to enter the model if p value is below than .1 or .15 just for exploration. When it is time to remove factors , we may remove factors that have a pa value greater than .05.

Question 4

Q

Backward selection

Answer

A

Opposite of forward selection. Start with a model of all factors and at each step we find the worst factor and remove it from the model We repeat until there aren’t factors bad enough to remove and the model doesn’t have more factors than we want

Question 5

Q

STepwise regression

Answer

A

Start with all factors or no factors. At each step we add or remove a factor. After adding each new factor and at the end, we eliminate all factors that no longer appear to be good. Allows the model to adjust if a factor that we earlier thought we needed no longer seems necessary thanks to new factors added to the model.

Question 6

Q

Who do we choose good variables?

Answer

A

p value, r squared, bic, aic

Question 7

Q

Stepwise selection

Answer

A

Decisions are made step by step
Known as greedy algorithm,
At each step take one thing that looks best
Future options are not considered

Question 8

Q

DO we need to scale data before using Lasso?

Question 9

Q

How do we choose T

Answer

A

Depends on number of variables
Quality of model as you allow more variables
The best route is to use LASSO with different values of T and see which value gives you the best tradeoff

Question 10

Q

Elastic net

Answer

A

Almost the same as LASSO, but the constrain combination of absolute value of coefficients and their squares. Need to choose the appropriate value of T and gamma.

Question 11

Q

Ridge Regression

Answer

A

Take out absolute value term from ELastic Net
DOesn’t do variable selection but can lead to better models

Question 12

Q

Lasso regression vs ridge regression

Answer

A

The objective of the two to choose coefficients a1 and a2 that minimize the total error are exactly the same. The differences are in the restriction or the constraints on the coefficients.
Because possible coefficients for lasso are a diamond,the quadratic equation can touch a corner, where some of the coefficients are 0 so those variables are not selected.

Because possible coefficients of a ridge is a circle defined by quadratic function the quadratic error function is unlikely to touch at a corner, so all coefficients are possible and all variables are a part of the model.

Question 13

Q

BIas-variance trade off

Answer

A

Underfit model
High bias: Miss or minimize real effects.
LOw variance: less differentiation between predictors
Underfitting the real effects b y eliminating variance from random effects

MORE FIT
LOw Bias: Real effects are models well
High variance: More differentiation between predictions
By fitting too much, we get unwanted variance from random patterns. Better fit to real patterns. MOre fit to random patterns

Question 14

Q

What is ridge regression used for?

Answer

A

If there is no ridge constraint the regression solution would be the point in the middle of the ellipses. You can think of it as a 0 sized ellipse with a lowest error that can be achieved in linear regression. Adding ridge regression constraint moves the solution point inward towards the origin with all coefficients getting smaller than in linear regression solution.

The same thing happens no matter where the original point is and no matter what the size of the circle is. If the original linear regression point is outside the circle in any direction ridge regression decreases the magnitude of all of the coefficients. The tighter the constraint gets, the smaller the circle gets, the smaller the magnitude of each coefficient gets.

If the original linear regression point is already inside the circle, then adding the ridge regression constraint doesn’t change the solution. To decrease the coefficients youd have to reduce the size of the circle by changing the value of toa ridge until the linear regression point is outside the circle and then the coefficients would decrease.

Question 15

Q

What does reducing the magnitude of each coefficient do?

Answer

A

We reduce our fit to both real and random patterns so it gets less overfit. We reduce variance in our model.

Question 16

Q

WHat type of approach is ridge regression?

Answer

Study These Flashcards

A

Regularization. It doesn’t select variables but it can help reduce overfitting by simplifying not the number of variables but the magnitude of each ones effect of a model. In Tao ridge is too small and reduces the coefficients too much, the model may be underfit. Have to be careful to not go too far.

If the Linear regression model is overfit, ridge regression can be used to fix the problem

Question 17

Q

Forward selection , bw elimination, setepwise:

Answer

Study These Flashcards

A

Good for initial data analysis. Point out variables that are worth exploring further. STepwise regression is most common a s generalization of the other two. Can give a set of variables that fit more to random effects than you’d like and appear to have a better fit(rquared) than it realy has. When testing on different data they don’t perform as well

Lasso and Elastic net are slower to computer but result in better prediction.

Recommendation is to use more advanced slower models uness you are just doing intro data exploration where you can use greedy methods first and then build a more advanced model with LAsso or elastic net.

Question 18

Q

Advantages of elastic net

Answer

Study These Flashcards

A

Variable selection benefits of LASSO
Predictive benefits of ridge regression

Question 19

Q

Disadvantages of elastic net

Answer

Study These Flashcards

A

Arbitrary rules out some correlated variables like LASSO
Underestimates coefficients of very predictive variables like RIdge Regression

Week 8 Flashcards

(19 cards)