Regularization Flashcards

1
Q

Definition

A

Regularization aims to lower the flexibility of a model by restricting, or shrinking, the coefficient estimates to be closer to 0. Variance would be reduced at the expense of some bias. This is done by optimizing a loss function (e.g. minimizing SSE or minimizing the negative log-likelihood) that includes a penalty with a hyperparameter (lambda) that penalizes large coefficients.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Ridge Regression

A

b’s estimated by minimizing SEE + penalty -> penalty = lambda *(sum of squares of model coefficients)

Lambda is called the shrinkage parameter; lambda tuned using cv.
-With a finite lambda, none of the ridge estimates will equal 0.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Lasso Regression

A

Least Absolute Selection and Shrinkage Operator

b’s estimated by minimizing SEE + penalty -> penalty = lambda *(sum of absolute values of model coefficients)

Lambda is called the shrinkage parameter; lambda tuned using cv.
-With a finite lambda, the lasso/elastic net estimates could equal 0. -> Remove predictors, or variable selection

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Elastic Net Regression

A

A more general regularization where b’s estimated by minimizing SEE + penalty -> penalty = lambda [alphasum of absolute values of model coefficients+(1-alpha)*sum of squares of model coefficients]

Which is a weighted average of the lasso and ridge quantities. When alpha = 0, this is ridge and when alpha = 1, this is lasso.

alpha tuned manually (usually by using a limited number of options) and lambda tuned using cv -> compare results from cv for each alpha (e.g. test RMSE) and select best alpha/lambda combo
-With a finite lambda, the lasso/elastic net estimates could equal 0. -> Remove predictors.

There are two parameters to select when performing regularization with elastic net. The alpha parameter selects ridge regression, the lasso, or a combination. The lambda parameter controls the regularization penalty. For a given alpha, the program selects the lambda that minimizes cross-validation error on the training set. Then a model fit with that lambda can be fit and evaluated against the testing set.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Notes

A

Lambda (non-negative):
–controls the size of the overall penalty, or shrinkage of the estimates. When it is 0 (no shrinkage), the estimates will match the OLS estimates; when it approaches infinity (maximum shrinkage), the estimates excluding the intercept’s will be equal to 0 (i.e. no predictors).
–lambda is inversely related to flexibility.
–Increasing lambda will cause the restricted quantity to approach 0; however it is possible for individual b_j’s to deviate away from 0.

-lambda and alpha are hyperparameters which should be tuned to find their optimal values.

-Regularization ignores the hierarchical principle.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Model Performance

A

Evaluate model using test RMSE or another performance metric.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Pros

A

–Useful in high-dimensional settings; Avoids overfitting especially when the number of observations is small compared to the number of predictors.
–Can reduce the size of coefficients without entirely eliminating variables
–Can perform variable selection to reduce model complexity

–binarization required and done automatically when setting up matrix object
–cv.gmlnet performs cv using 10 folds by default
–requires standardized variables which it does by default
->variables on a larger scale typically have smaller coefficients. without standardizing, the regularization will focus on shrinking the variables on a small scale over those on a large scale.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly