Regularization Flashcards

Question 1

Q

Definition

Answer

A

Regularization aims to lower the flexibility of a model by restricting, or shrinking, the coefficient estimates to be closer to 0. Variance would be reduced at the expense of some bias. This is done by optimizing a loss function (e.g. minimizing SSE or minimizing the negative log-likelihood) that includes a penalty with a hyperparameter (lambda) that penalizes large coefficients.

Question 2

Q

Ridge Regression

Answer

A

b’s estimated by minimizing SEE + penalty -> penalty = lambda *(sum of squares of model coefficients)

Lambda is called the shrinkage parameter; lambda tuned using cv.
-With a finite lambda, none of the ridge estimates will equal 0.

Question 3

Q

Lasso Regression

Answer

A

Least Absolute Selection and Shrinkage Operator

b’s estimated by minimizing SEE + penalty -> penalty = lambda *(sum of absolute values of model coefficients)

Lambda is called the shrinkage parameter; lambda tuned using cv.
-With a finite lambda, the lasso/elastic net estimates could equal 0. -> Remove predictors, or variable selection

Question 4

Q

Elastic Net Regression

Answer

A

A more general regularization where b’s estimated by minimizing SEE + penalty -> penalty = lambda [alphasum of absolute values of model coefficients+(1-alpha)*sum of squares of model coefficients]

Which is a weighted average of the lasso and ridge quantities. When alpha = 0, this is ridge and when alpha = 1, this is lasso.

alpha tuned manually (usually by using a limited number of options) and lambda tuned using cv -> compare results from cv for each alpha (e.g. test RMSE) and select best alpha/lambda combo
-With a finite lambda, the lasso/elastic net estimates could equal 0. -> Remove predictors.

There are two parameters to select when performing regularization with elastic net. The alpha parameter selects ridge regression, the lasso, or a combination. The lambda parameter controls the regularization penalty. For a given alpha, the program selects the lambda that minimizes cross-validation error on the training set. Then a model fit with that lambda can be fit and evaluated against the testing set.

Question 5

Q

Notes

Answer

A

Lambda (non-negative):
–controls the size of the overall penalty, or shrinkage of the estimates. When it is 0 (no shrinkage), the estimates will match the OLS estimates; when it approaches infinity (maximum shrinkage), the estimates excluding the intercept’s will be equal to 0 (i.e. no predictors).
–lambda is inversely related to flexibility.
–Increasing lambda will cause the restricted quantity to approach 0; however it is possible for individual b_j’s to deviate away from 0.

-lambda and alpha are hyperparameters which should be tuned to find their optimal values.

-Regularization ignores the hierarchical principle.

Question 6

Q

Model Performance

Answer

A

Evaluate model using test RMSE or another performance metric.

Question 7

Q

Pros

Answer

A

–Useful in high-dimensional settings; Avoids overfitting especially when the number of observations is small compared to the number of predictors.
–Can reduce the size of coefficients without entirely eliminating variables
–Can perform variable selection to reduce model complexity

–binarization required and done automatically when setting up matrix object
–cv.gmlnet performs cv using 10 folds by default
–requires standardized variables which it does by default
->variables on a larger scale typically have smaller coefficients. without standardizing, the regularization will focus on shrinking the variables on a small scale over those on a large scale.

Brainscape's Knowledge GenomeTM

Regularization Flashcards

Brainscape's Knowledge Genome^TM