Session 10 - Elastic Net Flashcards

1
Q

What is the strength of penalty determined by for regularised regression?

A

Strength of tuning parameter lambda

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Lasso is function of….

A

Sum of absolute values of coefficients

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

For regularised regression the estimates of β are obtained by what?

A

Minimising the penalized RSS (𝑅𝑆𝑆(𝜆) )

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does not force coefficients to 0 which means it cannot be used for variable selection and is not easy to interpret?

A

Ridge regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Lasso regression….

A

Produces 0 coefficients and thus performs variable selection - Adds interpretability

Grouped variables: From several strongly correlated covariates typically one only is selected.

If the number of variables (p) is larger than the sample size (n), the lasso selects at most n variables – may be disadvantage if have very high dimensional data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Shrinking coefficient values has the effect of…..

A

Stabilizing the variability of the estimation, i.e. the complexity of the model; the penalization does add some bias.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

If we want to predict GP visits based on 8 predictor variables what regularised regression method should we choose?

A

Ridge

Ridge: allows correlated variables to be included in the model but does not perform variable selection:

  • not useful for a large number of variables
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

We have a cohort database with several hundreds of variables collected at baseline of patients “at risk of mental state” (ARMS).

About 15% of the patients developed a psychosis.

Based on the available variables we want to build a prognostic model to predict the likelihood of developing a psychoses.

This model should be used in clinical practice for a risk assessment.

What regularised regression method should we choose?

A

Lasso – as this is a clinical assessment want to ensure clinicians do not have to take into account too many variables

Lasso: performs variable selection but has got problems with selecting groups of correlated variables, such as set of genes or brain voxels

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

The aim of DNA micro array experiments is to detect differential gene expression.

E.g. to identify genes expression changes under different treatment conditions or among different types of cell samples.

Often thousands or hundred of thousands of genes are tested on the array for expression changes.

Research question:
We want to identify predictors of depression using a case-control dataset with 521135 gene expressions.

What regularised regression method should we choose?

A

We need a method which allows variable selection but selects a set of correlated variables

A combination of the Ridge and Lasso would sometimes be useful
- Elastic net regression is a hybrid approach that blends both penalization of the L2andL1 norms.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

We want to predict autism
based on brain activity
differences when someone
looks at the person

3-Dimensional Data
64x64 voxel matrix
43 slices in the brain
= ≈176128 voxels
About 1/3 of this area is studied
- ≈ 50,000 voxels/ hypotheses tests
Add time as 4th dimension
and will will get into 100.000s
-Massive multiple
comparisons problem!

What regularised regression method should we use?

A

We need a method which allows variable selection but selects a set of correlated variables

A combination of the Ridge and Lasso would sometimes be useful
- Elastic net regression is a hybrid approach that blends both penalization of the L2andL1 norms.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is elastic net regression?

A

Hybrid approach that blends both penalization of the L2andL1 norms.

It is a generalization of ridge, lasso and unregularized linear regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the elastic net regression formula?

A

The l1 (“lasso”) part of the penalty
generates a sparse model and performs variable selection

The l2 (“ridge”) part of the penalty
Removes the limitation of the number of selected variables
Encourages grouping effect: Selects groups of correlated variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What do we get if we set λ1 to 0 for elastic net?

A

l2 penalty or Ridge regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What do we get if we set λ2 to 0 for elastic net?

A

L1 penalty or lasso regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What do we get if we set λ1 and λ2 to 0 for elastic net?

A

“Normal” linear regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

The usual approach to optimizing the lambda “hyper-parameter” is through cross-validation by minimizing the mean squared prediction error.

But in elastic net regression…

A

the optimal lambda1 hyper-parameter (tuning parameter) also depends on the lambda2 hyper-parameter (“hyper-hyper-parameter”).

17
Q

How do we find best lambda for elastic net?

A

Tuning of Elastic Net

18
Q

In the elastic net we need to tune two parameters

How can this be pursued?

A
  1. We start with a fixed λ2 and then select the best λ1 using cross-validation and record the cross-validated prediction MSE
  2. We then choose another λ2 and select the best λ1 and record the cross-validated MSE.
  3. We repeat this procedure for a large number of λ2 and select then the model with λ1 and λ2, which minimizes the cross-validated MSE

However, glmnet uses a slightly different approach because this selection of lamda causes computational problems and may not always find best solution

19
Q

Why does EN not use L1 and L2 penalties to find estimate of regression coefficient?

A

Because finding two lambdas using cross validation doesn’t work well.

20
Q

L1 and L2 Elastic net estimate of β is obtained by minimising what?

A

𝑅𝑆𝑆(𝛽(𝐿1&2))=..

21
Q

Finding the two lambdas using CV does not work very well in EN.

How can prediction performance be improved?

A

Mixing parameter alpha is used:

Here have only one type of lambda and not lambda 1 and 2 penalty terms

Modify alpha for L1 penalty term with lambda and 1- alpha for L2 penalty terms

Unless have lambda of .5 the lambdas will differ between two penalty terms and by changing range of alpha can test a range of different penalties

Alpha range between 0 and 1 whilst lambda can be greater than 0

22
Q

What is lambda?

A

The shrinkage parameter

23
Q

What is alpha?

A

The elastic net is the ratio between L1 and L2 penalty! (for details: See machine learning course)

24
Q

If α= 1 =

A

Lasso

25
Q

If α = 0 =

A

Ridge

26
Q

En requires an α value between what?

A

0 and 1

27
Q

The best α and lambda are those values that ..

A

Minimizes the cross-validation error.

28
Q

How can we tune the model?

A

Loop 1: (outer loop)
For alpha = 0, 0.01, 0.02, ….1

Loop 2 (inner loop)
For lambda = 0, 0.01, 0.02, ….
Find best lambda through cross-validation:
= Lambda with smallest MSE of prediction

Save lambda and MSE
Next lambda
Next alpha

Compare the MSE for all alphas.

The alpha with smallest MSE is the best tuning parameters

Use this alpha and respective lambda for final model with all data

29
Q

The EN performs simultaneous…

A

Regularization and variable selection.

30
Q

EN has ability to perform what?

A

Grouped selection

31
Q

What is EN appropriate for?

A

The p&raquo_space; n problems – more variables than sample size

Appropriate for bioinformatics and brain imaging studies

32
Q

For EN two tuning parameter need to be…

A

Trained by performing a grid search and identifying the pair of parameters, which minimize the cross-validated MSE of prediction

33
Q

EN is extend to…

A

Generalized linear model (using glmnet) but also to sparse PCA, DFA, support vector machines, etc.