Session 9- Logistic Regression Flashcards

1
Q

What is the basic principle of penalised methods?

A

Improve prediction accuracy by reducing the variability of the regression estimates at the cost of increased bias

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Penalised/ Regularized Regression allows what?

A

Automatic variable selection by shrinkage:

The coefficients of the weaker predictors are shrunk towards zero.
- useful for high dimensional data if the number of variables is close or larger than sample size

  • deals effectively with multi-collinearity
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Variance bias trade off due to shrinkage usually results what?

A

A model which predicts better unseen case than normal regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a big advantage of regularized regression methods especially if using lasso or elastic net?

A

They also perform automatic variable selection

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is
popular if have small number of variables but some of them are highly correlated to get a more stable result/model?

A

Ridge

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What can be easily extended from linear regression for continuous outcomes to a larger class of models, including:
Logistic, Multinomial, Ordinal, Poisson and Cox regressions?

A

Regularized Logistic Regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Regularized Logistic Regression is usually referred to as what?

A

Generalized Linear Models

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does Logistic regression allow modelling of?

A

Relationship between binary outcomes, such as recurrence of psychoses (yes/no), as a non-linear function of our predictor variables.

Other models can be fitted in a similar way, for example time to event data using Cox-regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are problems with linear regression?

A

Does not make sense for one binary outcomes

We rather assume a nonlinear relationship

Output variable is limited to [0,1], but with linear regression some of our observations are outside this range

Our goal is to separate best the two groups not to minimize MSE. Linear regression would be highly sensitive to influential cases

Assumptions of linear regression are violated (esp. homogeneity of variances) and hence inference is not valid

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is an s-shaped/ sigmoid relationship?

A

A non-linear relationship

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How can a sigmoid relationship be modelled by the logistic function?

A

f(x)= 𝒆𝒂+𝒙)/(𝒆𝒂+𝒙+𝟏)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How is Logistic function flexible?

A

If vary alpha will have different logistic functions

If vary x can also obtain very different functions - If change x/Beta e.g., our age relationship can vary

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

The simple linear regression describes the linear relationship between the outcome y and the predictor variable x.
𝜖 describes the random component (error) and is assumed to be normal distributed.

In a logistic regression we….

A

Relate x and y by way of a function, known as link function g():
g(𝑦)=𝛽0+𝛽1 𝑥1+𝜖

so that we can model a linear relationship between the left and right hand side of the equation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

We use the so-called logistic function to produce our models output:

P(Y=1|X)= 𝑒𝛽0+𝛽1 𝑥1 /𝑒𝛽0+𝛽1 𝑥1 +1)

Here, we are modelling the probability that our outcome belongs to class 1 given the input feature variable 𝑋1.

  1. What is the underlying probability distribution of this logistic model?
  2. How can we rearrange this function? and what is the term on the left side known as?
A
  1. The Bernoulli distribution.
  2. 𝑙𝑛⌊(𝑃(𝑌=1|𝑋)/1−(𝑃(𝑌=1|𝑋)⌋=𝛽0+𝛽1 𝑥1

Term on the left side logit-link function!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

We use the so-called logistic function to produce our models output:

P(Y=1|X)= 𝑒𝛽0+𝛽1 𝑥1 /𝑒𝛽0+𝛽1 𝑥1 +1)

Here, we are modelling the probability that our outcome belongs to class 1 given the input feature variable 𝑋1.

  1. What is the underlying probability distribution of this logistic model?
  2. How can we rearrange this function? and what is the term on the left side known as?
A
  1. The Bernoulli distribution.
  2. 𝑙𝑛⌊(𝑃(𝑌=1|𝑋)/1−(𝑃(𝑌=1|𝑋)⌋=𝛽0+𝛽1 𝑥1

Term on the left side logit-link function!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

In linear regression we estimated the coefficients 𝛽𝑖 by minimizing the sum of the squared error terms.

How can we estimate parameters in logistic regression?

A

Maximizing the likelihood of the data!

Likelihood of an observation is the probability of seeing that observation under a particular model

In logistic regression, the likelihood of seeing an observation for class 1 is P(Y|X), for seeing an observation in class 0 is 1-P(Y|X).

17
Q

What does the Maximum Likelihood estimation choose?

A

Parameter that maximizes the likelihood of the model given the data.

It is computational easier to work with the natural log of likelihoods (Log Likelihood, LL) rather than the likelihoods themselves.