Lecture 10: Logistic regression Flashcards

1
Q

What is a contingency table?

A

Assess the frequency distribution of each of two categorical variables as well as the association between two categorical variables

  • To form one in SPSS use crosstabs
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What does expected frequencies table represent?

A

If null hypothesis was true what would the proportions be

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the risk of an outcome?

A

The risk of an outcome is the number of times the outcome of interest occurred / the total number of possible outcomes (did & didn’t)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How is a risk ratio calculated?

A

Calculate the risk of having the outcome in both groups (with exposure / without exposure)

Risk of outcome with exposure
= number of times outcome occurred with exposure / total number of times of exposure (with & without outcome)

Risk of outcome without exposure
= number of times outcome occurred without exposure / total number of times without exposure (with & without outcome)

To calculate risk ratio
Risk with outcome / risk without outcome

  • if > 1 there is more of a risk of the outcome occurring with the exposure
  • risk of 1 there is no difference between groups
  • risk < 1 is risk is less with exposure
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How does odds calculation differ to risk?

A

Rather than dividing by total number of events - the number of outcomes is / number of times without outcome

Odds ratio calculated by dividing odds in exposed group / odds in non-exposed group

OR = 1 - mean outcome occurs half time and is not related to exposure

OR > 1 - outcomes occurs more than half time with exposure - is related

OR < 1 - outcome occurs less than half time - exposure doesn’t associate to increase risk of disease

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

When can risk and odds ratios be used?

A

Risk ratio - cannot be used in a case control study - i.e. when participants are selected already for the outcome of interest. Only odds ratio used here

Rare outcomes - both risk/odds ratios can be used

Odd’s ratios can be used in many study designs and forms the basis for logistic regression
Risk ratios often preferred for clinical practice

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Why can’t simple linear regression be used for a binary outcome?

A

Linear regression assumes the population distribution is normally distributed around the mean (for each value of X) - not going to be the case is here is a binary response

Linear relationship doesn’t make sense for a binary outcome

  • Output variable is limited to 0,1 - some of our observations would be outside this range
  • Our goal is to separate the two best groups rather than minimise the least square error
  • If linear regression was used would be very sensitive to influential outliers
  • Homogeneity of variance would be violated

Sigmoid - S - shape used instead

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What function is used in logistic regression?

A

g(y) link function - gives ability to model the distribution between left and right side of function

n = alpha + betaX

log (n / 1 - n) = alpha + betaX

The logistic function allows a linear relationship to be plotted

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

If beta increases in a simple logistic regression how will the log of the odds increase?

A

The log of the odds will increase and the steepness of the curve will increase

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is used to estimate the beta coefficient and the constant in logistic regression?

A

Maximum likelihood - an iterative process - many models are tried until the best fit is found

Find the coefficient value which makes the observed data most likely

(in SLR ordinary least squares is used)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How do we interpret the coefficient in logistic regression?

A

Increase in X by one unit will affect log(odds) by the value of the coefficient

To work out the odds change by one unit increase in X - use antilog function

anti-log (eX) = e (to the power of beat coefficient value) = odds ratio for the value of concentration occuring

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the assumptions of binary logistic regression?

A

Binary dependent variable which has a Bernouli distribution

The binary variable is only linearly related to the predictor variables after transforming into the logit scale

The observations are independent

Continuous variables have a linear effect on the log-odds scale

Use for binary dependent variable with continuous predictors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the probability of an event in logistic regression?

A

p = exp (L) / 1 + exp (L)
= 1 / 1 + exp(-L)

Odds of an event p / 1-p

Logit - Model: ln (p/1-p) = alpha + betaX

L = alpha + BetaX is the linear predictor

exp (L) = eL is the odds of an event

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How is binary logistic regression ran in SPSS?

A

Select regression - binary logistic

Choose variables - for any categorical - change reference category from last to first - select categorical - first - change

Choose the confience ratios for exponential beta

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What does the omnibus test of model coefficients explain?

A

The omnibus test of model explains whether the inclusion of a block of variables contributes to a better model fit

The coefficient of determination (R2) gives an indication of how much variation in y is explained by the model

Nagelkerke R2 - is used

The classification table indicates how does the correct classification improve when the predictors are included in the model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the regression equation from the output

A

ln (p / 1 - p) = constant + BetaX

17
Q

Can a binary logistic regression involve more than one predictor?

A

Yes these an be numerical or categorical

More than 1 level categorical variables need to be recoded with dummy variables

Interaction terms should be considered for each dummy variable

18
Q

How can a prediction be estimated using the linear predictor?

A

linear predictor (L) = constant + betacoefficient

Probability = exp (L) / 1 + exp (L)

The probability of Y if X = this is,

19
Q

What is classification?

A

Establishes a certain probability value which is used to indicate a positive and one which is used to indicate a negative

The true values are then cross tabulated

20
Q

How can the classification table help

A

Percentage accuracy in classification:

Sensitivity: percentage of cases with outcome that were correctly predicted by the model - true positives

Specificity - percentage of cases without observed characteristic that were correctly predicted - true negatives

Positive predictive value - percentage of correctly predicted cases with observed characteristic compared to total number of cases predicted as having characteristics

Negative predictive value - percentage of correctly predicted cases without observed characteristic compared to total number of cases predicted as not having the characteristics

21
Q

How do we interpret the hosmer and lemeshow goodness of fit?

A
  • Non-significant indicates good fit
  • Large chi square value with small p value indicates poor fit
  • Whereas a small chi square with a large p value (close to 1) indicates good logistic regression model fit