logical regression Flashcards
(96 cards)
what makes logical regression different to all the other types of regression: linear regression, multiple linear regression, non linear regression
For the others
They model ratio/scale data. DV must be ratio/scale
necessary because we use the sum of squared residual as a means to fit the model - using a parametric approach
logical regression
if DV has a limited range - e.g., either 1 or 0, or between 0 and 100
- could be like marks in a test
- accuracy scores
- etc
what is the typical linear regression equation
here we assume a linear relationship between the iV and DV
why can we not use linear regression if our IV is limitted in range e.g., pass (1) or fail (0)
because certain values will indicate 0 (e.g., 40%) and other values might indicate 1 (e.g., 90%)
Problem: values below 40, model will predict values lower than 1, and value above 90 predicts values higher than 1. Also, anything in between will equate to something between 0 and 1.
this doesn’t make sense
- cant have values between 0 and 1 - want to predict ONLY 0 and ONLY 1
- and cant have values exceeding 1 / less than 0 0 but the regression equation predicts values outside the range of 0 -to-1
serious problem. This is because it creates really large residuals. this will distort or bias our regression fit.
Residuals will violate the assumptions of homoscedasticity because the data range is limited to 0 and 1.
what is logistic regression
a special case of non linear regression.
why is logistic regression a special case of non linear regression
because it deals with this limitation in range
different types of logistic regression
Logistic regression
If you have a limited range in DV e.g., proportion of correct answers on a test. this gives continuous prediction
Binary logistic regression
type of logistic regression where the DV is the binary e.g., 0 or 1. this just ensures we get a binary outcome of either 0 or 1.
both cases deal with this limitation in range of the DV
if i asked n to a 7 point likert scale and then average the scores would I use logical regression because technically there is a limited range of answers
No, because while the scale is limited in range you are analysing the average, which, according to central limit theorem is normally distributed.
whats the big problem of using linear regression with data limitted in range?
the linear equation will fit the 0/1 values at certain points but everywhere else the residual is large! big problem is that it will predict values larger than 1 and smaller than 0.
we have a real problem with the residuals. and whenever we fit linear regression models. the residuals are what we use to do the fitting
will bias any result we get - will be a problem
cant we just fit a non-linear curve to the binary/limited range DV
nicely levels off at 0 and at 1
Let’s say we invent and fit a logistic curve to the binary data - it seems to do quite ok. Can we be satisfied with this
no, while it fits ok we want to find the best fitting logistic curve. that’s what logistic regression does.
the best fitting curve that has an S shape
What is the equation for the non-linear curve we fit in logistic regression
what is e
its a constant called Eulers number
what is the OLS regression equation
what is the rationale for using the logistic regression equaiton
- deals with the limitation of range - e.g., 0 to 100
- functional form is very flexible - fits a wide range of data
- there are analytical solutions for it - looking up eulers number. to the power of X
- easier to compute than non linear regression problems
just link in linear regression the form of the equation we are fitting is _____?
fixed
thus when fitting the model we are just finding the best fitting numerical values for r the coefficients in the equation (c and b)
what is logistic regression doing?
modelling/predicting data between 0 and 1
mathematically, is a prediction and what do we call it in statistics?
mathematically a prediction is the probability that a case has a value of 0 or 1
how do we get the probability/ prediction
what can we use the probability (prediction) to compute?
the odds
the equation: the probability of an event happening divided by the probability of it not
essentially the odds are Euler’s number raised to the power of our best-fitting coefficients. so if you know the logistic regression equation you can compute probability directly but can also compute the odds
what do we use to measure effect size in logistic regression
the odds ratio
how do we compute the log odds
the natural logarithm of the odds
basically taking the inverse of raising something to the power
what kind of relationship does the logistic regression have with the log odds
any logistic regression is linear with respect to the log odds (just like with OLS regression)
so by taking the natural logarithm of the odds you are creating a new unit (or DV if you will) that is now linear in terms of the independent variable X
so the log odds vary from negative infinity to infinity as the log odds move from 0 to 1
what is another word for log odds
logit
logit regression equation
Logit regression equation (c + bX)
- So result of this Is your logit
- Or logistic probability unit
here the logit to have a yes vs no answer is - 2113.056
so we have the normal regression equation with the constant and coefficient. to get the logit we just pluck X (-6 in this particular case) in