Linear and logistic regression Flashcards
What is regression
It is a statistical method for estimating the numerical relationship between one dependent variable and one or more independent variables
It can be r used to address a wide variety of research questions involving a dependency relationship between one or more variable
What are dependent or independent variables
Variables
Independent (predictors/ control)
Dependent (outcomes/ measure)
Regression models
The type of model is determined by the dependent variable and the research question
Most common types
- linear regression
- logistics regression
Others
- Cox regression
- ordered logistic regression
- multinominal regression
- poison regression
Data types
Can be broadly categories into two groups
- categorical (qualitative)
- nominal
- binary
- ordinal - Numeric
- discrete
- continuous
What is linear regression
It summarises the observed data by using a line equation that best fit the data to describe the dependency relationship between the dependent (y) and independent variable (x)
Interpretation
- when linear regression models are presented in publications, the statistic usually quoted is the beta coefficient
The beta coefficient measures the increase/ decrease in the dependent variable for each unit increase in the independent variable.
Assumptions of linear regression
- independence (the observations are independent)
- linearity (the relationship between (y) and (x) is linear)
- normality (the residuals are normally distributed)
- homoscedasticity (the residuals have constant variance)
What are residuals
They are the difference between the predicated value obtained from the model and observed value of the dependent variable.
A good model will have small residuals
Multi variable regression
Sometimes we are interested in including more than one predictors variable in the same regression model.
This is usually done because we want to control for the fact that sometimes a variable it’s related to both the exposure and the outcomes of interest (‘cofounder’)
What is logistic regression
Logistic regression is used when the outcome is binary (I.e disease/ disease free)
^ we want our regression model to estimate the probability p of the outcomes occurring (I.e the probability of having disease X)
Odds ratio is usually reported instead of beta coefficient