Simple Regression Flashcards
linear regression
used when the relationship between two variables can be described with a straight line
- proposes a model of the relationship
correlation vs regression
- correlation determines strength of relationship between X and y
- regression allows us to estimate how much Y will change as a result of a given change in X
terminology in regression
- regression distinguishes between variable being predicted and variable(s) used to predict
variable being predicted: y
- outcome variable
- DV (only ever one)
- criterion variable
- verticle axis
variable used to predict: x
- predictor variable
- IV(s)
- explanatory variable
- horizontal axis
when might we use regression
- to investigate strength of effect x has on y
- estimate how much y will change as a result of a given change in x
- predict future value of y based on x
what does regression assume + what does it not tell us
- y is dependent (to some extent) on x
- regression doesn’t tell us if this dependency is causal
3 stages of linear regression
- analysing the relationship between variables: strength and direction (correlation)
- proposing a model to explain that relationship: model is a line of best fit
- evaluating the model: assessing goodness of fit
regression line
(step 2)
- line of best fit
- intercept: value of y (on line of best fit) when x is 0
- slope: how much y changes as a result of 1 unit increase in x
evaluating the model; simplest model vs best model
simplest model:
- using average/mean value of y (predictor) to make estimates
- assumes no relationship between x and y
best model:
- based on relationship between x and y
- regression line
sum of squares total
the difference between observed values of y and the mean of y
- variance in y not explained by simplest model
- not required to perform in exam
sum of squares residual
the difference between the observed values of y and those predicted by the regression line
- variance in y not explained by regression model
- not required to perform in exam
difference between SST and SSR
reflects improvement in prediction using the regression model compared to simplest mode
- goodness-of-fit
- sum of squares of the model
- not required to perform in exam
the larger the SSm…
… the bigger the improvement in prediction using the regression model over the simplest model
final thing in goodness-of-fit test
- use ANOVA for F-test to evaluate the improvement due to the model (SSm), relative to the variance the model does not explain (SSr)
- ANOVA uses mean square values instead of SS
- this takes d.f. into account
- provides f-ratio
F-ratio
measure of how much the model has improved the prediction of y, relative to the level of inaccuracy of the model
interpreting F-ratio
- if regression model is good at predicting y (relative to simplest model) the improvement in prediction of the model (MSm) will be larger, while the level of accuracy of the model (MSr) will be small
e.g. F value further from 0
H0 when assessing goodness of fit
regression model and simplest model are equal (in terms of predicting y)
MSm = 0
p < .05 reject H0, regression model is better for the data than simplest model
note of SS
you never need to calculate it by hand
regression equation
y = bx + a
a-intercept
b-slop
y = predicted value of y
linear regression assumptions
- linearity: x and y must be linearly related
- absence of outliers (should be removed)
- normality, linearity and homoscedasticity, independece of residuals
- NO PARAMETRIC EQUIVALENT
homoscedasticity of residuals
variance of residuals about the outcome should be the same for all predicted scores
SPSS output for regression
in model summary
- don’t need this in write-up
ANOVA SPPS output for regression
F = MSm / MSr
if p < .05 it is significant improvement when using regression model vs simplest model