Simple linear regression Flashcards
What is the equation for simple linear regression?
Y = b0 + b1X - e
What is b0?
The intercept
- the point at which the regression line crosses the Y axis
- The value of Yi when X = 0
(labelled as the constant in SPSS)
What is b1?
The slope/gradient
- a measure of how much Y changes as X changes
- regardless of sign (pos/neg), the larger the value of b1, the steeper the slope
What is e?
Residual/prediction error
- difference between observed value of outcome variable and what the model predicts (e=Yobs - Ypred)
- represents how wrong we are in making the prediction for the particular case
What is the equation for Ypred?
Ypred = b0 + b1X
What is a regression line?
Line of best fit - line that best represents the data and minimises residuals
What is a prediction?
Best guess at Y given X
X doesn’t have to cause Y or come before Y in time
What values show how well the model fits the observed data? (goodness of fit)
R2
F-ratio
What does the model refer to?
The regression line
What values show how the variables relate to each other?
The Intercept
Beta values (slope)
What is residual sum of squares? (SSR)
Square residuals and then add them up - a gauge of how well the model (line) fits the data: The smaller SSR, the better the fit
- can also be error variance - hoe much error there is in the model
(Residual/error variance)
What is the equation for total sum of squares (SST)?
SSTotal = SSModel + SSResidual
What is the model sum of squares (SSM)?
Sum of squared differences between Ypred and sample mean - represents improvement from baseline model to regression model
(Model variance)
In any regression model, what is the overall variation of the outcome variable (Y) due to?
- Model/regression - how much variance in the observed Y the predicted values explain. This variance would be measured by the deviations of the predicted values from the sample mean, Y̅.
- Error/residual - how much variance is left over in observed Y after we accounted for the predicted values - measured by deviations of observed values from predicted values
What is the Total sum of squares (SST)?
Total variance in outcome variable - partitioned into model variance and residual/error variance
What is the equation for R2?
R2 = SSM/SST
Variance in outcome explained by model / total variance in outcome variable to be explained
What is R2?
- provides proportion of variance accounted for by model
- Value ranges between 0-1 (the higher the value, the better the model)
- interpreted as a percentage eg. R2=.69 - x 100 - 69% of variance in outcome variable is explained by the model
What is the equation for the F ratio?
F = MSM / MSR
Model mean squares / residual or error mean squares
What is the equation for model mean squares (MSM)?
MSM = SSM / dfM
What is the equation for residual/error mean squares?
MSR = SSR / dfR
What is the F ratio?
The ratio of explained variance to unexplained variance (error) in the model
- MSM should be larger than MSR (F-statistic greater than 1)
- also called ANOVA - comparing ratio of systematic variance to unsystematic variance
What is dfM?
K
Number of predictors
What is the equation for dfR?
dfR = N-k-1 (N minus number of coefficients)
What are the 2 ways the hypothesis (overall test) in regression can be phrased?
Can the scores on Y be predicted based on the scores on X and the regression line?
- Null hyp: Predicted values of Y are the same regardless of the value of X (or simply, there is no relationship between Y and X).
Does the model (Ypred) explain significant amount of variance in outcome variable (Yobs)?
- Null hyp: Populaion R2=0
- Ratio of model variance to error variance tested using F-test (ANOVA)
OR:
H1: The regression line is a significantly better model than the flat model
H0: The flat model