9: REGRESSION Flashcards

1
Q

linear regression

A

used when the relationship between variables x and y can be described with a straight line

correlation determines the strength of a relationship between x and y (doesn’t tell us how much y changes based on a given change in x)

regression allows us to estimate how much y will change as a result of a given change in x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

terminology (regression): variables x and y

A

regression distinguishes between the variable being predicted and the variable(s) used to predict (in simple linear regression, only one predictor variable)

predicted variable : y
- the outcome variable
- the dependent variable
- the criterion variable

predictor variable : x
- the predictor variable
- the independent variable
- the explanatory variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

uses of linear regression (interpretation)

A

researchers might use regression to
- investigate the strength of the effect x has on y
- estimate how much y will change as a result of a given change in x
- predict a future value of y, based on a known value of x

unlike correlation, regression makes the assumption that y is (to some extent) dependent on x
- this may not reflect causal dependency

regression does NOT provide direct evidence of causality

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

stages of linear regression

A
  1. analysing the relationship between variables
    - determining the strength and direction of relationship (correlation)
  2. proposing a model to explain that relationship
    - a line of best fit
    - find a (the intercept), b (the gradient)
  3. evaluating the model
    - assessing the goodness of fit
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

linear regression: 3 evaluating the model: goodness of fit

A

simplest model:
- no relationship between x and y (b = 0)

best model:
- based on relationship between x and y
- the regression line

is our regression model better at predicting y than the simplest model?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

linear regression: calculating goodness of fit

A
  • SSt: the difference between the observed values of y and the mean of y (i.e. the variance in y not explained by the simplest model, b = 0)
  • SSr: the difference between the observed values of y and those predicted by the regression line (i.e. the variance in y not explained by the regression model)
  • the difference between SSt and SSr reflects the improvement in prediction using the regression model compared to the simplest model (i.e. the reduction in unexplained variance using the regression model compared to the simplest model) - SSt - SSr = SSm
  • the larger SSm, the bigger the improvement in prediction using the regression model over the simplest model
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

linear regression: assessing the goodness of fit

A
  • we can use an F-test (i.e. ANOVA) to evaluate the improvement due to the model (SSm) relative to variance the model does not explain (SSr)
  • rather than using the Sums of Squares (SS) values, the F-test uses Mean Squares (MS) values - takes the degrees of freedom into account
  • F ratio provides a measure of how much the model has improved the prediction of y, relative to the level of inaccuracy of the model

F = MSm / MSr

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

linear regression: interpreting goodness of fit

A
  • if the regression model is good, MSm will be large, while MSr will be small (i.e. F value further away from 0)
  • null hypothesis: the regression model and the simplest model are equal (MSm = 0)
  • p expresses the probability of finding an improvement of the magnitude we have obtained (or larger), when the null is true
  • a significant result suggests regression model provides a better fit than the simplest model
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

linear regression: assumptions

A
  • linearity: x and y must be linearly related
  • absence of outliers
  • normality of residuals: residuals should be normally distributed around the predicted outcome
  • homoscedasticity: variance of residuals about the outcome should be the same for all predicted scores

Dancey and Reidy state that the outcome variable should be normally distributed, but this is a simplification

no parametric equivalent - can only attempt a ‘fix’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

linear regression: Assumptions: Normal P-P Plot of Regression Standardized Residual

A

ideally, data points will lie in a reasonably straight diagonal line, from bottom left to top right
- would suggest no major deviations from normality

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

linear regression: Assumptions: Scatterplot of Regression Standardized Residual

A

ideally, residuals will be roughly rectangularly distributed, with most scores concentrated in the centre (0)
- dont want to see systematic pattern to residuals

outliers: standardised residuals >3.3 or < -3.3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

linear regression: SPSS coefficients (location)

A

coefficients table
- intercept (a) - B(constant)
- slope (b) - B
variable
- standardised b value - beta*variable

t statistic tests the null that the value of b is 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

estimating variance explained

A
  • R^2: the amount of variance in y explained by model relative to total variance in y (R^2 = SSm/SSt)
  • can express R^2 as a percentage (x100)
  • r^2 expresses the proportion of shared variance between 2 variables

in regression we assume x explains the variance in y
- though r^2 = R^2 if we only have 1 predictor

variance not explained by x = (1-R^2)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

multiple regression

A
  • allows us to assess the influence of several predictor variables (x1,x2 …) on y
  • we obtain a measure of how much variance in the outcome variabe (y) the predictor variables combined explain (by incorporating a model which incorporates the slopes of each predictor variable)
  • we also obtain measures of how much variance in the outcome variable our predictor variables explain when considered separately
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

multiple regression: assumptions (sample size)

A

sufficient sample size
advice:
- combined effect of several predictors: N > (or equal) 50 + 8M (e.g., for 3 predictors, at least 74 Ps)
- separate effect of several predictors: N > (or equal) 104 + M (e.g. for 3 predictors, at least 107 Ps)

too few participants may result in over optimistic results

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

multiple regression: all assumptions

A
  • sufficient sample size
  • absence of outliers
  • multicollinearity: ideally, predictor variables will be correlated with the outcome variable but not one another
  • normality, linearity and homoscedasticity, independence of residuals
17
Q

multiple regression: assumption: multicollinearity

A
  • Ideally, predictor variables will be correlated with the outcome variable but not with one another
  • Check the correlation matrix before performing the regression analysis
  • Predictor variables which are highly correlated with one another (r = .9 and above) are measuring much the same thing
  • it may be appropriate to combine the correlated predictor variables, or to remove one
18
Q

multiple regression: Assumptions: Normal P-P Plot of Regression Standardized Residual

A

ideally, data points will lie in a reasonably straight diagonal line, from bottom left to top right
- would suggest no major deviations from normality

19
Q

mulltiple regression: Assumptions: Scatterplot of Regression Standardized Residual

A

ideally, residuals will be roughly rectangularly distributed, with most scores concentrated in the centre (0)
- dont wan’t to see systematic pattern

outliers: standardised residuals: >3.3 or <-3.3