Research methods and statistics 3 (year two) Flashcards
Explain and define P-Hacking
Method of manipulating data to get significant results
Multiple analyses
Omitting info
Controlling for variables
Analyse partway through then collect more data
Changing DV
Explain how outliers can be an issue
Outliers in small sample sizes can be the difference between a significant and non-significant result
Non-parametric correlations can combat this (e.g original test is pearson correlation, NP is spearman’s rho)
Define regression and what it tells us
A test of if two or more variables can predict variance in an outcome variable
E.g clinical psychologist may want to know what variables are associated with psychosis symptoms
Tells us:
If model is a good fit
If there are significant relationships between a predictor variable and an outcome variable
The direction of the relationships
Can then make predictions beyond our data
Predicts a line of best fit for association between variables
Give the linear regression equation
Yi= (B0+B1Xi) + ei
Yi = outcome/variable you’re predicting
B0 = intercept, constant – mean value of outcome variable if the predictor in model is 0. Positions line at intercept
B1Xi = predictor variable, tells you the shape of line of best fit (also called parameter estimate)
Ei = error term, amount of variance left over in model
Define beta slope
Slope aka beta: number of units change in the dependent variable for every 1 unit change in the IV
Give the assumptions for regression
Normally distributed continuous outcome
Independent data
Ratio/interval predictors
Nominal predictors with two categories (dichotomous)
No multicollinearity for multiple regression
Careful of influencing cases
Give the parameters needed to work out how well the regression model fits the data
To work out how well the model fits the data we need to know:
Sum of squares total (SST)
Used to generate test statistic – ideally as high as possible
Proportion of improvement due to model
Sum of squares residual (SSR)
Sum of squares model (SSM)
SST uses difference between observed data and mean value of outcome
SSR uses difference between observed data and regression line
SSM uses difference between the mean value of Y and the regression line
Give the equation and components for generating a regression test statistic
Test statistic tells us the ratio of explained vs unexplained variable in the outcome
F test (Model fit) = MSm
Msr
MSm = means of the squares of the model
MSr = means of the squares of the residual
F test tells us if it is a good fit of the data – are we explaining variance?
Define proportion of total variation and give the equation
Proportion of total variation (SST) that is explained by regression (SSR) is known as the coefficient of determination and referred to as R^2
R^2 = SSR
SST
R2 can vary between 0 and 1 and often expressed as %
R2 is not that useful if you have more than one predictor variable – more than one = r2 adjusted
Adjusted r2 = how effective the model is
Explain when multiple regression is needed
Two or more variables to predict our outcome
To improve explanatory potential – examine which predictors are statistically significant
Give the equation for multiple regression
Yi= (B0+B1X1i+B2X2i) + ei
Explain the spss output for simple regression
Variables entered/removed allows you to double check the info you put in
Model summary: gives R2 statistic – always report adjusted R square
ANOVA: tells us about our model fit (is model a better fit than just using the mean) – F-test
Coefficients: tells us about the individual predictors in our model – whether they are significant and their direction (unstandardized coefficients)
Give an example APA style writeup for simple regression
A simple regression was carried out to investigate the relationship ——- and ——. The regression model was significant and predicted approximately % of variance (adjusted R2 = .-;F(X,Y) = -, P=-). ——– was a significant/insignificant predictor of ——– (b=.-(s.e=.-); % - to -; t=- p=-)
Define multicollinearity
Multicollinearity: occurs when independent variables in a regression model are highly correlated
If two/more predictor variables in model are highly correlated with each other they do not provide unique/independent info to the model
Can adversely affect regression estimates
Large amounts of variance explained but no significant predictors
Explain how to identify multicollinearity
Identifying multi-collinearity
Look for high correlations between variables in a correlation matrix ( r>.8)
R = 1 is perfect MC – data issue
Tolerance statistic
Percentage of variance in IV not accounted for by other IVs
1 – R2
High tolerance = low multicollinearity
Low tolerance = high multicollinearity
Variance inflation factor
1/tolerance
Indicates how much the standard error will be inflated by
Give ways of fixing multi-collinearity issues
Fixing multi-collinearity issues
Increase sample sixe
Remove redundant variable
If two or more variables are important, create a variable that takes both of them into account
Give an example AQA style writeup for multiple regressions
A multiple regression was conducted to investigate the roles of –, – and – on —. The regression model was — and predicted -% of variance (adjusted R2=-:F(-,-) = -, P=-). Variance inflation factors suggest multicollinearity was not/ was a concern (–=-, – =, –=-). – was a significant/non-significant predictor of — (b=- (s.e = -); % Ci – to -; t=-, p=-) and – was not a significant/was significant predictor (b=- (s.e = -); % Ci – to -; t=-, p=-)
- Explain what a mediator is
- Links two variables
- Mediator is a variable that is affected by the IV, mediator influences DF
- Effect of the IV on the DV (IV-DV) is partially dependent on the mediator (IV-M-DV)
- IV-DV = direct effect ( c )
- IV-M = a-path
- M-DV = b-path
- Full mediation: inclusion of mediator renders direct IV-DV effect non-significant
- Partial mediation: inclusion of mediator renders direct IV-DV effect less significant
- Explain the difference between mediation and moderation
Mediation: a variable that accounts for an association between a predictor and a DV
- Moderator: affects the strength of a relationship between a predictor/DV
o Moderator does not have to be associated with the IV or DV
- Mediator MUST be something that can change, e.g age cannot be a mediator, craving can
o IV has to influence mediator, nothing can influence age
- Give some issues of the causal steps approach
- Has little or no sensitivity at all (needs huge sample)
- Mathematically incorrect (IV DV association not necessary)
- Is unable to detect suppression effects
- Explain why the Sobel test is a bad solution
Gives a p value for the indirect effect
- Based upon a product of the coefficients calculation
- Assumes the product of the coefficient is normally distributed – this is almost never the case
- This method also requires more participants to detect indirect effects than the methods used today
- Explain why the joint significance test is a good solution to mediation
This method ignores the IV-DV association (doesn’t have to be significant)
o If a path and b path are significant there is evidence of mediation
- Also give confidence intervals for the for indirect effect
- Explain how to run an SPSS analysis for mediation
example: IV = personality disorder, M= enhancement, DV = alcohol units consumed
- Firstly we produce two regressions
1. IV-M Personality disorder to enhancement
2. M-DV enhancement (+personality disorder) to alcohol units consumed
- We control for personality disorder in the second regression so we can be sure that the mediator is predicting variance beyond that accounted by the IV
- Analyse – regression – linear
- Regression 1: IV to M
- To do our additional test for mediation we need to take the unstandardized regression coefficient and its standard error and use it in Remediation (we can use three dp when using the program)
- Then run the second regression M-DV but controlling for PDQ-4 (personality disorder)
o Enhancement (and PDQ-4) to Units consumed
o If enhancement is significant in this regression then there is evidence of mediation
- Explain how to writeup mediation in APA format
The first regression IV-M
o The regression is significant (R² adjusted = 0.15, F(1,225)=41.32, p