Pauls Correlation And Regression Shite Flashcards
Correlation background
0.1 small, 0.3 med, 0.5 large
P lower than alpha means correlation sig diff from 0, not a sig diff between variables
Partial correlations/ 3rd variables
3rd variable is confounding e.g. suspect correlation between 2 variables is due to the third
Partial correlations control for 3rd and look at relationship between first two to see if still there
Writing out r small 12.3= correlation between 1 and 2 controlling for 3= 0.02, p bigger than 0.05. Correlation is 0.02 when control variable 3 so non significant
Reliability
Test retest: test once then later
Split half: splits questionnaire into two halves, calc scores and correlates them
Cronbachs alpha: measures internal reliability using individual items- from 0/no reliability to 1/complete reliability- correlation between all items e.g. are all items testing the same thing
Corn Bach’s alpha
Scale is reliable if a is bigger than 0.7. More questions means bigger a. Test only measures 1 thing so keep subscales separate, all scores in same direction, a changes across populations as only represents scores. Weak reliability means lower correlation/attenuation) due to more random variation in scores so score (a) is true score -error (random variation)
A in spss
Gives you a value under reliability stats table
Item total stats table used to see trouble items, if removed, increase alpha
Regression background
Predicts outcome (criterion) from predictor, relationship asymmetrical. Need to use a model e.g. line of best fit. Use y=mx+c m is gradient, x is score and c is intercept
In a linear relationship/diagonal up. C is o as line doesn’t go through y axis, m is 0.5 as for every unit of x, y goes up 0.5, y is 1
Ordinary least squares
Method you can draw line through scores-minimises distance between predicted (Line) and actual score (dots)
Slope called b, intercept is b to lowercase 0, equation is y=b lowercase 0+b1(X)
If b is 0.469, for every unit of x, y goes up 0.469 plus intercept. E.g. for 7 on x, y is 3.75+0.469 times 7
Slope and intercept in SPSS
coefficients table in SPSS: intercept is constant/B0, slope is unstandardised B/B1
Y=B0+B1X
Residuals
Gap between actual data to slope (predicted). Find by taking away actual from predicted for each y/ p score, square each then add up column to get sum of squares of residuals (SSr) variance of unexplained data
Sum of squares of total and explain variance
take each score away from mean, square and and up- this is the variation within the data set/ sum of squares of total. Explain variance is SS total - SS residual. Bigger means better
R squared
Explain variance c affected by sample size and can’t compare between studies. R squared shows proportion of variance predicted by model. Does SS regression divided by SS of total, gives number between 0-1, bigger is better. 0.8 ,means 80% variance explained by model
F ratio
See if model predicts a sig amount of variation, ratio between predicted variance and not predicted e.g. residuals. If high, means effect strong. SS over DF, also has p value for sig. T score shows whether B is sig diff to 0. Also : MS reg over MS res
Assumptions
Outcome must be continuous but predictors can be either, predictors must have non 0 variance, all values of outcome should be independent (from diff people). Relationship should be linear e.g. diagonal line. Homoscedasticity: relationship between residuals and predictors must be normally distributed/ variance of error term is constant in predictors?. Residuals must be normally distributed, normality of errors using PP plot
Checking for bias/outliers
Check for high residuals in spss, shows outliers only 5% should be over 2sd
Cooks distance: measure influence of each case on the model, if above 1 then it is having an undue influence of model/ spss gives max cooks distance
Multiple regression background
Has multiple predictors and how they affect one outcome. Either forced entry or hierarchical regression. Finding explained variance and how each predictor contributes e.g. does personality predict (r square) and how much does each of OCEAN predict (F)
Adjusted R square
R square tells us estimate of sample but adjusted shows for whole population/allows for overestimation. Bigger the sample, less need for adjustment. Should report both
Output for forced entry
Model summary has R values, meaning x % of varaince is explained. Anova has F, p and df is no. Predictors. Sig r square means model accounts for sig amount of variance and explained:unexplained is high. E.g. sig r as a whole predicts outcome. Get indiv b values for each predictor (ind contribution)
Equation
B0 is still intercept, more than one predictors e.g. x1, x2, x3 so it is Y=b0+b1(x1)+b2(x2)+b3(x3)+bn(xn)
Bn is regression coefficient for the nth variable
Y is outcome
Beta weights
Normal bs affected by score so can’t compare across diff measures. E.g. b1 is 0.594 as predictor increases by 1sd so outcome increases by 0.594 of a SD. Also get T value to say is variance explained by each b is sig e.g. b is amount controlled and sig is whether they impact model or not. Can have - relationships. Regression more inprntsnt than correlation
Dummy variables
Use this to code categorical data using 1 and 0 e.g. gender. If beta +, 1 is higher than 0 but if b is - than 0 higher than 1. If males 1, +b means males score higher. Look at p to tell sig.
Assumptions
All the same as simple regression but also have multicolliniarity bias: predictors can’t be highly correlated (as affects r square). Check in coefficients table, VIF is measure of each relationship, want this to be low /not close to 10. Tolerance is 1/VIF, needs above 0.2 but less important
Violations
Robust regression is regression that allows you to ignore assumption of normally distributed residuals
Bootstrapping is using sample to build up other samples/resampling to get over normal distribution- dont need to know this?
Will be in a bootstrapping table
Hierarchical regression
This is for looking at predictors, whilst controlling for another variable. Gives 2 models, extra on its own then W other predictors, need to tell if one better than the other. F change column: if change is big and sig, more variance explained in second. Anova table doesn’t compare but just gives f for each model. Adding another varibale affects the others in the coefficients table
When to use the standardised vs unstandardised B columns
Unstandardised is where you look for calculating the change in original values based on change in DV. Use the og formula, constant row is b0 the next row down is b1. The standardised column is when working in standardised units and you want to see which one has the strongest influence