lecture 21 + 22 (stats) Flashcards
(25 cards)
covariance?
- deviations of the two variables multiplied with each other, divided by N-1 to get the average of the combined deviation
- dependent on scales of measurement (aka affected by the standard deviation of either variable)
- formula in notebook
correlation coefficient?
- Pearson’s correlation coefficient (r)
- standardized measure of the covariance (aka NOT affected by the standard deviation of either variable)
- quantifies association between two continous variables
- it can be positive negative or 0 (which means that there is no correlation)
- ranges between -1 and 1
- formula and plot in the notebook
calculating r?
- standardize the data to set scores (Z): this means that both variables will have the same average (0) and the same variability
- calculate covariance and correlation
meaning of Z scores?
- if Z score is 2 it means that the observation is 2 standard deviations greater than the mean
- if z score is -2 it means that the observation is 2 standard deviations below the mean
significance of correlation?
- transform r into t and put it into t-distribution to get a p value
- formula in notebook
semi partial correlation?
examine the unique overlap in variance between variables X and Y while ignoring the shared overlap between X and Z but not ignoring the shared overlap between Y and Z
Partial correlation?
- how can we quantify an association between two variables while taking into account another variable
- the higher the correlation between X and Y the more shared variance they’ll have, the more predicted power X has for Y
- partial correlation can increase compared t the regular correlation
- Z as a cofounding variable if high correlation w/ both X and Y -> means that in this case when controlling for Z, the partial correlation would be lower than regular correlation
- also turn partial r into t to access its significance (df= N-3)
- check notebook for venn diagram and formula
correlation in Jasp?
- regression -> correlation
- put both dependent and independent variable in variables table
- if you want to do partial correlation put variable you want to control for in partial out table
- for easier analysis request scatterplot and put comparisons pairwise
regression formula?
Y = b0 + b1* X + e
- Y: is the outcome variable
- b0: the intercept (baseline that we are predicting with)
- b1: the regression coefficcient for our single predictor variable (X)
- b1 quantifies how strong of an association there is between your predictor and outcome variable
- positive relationship if b1 is positive and vice versa (if b1=0 there is no association)
- beta coefficients are influences by measurement scales
- e (residuals): the distances between the expected values and the observed data points
assumptions?
- sensitivity
- homoscedasticity
- normality
- additivity and linearity (checked through matrix scatterplot with predictors and outcome variable)
sensitivity?
- outliers: extreme residuals
- Cook’s distance: how influential is this observation (values larger than 1 are potentially problematic)
- you can also check Q-Q, residuals plots, casewise diagnostics
check outliers in Jasp?
- casewise diagnostics: usually st. residual > 3, which if it happens it means we were very mistaken about our prediction
- check how influential outlier is in Cook’s distance>1 tab
- both are in the statistics tab
homoscedasticity?
- modal errors are equal for every value of the predictor variable
- if you don’t add an intercept to your model you introduce systematicness to your errors
- Look at scatterplot of standardized predicted values vs residuals (roughly round shape is needed)
- we don’t want any systematicness/strong correlation between prediction value and error -> messes with estimates and Type I error rate
- checked after the analysis is complete because it’s based on the residuals
check homoscedasticity in Jasp?
plots tab -> residuals plots -> residuals vs predicted
check normality Jasp?
QQ plot of residuals
calculate regression parameters?
- calculate b1 and b0 (formulas in notebook)
- the slope is determined by b1
- with the slope we can determine how many units our outcome variable increases if we increase our predictor variable -> that value is b1
error/residuals?
- the difference between the model predictions (^y) and observed values (y)
- the better the correlation between observed and predicted values the better our model did
multiple regression formula?
Y = b0 + b1X1 + b2X2 + … + bn*Xn + e
added assumption for multiple regression?
- multicollinearity: how associated are the predictor variables (we don’t want high associations)
- you can still run a regression if you have multicollinearity but if predictor variables are very dependent of eachother the statistical method cannot really distinguish between them and their effect
how to assess multicollinearity in Jasp?
- correlations between predictor variables
- in statistics tab in linear regression -> Tolerance and VIF (VIF: max < 10, mean < 1; Tolerance > 0.2 is good; however we want VIF and tolerance to be as close to 1 as possible)
model fit?
- The fit of the model can be viewed in terms of the correlation between the predictions and the observed values: if the predictions are perfect, the correlation will be 1
- if we have a single predictor variable then correlation between prediction and observation equals the correlation between predictor variable and outcome variable
explained variance?
- Squaring correlation (R^2)
- one of the central metrics to assess model fits in regression
- R^2 can also be calculated through:
1. SS_M/SS_T which is the proportion of improvement with the predictive model compared to the baseline model
2. 1 - (SS_R/SS_T), with SS_R being the sum of the least squares (given by regression formula slope)
test model fit?
can be done through different ways:
- F statistic (check notebook for formula)
- t statistic (check notebook for formula)
- attention: signal to noise ration (MS_model/MS_error) = F statistic = t statistic squared
- they are all methods to test the model fit and they will all give the same value
regression in jasp?
- regression -> classical linear regression
- put outcome variable in dependent variable tab and predictor variable in covariate tab
- Adjusted R^2: penalization due to the complexity of the model
- you also are gonna get an ANOVA table were the SS_M is regression SS_E is residuals and SS_T is total in the sum of squares column -> tells you how much better your alternative model did compared to the null
- coefficients table (intercept is b0 and adverts is b1): look at standardized values -> allows us to say something about the individual performance of our predictors