- deviations of the two variables multiplied with each other, divided by N-1 to get the average of the combined deviation - dependent on scales of measurement (aka affected by the standard deviation of either variable) - formula in notebook

1. standardize the data to set scores (Z): this means that both variables will have the same average (0) and the same variability 2. calculate covariance and correlation

- if Z score is 2 it means that the observation is 2 standard deviations greater than the mean - if z score is -2 it means that the observation is 2 standard deviations below the mean

- how can we quantify an association between two variables while taking into account another variable - the higher the correlation between X and Y the more shared variance they'll have, the more predicted power X has for Y - partial correlation can increase compared t the regular correlation - Z as a cofounding variable if high correlation w/ both X and Y -> means that in this case when controlling for Z, the partial correlation would be lower than regular correlation - also turn partial r into t to access its significance (df= N-3) - check notebook for venn diagram and formula

- regression -> correlation - put both dependent and independent variable in variables table - if you want to do partial correlation put variable you want to control for in partial out table - for easier analysis request scatterplot and put comparisons pairwise

Y = b0 + b1 X + e - Y: is the outcome variable - b0: the intercept (baseline that we are predicting with) - b1: the regression coefficcient for our single predictor variable (X) - b1 quantifies how strong of an association there is between your predictor and outcome variable - positive relationship if b1 is positive and vice versa (if b1=0 there is no association) - beta coefficients are influences by measurement scales - e (residuals): the distances between the expected values and the observed data points

- sensitivity - homoscedasticity - normality - additivity and linearity (checked through matrix scatterplot with predictors and outcome variable)

- outliers: extreme residuals - Cook's distance: how influential is this observation (values larger than 1 are potentially problematic) - you can also check Q-Q, residuals plots, casewise diagnostics

- modal errors are equal for every value of the predictor variable - if you don't add an intercept to your model you introduce systematicness to your errors - Look at scatterplot of standardized predicted values vs residuals (roughly round shape is needed) - we don't want any systematicness/strong correlation between prediction value and error -> messes with estimates and Type I error rate - checked after the analysis is complete because it’s based on the residuals

lecture 21 + 22 (stats) Flashcards by Beatriz Alegria

covariance?

deviations of the two variables multiplied with each other, divided by N-1 to get the average of the combined deviation
dependent on scales of measurement (aka affected by the standard deviation of either variable)
formula in notebook

How well did you know this?

Not at all

Perfectly

correlation coefficient?

Pearson’s correlation coefficient (r)
standardized measure of the covariance (aka NOT affected by the standard deviation of either variable)
quantifies association between two continous variables
it can be positive negative or 0 (which means that there is no correlation)
ranges between -1 and 1
formula and plot in the notebook

How well did you know this?

Not at all

Perfectly

calculating r?

standardize the data to set scores (Z): this means that both variables will have the same average (0) and the same variability
calculate covariance and correlation

How well did you know this?

Not at all

Perfectly

meaning of Z scores?

if Z score is 2 it means that the observation is 2 standard deviations greater than the mean
if z score is -2 it means that the observation is 2 standard deviations below the mean

How well did you know this?

Not at all

Perfectly

significance of correlation?

transform r into t and put it into t-distribution to get a p value
formula in notebook

How well did you know this?

Not at all

Perfectly

semi partial correlation?

examine the unique overlap in variance between variables X and Y while ignoring the shared overlap between X and Z but not ignoring the shared overlap between Y and Z

How well did you know this?

Not at all

Perfectly

Partial correlation?

how can we quantify an association between two variables while taking into account another variable
the higher the correlation between X and Y the more shared variance they’ll have, the more predicted power X has for Y
partial correlation can increase compared t the regular correlation
Z as a cofounding variable if high correlation w/ both X and Y -> means that in this case when controlling for Z, the partial correlation would be lower than regular correlation
also turn partial r into t to access its significance (df= N-3)
check notebook for venn diagram and formula

How well did you know this?

Not at all

Perfectly

correlation in Jasp?

regression -> correlation
put both dependent and independent variable in variables table
if you want to do partial correlation put variable you want to control for in partial out table
for easier analysis request scatterplot and put comparisons pairwise

How well did you know this?

Not at all

Perfectly

regression formula?

Y = b0 + b1* X + e

Y: is the outcome variable
b0: the intercept (baseline that we are predicting with)
b1: the regression coefficcient for our single predictor variable (X)
b1 quantifies how strong of an association there is between your predictor and outcome variable
positive relationship if b1 is positive and vice versa (if b1=0 there is no association)
beta coefficients are influences by measurement scales
e (residuals): the distances between the expected values and the observed data points

How well did you know this?

Not at all

Perfectly

assumptions?

sensitivity
homoscedasticity
normality
additivity and linearity (checked through matrix scatterplot with predictors and outcome variable)

How well did you know this?

Not at all

Perfectly

sensitivity?

outliers: extreme residuals
Cook’s distance: how influential is this observation (values larger than 1 are potentially problematic)
you can also check Q-Q, residuals plots, casewise diagnostics

How well did you know this?

Not at all

Perfectly

check outliers in Jasp?

casewise diagnostics: usually st. residual > 3, which if it happens it means we were very mistaken about our prediction
check how influential outlier is in Cook’s distance>1 tab
both are in the statistics tab

How well did you know this?

Not at all

Perfectly

homoscedasticity?

modal errors are equal for every value of the predictor variable
if you don’t add an intercept to your model you introduce systematicness to your errors
Look at scatterplot of standardized predicted values vs residuals (roughly round shape is needed)
we don’t want any systematicness/strong correlation between prediction value and error -> messes with estimates and Type I error rate
checked after the analysis is complete because it’s based on the residuals

How well did you know this?

Not at all

Perfectly

check homoscedasticity in Jasp?

plots tab -> residuals plots -> residuals vs predicted

How well did you know this?

Not at all

Perfectly

check normality Jasp?

QQ plot of residuals

How well did you know this?

Not at all

Perfectly

calculate regression parameters?

Study These Flashcards

calculate b1 and b0 (formulas in notebook)
the slope is determined by b1
with the slope we can determine how many units our outcome variable increases if we increase our predictor variable -> that value is b1

error/residuals?

Study These Flashcards

the difference between the model predictions (^y) and observed values (y)
the better the correlation between observed and predicted values the better our model did

multiple regression formula?

Study These Flashcards

Y = b0 + b1X1 + b2X2 + … + bn*Xn + e

added assumption for multiple regression?

Study These Flashcards

multicollinearity: how associated are the predictor variables (we don’t want high associations)
you can still run a regression if you have multicollinearity but if predictor variables are very dependent of eachother the statistical method cannot really distinguish between them and their effect

how to assess multicollinearity in Jasp?

Study These Flashcards

correlations between predictor variables
in statistics tab in linear regression -> Tolerance and VIF (VIF: max < 10, mean < 1; Tolerance > 0.2 is good; however we want VIF and tolerance to be as close to 1 as possible)

model fit?

Study These Flashcards

The fit of the model can be viewed in terms of the correlation between the predictions and the observed values: if the predictions are perfect, the correlation will be 1
if we have a single predictor variable then correlation between prediction and observation equals the correlation between predictor variable and outcome variable

explained variance?

Study These Flashcards

Squaring correlation (R^2)
one of the central metrics to assess model fits in regression
R^2 can also be calculated through:
1. SS_M/SS_T which is the proportion of improvement with the predictive model compared to the baseline model
2. 1 - (SS_R/SS_T), with SS_R being the sum of the least squares (given by regression formula slope)

test model fit?

Study These Flashcards

can be done through different ways:

F statistic (check notebook for formula)
t statistic (check notebook for formula)

attention: signal to noise ration (MS_model/MS_error) = F statistic = t statistic squared
they are all methods to test the model fit and they will all give the same value

regression in jasp?

Study These Flashcards

regression -> classical linear regression
put outcome variable in dependent variable tab and predictor variable in covariate tab
Adjusted R^2: penalization due to the complexity of the model
you also are gonna get an ANOVA table were the SS_M is regression SS_E is residuals and SS_T is total in the sum of squares column -> tells you how much better your alternative model did compared to the null
coefficients table (intercept is b0 and adverts is b1): look at standardized values -> allows us to say something about the individual performance of our predictors

model metrics in Jasp for multiple regression?

- Models -> add a new model -> remove one of the variables from M1 and put it in M2 (the variable you want to control for goes into the model first) - Check R squared changed: how much has R^2 increased between models - AIC and BIC box: model fit metrics that are better at penalizing complex models -> if AIC and BIC decrease between models it means new model adds predictive value - these assesses whether an individual predictor variable adds something

lecture 21 + 22 (stats) Flashcards

(25 cards)