lecture 21 + 22 (stats) Flashcards

(25 cards)

1
Q

covariance?

A
  • deviations of the two variables multiplied with each other, divided by N-1 to get the average of the combined deviation
  • dependent on scales of measurement (aka affected by the standard deviation of either variable)
  • formula in notebook
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

correlation coefficient?

A
  • Pearson’s correlation coefficient (r)
  • standardized measure of the covariance (aka NOT affected by the standard deviation of either variable)
  • quantifies association between two continous variables
  • it can be positive negative or 0 (which means that there is no correlation)
  • ranges between -1 and 1
  • formula and plot in the notebook
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

calculating r?

A
  1. standardize the data to set scores (Z): this means that both variables will have the same average (0) and the same variability
  2. calculate covariance and correlation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

meaning of Z scores?

A
  • if Z score is 2 it means that the observation is 2 standard deviations greater than the mean
  • if z score is -2 it means that the observation is 2 standard deviations below the mean
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

significance of correlation?

A
  • transform r into t and put it into t-distribution to get a p value
  • formula in notebook
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

semi partial correlation?

A

examine the unique overlap in variance between variables X and Y while ignoring the shared overlap between X and Z but not ignoring the shared overlap between Y and Z

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Partial correlation?

A
  • how can we quantify an association between two variables while taking into account another variable
  • the higher the correlation between X and Y the more shared variance they’ll have, the more predicted power X has for Y
  • partial correlation can increase compared t the regular correlation
  • Z as a cofounding variable if high correlation w/ both X and Y -> means that in this case when controlling for Z, the partial correlation would be lower than regular correlation
  • also turn partial r into t to access its significance (df= N-3)
  • check notebook for venn diagram and formula
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

correlation in Jasp?

A
  • regression -> correlation
  • put both dependent and independent variable in variables table
  • if you want to do partial correlation put variable you want to control for in partial out table
  • for easier analysis request scatterplot and put comparisons pairwise
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

regression formula?

A

Y = b0 + b1* X + e

  • Y: is the outcome variable
  • b0: the intercept (baseline that we are predicting with)
  • b1: the regression coefficcient for our single predictor variable (X)
  • b1 quantifies how strong of an association there is between your predictor and outcome variable
  • positive relationship if b1 is positive and vice versa (if b1=0 there is no association)
  • beta coefficients are influences by measurement scales
  • e (residuals): the distances between the expected values and the observed data points
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

assumptions?

A
  • sensitivity
  • homoscedasticity
  • normality
  • additivity and linearity (checked through matrix scatterplot with predictors and outcome variable)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

sensitivity?

A
  • outliers: extreme residuals
  • Cook’s distance: how influential is this observation (values larger than 1 are potentially problematic)
  • you can also check Q-Q, residuals plots, casewise diagnostics
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

check outliers in Jasp?

A
  • casewise diagnostics: usually st. residual > 3, which if it happens it means we were very mistaken about our prediction
  • check how influential outlier is in Cook’s distance>1 tab
  • both are in the statistics tab
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

homoscedasticity?

A
  • modal errors are equal for every value of the predictor variable
  • if you don’t add an intercept to your model you introduce systematicness to your errors
  • Look at scatterplot of standardized predicted values vs residuals (roughly round shape is needed)
  • we don’t want any systematicness/strong correlation between prediction value and error -> messes with estimates and Type I error rate
  • checked after the analysis is complete because it’s based on the residuals
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

check homoscedasticity in Jasp?

A

plots tab -> residuals plots -> residuals vs predicted

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

check normality Jasp?

A

QQ plot of residuals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

calculate regression parameters?

A
  • calculate b1 and b0 (formulas in notebook)
  • the slope is determined by b1
  • with the slope we can determine how many units our outcome variable increases if we increase our predictor variable -> that value is b1
17
Q

error/residuals?

A
  • the difference between the model predictions (^y) and observed values (y)
  • the better the correlation between observed and predicted values the better our model did
18
Q

multiple regression formula?

A

Y = b0 + b1X1 + b2X2 + … + bn*Xn + e

19
Q

added assumption for multiple regression?

A
  • multicollinearity: how associated are the predictor variables (we don’t want high associations)
  • you can still run a regression if you have multicollinearity but if predictor variables are very dependent of eachother the statistical method cannot really distinguish between them and their effect
20
Q

how to assess multicollinearity in Jasp?

A
  • correlations between predictor variables
  • in statistics tab in linear regression -> Tolerance and VIF (VIF: max < 10, mean < 1; Tolerance > 0.2 is good; however we want VIF and tolerance to be as close to 1 as possible)
21
Q

model fit?

A
  • The fit of the model can be viewed in terms of the correlation between the predictions and the observed values: if the predictions are perfect, the correlation will be 1
  • if we have a single predictor variable then correlation between prediction and observation equals the correlation between predictor variable and outcome variable
22
Q

explained variance?

A
  • Squaring correlation (R^2)
  • one of the central metrics to assess model fits in regression
  • R^2 can also be calculated through:
    1. SS_M/SS_T which is the proportion of improvement with the predictive model compared to the baseline model
    2. 1 - (SS_R/SS_T), with SS_R being the sum of the least squares (given by regression formula slope)
23
Q

test model fit?

A

can be done through different ways:

  1. F statistic (check notebook for formula)
  2. t statistic (check notebook for formula)
  • attention: signal to noise ration (MS_model/MS_error) = F statistic = t statistic squared
  • they are all methods to test the model fit and they will all give the same value
24
Q

regression in jasp?

A
  • regression -> classical linear regression
  • put outcome variable in dependent variable tab and predictor variable in covariate tab
  • Adjusted R^2: penalization due to the complexity of the model
  • you also are gonna get an ANOVA table were the SS_M is regression SS_E is residuals and SS_T is total in the sum of squares column -> tells you how much better your alternative model did compared to the null
  • coefficients table (intercept is b0 and adverts is b1): look at standardized values -> allows us to say something about the individual performance of our predictors
25
model metrics in Jasp for multiple regression?
- Models -> add a new model -> remove one of the variables from M1 and put it in M2 (the variable you want to control for goes into the model first) - Check R squared changed: how much has R^2 increased between models - AIC and BIC box: model fit metrics that are better at penalizing complex models -> if AIC and BIC decrease between models it means new model adds predictive value - these assesses whether an individual predictor variable adds something