Parametric Tests and Assumptions Flashcards
(103 cards)
What do parametric tests assess?
What is required to run them?
- Parametric tests look at group means
- Require data to follow a normal distribution
- Can deal with unequal variances across groups
- Generally are more powerful
- still produce reliable results with continuous data not normally distributed if sample size requirements met (CLT)
If data does not meet parametric assumptions what non parametric tests would you use?
- Correlation tests, which are non parametric versions/ So for example, a Spearman’s Correlation Test.
- Non parametric tests also assess group means, they just don’t require a normal distribution.
What is the loop hole with parametric tests when continuous data is not normally distributed (therefore according to assumptions, perhaps should choose non parametric one?)
- Loophole is that sample size requirements are met due to central limit theorem. In these cases you can still produce reliable results.
What do non parametric tests assess?
How is this different to parametric tests?
- Group MEDIANS
- Don’t require data be normally distributed
- Can handle small sample sizes
Because parametric tests assess group means. They require a larger sample size.
What is one easy question to ask ourselves when figuring out whether to choose parametric or non parametric?
What is the sample size we’re working with.
Non parametric can deal with small sample sizes, parametric not so much..
What are the four parametric test assumptions?
- Additivity and linearity
- Normality
- Homogeneity of variance
- Independence of observations
What is this equation?
y(i) = b(0) + b(1)X(1) + e(i)
This is the standard linear model (that describes a straight line), and we see this when looking at additivity and linearity
What does the Y, B(0) and B(1) and E(i) stand for in the below?
y(i) = b(0) + b(1)X(1) + e(i)
Y(i) = the Xth persons score on the outcome variable B(0) = The Y intercept - the value of Y when X = 0 B(1) = the regression coefficient for the first predictor (so the gradient of the regression line (slope) and the strength of the relationship e = the difference between the actual and predicted value of the Y for the (i)th person.
What does the standard linear model equation describe?
Both the direction and the strength between the ASSOCIATION of the X and Y variable. Always have an error term at the end.
What does the E at the end of the standard regression equation represent
The difference between the actual observed data point and the LINE the we drew in the data points. That’s each data point (or persons) residual or error.
In parametric tests are we adding terms together or multiplying? If so, why?
Because predictors do not DEPEND on the values of other variables.
We use additive data, so x1 and x2 predict T.
So the predictors (variables) and their effect, added together, lead to an outcome which is a linear function of predictors x1 + x2.
Basically linear and additive data say X1 and X2 predict Y.
Basically, what does linear and additive allude to?
That x1 and x2 predict y
Why are variables not multiplied in linear equations?
Because we are looking at linear relationships which involve adding terms together. Not multiplying. Adding the predictors together says that the outcome, or DV, is a linear function of the predictors AND their effects
b(0) + b(1)X(1) + e(i)
How do we deal with assumptions for ANOVA?
- Independent observations: Repeated measures
- Normality – transform or use Kruskal Wallis
- Homogeneity of variances – test with Levene’s test, use Brown-Forsythe or Welch F
How do we deal with assumptions for correlations?
- Normality – Use Spearman correlation
* Linearity: if monotonic, use Spearman, otherwise transform
How do we deal with assumptions for regression?
• Continuous outcome (otherwise use nonlinear methods)
• Non-zero variance in predictors
• Independent observations: Repeated measures
• Linearity – check with partial regression plots, try transforming
• Independent errors: For any pair of observations, the error terms should be
uncorrelated
• Normally-distributed errors: The errors (i.e., residuals) should be random and
normally distributed with a mean of 0
• Homoscedasticity: For each value of the predictors, the variance of the error term
should be constant
How do we deal with assumptions for multiple regression?
Refer to Multiple
Regression lecture
slides #19-32
The above, and also multicollinearity – delete or combine
How do we deal with assumptions for moderation?
- One IV must be continuous (if both X and M are categorical, use factorial ANOVA)
- Each IV and Y, and interaction term and Y, should be linear – try transforming
Why would the best central tendency measure for your data sometimes be a median, and other times be a mean?
Generally the mean is best but media is preferred measure of central tendency when there are a few extreme scores in the distribution of the data (a single outlier can have a great effect on the mean)
Or, perhaps there are some missing values in the data.
What does the Gaussian distribution or bell curve mean?
Normal distribution.
What are the four assumptions for parametric tests?
Additivity and linearity
Normality
Homogeneity of variance
Independence of observations
y(i) = b(0) + b(1)X(1) + e(i)
What is this equation telling us? Which parametric test assumption is it associated with?
THE STANDARD LINEAR MODEL for additivity and linearity
Y(i) = the Xth persons score on the outcome variable B(0) = The Y intercept - the value of Y when X = 0 B(1) = the regression coefficient for the first predictor (so the gradient of the regression line (slope) and the strength of the relationship e = the difference between the actual and predicted value of the Y for the (i)th person.
With the standard linear model, how many X variables can be added to an equation for a straight line?
However many as you like!
What is Y in the standard linear model equation?
The outcome variable