Week 3 - Parametric test assumptions Flashcards
(44 cards)
What are the features of a parametric test?
- assess group means
- data must have normal distribution (+CLT)
- unequal variances allowed
- more powerful
What are the features of a non-parametric test?
e.g. correlation tests
- assess group MEDIANS
- data doesn’t need to be normally distributed
- can handle small sample size
Questions to ask yourself when deciding to use a parametric test or not
- sample size
- best way to measure central distribution (e.g. median or mean?)
What are the parametric test assumptions? (4)
- Additivity and linearity
- Normality (Gaussian distribution/Bell curve)
- Homogeneity of variances
- Independence of observations
Describe the assumption of Additivity and linearity
Involves a standard linear model/ equation (describing a straight line)
What is the Standard linear model equation
Yi - b0 + b1X1+Ei
Yi= the ith person’s score on the outcome variable
B0= Y-intercept. value of Y when X = 0. point at which the regression line crosses the y-axis
B1 = regression coefficient for the first predictor (B2 for the second predictor).
- Gradient (slope/ rise over run) of the regression
- Direction/ strength of relationship
Ei= the difference between the actual and predicted value of Y for the ith person
- residual/ error
What does it mean for data to be linear and additive?
- X1 and X2 predict Y.
- The outcome is a linear function of the predictors (X1 + X2)
- predictors are added together & do not depend on values of other variables in as in a multiplicative model
The outcome Y is an additive combination of the effects of X1 and X2. e.g. as both X1 and X 2 increase, Y increases also
True or false:
The outcome Y is an additive combination of the effects of X1 and X2. e.g. as both X1 and X 2 increase, Y increases also
true
How can we assess linearity?
- plot observed vs predicted values (symmetrically distributed around diagonal line)
- plot residuals vs predicted values (symmetrically distributed around diagonal line)
How to fix non-linear equations?
- nonlinear transformation to variables
- another regressor that is nonlinear - function - polynomial curve
- examine moderators
Describe the assumption of Normality
relevant to:
- parameters (sampling distribution)
- residuals/ error terms
- -> confidence intervals around parameter
- -> Null hypothesis significance testing
What is Central Limit Theorem (CLT)?
As the sample size increases toward infinity (gets larger), the sampling distribution approaches normal.
–> sample means will be normally distributed thus you don’t need to worry too much about the distribution that the samples came from.
–> distribution of means from many samples and re-samples
–>sample size must be AT LEAST 30
For CLT to apply, what size must the sample size be?
At least 30
True or false
According to CLT -
Even if the data is not normal, the sampling distribution of the data will be normal
True
True or false
Positively skewed data gathers on the left side and scores bunch at the low values with tails pointing to high values
true
True or false
Negatively skewed data gathers on the left side and scores bunch at the low values
false - it gathers on the left (e.g. as you grow conditions get “worse” in life)
they bunch at the high values with tails pointing to low values
What is kurtosis?
The amount which data clusters in either the tails (ends) or the peak (tallest part) of the distribution
- heaviness of tails
Draw the following:
Negative Kurtosis
Positive Kurtosis
Normal distribution
Leptokurtic (heavy tails)
Mesokurtic
Platykurtic (light tails)
draw on paper
What are properties of frequency distributions?
- Skewness
- Kurtosis
Checking the distribution to determine if the assumption of normality is met is important. Which graphical displays are used to test for normality?
Q-Q plots (dots on straight line = normal)
Histograms
What is the name for the software (e.g. JASP) based method for testing for normality?
Shapiro Wilkes Test
Describe the Shapiro Wilkes Test and what a p value of <0.05 means
- tests if data is different from normal distribution
- p < 0.05 = data varies significantly from normal distribution thus normality is violated
In Shapiro Wilkes Test, what does a p value >0.05 mean?
Data des not vary significantly from a normal distribution thus the normality assumption is not violated
Describe the assumption of homogeneity of variance
Assumes all groups or data points have the same or equal variances = the assumption of equal variances