Topic 11 - Tests for Relationship Flashcards
L.O.
LO7 Given real multivariate data and a problem, formulate an appropriate hypothesis and perform a range of hypothesis tests.
LO8 Interpret the p-value, conscious of the various pitfalls associated with testing.
Chi-Squared tests (x^2)
Used for:
Goodness of Fit:
- tests whether the observed frequency distribution of a categorical variable matches an expected theoretical distribution
eg. Do eye colours of DATA1001 students follow; 45% brown, 27% blue 28% green?
Independence:
- Examines whether there is a significant association between 2 qualitative variables
eg. Is there an association between a persons eye colour, and their parents?
X^2 GoF Vs Independence
GoF:
- One variable
- Compare observed data to theoretical distribtion
Independence:
- Two variables
- Examine the relationship between variables within the same population
Chi Squared test stat.
test stat = (OF - EF)^2 ÷ EF
Observed & Expected Frequencies
X^2 = sum of all the test statistics
x^2 GoF HATPC process
H:
H0 = assumes that any differences between OF& EF is due to chance alone
H1 = NOT due to chance alone
A:
- Observations are independent
- EF: none are empty and no more than 20% are < 5
T:
test stats = (O-E)^2 ÷ E
x^2 = Σ(O-E)^2 ÷ E
P:
Uses a x^2 distibution:
DoF = k-1
k = # of categories
eg. DoF = 6-1
=5
[heft]
C:
If p> 0.05, the data is consistent with H0 and H0 is retained
x^2 Independence HATPC Example
H:
H0 = withdrawal symptoms severity IS independent of belief in caffeine consumption
H1 = severity is NOT independent of befief of consumption, there is an association between betief and severity
A:
- Cochrans Rule; (No more than 20% < 5 and no EF empty)
- Observations are independent
T:
DoF = (m-1)(n-1)
where,
m= # categories in variable 1
n = # categories in variable 2
P:
The chance of observing X^2 value or more extreme on a 2DoF x^2 distribution
[heft]
C:
P > 0.05 = Retain H0, there is NO association
P < Reject H0, there IS association
Mosaic plot with standartised residuals
- The residuals indicate the ‘gaps’ (O-E) for each individual combination
[heft]
One-sample t-test for the scope: (Regression Test)
[heft] for equation
H:
H0 = There is a linear trend
H1 = There is NO linear trend
A:
- Independence of residuals (check context)
- Normality of residuals (QQplot, Shapiro-Wilk)
- Homoscedasicity of residuals (residual plot)
- Linearity of variables (scatterplot, residual plot)
T:
T = (OV - EV) ÷ SE
In R: summary()
C:
P > Retain H0
P < Reject H0, Slope is significantly different from 0