Lecture 8: Non-parametric tests Flashcards
What does the chi-square tests?
whether there is a relationship between two categorical variables.
What leads to chi-squared test?
Q: What sort of measurement? A: Categorical (in this case counts or frequencies)
Q:How many predictor variables? A: One
Q: What type of predictor variable? A: Categorical
Q: How many levels of the categorical predictor? A: Not relevant
Q: Same or Different participants for each predictor level? A: Different
This leads us to and Chi-square test for independence of groups
IV and DV in chi-squared tests - (2)
One categorical DV (because of frequencies)
with one categorical IV with different participants at each predictor level
In chi-square since we are using categorical variables we can not use
mean or any similar statistic hence cannot use any parametric tests
In chi-square test when measuring categorical variables we are interested in
frequencies (number of items that fall into combination of categories)
What does chi-square compare?
observed frequencies from the data with frequencies which would be expected if there was no relationship between the two variables.
In chi-square test, participants is allocated to one and only one category such as - (3)
pass or fail,
pregnant or not pregnant,
win, draw or lose
What is assumptions of chi-square test? - (3)
Data values that are a simple random sample from the population of interest.
Two categorical or nominal variables. Don’t use the independence test with continuous variables that define the category combinations. However, the counts for the combinations of the two categorical variables will be continuous.
For each combination of the levels of the two variables, we need at least five expected values. When we have fewer than five for any one combination, the test results are not reliable
Since each participant is allocated to one category in chi-squared test each individual therefore
contributes to the frequency or count with which a category occurs
In chi-squared categorical outcomes, the null hypothesis is set
up on the basis of expected frequencies, four all four variable combinations, based on the idea that the variable of interest has no effect on frequencies
Example of scenario using chi-square
We have a list of movie genres; this is our first variable. Our second variable is whether or not the patrons of those genres bought snacks at the theatre. Our idea (or, in statistical terms, our null hypothesis) is that the type of movie and whether or not people bought snacks are unrelated. The owner of the movie theatre wants to estimate how many snacks to buy. If movie type and snack purchases are unrelated, estimating will be simpler than if the movie types impact snack sales.
We have a list of movie genres; this is our first variable. Our second variable is whether or not the patrons of those genres bought snacks at the theatre. Our idea (or, in statistical terms, our null hypothesis) is that the type of movie and whether or not people bought snacks are unrelated. The owner of the movie theatre wants to estimate how many snacks to buy. If movie type and snack purchases are unrelated, estimating will be simpler than if the movie types impact snack sales.
Is the Chi-square test of independence an appropriate method to evaluate the relationship between movie type and snack purchases? - (3)
We have a simple random sample of 600 people who saw a movie at our theatre. We meet this requirement.
Our variables are the movie type and whether or not snacks were purchased. Both variables are categorical.
But last requirement is for more than five expected values for each combination of the two variables. To confirm this, we need to know the total counts for each type of movie and the total counts for whether snacks were bought or not. = check later
We have a list of movie genres; this is our first variable. Our second variable is whether or not the patrons of those genres bought snacks at the theatre. Our idea (or, in statistical terms, our null hypothesis) is that the type of movie and whether or not people bought snacks are unrelated. The owner of the movie theatre wants to estimate how many snacks to buy. If movie type and snack purchases are unrelated, estimating will be simpler than if the movie types impact snack sales.
Diagram of contigency table in Chi-square and calculating row totals and colum and grand total - (7)
50 + 125 + 90 +45 = 310
75 + 175 + 30 + 10 = 290
50 + 75 = 125
125 + 175 = 300
90 + 30 = 120
45 + 10 = 55
310 + 290 = 600
We have a list of movie genres; this is our first variable. Our second variable is whether or not the patrons of those genres bought snacks at the theatre. Our idea (or, in statistical terms, our null hypothesis) is that the type of movie and whether or not people bought snacks are unrelated. The owner of the movie theatre wants to estimate how many snacks to buy. If movie type and snack purchases are unrelated, estimating will be simpler than if the movie types impact snack sales.
Diagram of contigency table in Chi-square of calculating eexpected counts
e.g., for action and snacks it would be column total (310) * row total (125) divided by grand total of 600 = 65
How to calculate chi-square test statistic? - (4)
- Calculate the difference from actual and expected for each Movie-Snacks combination.
- square that difference.
- Divide by the expected value for the combination.
- We add up these values for each Movie-Snacks combination. This gives us our test statistic.
Example of calculating chi-square from table
For this it would be 65.03
How to understand your test statistic from chi-squared? - (5) if you have test statistic of 65.03
- Set your significance level = .05
- Calculate the test statistic -> 65.03
- Find your critical value from chi-squared distribution table based on df & significance level
- Degrees of freedom: df (r – 1) x (C-1)
For the movie example this is; Df = (4-1) x (2-1) = 3 -> 7.815 - compare test statistic with critical level
65.03 > 7.82 so reject the idea that movie type and snack purchases are independent
Example of research question and hypothesis and sig level of chi-square test of independence- (4)
Research question:
Does the area of psychology that a person prefers depend on whether they would select a cat or a dog as a pet?
Hypotheses:
H0: The area of interest in psychology and type of pet preferred are independent of each other.
H1: The area of interest in psychology and type of pet preferred are not independent of each other. That is the primary area of interest in psychology depends on whether you prefer a cat or a dog.
Significance level: α = .05
Does the area of psychology that a person prefers depend on whether they would select a cat or a dog as a pet? - chi-square test of independence
Chi-square example we need to check the assumptions below - (2)
Independence
Each item or entity contributes to only one cell of the contingency table.
The expected frequencies should be greater than 5.
In larger contingency tables up to 20% of expected frequencies can be below 5, but there is a loss of statistical power.
Even in larger contingency tables no expected frequencies should be below 1.
Does the area of psychology that a person prefers depend on whether they would select a cat or a dog as a pet? - chi-square test of independence
Chi-square example we need to check the assumptions of The expected frequencies should be greater than 5.
What does it show? - (4)
Here we see that all the expected counts in the cat group and one expected count in the dog group are below 5.
We also have one in the cat group that is below 1.
So, SPSS has flagged that we have 60% of the expected counts falling below 5.
So assmption of expected frequencies greater than 5 is not assumed
If chi-square assumption that The expected frequencies should be greater than 5 is not satisfied then do - chi-square test of independence
We should use Fisher’s Exact Test which can correct for this.
Does the area of psychology that a person prefers depend on whether they would select a cat or a dog as a pet? - - chi-square test of independence
If assumptions were met (expected frequencies greater than 5) then.. report - (2)
A chi-square independence test was performed to examine whether there was a relationship between their area of studies in psychology and their preference for cats or dogs.
The relationship between these variables was not significant, χ²(4, N = 46) = 1.46, p = .834, so we fail to reject H0.
Are directional hypotheses possible with chi-square?
A.Yes, but only when you have a 2 × 2 design.
B.Yes, but only when there are 12 or more degrees of freedom.
C.Directional hypotheses are never possible with the chi-squared test.
D.Yes, but only when your sample is greater than 200.
A = only when you have 2 variables to compare and can’t do non-directional in chi-square have to use loglinear or goodness of fit tests
Example situations you can do chi-square directional and not possible - (5)
If we are just comparing pet preferences between males and females, we can make a directional hypothesis (2 x 2 – male/female, cats/dogs).
Males prefer cats or females prefer dogs.
However, when we start adding variables to the design it gets complicated.
If we wanted to compare drink preferences at different times of the day for students/lecturers, we couldn’t form a directional hypothesis.
This is because we have 3 main effects and several interactions to consider. We need to use loglinear analyses to do this.