9 - Contingency Tables Flashcards
(16 cards)
What is a contingency table?
A contingency table displays frequencies for combinations of two variables categorized into groups. It organizes outcomes for one variable in rows and the other variable in columns, with the intersection values representing frequencies for each combination.
What is the chi-squared (πΒ²) distribution?
The chi-squared distribution is a special case of the gamma distribution, widely used in inferential statistics to test for associations, including independence and goodness of fit. It is particularly useful for large sample sizes.
What are degrees of freedom in statistics?
Degrees of freedom refer to the number of independent pieces of information used to calculate a statistic. (n-1)
How is the expected frequency calculated in a contingency table?
The expected frequency for any cell in a contingency table is calculated using the formula: (row total Γ column total) / overall total. Expected frequencies are the theoretical frequencies assuming independence between the variables.
What does the chi-squared test statistic (πΒ²) represent?
The chi-squared test statistic measures the difference between observed frequencies (π) and expected frequencies (π¬) in a contingency table, calculated as πΒ² = β((π β π¬)Β² / π¬). It helps determine if thereβs a significant association between the two variables.
What is the null hypothesis in the context of contingency tables?
The null hypothesis posits that there is no association between the two variables being studied, implying that the observed sample proportions are due to chance. Testing this hypothesis helps determine if the variables are independent.
Why might small expected frequencies be a problem in a chi-squared test?
If the expected frequencies are small (less than 5), it is invalid to obtain a critical value from the chi-squared distribution. This can often be remedied by combining categories to create larger expected frequencies.
What is the significance level in hypothesis testing?
The significance level is the threshold used to determine whether to reject the null hypothesis. Common levels are 1%, 5%, and 10%, indicating the probability of wrongly rejecting the null hypothesis if it is true.
These flashcards cover essential definitions, calculations, and concepts related to contingency tables and the chi-squared test, aiding in understanding and memorization for students studying statistics.
What role do contingency tables play in hypothesis testing?
Contingency tables help in analyzing and testing the null hypothesis that two variables are independent. By examining observed and expected frequencies, researchers can determine if there is a statistically significant relationship between the variables.
When is the chi-squared test considered valid?
The chi-squared test is valid when the observed frequencies (π) used for the test are counts (not percentages or other forms), and the expected frequencies (π¬) are sufficiently large, typically greater than 5 in all cells of the contingency table.
What should be done if expected frequencies are less than 5?
If any expected frequency is less than 5, groups should be combined to form larger expected values. This ensures that the chi-squared test remains valid and meaningful in evaluating associations between variables.
What does the term βnull hypothesisβ refer to in the context of chi-squared tests?
The null hypothesis (H0) shows that there is no association between the two variables being examined, meaning any observed differences in frequencies are due to chance rather than a systematic relationship.
How do researchers interpret the result of a chi-squared test?
Researchers interpret the chi-squared test result by comparing the calculated chi-squared statistic to a critical value determined by the degrees of freedom and significance level. If the statistic exceeds the critical value, the null hypothesis is rejected, indicating some association between the variables.
What is an example of a practical application for a chi-squared test?
A practical application includes analyzing survey data to see if the preference for public transport versus road improvement is independent of the area of residence. By using a chi-squared test, researchers can determine if preferences differ significantly across neighborhoods.
Describe the formula for the test statistic πΒ².
The test statistic πΒ² is calculated using the formula: πΒ² = β((π β π¬)Β² / π¬), where π is the observed frequency for a cell and π¬ is the expected frequency for that cell. This formula assesses the degree of difference between observed and expected outcomes.