categorical data tests Flashcards
(9 cards)
chi-square goodness of fit test
- “tests whether data come from a specific categorical (multinomial) distribution”
- The idea behind the chi-square goodness-of-fit test is to see if the sample comes from the population with the claimed distribution. Another way of looking at that is to ask if the frequency distribution fits a specific pattern.
- if the observed x^2 is greater than the expected value (from tables), reject H0
contingency table
- “a two-dimensional cross-tabulation of frequencies of occurrence for two categorical variables”
- most common example: 2x2 table
- chi-square tests of association/independence and homogeneity, McNemar’s test for paired data, kappa statistic all use contingency tables
chi-square test of association or independence (bt categorical variables)
- “used when a single random sample from a single population is obtained and two categorical variables are measured on each subject” ; tests for an association or lack of between those two categorical variables
- ex: random sample of college students: is there an association between gender and housing status? is one gender more likely to live on campus than the other?
- H0: there is no association bt variables x&y
H1: there is a significant association bt x&y
chi-square test of homogeneity (of populations)
- “used when samples are obtained from two populations and a single categorical variable is measured”
- are the populations the same across levels of the categorical variable?
- ex: random sample of men, random sample of women: is the proportion of depression the same in the two populations?
- H0: p1=p2 (population proportions in populations 1 & 2 are the same/no difference)
H1: p1≠p2 (population proportions in populations 1 & 2 are not the same/are different)
McNemar’s Test (for paired data)
- looks for symmetry within a contingency table of categorical data from paired observations (is the probability of an observation being classified in one cell the same as being classified into another)
- ex: cases matched with controls in case-control studies, before and after data in the same individual
- if calculated x^2 is > than the expected value (from tables) then reject H0
kappa statistic
- if there is an assumption that there is an association between two variables, kappa statistic can measure the degree of association
- “used in reliability studies to quantify the reproducibility of the same variable measured twice”
- “a function of the observed and expected concordance rates”
K > 0.75 excellent reproducibility
0.4 ≤ K ≤ 0.75 good reproducibility
0 ≤ K ≤ 0.4 marginal reproducibility
Understand the basic structure of a chi-square test statistic
Squared difference b/w observed and expected counts divided by the expected count
The # expected is derived from:
the NULL hypothesis in the following manner:
#expected = total sample size* (times) % specified in the null
the expected counts are you calculations
compare to see if counts agree
What do you do when x squared or test statistic calculated?
when x squared is calculated, we wonder what the size of it is?
we compare observed x squared with the value from chi square tables with 2 df (# of categories -1) and alpha=0.05
x square table value with 2df and alpha (type 1 error)
Then you make your decision to reject or fail to reject ho