8. Fitting probability models to frequency data Flashcards
what does chi squared goodness-of-fit test do?
compares counts to a probability distribution
Null hyp for chi squared test
The data come from a specified probability distribution
Alternate hypothesis for chi squared test
The data do NOT come from a specified probability distribution.
Test statistic for chi squared test
X^2 = Sum of all classes (Observed - Expected)^2 / Expected
Degrees of freedom Definition
number of degrees of freedom of a test specifies which family of distributions to use
Degrees of freedom for chi squared equation
df = (Number of Categories) - (Number of Parameters estimated from the data) - 1
P value
probability of getting observed value or something even less likely ased on null hyp
What chi squared would be expected for perfect match
0! Though very unlikely
P value of chi squared in R
pchisq(chisqu, degrees of freedom, lower.tail = FALSE)
P value of chi squared using statistical tables
Table tells what the critical value would be for a given degrees of freedom and given alpha
If drew a line on the chis squared distribution at the critical value, if the null hypothesis was true there would be a 5% chance the calculated value would fall on right of line and 95% chance would fall on left
Find critical value
If calculated value is greater than critical value, can reject null hypothesis
Critical value
the value of the test statistic where P = alpha
Where would start to reject null hypothesis
test statistic
A number calculated from the data and the null hypothesis that can be compared to a standard distribution to find the P-value of the test
relationship between chi squared and binomial test
chi squared could be used as an approximation of the binomial test, could be used even with only two categories and especially useful when LOTS of data points
Assumptions of the chi squared test
No more than 20% of categories for EXPECTED<5
NO categories have EXPECTED less than or = 1
Approximation used in chi squared doesn’t work when values are too small
About EXPECTED not observed
If expected values are JUST over 5, probably good to do binomial test instead
Discrete distribution
probability distribution describing a discrete numerical random variable
Poisson distribution
describes the probability that a certain number of events occur in a block of time or space, when those events happen independently of each other and occur with equal probability at every point in time or space
how many times something happens in a unit of something
ex. divide space into square meters and count flowers within
Assumptions of Poisson distribution
Events must happen independently of each other
Events must occur with equal probability at every point in time or space
In what ways could poisson distribution be interesting?
To check if events are occuring independently and with equal prob at every point in time or space
Poisson distribution formula
Pr[X] = { e^(-u) * u^X } / X!
any number from 0 to infinity could be X, X is the number of events
u (mew) the average number of events per unit
How to account for high values in poisson distribution if X = inf or very high number
In practice rarely need high values, calculate specific values precisely then calculate prob of any value greater by 1 - prob of smaller values
Process of applying poisson distribution/ chi squared
- Determine the mean number of events in given time/space
ex. for number of goals per team per game in world cup,
x bar = (Sum of # of goals * # of times that amount of goals were scored) / (number of games played * 2)
- Calculate Pr[X] for each value you’re interested in
- Use Pr[X] to find expected actual number for each category
- Check if any values are less than 5 or 1. If so, combine categories
- Calculate chi squared value for each category
- Determine number of degrees of freedom (df = (number of cat) - (number of parameters est from data ) - 1 )
- Identify critical value
- Compare to critical value, if chi squared is equal to or GREATER than critical value, can reject the null hypothesis