Statistical distributions + Hypothesis testing Flashcards
(27 cards)
What is a random variable
A variable whose outcome depends on a RANDOM event - The outcome isn’t known until the experiment is carried out.
A variable can take on a range of specific values
What is a sample space
The range of values a random variable can take
What makes a variable discrete
If it can only take on specific numerical values
What does a probability distribution do
Fully describes the probability of any outcome in the sample space
What is a discrete uniform distribution
When all the probabilities of a discrete random variable in a sample space are the same
Probability mass function definition
A function that gives the probability that a discrete random variable is exactly equal to some value. Sometimes it is also known as the discrete probability density function/frequency function
Sum of probabilities for a random variable X
= 1 for all little x
When can you model a random variable X with a binomial distribution B(n,p), where X is the number of successful trials
- If there are a fixed number of trials n
- There are two possible outcomes (success
and failure) - There is a fixed probability of success (p)
- TRIALS ARE INDEPENDENT OF EACH OTHER (always specify this in assumptions)
P(X=r) = nCr(p^r)(1-p)^(n-r)
where n is the index and p is the parameter
What does a cumulative probability function for a random variable X tell us
The sum of the given probabilities up to and including the given value of x in the calculation P(X≤x) for various values of n and p
How to carry out a hypothesis test
- Assume null hypothesis H₀ is true
- Consider how likely the observed value of
the test statistic to occur is - If less than a given threshold (significance
level) then you reject the null hypothesis
What is a hypothesis (elaborate on key words used in definition)
A statement made a about a population parameter (A number that describes something about an entire group/population)
How to test a hypothesis
Taking a sample or carrying out an experiment on the population
What is the test statistic
The result of the experiment carried out on/statistic calculated from the sample
Note that the test statistic is a DISTRIBUTION e.g. X-B(10, p) where p isn’t known until we start making assumptions
For a hypothesis test involving the binomial distribution, the test statistic is always the number of successes
Null and alternative hypotheses definitions
Null - Hypothesis assumed to be true
Alternative - Tells you about parameter if assumption is shown to be wrong
Two tailed vs one tailed tests
One tailed - H₁:p<… or H₁:p>…
Two tailed - H₁:p ≠…
Critical region definition
Region of the probability distribution which if the test statistic falls within, the null hypothesis would be rejected
Formally writing out hypothesis test
A test statistic is modelled as B(10,p), and a hypothesis test at the 5% significance level uses H₀: p=0.4 H₁:p<0.4
Assuming null to be true, X has distribution X~B(10,0.4)
Critical value
First value to fall within critical region
Actual significance level
Probability of incorrectly rejecting null hypothesis - falling within significance level only suggests that there is evidence to suggest that null hypothesis is incorrect
Calculating critical region w/o thought
Smallest value for P(X≥ r) < significance %
region is then everything greater than r
What is a hypothesis test
A hypothesis test
uses a sample or an experiment to determine whether or not to reject the hypothesis.
Critical value/acceptance region
The critical value is the first value to fall inside of the critical region.
The acceptance region is the region where we accept the null hypothesis
Difference between r and ρ
r is pmcc for a sample, ρ is for a population
Assumptions made in hyp test using pmcc
assumption that the population has a
bivariate normal distribution.