Advanced Analysis and Hypothesis Tests Flashcards
(172 cards)
What is a t-distribution?
Similar to the standard Normal distribution but is family of curves dependent on the degrees of freedom.
What is hypothesis testing?
using data to “weigh up the evidence” and using the evidence to decide whether to reject a pre-defined statement
What are the five steps of hypothesis testing?
- State the null hypothesis
- Calculate the appropriate test statistic
- Obtain a P value for the test statistic
- Make the decision whether to reject the null hypothesis based on P value
- State the conclusion in terms of the original research question
What is the null hypothesis?
- A statement about the value of a population parameter or the difference between groups
- usually the negation of the research hypothesis
- usually “the effect/association of interest is zero
What is the alternative hypothesis?
- Opposite of the null hypothesis
* Usually related to the research question
How do we calculate the test statistic?
Test statistic = observed value - hypothesised value/
standard error
What is the relationship between the test statistic and the null hypothesis?
The bigger the test statistic (+/-), the more evidence there is against the null hypothesis. The value of the test statistic is used to decide whether to reject the null hypothesis.
What is the goal of estimation?
We want to estimate the population parameter based on the sample statistic.
• The sample must therefore be representative of the population
How is estimation different from hypothesis testing?
Hypothesis testing is concerned with using the data to ‘weigh up the evidence’ and make a decision whether to reject a pre-specified statement (the null hypothesis) or not, whereas estimation gives us a ‘best estimate’ for the population value along with a range of likely values (confidence Intervals)
What is the definition of a population parameter?
A measurable characteristic of the population (e.g. mean = μ, proportion = π, standard deviation = σ). Values obtained from a sample are estimates of the
population parameters.
What are sample statistics?
Sample statistics are estimates of results that would have been obtained had the whole population been studied
What are the two different kinds of estimation?
Point estimation and interval estimate
What is a confidence interval?
a range of values in which we have confidence that the population true value lies. It quantifies uncertainty and indicates the precision of our sample statistic
What is a point estimate?
An example would be a mean - it is just one value and doesn’t take into account that this value would change from sample to sample
When does the width of the CI increase?
When there is:
- a small sample size
- lots of variability in the data
- the level of confidence (eg 99%) increases
When do we use the t-distribution?
When the sample size is small, say under 30
What is the formula for the t-distribution?
t = (x̄ – μ) / (s/√n) x̄ is the sample mean μ is the population mean s is the standard deviation n is the size of the given sample
What impacts the width of a CI?
- Precision of the estimate (s.e.)
* Level of confidence (multiplier)
What are poor and high precision and how do they relate to the concept of a CI?
• Poor precision (large SE): wide interval
•High precision (small SE): narrow interval
•As sample size increases, standard error (SE)
decreases which leads to greater precision and
narrower intervals
The larger the confidence, the….
….greater the interval
The narrower the interval, the…
…lower the confidence
Can you use a CI for a proportion?
Binomial proportions are not from the normal distribution but:
• If the sample size is greater than 30 and 0.1 < p < 0.9, we can use our standard formula for the confidence interval p +1.96SE( p)
What is the chi-squared test?
The chi-squared test of association(for categorical data) is a test for the comparison of two attributes in a sample of data to determine if there is any relationship between them
What would be the null hypothesis in the context of using the chi-squared test?
Ho = there is no association between the classification of the two attributes under investigation