WEEK 7 Flashcards
(15 cards)
Difference between descriptive statistics and statistical inference
Descriptive statistics - THINK what does my data say?
- describes and summarises the main features of a dataset
- focuses on the data you have, no assumptions about the bigger population
- uses the measures of central tendencies (mean, median and mode), variation (standard deviation, range)
- examples include bar charts, histograms and pie charts
Statistical inferences - THINK what can I deduce about the bigger picture?
- focuses on drawing conclusions or make predictions on an entire population
- examples include hypothesis testing (p-values etc), confidence intervals, regression analysis
Point estimation
Using a single value (a “point”) as an estimate for an unknown population parameter.
Example:
You collect a sample of students and find their average test score is 75.
You use 75 as a point estimate for the population mean.
Interval estimation (confidence interval)
Using a range of values (an interval) that is likely to contain the population parameter.
Example:
“I am 95% confident that the true average score is between 72 and 78.”
Hypothesis testing
A formal method to test a claim or assumption about a population parameter.
- Set up a null hypothesis (H₀) and an alternative hypothesis (H₁)
- Use sample data to calculate a test statistic
- Decide to reject or not reject H₀, often using a p-value
3 main types of statistical inference
point estimation
interval estimation (confidence interval)
hypothesis testing
What is the sampling distribution
a probability distribution of a statistic that is obtained through repeated sampling of a specific population
How do you find the mean and the standard error of the sampling distribution of x̄
μx̄ = μ
this means the mean of the sample means is equal to the population mean.
σx̄ = σ/√n
where:
σ = population standard deviation
n = sample size
What is the central limit theorem
the sampling distribution of the mean will always be normally distributed, as long as the sample size is large enough
What are confidence intervals
a range of values that is likely to contain the true population parameter with a certain level of confidence
How to write confidence interval
typically written as (lower bound, upper bound), with a specified confidence level (e.g., 95%)
Assumptions for confidence intervals?
- the data should be a random sample from the population
-. the measured quantity should be normally distributed
What is the margin of error
a statistic expressing the amount of random sampling error in the results of a survey
MOE = Z * (σ / √n)
What is significance level
Threshold probability for rejecting a true null hypothesis
Often represented by ɑ (alpha)
Common values are 0.05 (5%) and 0.01 (1%)
e.g. 0.05 means there’s a 5% chance of rejecting H0 incorrectly
How to choose between using a z-test or a t-test
Use a z-test if the population standard deviation is known and the sample size is large (n ≥ 30).
Use a t-test if the population standard deviation is unknown and/or sample size is small (n < 30).
What is a p-value
a number describing the likelihood of obtaining the observed data under the null hypothesis of a statistical test