Central Limit Theorem and Hypothesis Testing Flashcards
(24 cards)
t-distribution
a probability distribution that is used when estimating population parameters when the sample size is small, and the population standard deviation is unknown
the t-distribution resembles a normal distribution but has heavier tails. What does this mean?
it’s more prone to producing values far from its mean (allows for more variability and accounts for the uncertainty due to smaller sample sizes)
the shape of the t-distribution depends on the _____, which is dependent on ____
degrees of freedom, sample size
when do you use a t-distribution over a z-distribution?
- when sample size is less than 30
- for hypothesis testing to compare means and compute confidence intervals
what are the three types of t-tests and what do they do?
- one-sample: determines if the sample mean is significantly different from a known value
- two-sample: compares the means of two independent groups
- paired: compares the means of two related groups
how do you calculate a t-score (formula)
t = (Xbar - μ)/(s/√n)
Xbar is sample mean
μ is population mean
s is the sample SD
n is sample size
what is the chi-squared distribution?
when you sum the squares of
𝑘 independent standard normal random variables (variables with a mean of 0 and a standard deviation of 1
what is the chi-square statistic?
a value used in chi-squared tests to measure how much observed data deviate from what we would expect under a particular hypothesis
formula for chi-squared statistic
χ^2 =∑ ((Oi - Ei)^2)/Ei)
Oi = observed freq for category i
Ei = expected freq for category i
the sum of ∑ is taken over all categories
what are common applications of chi-squared distribution?
- goodness of fit test
- test of independence
- variance estimation
- ANOVA and regression
chi-squared goodness of fit test
determines whether an observed frequency distribution matches an expected frequency distribution
what is the F distribution?
used to compare two variances by assessing the ratio of these variances
used often in ANOVA
confidence interval
provides a range of values, derived from the sample data, that is likely to contain the true population parameter
confidence level
indicates the degree of certainty that the interval contains the parameter
point estimation
involves calculating a single value (a point estimate) from sample data to estimate an unknown population parameter
margin of error
usually calculated as the the critical value (from the z or t distribution) times the standard error of the point estimate
Central Limit Theorem
states that the mean of a sufficiently large number of iterates of random values will be approximately normally distributed
irrespective of the shape of the underlying distribution of the population, by increasing the sample size, sample means and proportions will ….
approximate normal distributions if the sample sizes are sufficiently large
the law of large numbers
as the number of repetitions of a probability experiment increases, the proportion with which a certain outcome is observed gets closer to the probability of the outcome
standard error and formula
measures the variability or dispersion of the sample statistic from the population parameter
sample deviation of the sample means
SE = (s/√n)
type I error
rejecting null hypothesis when it’s true
type II error
not rejecting the null hypothesis when it is false
how do you reduce probability of making a type I error?
lowering α
reducing probability of type I error usually