Estimation and hypothesis testing Flashcards
two main types of statistical inference
- estimation
- hypothesis testing
define point and interval estimates
point: a single value guess we hope is close to pop value
interval: range we hope contains pop value
define estimator. what symbol denotes this?
a rule for calculating the estimate of a pop parameter based on a sample.
a different sample, same estimator, gives different estimate hence the estimator behaves as a random variable with a PDF.
denoted by hat (circumflex)
define unbiased estimator
E[µ^] = µ
define precise estimator
want a small Var[µ^]
define consistent estimator
as sample size –> inf,
Var[µ^] –> 0
E[µ^] –> µ
the sample mean, x bar, is often a good point estimate of the population mean, µ. what does X bar denote then?
X bar is the sampling distribution of the sample mean
ie. the random variable of sample mean
eqn for sample mean x bar and X bar
x bar = ∑x / n (in DB)
X bar = ∑X / n
eqn for estimator of unknown population variance, σ^2 and also eqn for sample variance, s^2
𝜎^2 hat = ∑(X - Xbar)^2 / n-1
= S^2 (sampling distribution of sample variance)
s^2 = ∑(x - xbar)^2 / n-1
(sample variance, in DB)
define the standard error (also standard error of the mean). symbolic expression
s / √n
this is the estimate of the standard deviation of X bar, the sampling distribution of the sample mean x bar.
Central Limit Theorem
as n becomes large, X bar tends toward a normal distribution regardless of the distribution of X.
good when n>50 even if asymmetric
if X symmetric, n>20
what is the purpose of the Maximum Likelihood Estimator (MLE)? how is it found?
a method used to find expressions for estimators
“likelihood” defined as
L = P(X1=x1) P(X2=x2) P(X3=x3)…
ie. product of PDFs
MLE: suitable estumates are those that MAXIMISE this probability.
common method for maximising L in MLE
maximising lnL is the same as maximising L, may simplify the expression
take partial derivatives of lnL wrt µ and σ = 0
general method for hypothesis testing
Null Hypothesis (H0) and Alternative Hypothesis
Test statistic: calculated from sample data.
Compare to a sampling distribution assuming H0 is true –> find p-value.
p-value compared to critical value given by significance level 𝛼
reject (p<𝛼) or fail to reject (p>𝛼) H0.
what does the p value represent in hypothesis testing?
the probability of observing a result as extreme as or more extreme than the test statistic, IF the null hypothesis is true.
hence, if p>𝛼, there is evidence that the data is consistent w H0, not enough reason to reject it.
if p<𝛼, data is unlikely to obey H0, reject.
how to calculate p-value for a hypothesis test on a LARGE sample?
X bar distribution assumed normal (central limit theorem)
use normalised Z0 (given in DB)
z0 = x bar - µ0 / s/√n
OR 𝜎/√n
where µ0 is the population mean if the null hypothesis is correct.
p-value found by one-tailed or two-tailed test by calcualating appropriate area based on z0.
hypothesis test small sample, special case normally distributed X, KNOWN population variance
z0 test statistic
same as for large sample
hypothesis test small sample, special case normally distributed X, UNKNOWN population variance
t0 test statistic
must estimate σ using S
S has a PDF but NOT normal
NOTE: t-distribution has n-1 degrees of freedom!
what values does a confidence interval for the population mean contain?
all the values that would not be rejected using a hypothesis test.
confidence interval for LARGE sample (from hypothesis testing)
x bar ± z(crit) * s(x bar)
z(critical) from two tailed test
confidence interval for SMALL sample (from hypothesis testing), normal distribution
x bar ± t(crit) * s(x bar)
confidence interval for SMALL sample (from hypothesis testing), NOT normal distribution
obtain by bootstrapping
what is the chi squared distribution used for in hypothesis testing?
often used for hypothesis tests on variance (for normal distribution)
and for chi-squared goodness of fit test
how many degrees of freedom does the chi squared distribution have?
n-1
so the v in the chi squared tables is n-1