Estimation and hypothesis testing Flashcards

1
Q

two main types of statistical inference

A
  1. estimation
  2. hypothesis testing
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

define point and interval estimates

A

point: a single value guess we hope is close to pop value
interval: range we hope contains pop value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

define estimator. what symbol denotes this?

A

a rule for calculating the estimate of a pop parameter based on a sample.

a different sample, same estimator, gives different estimate hence the estimator behaves as a random variable with a PDF.

denoted by hat (circumflex)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

define unbiased estimator

A

E[µ^] = µ

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

define precise estimator

A

want a small Var[µ^]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

define consistent estimator

A

as sample size –> inf,
Var[µ^] –> 0
E[µ^] –> µ

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

the sample mean, x bar, is often a good point estimate of the population mean, µ. what does X bar denote then?

A

X bar is the sampling distribution of the sample mean

ie. the random variable of sample mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

eqn for sample mean x bar and X bar

A

x bar = ∑x / n (in DB)
X bar = ∑X / n

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

eqn for estimator of unknown population variance, σ^2 and also eqn for sample variance, s^2

A

𝜎^2 hat = ∑(X - Xbar)^2 / n-1
= S^2 (sampling distribution of sample variance)

s^2 = ∑(x - xbar)^2 / n-1
(sample variance, in DB)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

define the standard error (also standard error of the mean). symbolic expression

A

s / √n
this is the estimate of the standard deviation of X bar, the sampling distribution of the sample mean x bar.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Central Limit Theorem

A

as n becomes large, X bar tends toward a normal distribution regardless of the distribution of X.

good when n>50 even if asymmetric
if X symmetric, n>20

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what is the purpose of the Maximum Likelihood Estimator (MLE)? how is it found?

A

a method used to find expressions for estimators

“likelihood” defined as
L = P(X1=x1) P(X2=x2) P(X3=x3)…
ie. product of PDFs

MLE: suitable estumates are those that MAXIMISE this probability.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

common method for maximising L in MLE

A

maximising lnL is the same as maximising L, may simplify the expression
take partial derivatives of lnL wrt µ and σ = 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

general method for hypothesis testing

A

Null Hypothesis (H0) and Alternative Hypothesis
Test statistic: calculated from sample data.
Compare to a sampling distribution assuming H0 is true –> find p-value.
p-value compared to critical value given by significance level 𝛼
reject (p<𝛼) or fail to reject (p>𝛼) H0.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what does the p value represent in hypothesis testing?

A

the probability of observing a result as extreme as or more extreme than the test statistic, IF the null hypothesis is true.

hence, if p>𝛼, there is evidence that the data is consistent w H0, not enough reason to reject it.
if p<𝛼, data is unlikely to obey H0, reject.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

how to calculate p-value for a hypothesis test on a LARGE sample?

A

X bar distribution assumed normal (central limit theorem)
use normalised Z0 (given in DB)
z0 = x bar - µ0 / s/√n
OR 𝜎/√n
where µ0 is the population mean if the null hypothesis is correct.

p-value found by one-tailed or two-tailed test by calcualating appropriate area based on z0.

17
Q

hypothesis test small sample, special case normally distributed X, KNOWN population variance

A

z0 test statistic
same as for large sample

18
Q

hypothesis test small sample, special case normally distributed X, UNKNOWN population variance

A

t0 test statistic
must estimate σ using S
S has a PDF but NOT normal

NOTE: t-distribution has n-1 degrees of freedom!

19
Q

what values does a confidence interval for the population mean contain?

A

all the values that would not be rejected using a hypothesis test.

20
Q

confidence interval for LARGE sample (from hypothesis testing)

A

x bar ± z(crit) * s(x bar)
z(critical) from two tailed test

21
Q

confidence interval for SMALL sample (from hypothesis testing), normal distribution

A

x bar ± t(crit) * s(x bar)

22
Q

confidence interval for SMALL sample (from hypothesis testing), NOT normal distribution

A

obtain by bootstrapping

23
Q

what is the chi squared distribution used for in hypothesis testing?

A

often used for hypothesis tests on variance (for normal distribution)
and for chi-squared goodness of fit test

24
Q

how many degrees of freedom does the chi squared distribution have?

A

n-1
so the v in the chi squared tables is n-1

25
what is the chi squared goodness of fit test used for?
test if a random variable fits a particular probability distribution
26
interpret the test statistic for the chi squared goodness of fit test: X0^2 = ∑ (Oi-Ei)^2/Ei summed between i=1 and k
Oi: observed outcome for event i Ei: expected outcome for event i k = number of classes note: Ei must be greater than 5 (can combine rare events)
27
for hypothesis testing of the MEANS of two samples, this course will only consider the following cases:
BOTH NORMAL DISTRIBUTIONS - known variances for both samples - unknown but equal variances for the two samples
28
what is the test used for hypothesis testing of the means of two samples of unknown but equal variance?
pooled t-test recall t-distribution is used when the sample(s) have unknown variance. requires calculation of the pooled estimator of variance Sp^2 then uses that in test statistic T0
29
what is the test statistic used for hypothesis testing of the means of two samples with KNOWN variances?
Z0 test statistic, same as T0 in data book EXCEPT instead of Sp^2 (uses pooled estimator for variances) use σ1^2 and σ2^2 compare to Z0 distribution.
30
what does the F distribution describe?
the F-distribution is the PDF for the ratio between two random variables with REDUCED chi squared distributions (X^2/v)
31
how to use F-distribution to compare variances of two samples
H0: σ1 = σ2 known S1 and S2 plug in to F0 test statistic, evaluate against F-distribution for 0.05 significant level.
32
what is ANOVA used for?
when we want to analyse MORE THAN TWO SAMPLES to see if one particular treatment (one factor is being varied at a time) is having an effect by COMPARING THE MEANS Null: µ1 = µ2 = µ3 = ... if H0 is rejected, then in SOME sample the treatment has significant effect (does not tell you which one is different)
33
what are multiple hypothesis testing methods used for?
testing multiple hypotheses at one time in search for a significant one. each test provides a p-value, hence CORRECTIONS to significance thresholds are required.
34
what is the family-wise error rate (FWER)?
if the probability of making a false positive error is x, the probability of making such an error in a group (family) of measurements of size m is = 1 - (1-x)^m ie. the complement of NOT making the error in a family of m
35
what is the Bonferroni correction?
a method of controlling the family wise error rate in multiple hypothesis testing. divides significance threshold by number of tests, m, to get a new significant threshold. 𝛼' = 𝛼 / m NOTE: VERY CONSERVATIVE! risk of producing false negatives
36
what is the Benjamini-Hochberg correction? define the false discovery rate.
a correction to the family wise error rate that tolerates some proportion of false positives, the false discovery rate (FDR) FDR = # false positives / # total positives eg. if FDR = 0.05, you are allowing up to 5% false discoveries amongst those that are declared significant.
37
You have m hypothesis tests, each with a p-value. Method for Benjamini-Hochberg correction
RANK the p-values, with rank k Desired false discovery rate (FDR) of 𝛼 (usually 𝛼=0.05) calculate k * 𝛼 / m for each test. Check whether p-values are LESS THAN OR EQUAL to this. Find the LARGEST k for which this Is true - this, and all the tests ranked above it, are significant.
38
What is the q-value for the Benjamini-Hochberg correction?
the threshold value of 𝛼 required for test k to be significant by itself recall p <= k * 𝛼 / m rearrange for 𝛼: 𝛼 = q = p* m / k