Interval Estimation Flashcards

1
Q

What are point estimates?

A

Are single numbers obtained by estimation of population parameters from sample statistics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a confidence interval?

A

Given that we know that there is some variability or uncertainty around the point estimate we want to know an indication of how close our estimate is likely to be to the true value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the confidence interval formula for the mean μ when the population is normally/approximately normally distributed (when the population variance is known - unlikely case)? write it down

A

This comes from the formula:

z= (X ¯ - μ)/ σ/ √n

P(X ¯ − (Zα σ) / √n ≤ μ ≤ X ¯ +( Zα σ ) √n) = p

x CI (X ¯ − (Zα σ) / √n , X ¯ +( Zα σ ) √n)

Where p is known as the confidence value e.g. 95%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does a 95% confidence interval for μ mean?

A

That there is a 95% probability that these confidence intervals will contain μ.

Not that the probability that μ lies in the confidence interval is 95%.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what is a critical value for a confidence interval?

A

he value of the test statistic which defines the upper and lower bounds of a confidence interval.

(when population variance is known - unlikely case)

e.g. normal distribution, 95% CI for μ, find α, α= 0.025, for lower bound Zα corresponds to - 1.96 for upper bound 1- α = 0.975 and upper bound corresponds to 1.956, these are the critical values for the 95% CI of μ, due to the symmetry of the normal distribution are -1.96 and 1.96.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does the width of the interval depend on?

A

(when population variance is known - unlikely case)

look at the formula:
X ¯ − (Zα σ) / √n ≤ μ ≤ X ¯ +( Zα σ ) √n

It depends on the population standard deviation σ, the confidence value p, and the sample size n.

  • If σ or p increases, the interval gets wider
  • If n increases it gets narrower
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

When the population is normally distributed/approximately normally distributed and the population variance σ^2 and the population mean μ are unknown what formula is used instead of the Z one?

A

The formula for the random variable T has a student’s t-distribution with n-1 DF (degree of freedom)

T = (X ¯ - μ)/ s/ √n

Where the only thing that changed is the population standard deviation σ is substituted with the population one s.

This makes sense since X ¯ is normally distributed and s is X^2(n-1) (Chi-squared ) distributed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the characteristics of a student’s t-distribution?

A
  • It’s bell-shaped
  • is symmetric about 0 and has 0 means
  • has a larger variance (it’s more spread) than a normal distribution
  • as n increases the variance decreases and the distribution goes toward a normal distribution, IF n>40 NORMAL DISTRIBUTION IS AN ACCEPTABLE APPROXIMATION.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the confidence interval formula for the mean μ when the population is normally/approximately normally distributed (when the population variance is unknown - likely case)? write it down

A

c= tα/2, n-1

n-1=

α /2= (1-p ) /2

P((X ¯ − (c s) / √n ≤ μ ≤ X ¯ +( c s) √n) = p

(X ¯ − (c s) / √n, X ¯ +( c s) √n)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

When can the t-distribution be used if the population is NOT normally/approximately normally distributed? ( 2 cases)

A
  • When n>=30 (large sample case, Central LimitTheorem rule)
  • When n<30 but the population has a bell-shaped distribution like a Binomial distribution with the probability of success very close to 0.5.

E.g.

X ¯ =8.17, s^2 = 1.42 , s = 1.191, Population
is roughly bell-shaped and n=56 → t-distribution is good approximation

CI 95% of μ?

n-1 = 55
α /2= (1-0.95 ) /2 =0,05/2 = 0.025

c= t0.025,55, 55 is not on the t-distribution table so either find the value in between numbers that are there or ince n>40 normal distribution is an acceptable approximation → use z table.

(X ¯ − (c s) / √n ≤ μ ≤ X ¯ +( c s) √n)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the formula for w, the width of the interval? write it down
If s is approximately known in advance what can it be used for?

A

w = 2 (tα /2, n-1) (s/√n)
n = (4 (tα /2, n-1)^2 s^2) / w^2

This formula can be used to work out how large the sample is to get an estimate of the mean within a certain width of interval

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How do you compare two samples mean?

A

Two samples can be compared by establishing a confidence interval for the difference of their mean.

  • X1..Xm is a random i.i.d. sample from a population mean μ1 and σ^2 1
  • Y1..Yn is a random i.i.d. sample from a population mean μ2 and σ^2 2
  • X and Y are independent variables

For μ1- μ2 we can use the unbiased estimator X ¯ - Y¯ which has a variance σ^2. Var (X ¯ - Y¯ )= σ^2 1/m + σ^2 2 /n (the variance is like this because of the independence Var( X ¯ - Y¯ = Var (X ¯) - Var (Y¯ )

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How do you compare two samples’ mean when the population variance is unknown (likely case)? 2 cases

A

1) The sample size is m>30 and n>30 (large samples)

From this formula:

z= (X ¯ - μ)/ σ/ √n

z =(X ¯ - Y¯) - (μ1- μ2) / √(s^21 /m + s^22 /n)

IN THIS CASE (of comparison of two samples) thanks to the Central Limit Theorem the statistics have a normal distribution (even if it is not >40)

The CI would be:

(X ¯ - Y¯ - zp/2 √(s^21 /m + s^22 /n), X ¯ - Y¯ + zp/2 √(s^21 /m + s^22 /n)

2) Both population distributions are normally/approximately normally distributed and σ^2 1 = σ^2 2= σ^2, or “reasonably close”. (might check histogram data)

Given the variance: Var (X ¯ - Y¯ )= σ^2/m + σ^2/n = σ^2 (1/m +1/n)

An unbiased estimator for σ^2 of the X ¯ - Y¯ distribution is the pooled estimator sp^2

sp^2= ((m-1)s^21 + (n-1)s^2)/(m+n-2)

Then from:

T = (X ¯ - μ)/ s/ √n

t =(X ¯ - Y¯) - (μ1- μ2) / sp√ (1/m + 1/n)

With m+n-2 DF

c= t df,α

((X ¯ - Y¯) - c /sp√ (1/m + 1/n), (X ¯ - Y¯) + c/sp√ (1/m + 1/n)) (not ure about this)

Note: for small samples when σ^2 1 and σ^2 are very different there is no easy procedure to find the difference in sample mean even if the populations are normally distributed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How do you compare two samples mean when the population variance is known (unlikely case)?

A

From this formula:

z= (X ¯ - μ)/ σ/ √n

z =(X ¯ - Y¯) - (μ1- μ2) / √(σ^21 /m + σ^22 /n)

Which then follows a standard normal distribution N(0,1)

TheCI would be:

(X ¯ - Y¯ - zp/2 √(σ^21 /m + σ^22 /n), X ¯ - Y¯ + zp/2 √(σ^21 /m + σ^22 /n)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What does it mean when 0 is included in a confidence interval for a parameter (such as the difference between two means)?

A

It suggests that there is no statistically significant effect or difference at the given confidence level.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are paired observations or paired data?

A

Involve collecting two sets of related measurements from the same subjects or matched subjects. You do so so that the effect of external sources of variation is smaller (smaller sampling error) and so it’s easier to detect differences between the population means)

e.g. to see improvement before and after something on the same individual.

The sample of pairs is modeled with (X1,Y1)…(Xn,Yn) that are assumed to be indipendent. μd is the mean of the differences μd = μ1- μ2

The new data set is formed by n pairs like (x1-y1)…(xn-yn). This dataset has mean d¯ and sample variance sd^2.

T = (X ¯ - μ)/ s/ √n

t p= d¯ - (μ1- μ2) / sd√ n

(d¯ - c sd/√ n, d + c sd√ n)

Steps:

  • calculate differences
  • calculate the mean of differences d¯
  • calculate variance sd^2: 1/n-1 (d1^2 +…dn^2) and standard deviation sd: √sd^2
  • c= t df,α df= n-1
  • Find the CI with (d¯ - c sd/√ n, d + c sd√ n)
  • See if a 0 is included or interpret the results
17
Q

What are the differences, and pros and cons between two-sample tests and paired paired observations?

A
  • Two sample tests assume that X1…Xn and Y1…Yn are independent while paired test just assumes that the pairs (x1,Y1)…(Xn,Yn) are independent which is a less risky assumption.
  • Pros/cons:
    If population variation σ^2 is large and there is a strong correlation between pairs, the paired experiment is preferred.
    If population variation σ^2 is small and the correlation is weak then an independent sample experiment is preferable.
18
Q

What is the formula to calculate the confidence interval for the population variance σ^2 ?

A

Assumptions: The population is normally distributed (Central Limit Theorem does not apply the sample variance)

For a normal distribution (n-1) s^2/ σ^2 has a Chi-Squared distribution X^2(n-1),

Steps:
- Chi-squared distribution X^2(n-1) is not symmetrical. Find upper and lower values by doing 1-α.
- s = √ (Σx^2 – (Σx)^2/n)/n-1
- (s√ n-1/(X^2α /2,n-1), s√ n-1/(X^21-α/2,n-1)

19
Q
A