Flashcards in L6 - Estimating from a Sample: The Sample Mean & Confidence Intervals Deck (13)
Loading flashcards...
1
Why do we need Samples?
More often than not we use samples to infer about the population.
Why do we need samples in the first place?
- Only available data;
- Costly and time consuming to use the population, if not impossible;
- Could be even counter-productive (destructive sampling)
2
How prices is sample information?
- We need specialist sampling techniques.
- We need specialist sampling techniques to make sure the sample is
representative and accurate. Beyond the scope of this module.
- Bad sample --> poll put out by a newspaper --> biased to people that read newspapers, people who read that particular paper, their own political views
3
What is is called when we get information from a sample to find something out about the population?
- When we get information from a sample to find out something about the population we use what is called an estimator.
- For example, the sample mean X(bar) is an estimator for the population
mean (μ), and the sample variance (s^2) is an estimator for the population
variance (σ^2).
- mu and variance is a parameter --> doesnt change
4
What is the formula for the estimator?
An estimator is a formula and it defines a variable:
- For example the estimator of μ is:
- X(bar)= 1/n x Σ^n_i=1(x{i})
An estimate is the numerical value we get from applying that formula
to our sample of data
- For example if we get = 4.22 then 4.22 is our estimate
5
What is the Distribution of the Sample Mean?
- Consider a variable X ~ (μ, σ^2) and assume that:
- We know what σ^2 is
- We don’t know what μ is
- Ex: the height of students in a country; since we cannot measure them
all to compute μ we use a sample of students and calculate X(bar)
- The value we get clearly depends on the sample we used:
- Sample a will give a value of X^a(bar)
- Sample b will give a value of X^b(bar)
- Infinite samples --> infinite possible X(bar) values we could get
- This means that;
- the sample mean (like all estimators) is a variable;
as such it follows a distribution with a mean and a variance.
6
What would the the sample distribution of a sample most likely tend towards?
- So the sample mean is an estimator, i.e. a variable: its values are all the
possible values that I would get from infinite samples.
- If we did this, we would find that values close to μ are more likely, and
values far from μ less likely. I.e., more sample means would be closer to
the population mean, and fewer would be further away.
- It have been demonstrated that:
- X(bar) ~ (μ,σ^2/n)
- X(bar) is distributed with mean = μ and variance = σ^2/n where:
- σ^2 is the variance of X (the height of students in the whole population
- n the same size
7
When would a sample distribution be normal?
-The distribution will be normal, i.e. X(bar) ~ (μ,σ^2/n) if:
- X~N
- n is large ( ex n > 30: from the Central Limit Theorem
- Central Limit Theorem --> when independent variables are added together, their normalised sum tends to be Normal (i.e. it approaches normality as n --> ∞)
So what does all this practically mean?
- If the distribution of X(bar) is centred around μ the on average we are going to "get it right": more like ly to get values close to μ that far from it
- We write this as E[X(bar)] = μ and say that the sample mean is unbiased estimators of the population mean
8
What is the Property of Consistency with an Unbiased estimator of the population mean?
The variance of the sampling distribution σ^2/n will decrease with n; as n --> ∞
it tends to 0 and X --> μ : this is the property of consistency.
9
What do you need to be careful about with the two types of variance?
Do not confuse the variance of the sample mean (σ^2/n ) with
the sample variance s^2 !!
One is the variance of the various sample means, the other the variance
within our own sample, e.g. the height of the 10 children selected.
10
What is a Summary of the main logical points to calculate the Sample Mean?
- We start from a variable X ~ (μ,σ^2); (eg heights)
- We know what σ^2 is but not what μ is;
- We need to estimate μ, and we do this by using a sample.
- The value X(bar) that we get is an observation from the distribution of the
variable X(bar) which is the variable “sample mean”.
- X(bar) ~N (μ,σ^2/n) if
- X ~ N
- n > 30
11
What is a confidence interval?
- Nature of the problem for X ~ N(μ,σ^2), we want to find what μ is
- Assume we know σ^2
- We collect a sample and estimate X(bar) which is expected to be close to but not identical to μ
- What is the uncertainty around our point estimate? We build a
symmetric range with a certain probability (eg 95%) around it.
- This is called a confidence interval with probability (1-α)
- Let’s do this in steps starting from Z~N(0,1)
12
How do you calculate confidence intervals?
- First we choose the size of the confidence interval e.g. 95%
- This leaves two tails summing to a total area α of 5%, i.e. each worth α/2 =
2.5%. This value α is called significance level, and the C.I. has area (1-α).
- So yellow area = C.I. = (1-α) = 95%
- Green area in each tail: α/2 = 2.5%.
= We need to find the two critical points, which we know are +/-1.96 ( from the second table of critical values):
- P(z{1} < Z < z{2}) = 0.95
- Now we use P(-1.96 < Z < +1.96) = 0.95 to work out a confidence interval for the mean μ using the sample mean X(bar) :
- If X(bar) ~N (μ,σ^2/n) then P(-1.96 < {(x- μ)/sqrt(σ^2/n))] < +1.96) = 0.95
Now we simply rearrange the equation to that μ is in the middle:
P(X(bar) - 1.96sqrt(σ^2/n) < μ < X(bar) - 1.96sqrt(σ^2/n)) = 0.96
The two values for our range are then simply calculated as:
- X(bar) - 1.96sqrt(σ^2/n) = x{1}
- X(bar) + 1.96sqrt(σ^2/n) = x{2}
13