L6 - Estimating from a Sample: The Sample Mean & Confidence Intervals Flashcards Preview

18ECA005 - Data Analysis II > L6 - Estimating from a Sample: The Sample Mean & Confidence Intervals > Flashcards

Flashcards in L6 - Estimating from a Sample: The Sample Mean & Confidence Intervals Deck (13)
Loading flashcards...
1

Why do we need Samples?

More often than not we use samples to infer about the population.

Why do we need samples in the first place?
- Only available data;
- Costly and time consuming to use the population, if not impossible;
- Could be even counter-productive (destructive sampling)

2

How prices is sample information?

- We need specialist sampling techniques.
- We need specialist sampling techniques to make sure the sample is
representative and accurate. Beyond the scope of this module.

- Bad sample --> poll put out by a newspaper --> biased to people that read newspapers, people who read that particular paper, their own political views

3

What is is called when we get information from a sample to find something out about the population?

- When we get information from a sample to find out something about the population we use what is called an estimator.
- For example, the sample mean X(bar) is an estimator for the population
mean (μ), and the sample variance (s^2) is an estimator for the population
variance (σ^2).
- mu and variance is a parameter --> doesnt change

4

What is the formula for the estimator?

An estimator is a formula and it defines a variable:
- For example the estimator of μ is:
- X(bar)= 1/n x Σ^n_i=1(x{i})

An estimate is the numerical value we get from applying that formula
to our sample of data
- For example if we get = 4.22 then 4.22 is our estimate

5

What is the Distribution of the Sample Mean?

- Consider a variable X ~ (μ, σ^2) and assume that:
- We know what σ^2 is
- We don’t know what μ is

- Ex: the height of students in a country; since we cannot measure them
all to compute μ we use a sample of students and calculate X(bar)
- The value we get clearly depends on the sample we used:
- Sample a will give a value of X^a(bar)
- Sample b will give a value of X^b(bar)
- Infinite samples --> infinite possible X(bar) values we could get

- This means that;
- the sample mean (like all estimators) is a variable;
as such it follows a distribution with a mean and a variance.

6

What would the the sample distribution of a sample most likely tend towards?

- So the sample mean is an estimator, i.e. a variable: its values are all the
possible values that I would get from infinite samples.
- If we did this, we would find that values close to μ are more likely, and
values far from μ less likely. I.e., more sample means would be closer to
the population mean, and fewer would be further away.

- It have been demonstrated that:
- X(bar) ~ (μ,σ^2/n)

- X(bar) is distributed with mean = μ and variance = σ^2/n where:
- σ^2 is the variance of X (the height of students in the whole population
- n the same size

7

When would a sample distribution be normal?

-The distribution will be normal, i.e. X(bar) ~ (μ,σ^2/n) if:
- X~N
- n is large ( ex n > 30: from the Central Limit Theorem
- Central Limit Theorem --> when independent variables are added together, their normalised sum tends to be Normal (i.e. it approaches normality as n --> ∞)

So what does all this practically mean?
- If the distribution of X(bar) is centred around μ the on average we are going to "get it right": more like ly to get values close to μ that far from it
- We write this as E[X(bar)] = μ and say that the sample mean is unbiased estimators of the population mean

8

What is the Property of Consistency with an Unbiased estimator of the population mean?

The variance of the sampling distribution σ^2/n will decrease with n; as n --> ∞
it tends to 0 and X --> μ : this is the property of consistency.

9

What do you need to be careful about with the two types of variance?

Do not confuse the variance of the sample mean (σ^2/n ) with
the sample variance s^2 !!

One is the variance of the various sample means, the other the variance
within our own sample, e.g. the height of the 10 children selected.

10

What is a Summary of the main logical points to calculate the Sample Mean?

- We start from a variable X ~ (μ,σ^2); (eg heights)
- We know what σ^2 is but not what μ is;
- We need to estimate μ, and we do this by using a sample.
- The value X(bar) that we get is an observation from the distribution of the
variable X(bar) which is the variable “sample mean”.
- X(bar) ~N (μ,σ^2/n) if
- X ~ N
- n > 30

11

What is a confidence interval?

- Nature of the problem for X ~ N(μ,σ^2), we want to find what μ is
- Assume we know σ^2
- We collect a sample and estimate X(bar) which is expected to be close to but not identical to μ
- What is the uncertainty around our point estimate? We build a
symmetric range with a certain probability (eg 95%) around it.
- This is called a confidence interval with probability (1-α)
- Let’s do this in steps starting from Z~N(0,1)

12

How do you calculate confidence intervals?

- First we choose the size of the confidence interval e.g. 95%
- This leaves two tails summing to a total area α of 5%, i.e. each worth α/2 =
2.5%. This value α is called significance level, and the C.I. has area (1-α).
- So yellow area = C.I. = (1-α) = 95%
- Green area in each tail: α/2 = 2.5%.
= We need to find the two critical points, which we know are +/-1.96 ( from the second table of critical values):
- P(z{1} < Z < z{2}) = 0.95

- Now we use P(-1.96 < Z < +1.96) = 0.95 to work out a confidence interval for the mean μ using the sample mean X(bar) :
- If X(bar) ~N (μ,σ^2/n) then P(-1.96 < {(x- μ)/sqrt(σ^2/n))] < +1.96) = 0.95

Now we simply rearrange the equation to that μ is in the middle:
P(X(bar) - 1.96sqrt(σ^2/n) < μ < X(bar) - 1.96sqrt(σ^2/n)) = 0.96

The two values for our range are then simply calculated as:
- X(bar) - 1.96sqrt(σ^2/n) = x{1}
- X(bar) + 1.96sqrt(σ^2/n) = x{2}

13

What is a Caveat for the Confidence interval interpretation?

- The interval looks like saying “there is a 95% probability that μ lies
between 126,300 and 233,700” but this is technically incorrect since μ
is a fixed value (a parameter, not a variable).
- Technically the meaning is that if we were to calculate the C.I. an infinite
number of times, using an infinite number if samples, 95% of these
times we would get the value of μ to be exactly within C.I. limits.