Flashcards in L6 - Estimating from a Sample: The Sample Mean & Confidence Intervals Deck (13)

Loading flashcards...

1

## Why do we need Samples?

###
More often than not we use samples to infer about the population.

Why do we need samples in the first place?

- Only available data;

- Costly and time consuming to use the population, if not impossible;

- Could be even counter-productive (destructive sampling)

2

## How prices is sample information?

###
- We need specialist sampling techniques.

- We need specialist sampling techniques to make sure the sample is

representative and accurate. Beyond the scope of this module.

- Bad sample --> poll put out by a newspaper --> biased to people that read newspapers, people who read that particular paper, their own political views

3

## What is is called when we get information from a sample to find something out about the population?

###
- When we get information from a sample to find out something about the population we use what is called an estimator.

- For example, the sample mean X(bar) is an estimator for the population

mean (μ), and the sample variance (s^2) is an estimator for the population

variance (σ^2).

- mu and variance is a parameter --> doesnt change

4

## What is the formula for the estimator?

###
An estimator is a formula and it defines a variable:

- For example the estimator of μ is:

- X(bar)= 1/n x Σ^n_i=1(x{i})

An estimate is the numerical value we get from applying that formula

to our sample of data

- For example if we get = 4.22 then 4.22 is our estimate

5

## What is the Distribution of the Sample Mean?

###
- Consider a variable X ~ (μ, σ^2) and assume that:

- We know what σ^2 is

- We don’t know what μ is

- Ex: the height of students in a country; since we cannot measure them

all to compute μ we use a sample of students and calculate X(bar)

- The value we get clearly depends on the sample we used:

- Sample a will give a value of X^a(bar)

- Sample b will give a value of X^b(bar)

- Infinite samples --> infinite possible X(bar) values we could get

- This means that;

- the sample mean (like all estimators) is a variable;

as such it follows a distribution with a mean and a variance.

6

## What would the the sample distribution of a sample most likely tend towards?

###
- So the sample mean is an estimator, i.e. a variable: its values are all the

possible values that I would get from infinite samples.

- If we did this, we would find that values close to μ are more likely, and

values far from μ less likely. I.e., more sample means would be closer to

the population mean, and fewer would be further away.

- It have been demonstrated that:

- X(bar) ~ (μ,σ^2/n)

- X(bar) is distributed with mean = μ and variance = σ^2/n where:

- σ^2 is the variance of X (the height of students in the whole population

- n the same size

7

## When would a sample distribution be normal?

###
-The distribution will be normal, i.e. X(bar) ~ (μ,σ^2/n) if:

- X~N

- n is large ( ex n > 30: from the Central Limit Theorem

- Central Limit Theorem --> when independent variables are added together, their normalised sum tends to be Normal (i.e. it approaches normality as n --> ∞)

So what does all this practically mean?

- If the distribution of X(bar) is centred around μ the on average we are going to "get it right": more like ly to get values close to μ that far from it

- We write this as E[X(bar)] = μ and say that the sample mean is unbiased estimators of the population mean

8

## What is the Property of Consistency with an Unbiased estimator of the population mean?

###
The variance of the sampling distribution σ^2/n will decrease with n; as n --> ∞

it tends to 0 and X --> μ : this is the property of consistency.

9

## What do you need to be careful about with the two types of variance?

###
Do not confuse the variance of the sample mean (σ^2/n ) with

the sample variance s^2 !!

One is the variance of the various sample means, the other the variance

within our own sample, e.g. the height of the 10 children selected.

10

## What is a Summary of the main logical points to calculate the Sample Mean?

###
- We start from a variable X ~ (μ,σ^2); (eg heights)

- We know what σ^2 is but not what μ is;

- We need to estimate μ, and we do this by using a sample.

- The value X(bar) that we get is an observation from the distribution of the

variable X(bar) which is the variable “sample mean”.

- X(bar) ~N (μ,σ^2/n) if

- X ~ N

- n > 30

11

## What is a confidence interval?

###
- Nature of the problem for X ~ N(μ,σ^2), we want to find what μ is

- Assume we know σ^2

- We collect a sample and estimate X(bar) which is expected to be close to but not identical to μ

- What is the uncertainty around our point estimate? We build a

symmetric range with a certain probability (eg 95%) around it.

- This is called a confidence interval with probability (1-α)

- Let’s do this in steps starting from Z~N(0,1)

12

## How do you calculate confidence intervals?

###
- First we choose the size of the confidence interval e.g. 95%

- This leaves two tails summing to a total area α of 5%, i.e. each worth α/2 =

2.5%. This value α is called significance level, and the C.I. has area (1-α).

- So yellow area = C.I. = (1-α) = 95%

- Green area in each tail: α/2 = 2.5%.

= We need to find the two critical points, which we know are +/-1.96 ( from the second table of critical values):

- P(z{1} < Z < z{2}) = 0.95

- Now we use P(-1.96 < Z < +1.96) = 0.95 to work out a confidence interval for the mean μ using the sample mean X(bar) :

- If X(bar) ~N (μ,σ^2/n) then P(-1.96 < {(x- μ)/sqrt(σ^2/n))] < +1.96) = 0.95

Now we simply rearrange the equation to that μ is in the middle:

P(X(bar) - 1.96sqrt(σ^2/n) < μ < X(bar) - 1.96sqrt(σ^2/n)) = 0.96

The two values for our range are then simply calculated as:

- X(bar) - 1.96sqrt(σ^2/n) = x{1}

- X(bar) + 1.96sqrt(σ^2/n) = x{2}

13