Week 6 - Central limit theorem Flashcards

1
Q

The Central Limit Theorem is a very powerful statement in statistics, saying that as you take more and more
samples from a random variable, the distribution of the means of the samples (If you completed the lesson titled
“The Mean of Means”, you will recognize this as “the sampling distribution of the sample means”) will approximate
a normal distribution. This is true regardless of the original distribution of the random variable (if the number of data
points in each sample is 30 or more)! In fact, as demonstrated in the video above, even a discrete random variable
with a pretty odd distribution will output an approximately normal distribution from the means of enough samples.

A

Central limit theorem

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Formally, the CLT says:

A

If samples of size n are drawn at random from any population with a finite mean and standard deviation, then
the sampling distribution of the sample means, x, approximates a normal distribution as n increases.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

If you collect many samples from an ordinary random variable, and calculate the mean of each sample, then
the means will be distributed in an approximate bell-curve, and the “mean of means” will be the same as the
mean of the population. The larger the size of the samples you collect, the more closely the distribution of
their means will approximate a normal distribution.
Notes to remember:

A
  • As long as your sample size is 30 or greater, you may assume the distribution of the sample means to be
    approximately normal, meaning that you can calculate the probability that the mean of a single sample of size
    30 or greater will occur by using the z-score of the mean.
  • The mean of the distribution created from many sample means approaches the mean of the population.
    Formally: µx = µ
  • The standard deviation of the distribution of the means is estimated by dividing the standard deviation of the
    population by the square root of the sample size. Formally: sx = ps
    n
  • Use the notation x(x-bar) rather than the random variable x to indicate that the random variable you are
    describing is a sample mean.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

As long as your sample size is 30 or greater, you may assume the distribution of the sample means to be
approximately normal, meaning that you can calculate the probability that the mean of a single sample of size
30 or greater will occur by using the z-score of the mean.

A

r

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

The mean of the distribution created from many sample means approaches the mean of the population.
Formally: µx = µ

A

r

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

The standard deviation of the distribution of the means is estimated by dividing the standard deviation of the
population by the square root of the sample size. Formally: sx = ps
n

A

r

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Use the notation x(x-bar) rather than the random variable x to indicate that the random variable you are
describing is a sample mean

A

r

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

z table

A

analyze

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Mack asked 42 fellow high-school students how much they spent for lunch, on average. According to his research
online, the amount spent for lunch by high school students nation wide has µ = $15, with s = $9. What is the
probability that Mack’s random sample will result within $0.01 of the national average?

A

There are a few important facts to note here:
* Mack’s sample is 42 students, since 42 30, he can safely assume that the distribution of his sample is
approximately normal, according to the Central Limit Theorem.
* The range we are considering is $14.99 to $15.01, since that represents $0.01 above and below the mean.
* The mean of the sample should approximate the mean of the population, in other words µx = µ
* The standard deviation of Mack’s sample, sx, can be calculated as sx = ps
n
, where n = 42

Let’s start by finding the standard deviation of the sample, sx:
sx = 9
p
42
= 9
6.48
sx = 1.389

Since Mack’s sample of 42 samples can be assumed to be normally distributed, and since we now know the standard
deviation of the sample, 1.39, we can calculate the z-scores of the range using Z = xµx
sx :
Z1 = 15.0115.00
1.389 = +0.01
Z2 = 14.9915.00
1.389 = 0.01

Finally, we look up Z1 and Z2 on the Z-score probability table to get a range of 50.4% to 49.6% = 0.80%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

The time it takes a student to complete the mid-term for Algebra II is a bi-modal distribution with µ = 1 hr and
s = 1 hr. During the month of June, Professor Spence administers the test 64 times. What is the probability that the
average mid-term completion time for students during the month of June exceeds 48 minutes?

A

Important facts:
* There are more than 30 samples, so the Central Limit Theorem applies
* The mean of the sample should approximate the mean of the population, in other words µx = µ
* The standard deviation of Professor Spence’s sample, sx, can be calculated as sx = ps
n
, where n = 64 (the
number of tests/samples)
* 48 minutes is the same as 48
60 = 0.8 hrs, so the range we are interested in is x > 0.8 hrs

First calculate the standard deviation of the sample, using sx = ps
n
:
sx = 1
p
64
sx = 0.125

Since the sample is normally distributed, according to the CLT, we can use the standard deviation of the sample to
calculate the z-score of the minimum value in the relevant range, 0.80 hrs:
Z = 0.801
0.125 = 1.60
Finally, we use the z-score probability reference above to correlate the z-score of -1.60 to the probability of a value
greater than that
P(Z 1.6) = .9452 or 94.52%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Evan price-checked 123 online auction sellers to record their average asking price for his favorite game. According
to a major nation price-checking site, the national average online auction cost for the game is $35.00 with a standard
deviation of $3.00. Evan found the prices less than $34.86 on average. How likely is this result?

A

Since there are more than 30 samples (123 > 30), we can apply the CLT theorem and treat the sample as a normal
distribution.
The standard deviation of the sample is: sx = p
3
123 = 3
11.09 = .27
The z-score for Evan’s price point of $34.86 is:
Z = 34.8635
.27 = .14
.27 = 0.518
Consulting the z-score probability table, we learn that the area under the normal curve less than 0.52 is .3015 or
30.15%
30.15%
The likelihood of 123 samples having a mean of $34.86 is approximately 30.15%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the Central Limit Theorem? How does the Central Limit Theorem relate other distributions to the normal
distribution?

A

The Central Limit Theorem says that the larger the sample size, the more the mean of multiple samples will represent
a normal distribution. Since that is true regardless of the original distribution, the CLT can be used to effect a bridge
between other types of distributions and a normal distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

The time it takes to drive from Cheyenne WY to Denver CO has a µ of 1 hr and s of 15 minutes. Over the course of
a month, a hig

A

The sample mean, µx is the same as the population mean: 1 hr = 60 mins.
The sample standard deviation is 15
p
mins
55 = 15
7.42 = 2.02 min
The 55 trips made by the patrolman exceed the minimum sample size of 30 required to apply the CLT, so we may
assume the sample means to be normally distributed.
The z-score of the patrolman’s average time is: 6060
2.02 = 0
2.02 = 0
According to the z-score percentage reference, a z-score of 0 corresponds to .50 or 50%
There is a 50% probability that the patrolman’s mean travel time is greater than 60 mins.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Abbi polls 95 high school students for their GPA. According to the school, the average GPA of high school students
has a mean of 3.0, and a standard deviation of .5. What is the probability that Abbi’s random sample will have a
mean within 0.01 of the population.

A

The sample mean of the 95 polled G.P.A. scores is the same as the population mean: 3.0
The sample standard deviation is p.5
95 = .5
9.75 = .05
The 95 sampled G.P.A.’s exceed the minimum sample size of 30, so we may apply the CLT.
The z-scores of the minimum and maximum values in the range of interest, 2.99 to 3.01 is:
Z1 = 2.993.00
.05 = .01
.05 = 0.2
Z2 = 3.013.00
.05 = .01
.05 = +0.2
Referring to the z-score reference table, the z-scores -0.2 and 0.2 cover a range of apx. 15.86%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

A recipe website has calculated that the time it takes to cook Sunday dinner has µ of 1 hour with s of 25 minutes.
Over the course of a month, 172 users report their time spent cooking Saturday dinner, what is the probability that
the average user reports spending less than 45 minutes cooking dinner?

A

The sample mean, µx is the same as the population mean: 1 hr = 60 mins.
The sample standard deviation is 25
p
mins
172 = 25
13.11 = 1.91 min
The 172 users reporting cooking times exceed the minimum sample size of 30 required to apply the CLT, so we may
assume the sample means to be normally distributed.
The z-score of the average reported cooking time is: 4560
1.91 = 15
1.91 = 7.85
According to the z-score percentage reference, a z-score of -7.85 corresponds to 0%.
There is essentially zero probability that 172 users would average only 45 mins.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

128 randomly-sampled students reported how much they spent on a movie at the theater. If the national
average amount spent at the movies has a mean of $15 and standard distribution of $8, what is the probability
that the random sample will give a result within $0.01 of the true value?

A

r

17
Q

The time an American family spends doing dishes in the evening has µ = 60 mins and s = 60 mins. 58
Americans were polled to find the time they spend doing dishes. What is the probability that their average
time exceeds 60 minutes?

A

r

18
Q

. Rachel asked 65 second year college students how many credits they have taken. According to the colleges,
the average number of credits taken by 2nd year students is 15, with a standard deviation of 7. How likely is it
that Rachel got less than 17.17 on average?

A

r

19
Q

. What do you need in order to apply the Central Limit Theorem to sample means?

A

r

20
Q

. 117 business women were asked how much they spend for lunch, on average. If the national average has a
mean of $30, and standard distribution of $9, what is the probability that the random sample will return a
result within $0.01 of the true value?

A

r

21
Q

According to the phone company, the daily average number of calls made by Americans is 30, with a standard
deviation of 10. What is the probability that 117 Americans reported less than 30.92 calls per day, on average?

A

r

22
Q

The time spent by the average technician repairing a laptop is governed by an exponential distribution where
µ and s are each 60 minutes. In the month of June, a technician repairs 76 laptops. How likely is it that the
average repair time is greater than 77 minutes?

A

r

23
Q

46 teenagers were asked how many .mp3’s they purchase each month. According .mp3 sales data, the average
has a mean of 15, with a standard distribution of 2. How likely is it that the 46 polled teens averaged within
0.02 of the national average?

A

r

24
Q

44 classrooms were investigated to see how many students they contained. According to school data, the
average number of students per classroom is 35, with a standard deviation of 10. How likely is it that the 44
classrooms averaged fewer than 33.49 students?

A

r

25
Q

100 bags of candy were counted to see how many pieces they contained. According to the company that fills
the bags, the average number of candies per bag has a mean of 50, and standard distribution of 10. What is
the probability that the 100 bags will have an average number within 0.02 of the production average?

A

r