Week 2 - Lesson 5.1 The Standard Normal Probability Distribution Flashcards

1
Q

Most high schools have a set amount of time in-between classes during which students must get to their next class. If
you were to stand at the door of your statistics class and watch the students coming in, think about how the students
would enter. Usually, one or two students enter early, then more students come in, then a large group of students
enter, and finally, the number of students entering decreases again, with one or two students barely making it on
time, or perhaps even coming in late!
Now consider this. Have you ever popped popcorn in a microwave? Think about what happens in terms of the
rate at which the kernels pop. For the first few minutes, nothing happens, and then, after a while, a few kernels
start popping. This rate increases to the point at which you hear most of the kernels popping, and then it gradually
decreases again until just a kernel or two pops.
Here’s something else to think about. Try measuring the height, shoe size, or the width of the hands of the students in
your class. In most situations, you will probably find that there are a couple of students with very low measurements
and a couple with very high measurements, with the majority of students centered on a particular value.

All of these examples show a typical pattern that seems to be a part of many real-life phenomena. In statistics,
because this pattern is so pervasive, it seems to fit to call it normal, or more formally, the normal distribution. The
normal distribution is an extremely important concept, because it occurs so often in the data we collect from the
natural world, as well as in many of the more theoretical ideas that are the foundation of statistics. This chapter
explores the details of the normal distribution.

A

-read

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Characteristics of a Normal Distribution

A

When graphing the data from each of the examples in the introduction, the distributions from each of these situations
would be mound-shaped and mostly symmetric. A normal distribution is a perfectly symmetric, mound-shaped
distribution. It is commonly referred to the as a normal curve, or bell curve.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Because so many real data sets closely approximate a normal distribution, we can use the idealized normal curve to
learn a great deal about such data. With a practical data collection, the distribution will never be exactly symmetric,
so just like situations involving probability, a true normal distribution only results from an infinite collection of data.
Also, it is important to note that the normal distribution describes a continuous random variable

A

-read

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Due to the exact symmetry of a normal curve, the center of a normal distribution, or a data set that approximates a
normal distribution, is located at the highest point of the distribution, and all the statistical measures of center we
have already studied (the mean, median, and mode) are equal

It is also important to realize that this center peak divides the data into two equal parts

A

Center

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Let’s go back to our popcorn example. The bag advertises a certain time, beyond which you risk burning the popcorn.
From experience, the manufacturers know when most of the popcorn will stop popping, but there is still a chance that
there are those rare kernels that will require more (or less) time to pop than the time advertised by the manufacturer.
The directions usually tell you to stop when the time between popping is a few seconds, but aren’t you tempted to
keep going so you don’t end up with a bag full of un-popped kernels? Because this is a real, and not theoretical,
situation, there will be a time when the popcorn will stop popping and start burning, but there is always a chance, no
matter how small, that one more kernel will pop if you keep the microwave going. In an idealized normal distribution
of a continuous random variable, the distribution continues infinitely in both directions.

A

Spread

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Because of this infinite spread, the range would not be a useful statistical measure of spread. The most common way
to measure the spread of a normal distribution is with the standard deviation, or the typical distance away from the
mean. Because of the symmetry of a normal distribution, the standard deviation indicates how far away from the
maximum peak the data will be. Here are two normal distributions with the same center (mean):

The first distribution pictured above has a smaller standard deviation, and so more of the data are heavily concentrated
around the mean than in the second distribution. Also, in the first distribution, there are fewer data values at the
extremes than in the second distribution. Because the second distribution has a larger standard deviation, the data
are spread farther from the mean value, with more of the data appearing in the tails.

A

-read

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Technology Note: Investigating the Normal Distribution on a TI-83/84 Graphing Calculator

We can graph a normal curve for a probability distribution on the TI-83/84 calculator. To do so, first press [Y=].
To create a normal distribution, we will draw an idealized curve using something called a density function. The
command is called ’normalpdf(’, and it is found by pressing [2nd][DISTR][1]. Enter an X to represent the random
variable, followed by the mean and the standard deviation, all separated by commas. For this example, choose a
mean of 5 and a standard deviation of 1.

A

-read

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Because of the similar shape of all normal distributions, we can measure the percentage of data that is a certain
distance from the mean no matter what the standard deviation of the data set is. The following graph shows a normal
distribution with µ = 0 and s = 1. This curve is called a standard normal curve. In this case, the values of x represent
the number of standard deviations away from the mean.

A

The Empirical Rule

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Notice that vertical lines are drawn at points that are exactly one standard deviation to the left and right of the
mean. We have consistently described standard deviation as a measure of the typical distance away from the mean.
How much of the data is actually within one standard deviation of the mean? To answer this question, think about
the space, or area, under the curve. The entire data set, or 100% of it, is contained under the whole curve. What
percentage would you estimate is between the two lines? To help estimate the answer, we can use a graphing
calculator. Graph a standard normal distribution over an appropriate window.

A

-read

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Technology Note: Investigating the Normal Distribution on a TI-83/84 Graphing Calculator
We can graph a normal curve for a probability distribution on the TI-83/84 calculator. To do so, first press [Y=].
To create a normal distribution, we will draw an idealized curve using something called a density function. The
command is called ’normalpdf(’, and it is found by pressing [2nd][DISTR][1]. Enter an X to represent the random
variable, followed by the mean and the standard deviation, all separated by commas. For this example, choose a
mean of 5 and a standard deviation of 1

Adjust your window to match the following settings and press [GRAPH]

Press [2ND][QUIT] to go to the home screen. We can draw a vertical line at the mean to show it is in the center of the
distribution by pressing [2ND][DRAW] and choosing ’Vertical’. Enter the mean, which is 5, and press [ENTER]

Remember that even though the graph appears to touch the x-axis, it is actually just very close to it.
In your Y= Menu, enter the following to graph 3 different normal distributions, each with a different standard
deviation:

This makes it easy to see the change in spread when the standard deviation changes

A

-read

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

The calculator also gives a very accurate estimate of the area. We can see from the rightmost screenshot above that
approximately 68% of the area is within one standard deviation of the mean. If we venture to 2 standard deviations
away from the mean, how much of the data should we expect to capture? Make the following changes to the
’ShadeNorm(’ command to find out:

Notice from the shading that almost all of the distribution is shaded, and the percentage of data is close to 95%. If
you were to venture to 3 standard deviations from the mean, 99.7%, or virtually all of the data, is captured, which
tells us that very little of the data in a normal distribution is more than 3 standard deviations from the mean.
Notice that the calculator actually makes it look like the entire distribution is shaded because of the limitations of the
screen resolution, but as we have already discovered, there is still some area under the curve further out than that.
These three approximate percentages, 68%, 95%, and 99.7%, are extremely important and are part of what is called
the Empirical Rule.

A

-

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

The Empirical Rule states that the percentages of data in a normal distribution within 1, 2, and 3 standard deviations
of the mean are approximately 68%, 95%, and 99.7%, respectively

A

-read

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

A z-score is a measure of the number of standard deviations a particular data point is away from the mean. For
example, let’s say the mean score on a test for your statistics class was an 82, with a standard deviation of 7 points.
If your score was an 89, it is exactly one standard deviation to the right of the mean; therefore, your z-score would
be 1. If, on the other hand, you scored a 75, your score would be exactly one standard deviation below the mean,
and your z-score would be 1. All values that are below the mean have negative z-scores, while all values that are
above the mean have positive z-scores. A z-score of 2 would represent a value that is exactly 2 standard deviations
below the mean, so in this case, the value would be 8214 = 68.

A

-

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

To calculate a z-score for which the numbers are not so obvious, you take the deviation and divide it by the standard
deviation.

You may recall that deviation is the mean value of the variable subtracted from the observed value, so in symbolic
terms, the z-score would be:

As previously stated, since s is always positive, z will be positive when x is greater than µ and negative when x is
less than µ. A z-score of zero means that the term has the same value as the mean. The value of z represents the
number of standard deviations the given value of x is above or below the mean.

A

-

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Example: What is the z-score for an A on the test described above, which has a mean score of 82? (Assume that an
A is a 93.)

A

The z-score can be calculated as follows:

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

If we know that the test scores from the last example are distributed normally, then a z-score can tell us something
about how our test score relates to the rest of the class. From the Empirical Rule, we know that about 68% of the
students would have scored between a z-score of 1 and 1, or between a 75 and an 89, on the test. If 68% of the
data is between these two values, then that leaves the remaining 32% in the tail areas. Because of symmetry, half of
this, or 16%, would be in each individual tail.

A

-

17
Q

Example: On a nationwide math test, the mean was 65 and the standard deviation was 10. If Robert scored 81, what
was his z-score?

A

Ans

18
Q

Example: On a college entrance exam, the mean was 70, and the standard deviation was 8. If Helen’s z-score was
1.5, what was her exam score?

A

Answer

19
Q

The best way to determine if a data set approximates a normal distribution is to look at a visual representation.
Histograms and box plots can be useful indicators of normality, but they are not always definitive. It is often easier
to tell if a data set is not normal from these plots.

A

Assessing normality

20
Q

If a data set is skewed right, it means that the right tail is significantly longer than the left. Similarly, skewed
left means the left tail has more weight than the right. A bimodal distribution, on the other hand, has two modes,
or peaks. For instance, with a histogram of the heights of American 30-year-old adults, you will see a bimodal
distributionone mode for males and one mode for females.
There is a plot we can use to determine if a distribution is normal called a normal probability plot or normal quantile
plot. To make this plot by hand, first order your data from smallest to largest. Then, determine the quantile of each
data point. Finally, using a table of standard normal probabilities, determine the closest z-score for each quantile.
Plot these z-scores against the actual data values. To make a normal probability plot using your calculator, enter
your data into a list, then use the last type of graph in the STAT PLOT menu, as shown below:

If the data set is normal, then this plot will be perfectly linear. The closer to being linear the normal probability plot
is, the more closely the data set approximates a normal distribution.
Look below at the histogram and the normal probability plot for the same data.

A

-

21
Q

The histogram is fairly symmetric and mound-shaped and appears to display the characteristics of a normal distribution. When the z-scores of the quantiles of the data are plotted against the actual data values, the normal probability
plot appears strongly linear, indicating that the data set closely approximates a normal distribution. The following
example will allow you to see how a normal probability plot is made in more detail.

A

-

22
Q

Example: The following data set tracked high school seniors’ involvement in traffic accidents. The participants were
asked the following question: “During the last 12 months, how many accidents have you had while you were driving
(whether or not you were responsible)?”

A

Figure: Percentage of high school seniors who said they were involved in no traffic accidents. Source: Sourcebook
of Criminal Justice Statistics: http://www.albany.edu/sourcebook/pdf/t352.pdf
Here is a histogram and a box plot of this data:

The histogram appears to show a roughly mound-shaped and symmetric distribution. The box plot does not appear
to be significantly skewed, but the various sections of the plot also do not appear to be overly symmetric, either. In
the following chart, the data has been reordered from smallest to largest, the quantiles have been determined, and
the closest corresponding z-scores have been found using a table of standard normal probabilities

23
Q

Figure: Table of quantiles and corresponding z-scores for senior no-accident data.
Here is a plot of the percentages versus the z-scores of their quantiles, or the normal probability plot:

Remember that you can simplify this process by simply entering the percentages into a L1 in your calculator and
selecting the normal probability plot option (the last type of plot) in STAT PLOT.
While not perfectly linear, this plot does have a strong linear pattern, and we would, therefore, conclude that the
distribution is reasonably normal.

A

-

24
Q

A normal distribution is a perfectly symmetric, mound-shaped distribution that appears in many practical and real
data sets. It is an especially important foundation for making conclusions, or inferences, about data. A standard
normal distribution is a normal distribution for which the mean is 0 and the standard deviation is 1

A

-

25
Q

A z-score is a measure of the number of standard deviations a particular data value is away from the mean. The
formula for calculating a z-score is:

A

Ans

26
Q

z-scores are useful for comparing two distributions with different centers and/or spreads. When you convert an entire
distribution to z-scores, you are actually changing it to a standardized distribution. z-scores can be calculated for
data, even if the underlying population does not follow a normal distribution.

A

-

27
Q

The Empirical Rule is the name given to the observation that approximately 68% of a normally distributed data set
is within 1 standard deviation of the mean, about 95% is within 2 standard deviations of the mean, and about 99.7%
is within 3 standard deviations of the mean. Some refer to this as the 68-95-99.7 Rule.

A

-

28
Q

You should learn to recognize the normality of a distribution by examining the shape and symmetry of its visual
display. A normal probability plot, or normal quantile plot, is a useful tool to help check the normality of a
distribution. This graph is a plot of the z-scores of the data as quantiles against the actual data values. If a distribution
is normal, this plot will be linear.

A

-