AP Stat Ch 2 Flashcards

0
Q

Strategy for exploring data:

A

Always plot data: make a graph (histogram or stemplot)
Look for overall pattern (shape, center, spread) and for outliers
Calculate a numerical summary to briefly describe center and spread
Sometimes the overall pattern of a large number of observations is so regular that we can describe it by a smooth curve. The curve is a mathematical model for the distribution. It is an idealized description that gives an overall pattern of the data but ignores minor irregularities as well as any outliers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
1
Q

Empirical Rule

A

In a normal distribution, 68% of the data is within 1 standard deviation of the mean, 95% is within 2 standard deviations and 99.7% is within 3 standard deviations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How value of standard deviation affects bell curve

A

Larger the standard deviation, the wider the curve is.

Bell curve with s=1 is much taller and narrower than one with S=3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Percentile

A

One way to describe performance, or location in a distribution, is to use percentiles.

The pth percentile of a distribution is the value with p percent of the observations less than it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Example of percentile:

Jenny’s score is the 22nd highest score. There are 25 scores. What percentile did she score.

A

There are 21 values less than Jenny’s– 21/25 = 84th percentile.

N-1 / total

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q
Another example of percentile:
Katie got the highest score in the class on the exam. There are 25 people in the class.
A

24 scores less than Katie.

24/25= 96th percentile.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Relative cumulative frequency

A

Instead of wanting to know which percent of the data falls into a particular class, we often want to know which percent falls below a certain value. To make this possible, we will compute the relative cumulative frequency for each class, which is the sum of the relative frequency of that class and all the classes below it.

Add up relative frequency from groups at or below–this sum is the percent that falls at or above a value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Example of relative cumulative frequency:
Say that 2 presidents were inaugurated from 40-44, 7 from 45-49, and 13 from 50-54, …
44 total presidents
What’s the RCF in the 40-44, 45-49, 50-54 groups?

A

In 40-44, 2/44 presidents = 4.5%. So rel f = 4.5%. RCF = 4.5% too
45-49– 7/44 presidents, rel f =15.9%. RCF = 15.9+4.5 = 20.5%. 20.5% of the presidents inaugurated were 49 or less
50-54 – 13/44 = 29.5%. RCF = 20.5+29.5=50%. 50% of the presidents were 54 or less when inaugurated.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Ogive

A

Graph of a cumulative relative frequency distribution is referred to as an ogive.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How to graph an ogive

A
Label and scale your axes and title your graph
Plot a point corresponding to the RCF in each class interval at the left endpoint of the next class interval. For example, plot a point at 4.5% above the age value 45 to indicate that 4.5% of presidents were inaugurated before they were 45 years old. 
Begging your ogive with an height of 0% at the left endpoint of the lowest class interval. 
Last point should be at a height of 100%
Y axis is RCF and x axes is the variable-- for presidents it is the ages
4.5% of presidents in 40-44. So point at (45,4.5%)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Standardizing

A

Another way to describe position is to tell how many standard deviations above or below the mean it is. Converting scores from original values to standard deviation units is known as standardizing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Why do we standardize?

A

To allow for us to compare different approx. normal distributions.
Every set of data has different set of values. For example, heights of people might range from 18 inches to 8 feet and weights can range from one pound to 500 pounds. Those wide ranges make it difficult to analyze data so we standardize the normal curve, setting it to have a mean of zero and a standard deviation of one.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Standardized score/ z score

A

Z score tells us how many standard deviations an observation is from the mean, and in which direction.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How to calculate z score

A

(X- x bar)/ s

Value - mean divided by standard deviation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Example of z score:
Jenny got an 86 on stat test. Mean is 80 and S = 6.07.
82 on physics test. Mean is 76, s = 4.
Which did she performs better relative to her class?

A
ZSTAT = 86-80 / 6.07 = .99
ZPHYS = 82-76 / 4 = 1.5
Higher z score on physics, so did better relative to her class on the physics.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How does standardization affect the distribution?

A

When we subtract the mean of the data from every data value, the mean is now zero
When we divide each of these shifted values by s, the standard deviation should be divided by s as well. Since the standard deviation was S to start with the new standard deviation becomes 1.

Shape the same
Center - mean now is zero
Spread - S is now one

16
Q

Density curve

A

A curve that is always on or above the x axis and has an area of exactly one underneath it. A density curve describes the overall pattern of a distribution. The area under the curve and above any range of values is the proportion of all observations that fall in that interval.
Density curves, like distributions, come in many shapes. Normal, skewed left, or skewed right. Outliers are not described by density curves. Curve is an approx that is easy to use and accurate enough for practical use.

17
Q

Advantage of density curve over histogram

A

Doesn’t depend on our choice of classes

18
Q

Mean and median of a density curve

A

Mean is the balance point– hard to do this by eye– calculate mathematical way and locate on curve.
Median–equal areas point– divides area in half.

Median and mean are the same for symmetric density curve. Center of the curve. Mean of a skewed curve is pulled away from the median in the direction of the long tail.

19
Q

When to use x bar and S vs sigma and mu

A

Use English letters for statistics– measures on a data set.
Greek letter for parameters – idealized distribution.

20
Q

Normal distributions

A

Density curves that are symmetric, single peaked, and bell shaped are called normal curves, and they describe normal distributions.
A normal distribution curve shows the frequency (how many times something occurs) in a symmetric graph.
Following properties:
Graph is the highest at the mean
Mean = Median = mode
Data is symmetrical about the mean

21
Q

Inflection points

A

Points on a curve at which there’s a change in curvature, and are one standard deviation away from the mean.

22
Q

What type of data is normally not normal

A

Income. Skewed right

23
Q

Example using empirical rule:
A battery has an average life span of 50 hours, S=3. Normally distributed
A. What percent of batteries last at least 44 hours?
B. If we have 1500 batteries, how many are within one standard deviation of the mean

A

A. 44 hours is -2s.
95% of data within two S of mean– 47.5% between -2s and mu and 50% greater than mu. So 97.5%
B. 68% within one S of the mean – .68*1500 = 1020 batteries.

24
Q

Standard normal model

A

The normal distribution N(0,1) with mean 0 and standard deviation 1 is called the standard normal model. Remember that standardizing won’t change shape– if not normal before, standardizing won’t change shape.

25
Q

Shorthand for normal dist.

A

N (mean, standard deviation)

26
Q

Standard normal z table

A

Table of areas under the standard normal curve. Tells us the area under the curve to the left of z. Also the percentile of z.

27
Q

How to use the z table

A

First convert data to z scores
Then looking at a portion of the z table above, if the z score is -2.15, we find the z score by looking down the left column for the first two digitis, -2.1, and then across top row for third digit, 5. Table gives the percentile .0158. That means 1.58% of the z scores are less than -2.15.

28
Q

Example of analyzing z scores:
If at student scored a 720 on SAT math and mean is 500 and standard deviation is 100, what is his z score and what’s it mean according to the z table.

A

Z = (720-500)/100=2.2

This means that the score is 2.2 standard deviations ABOVE the mean

29
Q

Example of z table:

What z score represents the first quartiles in a normal model? Find the cut point for the 25th percentile.

A

Look at z table – look for .25 – Q1 –> z= -.67. Then use the z formula to get the corresponding score in the distribution.

30
Q

Normal probability plot

A

Used to assess normality. If it is linear, then the data set is roughly Normal.