OpenIntro 4 Flashcards

(32 cards)

1
Q

Percentiles

A

Percentile is the percentage of observations that fall below a given data point. Graphically, percentile is the area below the probability distribution curve to the left of that observation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Z scores

A

Z = observation − mean / SD Z score of an observation is the number of standard deviations it falls above or below the mean. Z scores are defined for distributions of any shape, but only when the distribution is normal can we use Z scores to calculate percentiles. (Observations that are more than 2 SD away from the mean are usually considered unusual)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Calculating the Z score

A

Z = observation − mean / SD

if z score = negative - it is below mean

if z score = positive - above the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Pnorm() of Z score to get percentile

A

round(pnorm(1.50), digits=4)
output: 0.9332 = 93,32 % of possoms

Pnorm shows distribution up to the Z score

1 - pnorm shows the data from Z score and on

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Find the head lenght between 2 Z scores

A

subtract the 2 Z-scores from eachother

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Solving equation for percentile

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Calculating percentiles

A

There are many ways to compute percentiles/areas under the
curve:

In R:
> pnorm(1800, mean = 1500, sd = 300)
output: 0.8413447

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Heinz ketchup factory the amounts which go into bottles of ketchup
are supposed to be normally distributed with:

mean 36 oz.

standarddeviation 0.11 oz.

What percent of bottles have less than
35.8 ounces of ketchup?

A

pnorm(35.8, mean = 36, sd = 0.11)
output: 0.0345

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

qnorm()

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Bernouilli random variable

A

only 2 outcomes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Geometric distribution

If p represents probability of success, (1 − p) represents probability
of failure, and n represents number of independent trials
P(success on the nth trial) = (1 − p)n−1p

A

Geometric distribution needs:

  1. independence: outcomes of trials don’t affect each other
  2. identical: the probability of success is the same for each trial

Example from book:

On average, how many transistors would you expect to be produced before the first with a defect? What is the standard deviation?

1/0.02 = 50 made until one defect one

SD: sqrt (1 - 0.02 / 0.02^2) = 49.49

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

roling a 6 (Geometric distribution)

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Suppose we randomly select four individuals to participate in this
experiment. What is the probability that exactly 1 of them will
refuse to administer the shock?

The Binomial distribution describes the probability of having
exactly k successes in n independent Bernouilli trials with
probability of success p.

A

Let’s call these people Allen (A), Brittany (B), Caroline (C), and
Damian (D). Each one of the four scenarios below will satisfy the
condition of “exactly 1 of them refuses to administer the shock”:

The probability of exactly one 1 of 4 people refusing to administer
the shock is the sum of all of these probabilities.
0.0961 + 0.0961 + 0.0961 + 0.0961 = 4 × 0.0961 = 0.3844

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

When is it binomial distribution?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What part does pnorm() return?

A

The stuff up to the input
Put 1 - to get the stuff after the input

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

68-95-99.7 rule

17
Q

Further on Z scores

A

The z-score is positive if the value lies above the mean, and negative if it lies below the mean

A z-score describes the position of a raw score in terms of its distance from the mean, when measured in standard deviation units

Tells you “how far ahead, or how far below your raw score is from the population mean, in standard deviation units“.

18
Q

qnorm() and pnorm()

A

The qnorm() function is simply the inverse of pnorm()

fx:

> qnorm(.10) (percentile)
[1] -1.281552

> pnorm(-1.28) (Z-score)
[1] 0.1002726

19
Q

Find the cutoff

  • the warmest 5 %
20
Q

The 4 conditions of the binomial distribution

A

1: The number of observations n is fixed.
2: Each observation is independent.
3: Each observation represents one of two outcomes (“success” or “failure”).
4: The probability of “success” p is the same for each outcome.

21
Q

The National Vaccine Information Center estimates that 90% of Americans have
had chickenpox by the time they reach adulthood.

(b) Calculate the probability that exactly 97 out of 100 randomly sampled American adults had chickenpox during childhood.
(c) What is the probability that exactly 3 out of a new sample of 100 American adults have not had chickenpox in their childhood?

A

> dbinom(97,100,.90)
[1] 0.005891602

> dbinom(03,100,.10)
[1] 0.005891602

22
Q

Geometric distribution - expected value

A

How many people is Dr. Smith expected to test before finding the first one that refuses to administer the shock?

The expected value, or the mean, of a geometric distribution is defined as

1 p . µ = 1 p = 1 0.35 = 2.86

She is expected to test 2.86 people before finding the first one that refuses to administer the shock.

23
Q

Choose function

A

k successes in n trials.

Note: You can also use R for these calculations: > choose(9,2)

output: 36

24
Q

A 2012 Gallup survey suggests that 26.2% of Americans are obese. Among a random sample of 10 Americans, what is the probability that exactly 8 are obese?

(binomial distribution)

A

(n=10 k=8) × (0.262)^8 × (0.738)^2 = 0.0005

25
Expected value A 2012 Gallup survey suggests that 26.2% of Americans are obese. Among a random sample of 100 Americans, how many would you expect to be obese?
Easy enough, 100 × 0.262 = 26.2. Or more formally, µ = np = 100 × 0.262 = 26.2. But this doesn’t mean in every random sample of 100 people exactly 26.2 will be obese. In fact, that’s not even possible. In some samples this value will be less, and in others more. How much would we expect this value to vary?
26
Distributions of number of successes
Low large is large enough? **The sample size is considered large enough if the expected number of successes and failures are both at least 10.** np ≥ 10 and n(1 − p) ≥ 10 (a) n = 100, p = 0.95 **(b) n = 25, p = 0.45 (correct answer)** (c) n = 150, p = 0.05 (d) n = 500, p = 0.015
27
Normal approximation to the binomial (What is the probability that the average Facebook user with 245 friends has 70 or more friends who would be considered power users?)
28
What is the probability that the average Facebook user with 245 friends has 70 or more friends who would be considered power users?
29
Negative binomial distribution
The following four conditions are useful for identifying a negative binomial case: 1. The trials are independent. 2. Each trial outcome can be classified as a success or failure. 3. The probability of success (p) is the same for each trial. 4. The last trial must be a success. Note that the first three conditions are common to the binomial distribution.
30
What is the probability that at most 3 out of 10 randomly sampled American adults have not had chickenpox? (binomial distribution)
Choose(10,0) \* 0.1^0\*0.9^10 + choose(10,1) \* 0.1^1\*0.9^9 + choose(10,2) \* 0.1^2\*0.9^8 + choose(10,3) \* 0.1^3\*0.9^7 [1] 0.9872048
31
Binomial vs. negative binomial
32
On 70% of days, a hospital admits at least one heart attack patient. On 30% of the days, no heart attack patients are admitted. Identify each case below as a binomial or negative binomial case, and compute the probability. (a) What is the probability the hospital will admit a heart attack patient on exactly three days this week? (b) What is the probability the second day with a heart attack patient will be the fourth day of the week? (c) What is the probability the fth day of next month will be the rst day with a heart attack patient?
In each part, p = 0.7. ## Footnote (a) The number of days is fixed, so this is binomial. The parameters are k = 3 and n = 7: 0.097. (b) The last "success" (admitting a heart attack patient) is fixed to the last day, so we should apply the negative binomial distribution. The parameters are k = 2, n = 4: 0.132. (c) This problem is negative binomial with k = 1 and n = 5: 0.006. Note that the negative binomial case when k = 1 is the same as using the geometric distribution.