Real World Examples Flashcards
Using Statistics Everyday (16 cards)
You go to an ice cream shop. The probability of choosing chocolate is 0.3, and vanilla is 0.5. What’s the chance someone picks either chocolate or vanilla?
The Addition Rule of Probability (Mutually Exclusive)
P(chocolateorvanilla)=P(chocolate)+P(vanilla)=0.3+0.5=0.8
A student is selected at random. 60% of students take math, 40% take science, and 20% take both. Apply the addition rule. Would it be mutually exclusive or not?
The Addition Rule of Probability (Not Mutually Exclusive) Addition with Overlap
(MathorScience)=P(Math)+P(Science)−P(Both) =
0.6 + 0.4 − 0.2 = 0.8
Inspectors for a hospital chain with multiple locations randomly select some of their locations for a cleanliness check of their operating rooms. The inspectors check every operating room in the hospitals that were chosen. What type of sample is this?
Cluster Sample
Mr. Thompson runs his own printing and bookbinding business. He suspects that the machine isn’t putting enough glue into the book spines and decides to inspect his most recent order of
70 textbooks to test his theory. He numbers them
01-70 and, using the random digit table printed below, selects a simple random sample of 5 books to check.
Which books are in the sample?
85063 55810 10470 08029 30025
06, 35, 58, 10, 47
What would be the CDF of this: an appliance manufacturer investigates failure times for the heating element within its toasters. They want to determine the time by which specific proportions of heating elements will fail so they can set the warranty period. Heating element failure times follow a normal distribution with a mean of 1000 hours and standard deviation of 300 hours.
In this context, use the normal CDF to find:
P(X ≤ t), where:
- X = failure time (in hours), μ = 1000 (mean), σ = 300
For example:
- P(X ≤ 800) = CDF at 800 → % of elements failing before 800 hours
- Use CDF to find the warranty cutoff time that covers, say, 90% of the elements → Find t such that: P(X ≤ t) = 0.9
Useful formula for standardization: Z = (X - μ) / σ
So the CDF gives the proportion of failures up to a certain time.
What would be the PDF of this: an appliance manufacturer investigates failure times for the heating element within its toasters. They want to determine the time by which specific proportions of heating elements will fail so they can set the warranty period. Heating element failure times follow a normal distribution with a mean of 1000 hours and standard deviation of 300 hours.
The inverse CDF gives the corresponding failure time for each cumulative probability. Use the inverse CDF to estimate the time by which 5% of the heating elements will fail, times between which 95% of all heating elements will fail, or the time at which only 5% of the heating elements remain
If X is binomial with n = 10,000 and p = 0.5 (we flip a fair coin 10,000 times), what is the probability that we observe p-hat between 0.49 and 0.51?
95%
The expected value of p-hat is 0.5. The standard deviation of p-hat = sqrt(p(1-p)/n) = sqrt(0.50.5/10,000) = 0.005. Thus, the z-score of 0.49 is (0.49–0.5)/0.005 = –2. And the z-score of 0.51 is (0.51–0.5)/0.005 = 2. Thus, our question is simplified to asking what is the probability that a z-score is between –2 and 2, which is 95% according to the empirical rule.
Why is this study flawed? The CEO of a major bank wanted to assess how many of the bank’s customers are satisfied, so he decided to conduct a small survey.
At the time of the survey, the bank had several millions of customers across 300 bank branches of varying sizes at which he picked 1 person at each bank
In this case, each branch has the same probability - they all have exactly one customer that got picked. However, the probability of each individual customer of each branch to get picked changes according to the size of the branch! If a branch has 100 people, it’s 1/100 vs a branch w/ a thousand people is 1/1000
Would you use CDF or Inverse CDF?
“What’s the probability that a light bulb burns out before 800 hours?” If failure times are normally distributed with mean μ=1000 and σ=300
CDF: P(X≤x)
P(X≤800)⇒CDFatx=800
This tells you how much of the distribution is to the left of 800.
Would you use CDF or Inverse CDF?”What number of hours corresponds to the 90th percentile of light bulb life?”
x=InvCDF(p)
You’re given p=0.90, and you want to find the x-value such that P(X≤x)=0.90
📍 In a normal distribution with mean 1000 and std dev 300, the inverse CDF at 0.90 gives you ≈ 1384 hours.
Imagine the PDF (Probability Density Function) as a mountain range (a smooth curve). You are told the area under that mountain from the far left up to some point x tells you. What would be the CDF?
That total area is the CDF(x), and it would answer, What’s the total probability of getting a value less than or equal to x?
We want to test our results for this question: Does country 1 have a higher percentage of ice cream lovers than country 2? We found 35 out of 70 people like ice cream in country 1 and in country 2 that 10 out of thirty liked ice cream, what test do we want to use? Why?
A 2-proportioned z-test because we’re comparing two different groups — in this case, two countries and how many people in each one like ice cream.
What would be a pooled proportion in this scenario? Does country 1 have a higher percentage of ice cream lovers than country 2? We found 35 out of 70 people like ice cream in country 1 and in country 2 that 10 out of thirty liked ice cream, what test do we want to use?
A pooled proportion is like saying:Let’s combine both countries’ data to get one best guess for the true percentage of people who like ice cream.”
pooled proportion = (x₁ + x₂) / (n₁ + n₂)
(35 + 10) / (70 + 30) = 45 / 100 = 0.45
We use the pooled proportion to calculate the z-forumla.
We are asking, does country 1 have a higher percentage of ice cream lovers than country 2? We found 35 out of 70 people like ice cream in country 1 and in country 2 that 10 out of thirty liked ice cream. Our null hypothesis is p1=p2 and our alternative is p1 > p2. Why did we fail to reject our null?
The z observed is 1.56. The z-critical value is 1.645. 1.56 is < 1.645. We are not far enough into the “rejection zone”. The difference between countries is not extreme enough to reject the idea that they’re the same. Even though the difference in proportions was 0.17 (17%), the z-score was only 1.56, which is not big enough to be considered rare or surprising under the null.
We are asking, does country 1 have a higher percentage of ice cream lovers than country 2? We found 35 out of 70 people like ice cream in country 1 and in country 2 that 10 out of thirty liked ice cream. Our null hypothesis is p1=p2 and our alternative is p1 > p2. How would you calculate z observed?
Calculate the Sample proportions for each country, p̂₁ = 35 / 70 = 0.50 and p^2 =p̂₂ = 10 / 30 = 0.3333
Compute the difference between sample proportions Difference = p̂₁ - p̂₂ = 0.50 - 0.3333 = 0.1667 (or ~0.17)
Calculate the pooled proportion = (35 + 10) / (70 + 30) = .45
Compute the standard error (SE) = .1085
Plug values into the z formula z observed = (0.50 - 0.3333) / 0.1085 =1.54
The circumference of a circle is 36pi. Contained in that circle is a smaller circle with area 16pi. A point is selected at random from inside the larger circle. What is the probability that the point also lies in the smaller circle?
P = smaller area of circle / larger area of circle
First find the Area of the larger circle using C = 2pi*r
Then calculate the ratios of both circles.