Quiz 2 Flashcards
(62 cards)
Briefly describe two possible sources of confusion about the “average”
Two possible sources of confusion are not knowing whether the reported average is the mean or the median, and not having enough information about how the average was computed
Determine whether the statement makes sense (or is clearly true) or does not make sense (or is clearly false)
- A survey found that the mean salary for professional soccer players is much higher than the median salary
The statement makes sense, because it is likely that a few players have very high salaries, which are outliers and will pull the mean to a higher value than the median
Determine whether the following statement makes sense (is clearly true) or does not make sense (is clearly false). Explain.
- A survey question asks respondents the number of car crashes they have been involved in during the past ten years.
- The sample of the results has modes of 0 and 1.
This statement makes sense because the mode is the most frequent value in a data set, and there may be more than one mode.
Listed below are measurements of the “head injury criterion” for seven smalls cars tested in crashes by a traffic safety organization.
- Higher numbers are associated with a higher risk of injury.
- Find the mean, median, and mode of the listed numbers.
- Can you draw any conclusion about the risk of head injury in small cars versus larger cars?
512 542 468 379 489 478 509
The mean is 482.4
- The median is 489
- There is no mode
- None, because the data are all for small cars. They do not by themselves tell us anything about a comparison with larger cars.
The mean formula is mean = sum of all values divided by total number of values
The median is the middle value in the sorted data set (or halfway between the two middle values if the number of values is even)
The mode is the most common value (or group of values) in a data set. There is no mode if no value occurs more than once.
Cans of soda vary slightly in weight.
- Given below are the measured weights of seven cans, in pounds.
- Find the mean and median of these weights.
- Which, if any, of these weights would be considered the outlier?
- What are the mean and median weights if the outlier is excluded?
0.8161 0.8194 0.8166 0.8172 0.7906 0.8142 0.8123
The mean is 0.81234
- The median is 0.8161
- The outlier is 0.7906
- The mean without the outlier is 0.81597
- The median without the outlier is 0.81635
An outlier in a data set is a value that is much higher or much lower than almost all other values
Listed below are amounts (in millions of dollars) collected from parking meters by a security company in a certain city.
- A larger data set was used to convict 5 members of the company of grand larceny.
- Find the mean and median for each of the two samples and then compare the two sets of results.
- Do the limited data listed here show evidence of stealing by the security company’s employees?
Security company: 1.5 1.8 1.5 1.8 1.7 1.4 1.1 1.4 1.4 1.6
Other companies: 1.7 2.2 1.6 1.8 1.6 1.9 2.2 1.8 1.8 2.3
The mean for the security company is $1.52 million and the mean for the other companies is $1.89 million.
- The median for the security company is $1.5 million and the median for the other companies is $1.8 million.
- The mean and the median for the security company are both lower than the mean and the median for the collections performed by other companies.
- Since the security company appears to have collected lower revenue than the other companies, there is some evidence of stealing by the security company’s employees.
Distinguish between a uniform distribution and a distribution with one or more modes.
- What do we call a distribution with one, two, and three modes?
All data values in a uniform distribution have the same frequency, whereas a distribution with one or more modes has one or more values that occur most frequently.
- A distribution with one mode is called unimodal, a distribution with two modes is called bimodal, and a distribution with three modes is called trimodal.
What is the difference between symmetry and skewness?
A distribution has skewness when it is lopsided, with values that are more spread out on either the right side or the left side.
- A distribution has symmetry when the left half is a mirror image of the right half.
Decide whether the following statement makes sense (or is clearly true) or does not make sense (or is clearly false).
- Explain your reasoning.
- The distribution of grades was left skewed, but the mean, median, and mode were all the same.
This does not make sense because the mean and median should lie somewhere to the left of the mode for most left skewed distributions.
In a recent year, the 870 players in a certain sports league had salaries with the characteristics below.
- The mean was $3,152,075.
- The median was $1,525,000.
- The salaries ranged from a low of $503,000 to a high of $28,000,000.
- Describe the shape of the distribution of salaries. Is the distribution symmetric? Is it left skewed? Is it right skewed?
- The distribution is right skewed. - About how many players had salaries of $1,525,000 or higher?
- About 435 players had salaries of $1,525,000 or higher.
For the distribution described below, complete parts a and b below.
The annual incomes of all those in a statistics class, including the instructor.
- How many modes are expected for the distribution?
- The distribution is probably unimodal. - Is the distribution expected to be symmetric, left skewed, or right skewed?
- The distribution is probably right skewed.
Consider two grocery stores at which the mean time in line is the same but the variation is different.
- At which store would you expect the customers to have more complaints about the waiting time?
- Explain.
The customers would have more complaints about the waiting time at the store that has more variation because some customers would have longer waits and might think they are being treated unequally.
Decide whether the following statement makes sense (or is clearly true) or does not make sense (or is clearly false).
- Explain your reasoning.
The standard deviation for the heights of a group of 5 year old children is smaller than the standard deviation for the heights of a group of children who range in age from 3 to 15.
The statement makes sense because the range of data for the heights of a group of 5 year old children is smaller than the range of data for the heights of a group of children who range in age from 3 to 15.
The celebrities with the top eight net worths (in millions of dollars) in a certain country in a recent year are shown in the table.
- Find the range and standard deviation of these data.
- If you considered all celebrities rather than just this group, would you expect these measures of variation in net worth to be larger, smaller, or the same?
Celebrity 1 = 5300
Celebrity 2 = 3800
Celebrity 3 = 3200
Celebrity 4 = 1900
Celebrity 5 = 1100
Celebrity 6 = 800
Celebrity 7 = 800
Celebrity 8 = 650
The range = 4650 million dollars
- The standard deviation for= 1723.5 million dollars
- If you considered all celebrities rather than just this group, these measures of variation in net worth would be larger.
The range of a set of data values is the difference between its highest and lowest data values.
Range = highest value (max) - lowest value (minimum)
The standard deviation is a measure of how widely data values are spread around the mean of a data set.
S = square root of E symbol times x - x to the -2 power divided by n-1.
Ayurveda is a traditional medical system commonly used in India.
- Listed are the lead concentrations measured in different Ayurveda medicines (manufactured in the United States).
- Find the range and standard deviation of these data.
- Given that lead is considered a poison when it enters your bloodstream, what does the variation tell you about the safety of these traditional medicines?
2.9 6.3 5.1 5.9 20.9 7.6 11.8 20.8 11.1 17.3
The range is 18.
- The standard deviation is 6.62
- Lead is considered a poison when it enters your bloodstream. What does the variation in the data set tell you about the safety of these traditional medicines?
- The large variation suggests that you should be careful using these medicines, especially those with high lead values.
Listed are measurements of blood alcohol concentration (BAC) of drivers who were involved in fatal crashes and then given jail sentences.
- Find the range and standard deviation of the data.
- Briefly comment on what the results mean in this case.
0.25 0.17 0.17 0.15 0.13 0.24 0.32 0.24 0.14 0.15 0.12 0.15
The range = 0.2
- The standard deviation = 0.062
- While some drunk drivers who caused fatalities had very high BAC levels, others were fairly close to the legal intoxication limit of 0.08. Given the small sample size, this suggests that even lower BAC levels could potentially be dangerous.
One of the authors with too much time on his hands weighed each chocolate candy in a bag of 466 plain chocolate candies.
- One of the chocolate candies weighed 0.776 gram and it was heavier than 24 of the other chocolate candies. What is the percentile of this particular value?
- It is in the 5th percentile. - One of the chocolate candies weighed 0.876 gram and it was heavier than 328 of the other chocolate candies. What is the percentile of this particular value?
- It is in the 70th percentile. - One of the chocolate candies weighed 0.856 gram and it was heavier than 221 of the other chocolate candies. What is the percentile of this particular value?
- It is in the 47th percentile.
Percentile of data value = number of values less than this data value divided by total number of values in this data set multiplied by 100.
A data set consists of the 80 ages of women at the time that they won an award in the category of best actress.
- One of the actresses was 40 years of age, and she was older than 57 of the other actresses at the time that they won awards. What is the percentile of the age of 40?
- 71st percentile. - One of the actresses was 54 years of age, and she was older than 72 of the other actresses at the time that they won awards. What is the percentile of the age of 54?
- 90th percentile. - One of the actresses was 60 years of age, and she was older than 73 of the other actresses at the time that they won awards. What is the percentile of the age of 60?
- 91st percentile.
The following four sets of seven numbers all have a mean of 9.
(9,9,9,9,9,9,9), (7,7,9,9,9,11,11), (7,7,7,9,11,11,11), (4,4,4,9,14,14,14)
- Construct a histogram for set (9,9,9,9,9,9). Choose the correct graph.
- Answer is A. - Construct a histogram for set (7,7,9,9,9,11,11). Choose the correct graph.
- Answer is C. - Construct a histogram for set (7,7,7,9,11,11,11). Choose the correct graph.
- Answer is D. - Construct a histogram for set (4,4,4,9,14,14,14). Choose the correct graph.
- Answer is D. - Give the five number summary and draw a box plot for each set. Give the five number summary for (9,9,9,9,9,9,9).
- Low value = 9.
- Lower quartile = 9.
- Median = 9.
- Upper quartile = 9.
- High value = 9.
- Answer for boxplot is C. - Give the five number summary for set (7,7,9,9,9,11,11).
- Low value = 7.
- Lower quartile = 7.
- Median = 9.
- Upper quartile = 11.
- High value = 11.
- Answer for boxplot is D. - Give the five number summary for set (7,7,7,9,11,11).
- Low value = 7.
- Lower quartile = 7.
- Median = 9.
- Upper quartile = 11.
- High value = 11.
- Boxplot answer is A. - Give the five number summary for set (4,4,4,9,14,14,14).
- Low value = 4.
- Lower quartile = 4.
- Median = 9.
- Upper quartile = 14.
- High value = 14.
- Boxplot answer is A. - Compute the standard deviation for each set. Compute the standard deviation for set (9,9,9,9,9,9,9).
- S = 0.0 - Compute the standard deviation for set (7,7,9,9,9,11,11).
- S = 1.6 - Compute the standard deviation for set (7,7,7,9,11,11,11).
- S = 2.0 - Compute the standard deviation for set (4,4,4,9,14,14,14).
- S = 5.0 - Based on your results, briefly explain how the standard deviation provides a useful single number summary of the variation in these data sets.
- The standard deviation is a measure of how widely data values are spread around the mean of a data set.
- Note that in the first data set, the difference between the highest and lowest values is zero and the standard deviation is 0.0, and in the last data set, the difference between the highest and lowest values is higher than in the other data sets and the standard deviation is the highest.
A report claims that the returns for the investment portfolios with a single stock have a standard deviation of 0.57, while the returns for portfolios with 32 stocks have a standard deviation of 0.322.
- Explain how the standard deviation measures the risk in these two types of portfolios.
A lower standard deviation means more certainty in the return and less risk.
- Hence, the returns for portfolios with 32 stocks have less risk than the ones with a single stock.
When referring to a “normal” distribution, does the word normal have the same meaning as it does in ordinary usage?
- Explain.
The word normal has a special meaning in statistics.
- It refers to a specific category of distributions that are symmetric and bell shaped with a single peak.
- The peak corresponds to the mean, median, and mode of such a distribution.
Determine whether the statement makes sense (or is clearly true) or does not make sense (or is clearly false).
- The explanation is more important than the answer.
Among a sample of 1044 adult women, pulse rates are normally distributed with a mean of 75.5 beats per minute, but 80% of the women have pulse rates greater than 75.5 beats per minute.
The statement does not make sense.
- For a normal distribution, only half of the women should have pulse rates above the mean.
Consider the following three distributions.
- Which distribution is not normal?
- B. - Of the two normal distributions, which has the larger standard deviation?
- C.
Determine whether the following data set is likely to be normally distributed.
- Explain the reasoning.
The amounts of rainfall (in inches) on each day of a year in New York.
The given data set is not likely to be normally distributed.
- There will be many days with 0 inches of rain and very few days with large amounts of rain.
- There will be a peak in the distribution at the extreme left.