Quiz 3 Flashcards
(55 cards)
What is a correlation?
- Give three examples of pairs of variables that are correlated
- A correlation exists between two variables when higher values of one variable consistently go with higher or lower values of another variable.
- Amount of smoking and lung cancer, height and weight of people, price of a good and demand of the good.
The scatterplot showed all the data points following a nearly straight diagonal line, but only a weak correlation between the two variables being plotted.
The statement does not make sense.
- The data points following a nearly straight diagonal line would indicate a very strong correlation between the two variables.
The IQ scores and hat sizes of randomly selected adults.
The variables are not correlated.
The heights and weights of 50 randomly selected males between the ages of 10 and 21.
Positive correlation because taller males tend to weigh more.
Listed in the following table are altitudes (in thousands of feet) and outside air temperatures (in Fahrenheit) recorded during a flight between two cities.
2 = 58; 8 = 39; 14 = 26; 23 = -4; 27 = -32; 31 = -39; 33 = -56
- Construct a scatterplot:
- Graph answer is B; nearly diagonal straight line trending downward. - Strong negative correlation.
- R = -1
- For the data, it seems that as the aircraft gains altitude, the outside temperature appears to drop in a strong and consistent pattern.
Briefly explain how an outlier can make it appear that there is correlation when there is none.
- Also briefly explain how an outlier can make it appear that there is no correlation when there is one.
- Under what circumstances is it reasonable to ignore outliers when studying correlations?
- Which outlier would make it appear that there is a correlation when there is none?
- An outlier far separated from the rest of the data points. - Which outlier would make it appear that there is no correlation when there is one?
- An outlier located in a place opposite where the correlation would predict. - Under what circumstances is it reasonable to ignore outliers?
- When there is good reason to suspect that they represent errors in the data.
I created a scatterplot of CEO salaries and corporate revenue for 10 companies and found a negative correlation, but when I left out a data point for a company whose CEO took no salary, there was no correlation for the remaining data.
The statement makes sense.
- A CEO taking no salary is an outlier, and an outlier can make a correlation appear where there otherwise is none.
In one state, the number of unregistered handguns steadily increased over the past several years, and the crime rate increased as well.
- There is a positive correlation between the number of unregistered handguns and an increase in crime rate.
- The correlation is most likely due to a direct cause.
- Many crimes are committed with handguns that are not registered.
It has been found that as the number of traffic lights increases, the number of car crashes also increases.
- There is a positive correlation between the number of traffic lights and the number of car crashes.
- The correlation is most likely due to a common underlying cause, such as the general increase in the number of cars and traffic.
It has been found that as gas prices increase, the distances vehicles are driven tend to get shorter.
- There is a negative correlation between gas prices and the distances vehicles are driven.
- The correlation is most likely due to a direct cause.
- As gas prices increase significantly, people can’t afford to drive as much, so they cut costs by driving less.
The figure shows the birth and death rates for different countries, measured in births and deaths per 1000 people.
- The correlation coefficient is r = approximately 0.8 which indicates a strong positive correlation.
- The points toward the left correspond to relatively wealthy countries, which have low birth rates and low death rates.
- The points toward the right correspond to relatively low income countries, which tend to have high birth rates and high death rates. - Wealthier countries have a negative correlation, so higher birth rates are associated with lower death rates.
- Lower income countries have a positive correlation, so higher birth rates are associated with higher death rates.
What is a best fit line?
- How is a best fit line useful?
- It is a line that lies closer to the data points than any other possible line.
- It is useful to make predictions within the bounds of the data points.
What does the square of the correlation coefficient tell us about a best fit line?
It tells us the proportion of the variation that is accounted for by the best fit line.
- For example, if r2 = 0.9, or 90%, then 90% of the variability is accounted for by the best fit line, but 10% is not.
I used a best fit line for data showing the ages and arm lengths of hundreds of thousands of boys of various ages to predict the mean arm length of 12 year old boys.
The statement makes sense.
- Assuming the data were collected in a reasonable way and all ages were sampled, a scatterplot for thousands of boys should produce a best fit line that makes reasonable predictions of mean arm lengths at different ages.
Using sample data on footprint lengths and heights from men, the equation of the best fit line is obtained, and it is used to find that a man with a footprint length of 40 inches is predicted to have a height of 152 inches, or 12 feet, 8 inches.
The statement does not make sense since a prediction is being made regarding a value that is beyond the bounds of the data points.
Researchers conducted animal experiments to study smoking and lung cancer because it would have been unethical to conduct these experiments on humans.
The statement makes sense.
- Researchers cannot randomly assign people to treatment and control groups and ask subjects in the treatment group to smoke.
Drinking greater amounts of alcohol slows a person’s reaction time.
The causal connection is valid.
- Alcohol is a depressant to the central nervous system, which leads to slower reaction time.
Several things besides smoking have been shown to be probabilistic causal factors in lung cancer.
- For example, exposure to asbestos and exposure to radon gas, both of which are found in many homes, can cause lung cancer.
- Suppose that you meet a person who lives in a home that has a high radon level and insulation that contains asbestos.
- The person tells you, “I smoke, too, because I figure I’m doomed to lung cancer anyway.”
- What would you say in response? Explain.
This person may or may not be doomed to lung cancer, but smoking will only increase the risk of getting lung cancer.
A study reported in Nature claims that women who give birth later in life tend to live longer.
- Of the 78 women who were at least 100 years old at the time of the study, 19% had given birth after their 40th birthday.
- Of the 54 women who were 73 years old at the time of the study, only 5.5% had given birth after their 40th birthday.
- A researcher stated that “if your reproductive system is aging slowly enough that you can have a child in your 40s, it probably bodes well for the fact that the rest of you is aging slowly too.”
- Was this an observational study or an experiment?
- Does the study suggest that later child bearing causes longer lifetimes or that later child bearing reflects an underlying cause?
- This was an observational study.
- The study suggest that later child bearing reflects an underlying cause.
- There are other possible explanations for the findings.
- For example, it’s also possible that the younger women lived during a time when having babies after age 40 was less likely (by choice).
- It is still possible for them to live to be 100 years old.
Those who favor gun control often point to a positive correlation between the availability of handguns and murder rates to support their position that gun control would save lives.
- Does this correlation, by itself, indicate that handgun availability causes a higher murder rate?
- Suggest some other factors that might support or weaken this conclusion.
Availability is not itself a cause.
- Social, economic, or personal conditions cause individuals to use the available handguns.
Distinguish between a distribution of sample means and a distribution of sample proportions.
A distribution of sample means results when the means of all possible samples of a given size are found, and a distribution of sample proportions results when the corresponding proportions are found.
What is a sample mean? What is a sample proportion? Summarize the notation used for these statistics.
- The mean of a particular sample drawn from a population.
- A fraction (or percentage) with which some variable occurs in a sample.
- Notations for samples and populations are pictured in your phone.
I selected three different samples of size n = 20 drawn from the 1250 students at my school, and with these I constructed the sampling distribution.
The statement does not make sense.
- A sampling distribution is a distribution of all possible samples of a particular size, which is far more than three.
Although a company randomly surveys only a few thousand households out of the millions that own cars, they have a good chance of getting an accurate estimate of the proportion of the population with a sports car.
The statement makes sense.
- The sample size is large enough for the distribution of sample proportions to be nearly normal, so individual sample proportions should be clustered around the actual population proportion.