Quiz 3 Flashcards

(55 cards)

1
Q

What is a correlation?
- Give three examples of pairs of variables that are correlated

A
  1. A correlation exists between two variables when higher values of one variable consistently go with higher or lower values of another variable.
  2. Amount of smoking and lung cancer, height and weight of people, price of a good and demand of the good.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

The scatterplot showed all the data points following a nearly straight diagonal line, but only a weak correlation between the two variables being plotted.

A

The statement does not make sense.
- The data points following a nearly straight diagonal line would indicate a very strong correlation between the two variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

The IQ scores and hat sizes of randomly selected adults.

A

The variables are not correlated.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

The heights and weights of 50 randomly selected males between the ages of 10 and 21.

A

Positive correlation because taller males tend to weigh more.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Listed in the following table are altitudes (in thousands of feet) and outside air temperatures (in Fahrenheit) recorded during a flight between two cities.
2 = 58; 8 = 39; 14 = 26; 23 = -4; 27 = -32; 31 = -39; 33 = -56

A
  1. Construct a scatterplot:
    - Graph answer is B; nearly diagonal straight line trending downward.
  2. Strong negative correlation.
  3. R = -1
  4. For the data, it seems that as the aircraft gains altitude, the outside temperature appears to drop in a strong and consistent pattern.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Briefly explain how an outlier can make it appear that there is correlation when there is none.
- Also briefly explain how an outlier can make it appear that there is no correlation when there is one.
- Under what circumstances is it reasonable to ignore outliers when studying correlations?

A
  1. Which outlier would make it appear that there is a correlation when there is none?
    - An outlier far separated from the rest of the data points.
  2. Which outlier would make it appear that there is no correlation when there is one?
    - An outlier located in a place opposite where the correlation would predict.
  3. Under what circumstances is it reasonable to ignore outliers?
    - When there is good reason to suspect that they represent errors in the data.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

I created a scatterplot of CEO salaries and corporate revenue for 10 companies and found a negative correlation, but when I left out a data point for a company whose CEO took no salary, there was no correlation for the remaining data.

A

The statement makes sense.
- A CEO taking no salary is an outlier, and an outlier can make a correlation appear where there otherwise is none.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

In one state, the number of unregistered handguns steadily increased over the past several years, and the crime rate increased as well.

A
  1. There is a positive correlation between the number of unregistered handguns and an increase in crime rate.
  2. The correlation is most likely due to a direct cause.
    - Many crimes are committed with handguns that are not registered.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

It has been found that as the number of traffic lights increases, the number of car crashes also increases.

A
  1. There is a positive correlation between the number of traffic lights and the number of car crashes.
  2. The correlation is most likely due to a common underlying cause, such as the general increase in the number of cars and traffic.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

It has been found that as gas prices increase, the distances vehicles are driven tend to get shorter.

A
  1. There is a negative correlation between gas prices and the distances vehicles are driven.
  2. The correlation is most likely due to a direct cause.
    - As gas prices increase significantly, people can’t afford to drive as much, so they cut costs by driving less.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

The figure shows the birth and death rates for different countries, measured in births and deaths per 1000 people.

A
  1. The correlation coefficient is r = approximately 0.8 which indicates a strong positive correlation.
  2. The points toward the left correspond to relatively wealthy countries, which have low birth rates and low death rates.
    - The points toward the right correspond to relatively low income countries, which tend to have high birth rates and high death rates.
  3. Wealthier countries have a negative correlation, so higher birth rates are associated with lower death rates.
    - Lower income countries have a positive correlation, so higher birth rates are associated with higher death rates.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a best fit line?
- How is a best fit line useful?

A
  1. It is a line that lies closer to the data points than any other possible line.
  2. It is useful to make predictions within the bounds of the data points.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does the square of the correlation coefficient tell us about a best fit line?

A

It tells us the proportion of the variation that is accounted for by the best fit line.
- For example, if r2 = 0.9, or 90%, then 90% of the variability is accounted for by the best fit line, but 10% is not.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

I used a best fit line for data showing the ages and arm lengths of hundreds of thousands of boys of various ages to predict the mean arm length of 12 year old boys.

A

The statement makes sense.
- Assuming the data were collected in a reasonable way and all ages were sampled, a scatterplot for thousands of boys should produce a best fit line that makes reasonable predictions of mean arm lengths at different ages.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Using sample data on footprint lengths and heights from men, the equation of the best fit line is obtained, and it is used to find that a man with a footprint length of 40 inches is predicted to have a height of 152 inches, or 12 feet, 8 inches.

A

The statement does not make sense since a prediction is being made regarding a value that is beyond the bounds of the data points.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Researchers conducted animal experiments to study smoking and lung cancer because it would have been unethical to conduct these experiments on humans.

A

The statement makes sense.
- Researchers cannot randomly assign people to treatment and control groups and ask subjects in the treatment group to smoke.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Drinking greater amounts of alcohol slows a person’s reaction time.

A

The causal connection is valid.
- Alcohol is a depressant to the central nervous system, which leads to slower reaction time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Several things besides smoking have been shown to be probabilistic causal factors in lung cancer.
- For example, exposure to asbestos and exposure to radon gas, both of which are found in many homes, can cause lung cancer.
- Suppose that you meet a person who lives in a home that has a high radon level and insulation that contains asbestos.
- The person tells you, “I smoke, too, because I figure I’m doomed to lung cancer anyway.”
- What would you say in response? Explain.

A

This person may or may not be doomed to lung cancer, but smoking will only increase the risk of getting lung cancer.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

A study reported in Nature claims that women who give birth later in life tend to live longer.
- Of the 78 women who were at least 100 years old at the time of the study, 19% had given birth after their 40th birthday.
- Of the 54 women who were 73 years old at the time of the study, only 5.5% had given birth after their 40th birthday.
- A researcher stated that “if your reproductive system is aging slowly enough that you can have a child in your 40s, it probably bodes well for the fact that the rest of you is aging slowly too.”
- Was this an observational study or an experiment?
- Does the study suggest that later child bearing causes longer lifetimes or that later child bearing reflects an underlying cause?

A
  1. This was an observational study.
  2. The study suggest that later child bearing reflects an underlying cause.
  3. There are other possible explanations for the findings.
    - For example, it’s also possible that the younger women lived during a time when having babies after age 40 was less likely (by choice).
    - It is still possible for them to live to be 100 years old.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Those who favor gun control often point to a positive correlation between the availability of handguns and murder rates to support their position that gun control would save lives.
- Does this correlation, by itself, indicate that handgun availability causes a higher murder rate?
- Suggest some other factors that might support or weaken this conclusion.

A

Availability is not itself a cause.
- Social, economic, or personal conditions cause individuals to use the available handguns.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Distinguish between a distribution of sample means and a distribution of sample proportions.

A

A distribution of sample means results when the means of all possible samples of a given size are found, and a distribution of sample proportions results when the corresponding proportions are found.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is a sample mean? What is a sample proportion? Summarize the notation used for these statistics.

A
  1. The mean of a particular sample drawn from a population.
  2. A fraction (or percentage) with which some variable occurs in a sample.
  3. Notations for samples and populations are pictured in your phone.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

I selected three different samples of size n = 20 drawn from the 1250 students at my school, and with these I constructed the sampling distribution.

A

The statement does not make sense.
- A sampling distribution is a distribution of all possible samples of a particular size, which is far more than three.

24
Q

Although a company randomly surveys only a few thousand households out of the millions that own cars, they have a good chance of getting an accurate estimate of the proportion of the population with a sports car.

A

The statement makes sense.
- The sample size is large enough for the distribution of sample proportions to be nearly normal, so individual sample proportions should be clustered around the actual population proportion.

25
A population consists of a batch of 25,470 aspirin tablets, and it includes 1,020 that are defective because they do not meet specifications. - A random sample of n = 220 of the tablets is obtained and tested, with the result that 15 of them are defective.
1. The population proportion of defective aspirin tablets is p = 0.04 2. The sample proportion of defective aspirin tablets is p hat = 0.068 3. Random samples collected with sound sampling methods vary, and the sample proportions differ from the population proportion because of sampling error.
26
Suppose you have measured the mean in a sample drawn from a much larger population. - What value should you use as your estimate of the population mean?
The sample mean.
27
Once you have constructed the 95% confidence interval around your sample mean, how do you interpret its possible relationship to the population mean?
There is 95% confidence that the confidence interval limits actually contain the true value of the population mean.
28
Based on our sample, the 95% confidence interval for the mean amount of television watched by adults in a nation is 2.8 to 3.8 hours per day. - Therefore, there is 95% chance that the actual mean for the population is 3.3 hours.
The statement does not make sense. - The center of a confidence interval is not necessarily the population mean.
29
Here is a typical statement made by the media: “Based on a recent study, pennies weigh an average of 2.5 grams with a margin of error of 0.006 gram.” - What important and relevant piece of information is omitted from that statement? - Is it okay to use the word “average”?
The media often omit reference to the confidence level, which is typically 95%. - The word “mean” should be used instead of the word “average.”
30
Assume that you want to construct a 95% confidence interval estimate of a population mean. - Find an estimate of the minimum sample size needed to obtain the specified margin of error for the 95% confidence interval. - Use the given sample standard deviation as an estimate of the population standard deviation. Margin of error, E =4.2 minutes; sample standard deviation, s=51.5 minutes
The required sample size is 578. Formula is 1.96 x the sample standard deviation, divided by the margin of error, squared.
31
Assume that you want to construct a 95% confidence interval estimate of a population mean. - Find an estimate of the minimum sample size needed to obtain the specified margin of error for the 95% confidence interval. - Use the given sample standard deviation as an estimate of the population standard deviation. Margin of error, E=8.6 grams; sample standard deviation, s=62.1 grams
The required sample size is 201.
32
Suppose you conducted an opinion poll and measured the proportion of your sample that held a particular view. - What value should you use as your estimate of the population proportion?
The sample proportion.
33
Once you have constructed the 95% confidence interval around your sample proportion, what does this tell you about the estimated value of the population proportion?
We have 95% confidence that the confidence interval limits actually contain the true value of the population proportion.
34
Our survey found that 61% of voters approve of a particular policy of the President, with a margin of error (for 95% confidence) of 3 percentage points. - Therefore, there is only a 5% chance that the proportion of approval among all voters differs from 61%.
The statement does not make sense. - The confidence level is not the probability that the interval contains the population parameter.
35
A newspaper publishes an article stating that, based on survey results, 75% of local residents oppose an increase in the sales tax, with a margin of error of 6 percentage points. - We can therefore express the confidence interval as 0.69 < p < 0.81
The statement makes sense. - The margin of error was added to and subtracted from the sample proportion to construct the confidence interval.
36
A newspaper provided a “snapshot” illustrating poll results from 1910 professionals who interview job applicants. - The illustration showed that 26% of them said the biggest interview turnoff is that the applicant did not make an effort to learn about the job or the company. - The margin of error was given as + or - 3 percentage points. - What important feature of the poll was omitted?
The confidence level.
37
Assume that you want to construct a 95% confidence interval to estimate a population proportion. - Estimate the minimum sample size deed to obtain the margin of error E =0.017 for the 95% confidence interval.
The minimum sample size is 3461. 1 divided by the margin of error squared.
38
Assume that you want to construct a 95% confidence interval to estimate a population proportion. - Estimate the minimum sample size needed to obtain the margin of error E=0.235 for the 95% confidence interval.
The minimum sample size is 19.
39
A poll finds that 56% of the population approves of the job that the President is doing; the poll has a margin of error of 8% (assuming a 95% degree of confidence).
1. Find the 95% confidence interval. 0.48 < p < 0.64 2. The minimum sample size was 157.
40
What is a hypothesis in statistics? - What is meant by a hypothesis test in statistics?
1. A hypothesis is a claim about a population parameter (such as a population proportion, p, or a population mean, u) or some other characteristic of a population. 2. A hypothesis test is a standard procedure for testing a claim about the value of a population parameter.
41
What are two possible outcomes of a hypothesis test, and what do they mean? - Can such a test have an outcome of accepting the null hypothesis?
1. What are the two possible outcomes of a hypothesis test? - Rejecting the null hypothesis, in which case there is evidence in support of the alternative hypothesis. - Not rejecting the null hypothesis, in which case there is not enough evidence to support the alternative hypothesis. 2. Can such a test have an outcome of accepting the null hypothesis? - No. The null hypothesis is the starting assumption. The hypothesis test may not give us reason to reject this starting assumption, but it cannot by itself give us reason to conclude that the starting assumption is true.
42
In interpreting a P value of 0.44, a researcher states that the results are statistically significant because the P value is less than 0.5, indicating that the results are not likely to occur by chance.
The statement does not make sense. - A P value of 0.44 corresponds to results that are likely to occur by chance.
43
In a testing method of sex selection, 50 couples are given a treatment designed to increase the likelihood of a female, and each couple has one baby. Assume that none of the babies in parts a through c are intersex. - If the 50 babies include exactly 44 females, would you consider this result statistically significant or would you attribute it to random fluctuations? - If the 50 babies includes exactly 28 females, would you consider this result statistically significant or would you attribute it to random fluctuations?
1. The result is statistically significant. - The result is unlikely the result of random fluctuations. 2. The result is not statistically significant. - The result could be the result of random fluctuations. 3. The P value is low, so it corresponds to the outcome from part a.
44
A business manager claims that less than 85% of their employees are college graduates.
Reject the null hypothesis that the proportion of employees is equal to 0.85, in which case there is evidence in support of the claim that the proportion of employees is less than 0.85. Do not reject the null hypothesis, in which case there is not enough evidence to support the claim that the proportion of employees is less than 0.85.
45
A human resources manager claims that greater than 5% of new hires fail their background checks.
1. Which of the following is the hypothesis test to be conducted? - H0: p = 0.05 and Ha: p > 0.05 2. The hypothesis test will be right tailed. 3. Reject the null hypothesis that the proportion of employees is equal to 0.05, in which case there is evidence in support of the claim that the proportion of employees is greater than 0.05. Do not reject the null hypothesis, in which case there is not enough evidence to support the claim that the proportion of employees is greater than 0.05.
46
Briefly describe what each of the variables n, x bar, s, sigma o, and u represent in hypothesis tests of a claim made about a population mean.
1. The variable n represents the sample size. 2. The variable x bar represents the sample mean. 3. The variable s represents the sample standard deviation. 4. The variable sigma o represents the population standard deviation. 5. The variable u represents the standard score for the sample mean.
47
In hypothesis tests, if the significance level is 0.01, then the P value is also 0.01
The statement does not make sense. - The significance level and the P value represent different components of the hypothesis test, and are generally not the same.
48
In testing a claim about a population mean, if the standard score for a sample mean is z = 0, then there is not sufficient sample evidence to support the alternative hypothesis.
The statement makes sense. - A standard score of 0 represents the peak of the sampling distribution, so it is a likely outcome if the null hypothesis is true.
49
It is always wise to select a significance level of zero to minimize the chance of making an error.
The statement does not make sense. - With a significance level of zero, the test must always fail to reject the null hypothesis, and the test will never reveal anything about the population of interest.
50
H0: The lottery is fair. Ha: The lottery is biased.
1. Identify the type I error. - Reject the claim that the lottery is fair when the lottery is fair. 2. Identify the type II error. - Fail to reject the claim that the lottery is fair when the lottery is biased. Type I error = Rejecting the null hypothesis when it is true. Type II error = Accepting the null hypothesis when it is false.
51
H0: The car is new. Ha: The car is old.
Type I error: Reject the claim that the car is new when the car is new. Type II error: Fail to reject the claim that the car is new when the car is old.
52
What do n, p, p hat, and p value represent?
1. The symbol n represents the sample size. 2. The symbol p represents the proportion in a population. 3. The symbol p hat represents the proportion in a sample. 4. The term p value represents the probability of getting a sample proportion that is at last as extreme as the sample proportion actually observed given that the null hypothesis is true.
53
What do we mean by critical values for significance in a hypothesis test for the population proportion? - How does this compare to the critical values for statistical significance for a population mean? - How do we use this for making decisions about the hypothesis test?
1. The critical values are the standard scores required for statistical significance at a given level. 2. The critical values are the same as those used with population means. 3. The level of statistical significance of the data is assessed by comparing the standard score to the critical values.
54
In a test of the claim that a majority of Americans believe that human activity is the major cause of global warming, the null hypothesis is that p = 0.5 and the alternative hypothesis is p > 0.5
The statement makes sense. - The null hypothesis is the starting assumption and the alternative hypothesis is the claim that needs to be supported by evidence.
55
The area to the right of the standard score z = 1.0 is 0.1587, so the p value in a two tailed test is 0.1587
The statement does not make sense. - The P value is equal to twice the area in the tail past the standard score, Z.