Chpt 11 - Chi-Square Tests Flashcards

1
Q

What test can be used to find out if a die is claimed to be unfair?

A

Chi squared test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the chi square distribution

A

Special type of right-skewed curve which depends on its degrees of freedom

Starts at 0 on the horizontal axis and extends indefinitely to the right, approaching, but never touching, the horizontal axis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the total area under the chi squared curve?

A

Equal to 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the effects of degrees of freedom on a chi-squared curve?

A

The larger the degrees of freedom, the more the X2 curve looks like normal curves

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the value of X2α?

A

The area of α to its right under the chi-square curve

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Which table do we use to find the X2α value?

A

Table VII

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the formula to determine the expected frequency for a chi-squared test?

A

E = np

E is the expected frequency
n is the sample size
p is the probability specified by Ho

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the test statistic for a chi-squared test?

A

X2 = Σ(O-E)squared/E

O is observed frequency

E is expected frequency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the steps to a goodness-of-fit test?

A
  1. Set up the hypotheses
  2. Check the assumptions
  3. Decide significance level and find critical value
  4. Calculate the test statistic
  5. Compare the test statistic with critical value
  6. Interpret the result in the context of the question
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the assumptions that must be checked for a chi-square goodness-of-fit test?

A

All expected frequencies are at least 1

At most 20% of the expected frequencies are less than 5

Simple random sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the basics for the hypotheses for a chi-square goodness-of-fit test?

A

Ho: The variable has the specified distribution

Ha: The variable does not have the specified distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How do we determine degrees of freedom for a chi-squared goodness-of-fit test?

A

c-1

the number of categories - 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

If we are using the P-value to solve a chi-squared goodness-of-fit test, how do we determine if we reject the Ho?

A

If:

α > p value -> we reject Ho

α < p value -> we DO NOT reject ho

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the rejection region of a chi-squared test and how do we determine if we reject Ho?

A

The critical value is X2α with df=C-1

The rejection region is the area to the right of the critical value

If the test statistic is larger than the X2a value, we reject Ho

If the test statistic is smaller than the X2a value, we DO NOT reject Ho

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

A six sided die is claimed to be unfair so we rolled the die 1200 times and observed the results. We are using a significance level of 5%.

Set up the hypotheses

A

Ho: The distribution of the outcome of rolling this die is P(X=x) = 1/6, x = 1, 2, 3, 4, 5, 6

Ha: the distribution is not the one as shown above

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

A six sided die is claimed to be unfair so we rolled the die 1200 times and observed the results. We are using a significance level of 5%.

Determine the significance level and critical value

A

α = 5% = 0.05

df = 6 category (one for each die) -1 = 5

Z2α = 11.070

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

A six sided die is claimed to be unfair so we rolled the die 1200 times and observed the results. We are using a significance level of 5%.

What are the expected outcomes?

A

E = np = 1200 x 1/6 = 200

Each side (1, 2, 3, 4, 5, 6) all have the same expected outcome because we expect Ho to be true, so all sides should be equal

18
Q

A six sided die is claimed to be unfair so we rolled the die 1200 times and observed the results. We are using a significance level of 5%.

Check the assumptions

A

simple random sample ✓

all expected frequencies are at least 1 ✓

at most 20% of the expected frequencies are less than 5 ✓

(all expected outcomes should be 200)

19
Q

A six sided die is claimed to be unfair so we rolled the die 1200 times and observed the results. We are using a significance level of 5%.

If the observed frequency of 3 was 183, what is the statistic for this line?

How do we determine the test statistic?

A

(O-E )squared/E

(183-200) squared/200 = 1.445

The test statistic is the sum of this value for each category (so the dice sides 1-6)

20
Q

A six sided die is claimed to be unfair so we rolled the die 1200 times and observed the results. We are using a significance level of 5%.

Compare:

Test statistic 11.38
Critical value 11.070

A

The test statistic value is greater than the critical value, so we reject Ho

21
Q

A six sided die is claimed to be unfair so we rolled the die 1200 times and observed the results. We are using a significance level of 5%.

Interpret

The test statistic value is greater than the critical value, so we reject Ho

A

At the 5% significance level, the data provides sufficient evidence that the die is unfair

22
Q

A six sided die is claimed to be unfair so we rolled the die 1200 times and observed the results. We are using a significance level of 5%.

Compare:

P(X2 < 11.38) = 0.9557

A

The p-value given by the software is the area to the left, for chi-squared goodness-of-fit test, we need the area to the right so:

1-0.9557 = 0.0444

α = 0.05

α > p value so we reject Ho

23
Q

The proportions of blood types O, A, B, and AB in the general population are known to be 46%, 42%, 9%, 3% correspondingly. A research team, investigating a small isolated community in Canada of 200, obtained the following frequencies of blood type. Test that the proportions in this community differ significantly from those in the general population at 1% significance level.

Set up the hypotheses

A

Ho: distributions of blood type is

P(O) = 0.46
P(A) = 0.42
P(B) = 0.09
P(AB) = 0.03

Ha: the distribution is not the one as shown above

24
Q

The proportions of blood types O, A, B, and AB in the general population are known to be 46%, 42%, 9%, 3% correspondingly. A research team, investigating a small isolated community in Canada of 200, obtained the following frequencies of blood type. Test that the proportions in this community differ significantly from those in the general population at 1% significance level.

Check assumptions

A

simple random sample ✓

all expected frequencies are at least 1 ✓

at most 20% of the expected frequencies are less than 5 ✓

(the smallest expected outcome is 3% of 200=6)

25
Q

The proportions of blood types O, A, B, and AB in the general population are known to be 46%, 42%, 9%, 3% correspondingly. A research team, investigating a small isolated community in Canada of 200, obtained the following frequencies of blood type. Test that the proportions in this community differ significantly from those in the general population at 1% significance level.

Find critical value

A

α = 1% = 0.01

degrees of freedom = number of blood types - 1 = 4-1 = 3

Critical value is 11.345

26
Q

The proportions of blood types O, A, B, and AB in the general population are known to be 46%, 42%, 9%, 3% correspondingly. A research team, investigating a small isolated community in Canada of 200, obtained the following frequencies of blood type. Test that the proportions in this community differ significantly from those in the general population at 1% significance level.

If the number of people with A blood was actually 76 in the community, what is the calculation for this line?

How do we use this information to determine the test statistic?

A

Expected: np = 200*0.42 = 84

(O-E)squared/E

= (76-84) squared / 84

= 0.7619

Test statistic is the sum of this equation for each category

27
Q

The proportions of blood types O, A, B, and AB in the general population are known to be 46%, 42%, 9%, 3% correspondingly. A research team, investigating a small isolated community in Canada of 200, obtained the following frequencies of blood type. Test that the proportions in this community differ significantly from those in the general population at 1% significance level.

Compare:

Test statistic 3.7559
critical value 11.345

A

The test statistic is less than the critical value, we DO NOT reject Ho

28
Q

The proportions of blood types O, A, B, and AB in the general population are known to be 46%, 42%, 9%, 3% correspondingly. A research team, investigating a small isolated community in Canada of 200, obtained the following frequencies of blood type. Test that the proportions in this community differ significantly from those in the general population at 1% significance level.

Interpret:

The test statistic is less than the critical value, we DO NOT reject Ho

A

At the 1% significance level, the data does not provide sufficient evidence that the proportions in this community differ from those in the general population

29
Q

The proportions of blood types O, A, B, and AB in the general population are known to be 46%, 42%, 9%, 3% correspondingly. A research team, investigating a small isolated community in Canada of 200, obtained the following frequencies of blood type. Test that the proportions in this community differ significantly from those in the general population at 1% significance level.

Compare:

P value is P(X2 < 3.7558) = 0.7109

A

The value given is the area to the left and we need the area to the left so:

1-0.7109 = 0.2891

α = 0.01

α < p value so we DO NOT reject Ho

30
Q

What are the 6 steps of a chi-square independence test?

A
  1. Set up the hypotheses
  2. Check the assumptions
  3. Decide significance level and find critical value
  4. Calculate the test statistic
  5. Compare the test statistic with critical value
  6. Interpret the result in the context of the question
31
Q

What are the assumptions for a chi-square independence test?

A

All expected frequencies are at least 1

At most 20% of the expected frequencies are less than 5

Simple random sample

32
Q

What are the basics for the hypotheses for a chi-square goodness-of-fit test?

A

Ho: the two variables are not associated (independent)

Ha: the two variables are associated (not independent)

33
Q

How are expected frequencies determined for chi-independence tests?

A

E = RC/n

E - expected frequencies
R - Row frequency
C - column frequency
n - sample size

34
Q

How is a test statistic calculated for a chi-independence test?

A

X2 = Σ(O-E)squared/E

O is observed frequency
E is expected frequency

Degrees of freedom = (r-1)(c-1)
r - number of row variables
c - number of column variables

As a reminder:

E = RC/n

E - expected frequencies
R - Row frequency
C - column frequency
n - sample size

35
Q

We are interested in a population with a regular doctor by age group and gender in Canada. We randomly selected 17 890 Canadians. At the 5% significance level, we want to test the claim that gender and age are associated.

Set up the hypotheses

A

Ho: gender and age are not associated (independent)

Ha: gender and age are associated (not independent)

36
Q

We are interested in a population with a regular doctor by age group and gender in Canada. We randomly selected 17 890 Canadians. At the 5% significance level, we want to test the claim that gender and age are associated.

Check the assumptions

A

To do this you would have to set up the table to determine expected frequencies first, and when we do, all are over 5, so:

simple random sample ✓

all expected frequencies are at least 1 ✓

at most 20% of the expected frequencies are less than 5 ✓

37
Q

We are interested in a population with a regular doctor by age group and gender in Canada. We randomly selected 17 890 Canadians. At the 5% significance level, we want to test the claim that gender and age are associated. We are going to group the ages into 20-34, 35-44, and 45-64

Determine the critical value

A

α = 5% = 0.05

df = (r-1)(c-1) = (2-1)(3-1) = 1*2 = 2

Critical value is 5.991

38
Q

We are interested in a population with a regular doctor by age group and gender in Canada. We randomly selected 17 890 Canadians. At the 5% significance level, we want to test the claim that gender and age are associated. We are going to group the ages into 20-34, 35-44, and 45-64.

Determine the statistic for this line:

————–20-34—-Total
Women—2768—-9445
Total——-5166—-17890

How do we use this to determine the test statistic for the chi independence test?

A

Expected
= RC/n
= (9445*5166)/17890
= 2727.382

(O-E)squared/E
= (2768-2727.382)squared/2727.382
= 0.6049

For the test statistic, we take the sum of all of the above values

39
Q

We are interested in a population with a regular doctor by age group and gender in Canada. We randomly selected 17 890 Canadians. At the 5% significance level, we want to test the claim that gender and age are associated. We are going to group the ages into 20-34, 35-44, and 45-64.

Compare

Test statistic is 8.4567
critical value is 5.991

A

Test statistic is larger than the critical value, so we reject Ho

40
Q

We are interested in a population with a regular doctor by age group and gender in Canada. We randomly selected 17 890 Canadians. At the 5% significance level, we want to test the claim that gender and age are associated. We are going to group the ages into 20-34, 35-44, and 45-64.

Interpret:

Test statistic is larger than the critical value, so we reject Ho

A

At the 5% significance level, the data provides sufficient evidence that the two variables of gender and age are associated

41
Q

We are interested in a population with a regular doctor by age group and gender in Canada. We randomly selected 17 890 Canadians. At the 5% significance level, we want to test the claim that gender and age are associated. We are going to group the ages into 20-34, 35-44, and 45-64.

Compare

P-value is P(X2, df=2 < 0.9854)

A

The value given is to the left and we need the area to the right so

1-0.9854 = 0.0146

α = 0.05

α > p value so we reject Ho