Chpt 12 - One-Way ANOVA Flashcards

1
Q

What does ANOVA stand for?

A

ANalysis Of VAriance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

We want to determine if changing the cotton content changes the mean tensile strength. We set up 5 different levels of cotton content.

In this experiment, what is the response variable?

A

Tensile strength

It is the variable to be measured in the experiment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

We want to determine if changing the cotton content changes the mean tensile strength. We set up 5 different levels of cotton content.

In this experiment, what is the factor?

A

Cotton content

Its the variable that is going to be changed for the study

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

We want to determine if changing the cotton content changes the mean tensile strength. We set up 5 different levels of cotton content.

What type of test can be used to determine this?

A

ANOVA test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the response variable?

A

The dependent variable.

It is the variable of interest to be measured in the experiment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are factors?

A

The variables whose effect on the response variable is studied in the experiment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the factor levels?

A

The values of a factor in the experiment

For cotton content, you may have 20% and 25% to check

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a one-way ANOVA test?

A

Used when there is only one factor that we are changing in the experiment.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

We want to determine if changing the cotton content changes the mean tensile strength. We set up 5 different levels of cotton content (A, B, C, D, E). We also have 3 different colour dyes we are going to use (1, 2, 3).

What are the treatments in this experiment?

A

The possible factor level combinations in the experiment.

For this example we have:
A1, B1, C1, D1, E1
A2, B2, C2, D2, E2
A3, B3, C3, D3, E3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How do we determine the overall mean of an ANOVA test?

A

x̄ = sum of all sample data/sum of all sample sizes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How do we determine the sample mean for each sample?

A

x̄1 = sum of all data in sample 1/sample size of sample 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Which statistic measures the total variation of the data from all samples in an ANOVA test?

A

Total Sum of Squares (SST)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does the total sum of squares (SST) measure?

A

The total variation of the data from all samples

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What does the SST consist of?

A

Variation between groups (SST) = Variation between groups (SSTR) + variation within groups (SSE)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Which statistic measures the variation between different samples of an ANOVA test?

A

Sum of Squares of TReatment (SSTR) is the between group variation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Which statistic measures the variation within samples of an ANOVA test?

A

Sum of Squares of Error (SSE) is the within group variation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the equation for the SST?

A

Sum of Squares Total

SST = Σ(xi-x̄) squared

OR

SST = Σx squared - (Σx values) squared/n

With degrees of freedom = n-1
n=total number of samples

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is the equation for the SSTR?

A

Sum of Squares of TReatment

SSTR = Σni(x̄i-x̄)squared

The i is observations in a sample

So ni is the sample size of the individual sample

x̄i is the sample mean of the individual sample

x̄ is the overall mean

degrees of freedom = k-1
k - number of factors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is the equation for the SSE?

A

Sum of Squares of Error

SSE = Σ all samples Σ each sample point in a given sample (xi-x̄i)

xi is the individual sample point in a given sample

x̄i is the sample mean of the individual sample

OR

SSE = SST - SSTR

degrees of freedom = n-k
n - total number of samples
k - number of factors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

We have an the following numbers for a one-way ANOVA test:

x̄ = 5.5, SST = 95, SSTR = 3, SSE = 92

What does this tell us?

A

The between groups (SSTR) is small and the variation within groups (SSE) is large.

Therefore it is hard to tell if the variation is due to the difference between the population means or to the variation within the samples.

This means we are unable to conclude that the two population means are different.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

We have an the following numbers for a one-way ANOVA test:

x̄ = 5.5, SST = 95, SSTR = 87, SSE = 8

What does this tell us?

A

The between groups (SSTR) is relatively larger than the variation within groups (SSE).

This means most of the variation is due to the difference between the population means.

Therefore, we may claim the population means are NOT identical.

22
Q

You have two samples

Sample 1: 1, 2, 3, 3, 4, 5; n=6

Sample 2: 6, 7, 8, 8, 9, 10; n=6

What is the overall mean?

A

x̄ = sum of values/total n

= (1+2+3+3+4+5+6+7+8+8+9+10)/12
= 66/12
= 5.5

23
Q

You have two samples

Sample 1: 1, 2, 3, 3, 4, 5; n=6

Sample 2: 6, 7, 8, 8, 9, 10; n=6

What are each of the sample means?

A

x̄ = sum of values/n

x̄1 = (1+2+3+3+4+5)/6 = 3

x̄2 = (6+7+8+8+9+10)/6 = 8

24
Q

You have two samples

Sample 1: 1, 2, 3, 3, 4, 5; x̄ = 3; n=6

Sample 2: 6, 7, 8, 8, 9, 10; x̄ = 8; n=6

Total: x̄=5.5, n=12

What is the total variation of the whole data (SST)?

A

SST = Σ(xi-x̄) squared

= (1-5.5)2+(2-5.5)2 +(3-5.5)2+(3-5.5)2+(4-5.5)2+(5-5.5)2+(6-5.5)2+(7-5.5)2+(8-5.5)2+(8-5.5)2+(9-5.5)2+(10-5.5)2

= 95

25
Q

You have two samples

Sample 1: 1, 2, 3, 3, 4, 5; x̄ = 3; n=6

Sample 2: 6, 7, 8, 8, 9, 10; x̄ = 8; n=6

Total: x̄=5.5, n=12

What is the sum of squares treatment (SSTR)?

A

SSTR = Σni(x̄i-x̄)squared

= 6(3-5.5)2 + 6(8-5.5)2

= 75

26
Q

You have two samples

Sample 1: 1, 2, 3, 3, 4, 5; x̄ = 3; n=6

Sample 2: 6, 7, 8, 8, 9, 10; x̄ = 8; n=6

Total: x̄=5.5, n=12

What is the sum of squares of errors (SSE)?

A

SSE = Σ all samples Σ each sample point in a given sample (xi-x̄i)squared

SSE = Σ all samples

Sample 1 = (xi-x̄1)squared
=(1-3)2+(2-3)2+(3-3)2+(3-3)2+(4-3)2+(5-3)2
=10

Sample 2 = (xi-x̄2)squared
=(6-8)2+(7-8)2+(8-8)2+(8-8)2+(9-8)2+(10-8)2
=10

SSE = Σ all samples = 10+10 = 20

27
Q

What are the assumptions for an ANOVA test?

A

Normally distributed - each population is normally distributed with their own μ

σ is the same for ALL populations

So we assume all observations in the population satisfy the following:
xij = μi + εij

28
Q

Since the k populations are assumed to be normally distributed with the same standard deviation, what would the only difference between the k populations be?

A

the means IF the treatments impact the response variable

29
Q

An assumption we make for ANOVA test is that

xij = μi + εij

What does this mean?

A

The jth observation from the ith population, xij, is equal to the sum of the population mean μi and the measurement error εij

The measurement error of εij is used to describe the error in the measurement or the part in this measurement that cannot be explained by the mean μi (the ith treatment)

The measurement error of εij is independent, normally distributed with a mean of 0 and standard deviation of σ

30
Q

When performing an ANOVA test, what is the basic hypotheses that are set up?

A

Ho: μ1 = μ2 = μ3 = …. = μk
(so treatments to NOT impact response variable)

Ha: At least ONE of the means is different from the others

31
Q

If the null hypotheses in an ANOVA test is true, what do we know about the variance between groups?

A

It cannot be large

32
Q

If the null hypotheses in an ANOVA test is not true, what do we know about the variance within and between groups?

A

When the null hypothesis is not true, the variance within groups (SSE) should be relatively small in relation to the total variance (SST)

33
Q

How can we remember which value is for the variation between groups vs within groups and the total?

A

SST - the T actually means total

SSTR - the TReatments value is going to be large if the treatments work, and the between group number changing is what we want

SSE - the error value is a natural variation WITHIN the population, so it would be within the group as well

34
Q

What is the test statistic of an ANOVA test?

A

Fo =

SSTR/(k-1) MSTR
————— = ———-
SSE/(n-k) MSE

SSTR - variation between groups
SSE - variation within groups
k - number of factor levels
n - total number of samples

35
Q

What are the degrees of freedom for the SST, SSTR, and SST and what is their relationship?

A

SST = n-1
SSTR = k-1
SSE = n-k

n - total number of samples
k - number of factors

df SST = df SSTR + df SSE

36
Q

Describe the F-distribution used in ANOVA tests

A

Density curve that is unimodal, right skewed, and determined by degrees of freedom, df 1 is the numerator of the test statistic and df 2 is the denominator of the test statistic

Test statistic for a reminder:

Fo =

SSTR/(k-1) MSTR
————— = ———-
SSE/(n-k) MSE

37
Q

If the MSTR is much larger than the MSE, what does the data indicate?

A

This would result in an Fo value that is large

That the treatment has had an effect on the mean response

38
Q

What is the MSTR?

What does it denote?

A

Mean Square TReatment

SSTR/(k-1)

39
Q

What is the MSE?

What does it denote?

A

Mean Square Error

SSE/(n-k)

40
Q

How is the P value written?

What does it mean when the pvalue is small?

A

P(F k-1,n-k > Fo)

When the p value is small, the value of Fo is “extremely large” and therefore we tend to reject Ho

41
Q

What are the steps to perform an ANOVA test?

A
  1. Set up the hypotheses
  2. Check assumptions
  3. Decide the significance level and find the critical value in the F table
  4. Calculate the test statistic
    (So you have to find the SST, SSTR, SSE, MSTR, and MSE) all to find Fo
  5. Compare
  6. Interpret the results in context
42
Q

The cotton content that will be used to make cloth for shirts varies from 10-40% by weight. To investigate the effect of the factor in tensile strength, an experiment choses 5 levels (A, B, C, D, E). 5 shirts of each factor are randomly chosen and tested. The 25 tensile strengths are measured in random order. We want to determine if the cotton content changes the mean strength using a 5% significance level.

Set up the hypotheses

A

Ho: μ1 = μ2 = μ3 = μ4 = μ5

Ha: not all the means are equal

43
Q

The cotton content that will be used to make cloth for shirts varies from 10-40% by weight. To investigate the effect of the factor in tensile strength, an experiment choses 5 levels (A, B, C, D, E). 5 shirts of each factor are randomly chosen and tested. The 25 tensile strengths are measured in random order. We want to determine if the cotton content changes the mean strength using a 5% significance level.

What assumptions are made?

A

Normally distributed - each population is normally distributed with their own μ

σ is the same for ALL populations

So we assume all observations in the population satisfy the following:
xij = μi + εij

44
Q

The cotton content that will be used to make cloth for shirts varies from 10-40% by weight. To investigate the effect of the factor in tensile strength, an experiment choses 5 levels (A, B, C, D, E). 5 shirts of each factor are randomly chosen and tested. The 25 tensile strengths are measured in random order. We want to determine if the cotton content changes the mean strength using a 5% significance level.

What is the significance level and critical value?

A

α = 5% = 0.05

Degrees of freedom:

Numerator = k-1 = 5-1 = 4

Denominator = n-k = 25-5 = 20

F 4, 20, 0.05 = 2.87

45
Q

The cotton content that will be used to make cloth for shirts varies from 10-40% by weight. To investigate the effect of the factor in tensile strength, an experiment choses 5 levels (A, B, C, D, E). 5 shirts of each factor are randomly chosen and tested. The 25 tensile strengths are measured in random order. We want to determine if the cotton content changes the mean strength using a 5% significance level.

Tr n x̄i si
A 5 9.8 3.35
B 5 15.4 3.13
C 5 17.6 2.07
D 5 21.6 2.61
E 5 10.8 2.86

What is the overall mean?

A

x̄ = (Σnix̄i)/(Σni)

x̄ =( (59.8)+(515.4)+(517.6)+(521.6)+(5*10.8) ) / (5+5+5+5+5)

x̄ = 15.04

46
Q

The cotton content that will be used to make cloth for shirts varies from 10-40% by weight. To investigate the effect of the factor in tensile strength, an experiment choses 5 levels (A, B, C, D, E). 5 shirts of each factor are randomly chosen and tested. The 25 tensile strengths are measured in random order. We want to determine if the cotton content changes the mean strength using a 5% significance level.

Tr n x̄i si
A 5 9.8 3.35
B 5 15.4 3.13
C 5 17.6 2.07
D 5 21.6 2.61
E 5 10.8 2.86

x̄ = 15.04

What is the SSTR? What is it’s degrees of freedom?

A

SSTR = Σni(x̄i-x̄)squared

=5(9.8-15.04)2 + 5(15.4-15.04)2 + 5(17.6-15.04)2 + 5(21.6-15.04)2 + 5(10.8-15.04)2

= 475.76

Degrees of freedom = k-1 = 5-1 = 4

47
Q

The cotton content that will be used to make cloth for shirts varies from 10-40% by weight. To investigate the effect of the factor in tensile strength, an experiment choses 5 levels (A, B, C, D, E). 5 shirts of each factor are randomly chosen and tested. The 25 tensile strengths are measured in random order. We want to determine if the cotton content changes the mean strength using a 5% significance level.

Tr n x̄i si
A 5 9.8 3.35
B 5 15.4 3.13
C 5 17.6 2.07
D 5 21.6 2.61
E 5 10.8 2.86

x̄ = 15.04

What is the SSE? What is it’s degrees of freedom?

A

SSE = Σ(ni-1)si squared

=4(3.35)2+4(3.13)2+4(2.07)2+4(2.61)2+4(2.86)2

= 161.2

Degrees of freedom = n-k = 25-5 = 20

48
Q

The cotton content that will be used to make cloth for shirts varies from 10-40% by weight. To investigate the effect of the factor in tensile strength, an experiment choses 5 levels (A, B, C, D, E). 5 shirts of each factor are randomly chosen and tested. The 25 tensile strengths are measured in random order. We want to determine if the cotton content changes the mean strength using a 5% significance level.

We know:

SSTR: 475.76
SSE: 161.2

What is the SST? What is it’s degrees of freedom?

A

SST = SSTR + SSE

= 475.76 + 161.2

= 636.98

Degrees of freedom = n-1 = 25-1 = 24

49
Q

The cotton content that will be used to make cloth for shirts varies from 10-40% by weight. To investigate the effect of the factor in tensile strength, an experiment choses 5 levels (A, B, C, D, E). 5 shirts of each factor are randomly chosen and tested. The 25 tensile strengths are measured in random order. We want to determine if the cotton content changes the mean strength using a 5% significance level.

We know:

SSTR: 475.76, df = 4
SSE: 161.2, df = 20
SST: 636.98, df = 24

What is our test statistic?

A

Fo =

MSTR SSTR/(k-1)
——– = ————— =
MSE SSE/(n-k)

475.76/4 118.94
————- = ———– =
161.2/20 8.06

= 14.76

50
Q

The cotton content that will be used to make cloth for shirts varies from 10-40% by weight. To investigate the effect of the factor in tensile strength, an experiment choses 5 levels (A, B, C, D, E). 5 shirts of each factor are randomly chosen and tested. The 25 tensile strengths are measured in random order. We want to determine if the cotton content changes the mean strength using a 5% significance level.

Compare:

Srce–df—–ss———-MS——-F
TR—–4—475.76—118.96—14.76
Er—–20–161.2——-8.06
To—-24—636.96

F4, 20, 0.05 = 2.87

A

Our test statistic is larger than our Fα value, we reject the Ho

51
Q

The cotton content that will be used to make cloth for shirts varies from 10-40% by weight. To investigate the effect of the factor in tensile strength, an experiment choses 5 levels (A, B, C, D, E). 5 shirts of each factor are randomly chosen and tested. The 25 tensile strengths are measured in random order. We want to determine if the cotton content changes the mean strength using a 5% significance level.

Compare:

P(F4, 20 >14.76) = 0.000091

A

α = 0.05

α > p value so we reject Ho

52
Q

The cotton content that will be used to make cloth for shirts varies from 10-40% by weight. To investigate the effect of the factor in tensile strength, an experiment choses 5 levels (A, B, C, D, E). 5 shirts of each factor are randomly chosen and tested. The 25 tensile strengths are measured in random order. We want to determine if the cotton content changes the mean strength using a 5% significance level.

Interpret:
Our test statistic is larger than our Fα value, we reject the Ho

A

At the 5% significance level, the data provides enough evidence that changing the cotton content changes the mean strength.