Flashcards in Lecture 6 Deck (31):

1

## What are the ANOVA assumptions?

###
1. Normality

2. Homogeneity of variance

3. Independence of observations

4. DV measured on an interval or ratio scale

5. X (IV) & Y (DV) are linearly related

2

## Explain the normality assumption of ANOVA.

###
- For any value of x (the IV aka the raw scores) are approximately normally distributed.

- In other words, the raw scores are normally distributed with in each group. Do a frequency distribution of the raw scores for each group.

3

## What is the effect of violation of the normality assumption of ANOVA on type I and type II errors?

###
Type I error:

- non normality only has a slight effect on type I error.

- even for very skewed, or kurtotic (peakedness) distributions.

e.g. nominal alpha (what we set alpha at = type I errror when all assumptions met) vs actual alpha (type I error if one or more assumptions are violated)

4

## In really non-normal populations, when nominal alpha = .05, actual alpha = .055 or .06. If nominal alpha ~ actual alpha, what do we say?

###
We say F is robust to violations of the assumptions.

Therefore F is robust with respect to the normality assumption.

5

## What are the reasons that F is robust with respect to the normality assumption?

###
The sampling distribution of the mean will be normally distributed if:

a) the raw scores are normally distributed in the population.

b) The raw sores in the population are skewed, the sampling distribution of the mean will approach a normal distribution as n increases (n greater than or equal to 30 or so).

6

## Define standard error of the mean:

### The standard deviation o the sampling distribution of the mean.

7

## When would you use a non-parametric test? Why?

### When the population is very skewed. Because non-parametric tests are distribution free, which means they don't have the normality assumption.

8

## What effect does lack of normality have on power?

###
- Only a light effect (a few 100ths)

- Lack of normality due to platy kurtosis (flattened distribution) does affect power, especially if n is small.

9

## How does one check for normality?

###
- Check via frequency distributions

- If big violation of normality with small n --> conduct a non-parametric test --> i.e. distribution doesn't matter.

10

## What are some examples of non-parametric tests?

###
Chi square

Mann whitney

Wilcoxon

Kruskal-Wallace

Friedman

11

## Describe the Homogeneity (homoscedasticity) of variance assumption

###
i.e. variance (refers to error variance aka within group variance) is unaffected by the treatment i.e. the IV

i.e. MSerror

MSwithin

S/A

error due to chance

variability due to chance

etc...

i.e. σ²1 = σ²2 = σ²3 etc

In other words, for every value of x, the variance of y is the same.

12

## Illustration of heteroscedasticity

###
Scores (y axis) and independent variable (x axis)

Each group's scores grouped together above each group

13

## Under what circumstances is F robust for unequal variances?

### If n's are equal or approximately equal.

14

## When is heterogeneity of variance an issue?

###
Only an issue if:

- n's are sharply unequal and a test shows that the variances are sharply unequal.

15

## What is meant by approximately equal n?

### largest n/smallest n < 1.5

16

## What is meant by approximately equal σ²? (variance)

###
Largest variance/smallest variance > 3

If ratio is greater than 3, we have sharply unequal σ².

If Fmax > 3, then the variances are sharply unequal

17

## When is heterogeneity an issue for type I error? (case 1)

###
Case 1: If the largest variance is associated with the group with smallest n

F is liberal. i.e. actual alpha is going to be greater than nominal alpha.

i.e. falsely reject H0 too often

Solution: adjust nominal alpha downwards. e.g. .025 --> therefore actual alpha is approximately .05

18

## When is heterogeneity an issue for type I error? (case 2)

###
Case 2: If largest variance is associated with the group with largest n

F is conservative.

i.e. actual alpha is less than nominal alpha.

So people usually don't make an adjustment.

19

## Explain the independence of observations assumption of ANOVA

###
- Observations within each group are independent of one another.

- Usually satisfied if unrelated subjects run individually and alone.

- Usually satisfied if subjects run individually and alone.

*REALLY IMPORTANT*

20

## Why is the independence of observations assumption of ANOVA so important?

### Because even small violations have a substantial effect on both alpha and power.

21

## How is dependence measured?

### Intraclass correlation

22

## Explain the DV is measured on an interval or ratio scale assumption of ANOVA

###
Check definitions against actual DV used.

If DV is nominal or ordinal, conduct a different type of statistical test. e.g. Chi square test

23

## Explain the X (IV) & Y (DV) are linearly related assumption of ANOVA

###
i.e. a subject's score is comprised of 3 parts:

1. general effect (grand mean)

2. an effect that is unique and constant within a given treatment.

3. An effect that is unpredictable (random error & individual differences).

24

## Give the linear model of the fifth assumption for ANOVA

###
μ + alphaj + eij

where μ = grand mean

alphaj = treatment effect for the jth group

and eij = random error for the ith subject in the jth group

so:

general effect + treatment effect + error

25

## Define an outlier

###
a data point which is very different from the rest of the data.

Outliers can have a dramatic effect on results

26

## When removing outliers, what must be done?

### Must explain why they were removed, this information must be shared.

27

## What causes outliers?

###
1. Human error (eg data entry)

2. instrumentation

3. Subjects significantly different from the rest of the sample --> perhaps from a different population.

Therefore need to detect and remove outliers.

28

## How do you detect outliers for small samples?

###
- The largest possible z score of a data set is bounded by: (n-1)/√n

eg. for n=10, largest possible z score is 2.846, therefore for small samples, scrutinize any data point greater than or equal to z=2.5

29

## How d you detect outliers for large, normally distributed samples?

###
- Approximately 99% of scores are within three standard deviations of the mean.

Therefore z scores >3 should be scrutinized

Note: If n>100 will get some z scores >3 by chance.

A criteria of z>3 is also reasonable for non-normal distributions, but could extend it to z>4.

30

## What happens when subjects are run after analyses?

### Tends to increase variability, therefore decrease probability of finding significance AND if N is really large, tend to get statistical significance no matter what, even if no practical significance.

31