Flashcards in DT Deck (27)

Loading flashcards...

1

##
Ungrouped data is?

What type of analyses?

What to consider?

###
One or more continuous independent variable (e.g. hight)

• Use correlations, regressions

• Need to check for normality/skewnwss

2

##
Grouped data

What type of analyses?

What to consider? (2 things)

###
One or more categorical independent variables

t-test, ANOVA

(1) Normality is assumed if sample sizes within cells are 20 or greater (central limit theorem)

BUT sample sizes within cells tends to be small

(2) outlines can exert too much influence

3

## What is central limit theorem?

### The central limit theorem states that if you have a population with mean μ and standard deviation σ and take sufficiently large random samples from the population with replacement, then the distribution of the sample means will be approximately normally distributed

4

## What techniques can be used for assessing normality?

###
1) Normal probability plots

2) Normality tests

3) Histograms

5

## What are the normality tests discussed in class?

###
1) Tests for skewness

2) Shapiro-Wilk

3) Kolmogorov-Smirnov

6

## What is the test for skewness and how is it interpreted?

###
skewness statistic

OVER

standard error of skewness

-> compared to a critical z value that varies based on sample size

-> positive = positively skewed, neg = negatively skewed (if outside of critical range)

7

## When is the Kolmogorov-Smirnov test valid? when is it not?

###
Valid when testing whether a set of observations are from a completely specified continuous distribution

Meaning: if one or more parameters must be estimated from the sample then the tables are not valid

8

## What types of probability plots are used to assess normality?

###
Q-Q Plot

P-P Plot

Detrended normal probability plots

9

##
What is a Q-Q plot?

What are Q-Q plots better for?

###
The Q-Q is plotting the actual values of the variable against the theoretical values for the normal distribution.

Q-Q plot is better at finding deviations in the tails (Q has a tail)

10

## What are detrended normal probability plots?

### Deviations from the diagonal are plotted meaning that the positive linear trend is eliminated

11

## What are the 4 steps for producing a normal probability plot?

###
1. arranging the data from smallest to largest.

2. determining the percentile of each data value.

3. determining the corresponding z-scores from these percentiles based on the normal distribution.

4. plotting each z-score against its corresponding data value.

12

##
What is a P-P plot?

What are P-P plots better for?

###
A P-P plot plots the corresponding areas under the curve (cumulative distribution function) for those values.

P-P plot is better at finding deviations from normality in the center of the distribution

13

## Which plot tends to be preferred in research situations?

### Q-Q (over P-P)

14

##
What would need to be done if there is a large number of subjects that are contributing to the skew?

what would you consider if there is only a small number of subjects contributing to it?

###
Mathematical transformation

Winsorizing

15

## What is Winsorizing

###
a method for minimizing the influence of outliers by

(1) assigning the outlier a lower weight

OR

(2) changing the outlier value so that it is closer to the other values in the set

16

##
What do then Kolmogorov–Smirnov test and Shapiro–Wilk test do?

How are they interpreted

What is a potential concern?

###
they compare the scores in the sample to a normally distributed set of scores with the same mean and standard deviation

If the test is non-significant (p > .05) it tells us that the distribution of the sample is not significantly different from a normal distribution

Larger sample sizes increase the chance of getting significant results from small deviations from normality that may not be important

17

## How do you interpret QQ or PP plots?

###
if a line sags consistently below or rises consistently above, the problem is kurtosis

if S shape the issue is skewness

18

## What side is the hump on in a positive skew?

### left hand side (remember turn to the right and it makes a P for positive)

19

## What side is the hump on in a positive skew?

### Right hand side

20

## What does positive kurtosis vs negative kurtosis look like?

###
positive kurtosis is tall and skinny (pointier than normal)

completely negative is a bowl shape (partly negative would be flatter thank normal)

21

## What are the 3 main types of transformations? What can be done with all of them?

###
1. Square root

2. Logarithmic

3. Inverse

They can all be reflected

22

##
When would you use a reflection in a transformation? Why?

What to consider?

###
If you need to normalize a negative skew

-> a reflection turns it into a positive skew and then the positive skew transformations can be applied

-> must keep in mind that it was reflected in interpretation (or reflect back after not clear to me)

23

## What are Tabachnick and Fidell's suggestions for Moderate, Substantial, and Severe positive skewness?

###
Moderate: square root

Substantial: logarithmic

Severe: inverse

24

## What is the Box-Cox Power Transformation?

###
A procedure for identifying the best exponent to use in a transformation in order to get the best normal shape

-> Lambda = the power that each value is raised to

-> Lambda is the best value between -5 and +5

25

## What is Templeton's two stage approach to data transformation?

###
It is a procedure for transforming continuous variables to normal

Step 1: Rank the data

Step 2: match the ranks with the corresponding variables from a normal distribution

26

## What does excluding cases listwise mean? pairwise?

###
Listwise: Only subjects that have data for all the selected variables are included in the analysis (i.e. if you are looking at 4 variables and a participant only had data for 3 and has missing data for the other, this participant would be excluded from the analysis entirely - excluded for all 4 variables being looked at)

Pairwise: Variables are evaluated individually, therefore any subject with data for that variable will be included in the analysis for that variable -> this means you could have an unequal number of participants included in the analysis of each of the variables.

27