Ch. 13 Flashcards
(33 cards)
What to do when the assumptions are not true?
- Ignore the violations
- Transform the data
- Use nonparametric method
- Use permutation test
When is data not likely to be normal, graphically?
(pg. 371)
When distribution is strongly skewed or strongly bimodal, or has outliers
Normal Quantile Plot
- What is it?
Compares each observation in the sample w/ its quantile expected from the standard normal distribution
If it is normally distributed, points should roughly be in a straight line.
Shapiro-Wilk test
What is it?
Shapiro-Wilk test evaluates goodness of fit of a normal distribution to a set of data randomly sampled from population (null hypothesis is that it is normal)
Recommended methods of evaluating assumption of normality?
- Can do statistical test (ex. Shapiro-Wilk test), but has false sense of security
- IDEALLY should use graphical methods & common sense to evaluate (frequency distribution histograms, normal quantile plots)
Robust
Def’n?
A statistical procedure is robust if the answer it gives is not sensative to violations of the assumptions of the method.
Normal Approximation - Ignoring violations
- For what types of tests?
- Threshold of “ignorance”?
- For tests that use the mean (robustness due to the central limit theorem)
- Ignorance threshold rule of thumb is n > 50(ish)
- Also need to consider the shape of the distributions; skew needs to be similar, and no outliers. See pg. 376
When can assumptions of equal standard deviations can be ignored?
- If n > 30 for each group, and n is similar for both groups (approximately), then SD can be ignored even w/ greater than 3x difference
- When:
- n is not approximately equal
Data transformation def’n
Data transformation changes each measurement by the same mathematical formula
Purpose of a transform?
- To attempt to make SD more similar and to improve the fit of the normal distribution to the data.
NOTE: This tranform will affect all the data AND the hypotheses equally; i.e. everything gets tranformed the same way. Also, can’t just do ln[s] for example; have to re-calculate starting with the mean of all the logs.
Examples of possible transformations?
(indicate top 3)
- Log transform
- Arcsine transform
- Square-root transform
- Square transform
- Antilog transform
- Reciprocal transform
Log tranform - what does it do?
Converts each data point to its logarithm
Ex. Y = ln[X]
What to do if you want to try log transform but data has zero?
Try Y’ = ln[Y + 1]
Log transform - When is it useful?
- When measurements are ratios or products of variables
- When frequency distribution is right skewed
- When group that has the larger mean (when comparing 2 groups) also has the larger SD
- When data spans several orders of magnitude
See pg. 378 for details
Arcsine tranformation
- What does it look like?
- What is it needed for?
- How does it fix things?
- p’ = arcsin[sqrt(p)]
- Used for proportions
- Makes it closer to a normal distribution and also makes SD’s similar
Note: convert percentages into decimal proportions first b4 application
Square-root tranformation
- What does it look like?
- What is it needed for?
- How does it fix things?
- Y’ = sqrt( Y + 1/2)
- sometimes Y + 0 or Y + 1)
- Used for count data
- Effect similar to log; makes SD similar for comparisons where the larger mean is also the the higher SD
- If effects same as log; use either or
Square tranformation
- Transform?
- When to use?
- Y’ = Y^2
- When frequency distribution is skewed left
- Only usable if all Y have same sign
- If all negative, try multiplying all by -1 first
Antilog transform
- Transform?
- When to use?
- Y’ = e ^Y
- Use when square transform doesn’t work on left-skewed data
Reciprocal tranform
- Transform?
- When to use?
- Y’ = 1/Y
- When data is skewed right
- Only usable if all Y have same sign
- If all negative, try multiplying all by -1 first
Calculating CI with transform?
Use the transformed values, then once range is calculated, it is best to convert BACK into original scale by back-transform (i.e. invert the transformation).
See pg. 382
Valid transforms . . .
- Require same transform applied to each individual
- Have 1-to-1 correspondence to OG values
- Have monotonic relationship w/ OG values (large values stay larger)
Nonparametric methods - def’n
Nonparametric methods make fewer assumptions than standard parametric methods do about the distribution of variables
- Achieves this by ranking the data
- AKA “distribution-free” methods”
Ranking Data points
- Rank in smallest to largest.
- Ties are resolved by averaging what the ranks would be if it was sequential, then assigning the next rank up to the next largest individual
- ex. 5 (Rank: 1), 6, 6, 8
- 6’s would be 2 and 3, so average 2 + 3, = 2.5 (“midrank”)
- So the rank of 6 is 2.5 and 2.5, and 8 is the next rank after the highest one (3), i.e. 8 is ranked 4
See pg. 391
Sign-test
- What is it?
- What is its parametric equivalent(s)?
- Compares the median of a sample to a constant specified in the null hypothesis. Makes NO assumptions about distribution of measurement in population
- Equivalents are one-sample or paired t-tests