EDS Flashcards
Single group t test assumptions?
Normal distribution of data
Paired t test assumptions?
Normal distribution of differences between paired datapoints
Unpaired t test assumptions?
Normality of residuals and roughly equal variance
Independent ANOVA assumptions?
Normality of residuals and roughly equal variance
Repeated measures ANOVA assumptions?
Normality of residuals and sphericity (not a problem if fails sphericity!)
Pearson correlation assumptions?
At least one scale/ratio variable that is normally distributed, other can be categorical with 2 categories
Linear relationship
Don’t transform if fails normality - use Spearman’s (performs almost as well and can be used for ordinal rather than continuous)
Regression assumptions?
Dependent: continuous, predictors: cont or binary
Linear relationship, with non-zero variance
Homoscedasicity (residuals plot)
Independence of residuals (Durbin-Watson)
Random + normally distributed residuals
No entanglement
Sample size / datapoints at least 10x predictors tested
No interactions among predictors beyond that specified in model + predictors don’t correlate too highly (multicolinearity - doesn’t actually violate assumptions)
Assumptions for contingency table?
data independently measured (same person not in two cells)
if 2x2, all E>5 then Chi or linear-by-linear, any E<5 then Fisher’s
if >2x2, >80% E>5 then Chi of linear-by-linear, >20% E<5 then Fisher’s
Parametric vs non-parametric meaning?
,
Standard deviation versus standard error?
SD = sqrt of variance SE = SD / sqrt(n)
Positively skewed distribution versus negatively skewed distribution?
Positively skewed = right tail is longer, mass of distribution concentrated on left (i.e. fewer positive values, mean usually higher than the median, try log transforming)
Negatively skewed = left tail is longer, mass of distribution concentrated on right (i.e. there are fewer negative values, mean usually less than median, try square transf)
Standard deviation versus standard error?
SD = sqrt of variance (remember variance is SS/df) SE = SD / sqrt(n)
Positively skewed distribution versus negatively skewed distribution?
Positively skewed = right tail is longer, mass of distribution concentrated on left (i.e. fewer positive values, mean usually higher than the median, try log transforming / sqrt)
Negatively skewed = left tail is longer, mass of distribution concentrated on right (i.e. there are fewer negative values, mean usually less than median, try square transf)
Underfitting vs overfitting in polynomial regression?
overfitting: r2 automatically increases as you increase number of polynomial terms but start to fit random variability (wouldn’t generalise to other samples)
underfitting; not the best fit, lower r2
Why are multicolinearity & automatic variable selection problematic for multiple linear regression?
Multicolinearity: increases SE of b co-efficient (makes them less stable) and reduces significance of r2
Automatic variable selection: stepwise methods have multiple comparison issues, inflate type I error rate
Why are multicolinearity & automatic variable selection problematic for multiple linear regression?
Multicolinearity: increases SE of b co-efficient (makes them less stable) and reduces significance of r2
Automatic variable selection: stepwise methods have multiple comparison issues, inflate type I error rate
When can you use these formulas to calculate CI?
95% CI = ± 2.0 x standard error
99% CI = ±2.6 x standard error
99.9% CI = ±3.3 x standard error
If there is a normal sampling distribution (or normal approximation for the binomial for proportions): which can be assumed if quant data is normally distributed or if >50 in sample
How to calculate CI (when normal sampling distribution / normal approximation for the binomial)
95% CI = ± 2.0 x standard error
99% CI = ±2.6 x standard error
99.9% CI = ±3.3 x standard error
*note these are effectively 2-tailed
What is variance?
SS / df
SEM and SE for proportion?
SEM = SD / sqrt(n)
SE proportion = sqrt of: p(1-p)/n
How to calculate R2?
SSM / SST
Use the corrected total value, unless non-linear regression
Skew vs kurtosis?
Skew is positive/negative tail, >1 problematic
Kurtosis is how light/heavy the tails are; can also be thought of as the sharpness of the peak on distribution curve
How to calculate SSM / SST / SSR?
SSM + SSR = SST
How to calculate F ratio?
ANOVA: MS model (between group) / MS error (within group), where MS = SS/df
Non-linear regression: MS model (difference) / MS error (alt) (note: MS total would be null hypothesis)
How to calculate mean square of the difference when analysing non-linear regression fits? (e.g. constrained vs unconstrained)
treat constrained as null and unconstrained as alt
SS diff = SS null = SS alt
df diff = df null - df alt
use to calculate the mean square