EDS Flashcards

1
Q

Single group t test assumptions?

A

Normal distribution of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Paired t test assumptions?

A

Normal distribution of differences between paired datapoints

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Unpaired t test assumptions?

A

Normality of residuals and roughly equal variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Independent ANOVA assumptions?

A

Normality of residuals and roughly equal variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Repeated measures ANOVA assumptions?

A

Normality of residuals and sphericity (not a problem if fails sphericity!)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Pearson correlation assumptions?

A

At least one scale/ratio variable that is normally distributed, other can be categorical with 2 categories

Linear relationship

Don’t transform if fails normality - use Spearman’s (performs almost as well and can be used for ordinal rather than continuous)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Regression assumptions?

A

Dependent: continuous, predictors: cont or binary
Linear relationship, with non-zero variance
Homoscedasicity (residuals plot)
Independence of residuals (Durbin-Watson)
Random + normally distributed residuals
No entanglement
Sample size / datapoints at least 10x predictors tested
No interactions among predictors beyond that specified in model + predictors don’t correlate too highly (multicolinearity - doesn’t actually violate assumptions)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Assumptions for contingency table?

A

data independently measured (same person not in two cells)

if 2x2, all E>5 then Chi or linear-by-linear, any E<5 then Fisher’s
if >2x2, >80% E>5 then Chi of linear-by-linear, >20% E<5 then Fisher’s

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Parametric vs non-parametric meaning?

A

,

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Standard deviation versus standard error?

A
SD = sqrt of variance
SE = SD / sqrt(n)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Positively skewed distribution versus negatively skewed distribution?

A

Positively skewed = right tail is longer, mass of distribution concentrated on left (i.e. fewer positive values, mean usually higher than the median, try log transforming)

Negatively skewed = left tail is longer, mass of distribution concentrated on right (i.e. there are fewer negative values, mean usually less than median, try square transf)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Standard deviation versus standard error?

A
SD = sqrt of variance (remember variance is SS/df)
SE = SD / sqrt(n)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Positively skewed distribution versus negatively skewed distribution?

A

Positively skewed = right tail is longer, mass of distribution concentrated on left (i.e. fewer positive values, mean usually higher than the median, try log transforming / sqrt)

Negatively skewed = left tail is longer, mass of distribution concentrated on right (i.e. there are fewer negative values, mean usually less than median, try square transf)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Underfitting vs overfitting in polynomial regression?

A

overfitting: r2 automatically increases as you increase number of polynomial terms but start to fit random variability (wouldn’t generalise to other samples)

underfitting; not the best fit, lower r2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Why are multicolinearity & automatic variable selection problematic for multiple linear regression?

A

Multicolinearity: increases SE of b co-efficient (makes them less stable) and reduces significance of r2

Automatic variable selection: stepwise methods have multiple comparison issues, inflate type I error rate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Why are multicolinearity & automatic variable selection problematic for multiple linear regression?

A

Multicolinearity: increases SE of b co-efficient (makes them less stable) and reduces significance of r2

Automatic variable selection: stepwise methods have multiple comparison issues, inflate type I error rate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

When can you use these formulas to calculate CI?
95% CI = ± 2.0 x standard error
99% CI = ±2.6 x standard error
99.9% CI = ±3.3 x standard error

A

If there is a normal sampling distribution (or normal approximation for the binomial for proportions): which can be assumed if quant data is normally distributed or if >50 in sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

How to calculate CI (when normal sampling distribution / normal approximation for the binomial)

A

95% CI = ± 2.0 x standard error
99% CI = ±2.6 x standard error
99.9% CI = ±3.3 x standard error

*note these are effectively 2-tailed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is variance?

A

SS / df

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

SEM and SE for proportion?

A

SEM = SD / sqrt(n)

SE proportion = sqrt of: p(1-p)/n

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

How to calculate R2?

A

SSM / SST

Use the corrected total value, unless non-linear regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Skew vs kurtosis?

A

Skew is positive/negative tail, >1 problematic

Kurtosis is how light/heavy the tails are; can also be thought of as the sharpness of the peak on distribution curve

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

How to calculate SSM / SST / SSR?

A

SSM + SSR = SST

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

How to calculate F ratio?

A

ANOVA: MS model (between group) / MS error (within group), where MS = SS/df

Non-linear regression: MS model (difference) / MS error (alt) (note: MS total would be null hypothesis)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

How to calculate mean square of the difference when analysing non-linear regression fits? (e.g. constrained vs unconstrained)

A

treat constrained as null and unconstrained as alt

SS diff = SS null = SS alt
df diff = df null - df alt

use to calculate the mean square

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

How to calculate contingency expected frequency?

A

row total x column total / overall total

27
Q

How to calculate odds ratio?

A

odds of disease with characteristic (A) / odds of disease without characteristic (B)

A = n (disease + ch) / n (no disease + ch)
B = n (disease + no ch) / n (no disease + no ch)
28
Q

How to increase power?

A

Inc effect size
Reduce random variation
Inc sample size

29
Q

Formula for SS of an interaction?

A

SS(AxB) = SSM - SSA - SSB

30
Q

Calculate % variability due to interaction?

A

SS(interaction) / SS(corrected total)

31
Q

Calculate % variability due to model?

A

SS (corrected model) / SS (corrected total)

this is R2 formula!

32
Q

How to test?

1) a single mean
2) if 2 means/medians are equal

A

1) one-sample t test if normal distribution, if not, sign test for median or Wilcoxon sign ranked test
2) unpaired t-test if ND and equal variance, if not Mann-Whitney

33
Q

How to test differences between pairs with two groups?

A

normal distribution: paired t-test

not normal: Wilcoxon signed rank test

34
Q

How to test the means/medians are equal when there are more than 2 groups?

A

Normal distribution and equal variance: Independent one-way ANOVA (or factorial if more than one variables)

Not normal / no equal variance: Kruskal-Wallis

35
Q

How to test the differences between paired data when there are >2 groups

A

Normal distribution and sphericity: Repeated measures ANOVA (one-way or factorial)

Not normal/not equal variance: Friedman’s test

36
Q

How to test sig of a single proportion?

A

z test for proportion

37
Q

How to test if 2 proportions are equal when the data is paired? (i.e. two groups and data is paired)

A

McNemar’s test

38
Q

How to test if 2 proportions are equal? (independent measures)

A

If >5 in each cell, Chi square

If<5 expected any cell, Fisher’s

39
Q

How to test if >2 proportions are equal? (independent measures)

A

If >5 expected in atleast 80% of cells, Chi square

If <5 expected in 20% or more, Fisher’s (or combine data logically)

40
Q

How to test if >2 groups of proportions show trend across categories?

A

Linear-by-linear association test (Chi square test for trend)

41
Q

Relationship/association (not dependence) between two variables - how to test?

A

Association and both normally distributed (or one categorical): Pearson

Association and one/both not normal: Spearman’s

42
Q

Regression: what standardised residuals do you expect?

A

no more than 5% to be +/-2

SR >3 highly unlikely/concerning

43
Q

What is Cook’s / leverage?

A

Cook’s = influence on overall model (combines leverage + SR), concern if >1

Leverage = influence of point on predicted value
concern if > 2(k+1)/n where k=predictors and n = datapoints

44
Q

Why may influential points not be outliers?

A

SR may be ok (not outlier) but if DFFit/Beta still high should remove

45
Q

What to do if changing the model causes the effect of a certain variable to change dramatically?

A

Say coefficient is unstable and rerun the regression without that variable

note SE of the coefficient can give idea of its instability

46
Q

What is multi-colinearity? Why is it bad?

A

Strong correlation among predictors (or weaker correlation of one predictor + multiple predictors), variance is shared between predictors

Inc. colinearity increases SE of coefficients, and therefore limits the significance of R2

Colinearity also means the order that you enter variables in hierarchial entry may affect the model

47
Q

How to diagnose multicolinearity?

A

VIF>10 or average VIF >1

(also tolerance factor <0.1)

difficult to know what to remove/keep, some expected as if zero colinearity could just run several SLRs

Note: standardised b coefficients account for colinearity, correlation coefficients do not!

48
Q

How to do hierarchical multiple regression?

A

Add predictors based on theoretical importance

Be selective as useful models dont have too many predictors (better to have fewer that explain more variability)

Rerun analysis removing non-significant predictors and report each removal step

Use R2 and p value for sig improvements in model fit

49
Q

When to use Welch’s correction?

A

An ANOVA with UNEQUAL group sizes and deviations from equality of variance

(if equal, quite robust to deviations anyway)

50
Q

post hocs - which to use when?

A

tukey: equal group sizes, good type i/ii trade off
bonferonni: conservative (controls type I, higher type II), use in any situ wanting high confidence

dunnett’s: all groups vs control

gabriel’s = slightly diff sizes
hochberg gt2 = very diff sizes

games-howell: any doubt about equality of variance

51
Q

why perform planned contrasts?

A

they are more powerful when want to test specific hypothesis

if sum of products (d1d2d3) = 0, orthogonal, statistically independent, controls type I so dont need MC corrections

Helmert and difference = standard planned contrasts that are orthogonal

52
Q

Why use polynomial contrasts?

A

type of orthogonal contrast that tests for trends when groups have logical order

53
Q

Why run 2-way ANOVA?

A

more powerful than running separate ANOVAs as you can explain more variability by the two factors and by their interaction

54
Q

How to interpret interaction plots?

A

horizontal, x axis has no affect
overlap, separate lines have no affect
parallel, no sig interaction

if sig interaction, don’t emphasise main effects

note: standard contrasts/post hocs, only available on main effects, so they are most useful when there is no interaction

main affects can be conducted if there are >3 levels

55
Q

when performing simple effects analysis, which correction do you apply?

A
  • sidak for independent

- bonferroni for repeated

56
Q

if fail mauchly’s sphericity test?

A

look at GreenhouseGeisser value, if <0.75, use GG, if >0.75, use Hugh Feidt correction

(note GG value x df = new df)

57
Q

how do you fit a non-linear regression?

A

base on theory, guess starting values for parameters

comp calculates curve to minimise SS(residual)

check diff starting parameters to ensure global minimum not local minimum

58
Q

how to decide on best model for non linear regression?

A

don’t assume best model based on high R2

check SEs of parameters are not too big!!

if it makes biological sense to use constraints, keep them in even if the fit is worse

To assess fit; COMPARE SSR

59
Q

How to compare SSR in nonlinear regression?

A

SS(alt)=SSR

SS(null)-SS(alt)=SS(dif)

60
Q

F ratio in nonlinear regression?

A

calculate SS and df for difference, and SS and df for alternative
(null-alt = difference)

SS(dif) / SS(df) = MS(difference)

F ratio = MS (difference) / MS (alt) ie. MS (model) / MS (error)

61
Q

What does bootsrapping provide?

A

robust SE/CI without making assumptions about population distribution

B = number of boostraps
N = sample size

form of resampling with replacement: doesn’t throw away info about size of differences, creates frequency distribution of bootstrapped means, take central 95% for 95% CI

62
Q

Which summary statistics are most appropriate when?

A

Mean: assumes normal distribution (like SD), more sensitive to outliers (normal: mean +SD)

Median: less sensitive to outliers, equal to the mean when normally distributed (not normal: median +IQR)

If bimodal distribution, neither statistic is appropriate!

63
Q

What do different error bars show?

A

+/-SD = no information on sig differences, not affected by sample size

+/-CI = if don’t overlap, likely to differ significantly at stated confidence (do overlap = no conclusions)

+/-SE = if overlap, means do not differ at p<0.05, (no overlap = no conclusions, as there’s only 68% chance that population means lie within upper and lower ranges of SEM)

64
Q

What is the effect of increasing sample size?

A

The SEM (&SD) decreases (distribution looks more normal) and therefore the CIs become narrower (but effect size won’t change)