EDS Flashcards

Question 1

Q

Single group t test assumptions?

Answer

A

Normal distribution of data

Question 2

Q

Paired t test assumptions?

Answer

A

Normal distribution of differences between paired datapoints

Question 3

Q

Unpaired t test assumptions?

Answer

A

Normality of residuals and roughly equal variance

Question 4

Q

Independent ANOVA assumptions?

Answer

A

Normality of residuals and roughly equal variance

Question 5

Q

Repeated measures ANOVA assumptions?

Answer

A

Normality of residuals and sphericity (not a problem if fails sphericity!)

Question 6

Q

Pearson correlation assumptions?

Answer

A

At least one scale/ratio variable that is normally distributed, other can be categorical with 2 categories

Linear relationship

Don’t transform if fails normality - use Spearman’s (performs almost as well and can be used for ordinal rather than continuous)

Question 7

Q

Regression assumptions?

Answer

A

Dependent: continuous, predictors: cont or binary
Linear relationship, with non-zero variance
Homoscedasicity (residuals plot)
Independence of residuals (Durbin-Watson)
Random + normally distributed residuals
No entanglement
Sample size / datapoints at least 10x predictors tested
No interactions among predictors beyond that specified in model + predictors don’t correlate too highly (multicolinearity - doesn’t actually violate assumptions)

Question 8

Q

Assumptions for contingency table?

Answer

A

data independently measured (same person not in two cells)

if 2x2, all E>5 then Chi or linear-by-linear, any E<5 then Fisher’s
if >2x2, >80% E>5 then Chi of linear-by-linear, >20% E<5 then Fisher’s

Question 9

Q

Parametric vs non-parametric meaning?

Question 10

Q

Standard deviation versus standard error?

Answer

A

SD = sqrt of variance
SE = SD / sqrt(n)

Question 11

Q

Positively skewed distribution versus negatively skewed distribution?

Answer

A

Positively skewed = right tail is longer, mass of distribution concentrated on left (i.e. fewer positive values, mean usually higher than the median, try log transforming)

Negatively skewed = left tail is longer, mass of distribution concentrated on right (i.e. there are fewer negative values, mean usually less than median, try square transf)

Question 12

Q

Standard deviation versus standard error?

Answer

A

SD = sqrt of variance (remember variance is SS/df)
SE = SD / sqrt(n)

Question 13

Q

Positively skewed distribution versus negatively skewed distribution?

Answer

A

Positively skewed = right tail is longer, mass of distribution concentrated on left (i.e. fewer positive values, mean usually higher than the median, try log transforming / sqrt)

Negatively skewed = left tail is longer, mass of distribution concentrated on right (i.e. there are fewer negative values, mean usually less than median, try square transf)

Question 14

Q

Underfitting vs overfitting in polynomial regression?

Answer

A

overfitting: r2 automatically increases as you increase number of polynomial terms but start to fit random variability (wouldn’t generalise to other samples)

underfitting; not the best fit, lower r2

Question 15

Q

Why are multicolinearity & automatic variable selection problematic for multiple linear regression?

Answer

A

Multicolinearity: increases SE of b co-efficient (makes them less stable) and reduces significance of r2

Automatic variable selection: stepwise methods have multiple comparison issues, inflate type I error rate

Question 16

Q

Why are multicolinearity & automatic variable selection problematic for multiple linear regression?

Answer

A

Multicolinearity: increases SE of b co-efficient (makes them less stable) and reduces significance of r2

Automatic variable selection: stepwise methods have multiple comparison issues, inflate type I error rate

Question 17

Q

When can you use these formulas to calculate CI?
95% CI = ± 2.0 x standard error
99% CI = ±2.6 x standard error
99.9% CI = ±3.3 x standard error

Answer

A

If there is a normal sampling distribution (or normal approximation for the binomial for proportions): which can be assumed if quant data is normally distributed or if >50 in sample

Question 18

Q

How to calculate CI (when normal sampling distribution / normal approximation for the binomial)

Answer

A

95% CI = ± 2.0 x standard error
99% CI = ±2.6 x standard error
99.9% CI = ±3.3 x standard error

*note these are effectively 2-tailed

Question 19

Q

What is variance?

Question 20

Q

SEM and SE for proportion?

Answer

A

SEM = SD / sqrt(n)

SE proportion = sqrt of: p(1-p)/n

Question 21

Q

How to calculate R2?

Answer

A

SSM / SST

Use the corrected total value, unless non-linear regression

Question 22

Q

Skew vs kurtosis?

Answer

A

Skew is positive/negative tail, >1 problematic

Kurtosis is how light/heavy the tails are; can also be thought of as the sharpness of the peak on distribution curve

Question 23

Q

How to calculate SSM / SST / SSR?

Answer

A

SSM + SSR = SST

Question 24

Q

How to calculate F ratio?

Answer

A

ANOVA: MS model (between group) / MS error (within group), where MS = SS/df

Non-linear regression: MS model (difference) / MS error (alt) (note: MS total would be null hypothesis)

Question 25

Q

How to calculate mean square of the difference when analysing non-linear regression fits? (e.g. constrained vs unconstrained)

Answer

A

treat constrained as null and unconstrained as alt

SS diff = SS null = SS alt
df diff = df null - df alt

use to calculate the mean square

Question 26

Q

How to calculate contingency expected frequency?

Answer

A

row total x column total / overall total

Question 27

Q

How to calculate odds ratio?

Answer

A

odds of disease with characteristic (A) / odds of disease without characteristic (B)

A = n (disease + ch) / n (no disease + ch)
B = n (disease + no ch) / n (no disease + no ch)

Question 28

Q

How to increase power?

Answer

A

Inc effect size
Reduce random variation
Inc sample size

Question 29

Q

Formula for SS of an interaction?

Answer

A

SS(AxB) = SSM - SSA - SSB

Question 30

Q

Calculate % variability due to interaction?

Answer

A

SS(interaction) / SS(corrected total)

Question 31

Q

Calculate % variability due to model?

Answer

A

SS (corrected model) / SS (corrected total)

this is R2 formula!

Question 32

Q

How to test?

1) a single mean
2) if 2 means/medians are equal

Answer

A

1) one-sample t test if normal distribution, if not, sign test for median or Wilcoxon sign ranked test
2) unpaired t-test if ND and equal variance, if not Mann-Whitney

Question 33

Q

How to test differences between pairs with two groups?

Answer

A

normal distribution: paired t-test

not normal: Wilcoxon signed rank test

Question 34

Q

How to test the means/medians are equal when there are more than 2 groups?

Answer

A

Normal distribution and equal variance: Independent one-way ANOVA (or factorial if more than one variables)

Not normal / no equal variance: Kruskal-Wallis

Question 35

Q

How to test the differences between paired data when there are >2 groups

Answer

A

Normal distribution and sphericity: Repeated measures ANOVA (one-way or factorial)

Not normal/not equal variance: Friedman’s test

Question 36

Q

How to test sig of a single proportion?

Answer

A

z test for proportion

Question 37

Q

How to test if 2 proportions are equal when the data is paired? (i.e. two groups and data is paired)

Answer

A

McNemar’s test

Question 38

Q

How to test if 2 proportions are equal? (independent measures)

Answer

A

If >5 in each cell, Chi square

If<5 expected any cell, Fisher’s

Question 39

Q

How to test if >2 proportions are equal? (independent measures)

Answer

A

If >5 expected in atleast 80% of cells, Chi square

If <5 expected in 20% or more, Fisher’s (or combine data logically)

Question 40

Q

How to test if >2 groups of proportions show trend across categories?

Answer

A

Linear-by-linear association test (Chi square test for trend)

Question 41

Q

Relationship/association (not dependence) between two variables - how to test?

Answer

A

Association and both normally distributed (or one categorical): Pearson

Association and one/both not normal: Spearman’s

Question 42

Q

Regression: what standardised residuals do you expect?

Answer

A

no more than 5% to be +/-2

SR >3 highly unlikely/concerning

Question 43

Q

What is Cook’s / leverage?

Answer

A

Cook’s = influence on overall model (combines leverage + SR), concern if >1

Leverage = influence of point on predicted value
concern if > 2(k+1)/n where k=predictors and n = datapoints

Question 44

Q

Why may influential points not be outliers?

Answer

A

SR may be ok (not outlier) but if DFFit/Beta still high should remove

Question 45

Q

What to do if changing the model causes the effect of a certain variable to change dramatically?

Answer

A

Say coefficient is unstable and rerun the regression without that variable

note SE of the coefficient can give idea of its instability

Question 46

Q

What is multi-colinearity? Why is it bad?

Answer

A

Strong correlation among predictors (or weaker correlation of one predictor + multiple predictors), variance is shared between predictors

Inc. colinearity increases SE of coefficients, and therefore limits the significance of R2

Colinearity also means the order that you enter variables in hierarchial entry may affect the model

Question 47

Q

How to diagnose multicolinearity?

Answer

A

VIF>10 or average VIF >1

(also tolerance factor <0.1)

difficult to know what to remove/keep, some expected as if zero colinearity could just run several SLRs

Note: standardised b coefficients account for colinearity, correlation coefficients do not!

Question 48

Q

How to do hierarchical multiple regression?

Answer

A

Add predictors based on theoretical importance

Be selective as useful models dont have too many predictors (better to have fewer that explain more variability)

Rerun analysis removing non-significant predictors and report each removal step

Use R2 and p value for sig improvements in model fit

Question 49

Q

When to use Welch’s correction?

Answer

A

An ANOVA with UNEQUAL group sizes and deviations from equality of variance

(if equal, quite robust to deviations anyway)

Question 50

Q

post hocs - which to use when?

Answer

A

tukey: equal group sizes, good type i/ii trade off
bonferonni: conservative (controls type I, higher type II), use in any situ wanting high confidence

dunnett’s: all groups vs control

gabriel’s = slightly diff sizes
hochberg gt2 = very diff sizes

games-howell: any doubt about equality of variance

Question 51

Q

why perform planned contrasts?

Answer

A

they are more powerful when want to test specific hypothesis

if sum of products (d1d2d3) = 0, orthogonal, statistically independent, controls type I so dont need MC corrections

Helmert and difference = standard planned contrasts that are orthogonal

Question 52

Q

Why use polynomial contrasts?

Answer

A

type of orthogonal contrast that tests for trends when groups have logical order

Question 53

Q

Why run 2-way ANOVA?

Answer

A

more powerful than running separate ANOVAs as you can explain more variability by the two factors and by their interaction

Question 54

Q

How to interpret interaction plots?

Answer

A

horizontal, x axis has no affect
overlap, separate lines have no affect
parallel, no sig interaction

if sig interaction, don’t emphasise main effects

note: standard contrasts/post hocs, only available on main effects, so they are most useful when there is no interaction

main affects can be conducted if there are >3 levels

Question 55

Q

when performing simple effects analysis, which correction do you apply?

Answer

A

sidak for independent

- bonferroni for repeated

Question 56

Q

if fail mauchly’s sphericity test?

Answer

A

look at GreenhouseGeisser value, if <0.75, use GG, if >0.75, use Hugh Feidt correction

(note GG value x df = new df)

Question 57

Q

how do you fit a non-linear regression?

Answer

A

base on theory, guess starting values for parameters

comp calculates curve to minimise SS(residual)

check diff starting parameters to ensure global minimum not local minimum

Question 58

Q

how to decide on best model for non linear regression?

Answer

A

don’t assume best model based on high R2

check SEs of parameters are not too big!!

if it makes biological sense to use constraints, keep them in even if the fit is worse

To assess fit; COMPARE SSR

Question 59

Q

How to compare SSR in nonlinear regression?

Answer

A

SS(alt)=SSR

SS(null)-SS(alt)=SS(dif)

Question 60

Q

F ratio in nonlinear regression?

Answer

A

calculate SS and df for difference, and SS and df for alternative
(null-alt = difference)

SS(dif) / SS(df) = MS(difference)

F ratio = MS (difference) / MS (alt) ie. MS (model) / MS (error)

Question 61

Q

What does bootsrapping provide?

Answer

A

robust SE/CI without making assumptions about population distribution

B = number of boostraps
N = sample size

form of resampling with replacement: doesn’t throw away info about size of differences, creates frequency distribution of bootstrapped means, take central 95% for 95% CI

Question 62

Q

Which summary statistics are most appropriate when?

Answer

A

Mean: assumes normal distribution (like SD), more sensitive to outliers (normal: mean +SD)

Median: less sensitive to outliers, equal to the mean when normally distributed (not normal: median +IQR)

If bimodal distribution, neither statistic is appropriate!

Question 63

Q

What do different error bars show?

Answer

A

+/-SD = no information on sig differences, not affected by sample size

+/-CI = if don’t overlap, likely to differ significantly at stated confidence (do overlap = no conclusions)

+/-SE = if overlap, means do not differ at p<0.05, (no overlap = no conclusions, as there’s only 68% chance that population means lie within upper and lower ranges of SEM)

Question 64

Q

What is the effect of increasing sample size?

Answer

A

The SEM (&SD) decreases (distribution looks more normal) and therefore the CIs become narrower (but effect size won’t change)

Brainscape's Knowledge GenomeTM

EDS Flashcards

Brainscape's Knowledge Genome^TM