analysing experimental studies Flashcards

1
Q

how do we know if an interaction is significant?

A

look at the p-value of the interaction term - if it is significant, the p-value should be smaller than the alpha level

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

how do we know if one variable is significant?

A

F-stats tell us the overall significance of a model so if we only include on variable in the model it tells us the significance of the variance in our outcome based on the one variable included.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

incremental F-test

A

a regular f-test is an incremental test of our model against a null/’empty’ model (just the intercept where all β = 0)

the incremental f-test evaluates the statistical significance of the improvement of variance explained in an outcome with the addition of further predictors - based on the difference in f-values between the two models
- the model with more predictors is called model 1 or the ‘full model’
- the model with less is called model 0 or ‘restricted
model’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

incremental F-test equation

A

F (dfR - dfF), dfF = [ (SSRr - SSRf) / (dfR - dfF) ] / SSRf - dfF

where:
dfR = df of restricted model
dfF = df of full model
SSRr = residual sums of squares of restricted model
SSRf = residual sums of squares of full model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

anova() function in R

A

used to perform an incremental f-test

provides the following results:
- residual df for both model
- SSresidual for both models
- difference in their dfs
- difference in their SS
- incremental F-test
- p-value for the significance of the test (needs to be smaller than our alpha value to be significant)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Nested vs non-nested models

A

nested = the predictors in one model are a subset of the predictors in the other (models must also be computed on the same data)
- can use incremental F-test

non-nested = there are unique variables in both of the models so there is no way of making them equivalent
- have to use AIC or BIC (smaller/more negative values indicate better fitting models)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

AIC and BIC

A

both contain a primary correction - meaning they penalise models for being too complex. BIC is harsher in this aspect

AIC = n * ln( SSresidual / n ) + 2k
BIC = n * ln( SSresidual / n) + ( k* ln(n) )
- where ln = natural log function

AIC and BIC only make sense when used for model comparisons.
For BIC a difference of 10 can be used as a rule of thumb to suggest that one model is better than another (AIC does not do this) - we want the model with the smaller AIC or BIC value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

why do we need constraints/a reference group?

A

we want a model that represents our data/observations but all we ‘know’ is what group an observation belongs to ( µ = β0 + βi) this creates problems as we don’t want to estimate too many variables (β0 means we will have one more parameter to estimate than group means)
- constraints fix this by making one of the group means β0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what is effects coding?

A

also called sum to zero coding
we compare groups to the grand mean (the mean of all observations)

we reduce the number of groups we have to
βj = 0 = all β values must sum to 0

typically used in experimental settings when there isn’t always an obvious reference group

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

how does the dummy coding constraint help?

A

example = 3 treatment variables

before dummy coding:
µA = β0 + βA
µB = β0 + βB
µC = β0 + βC
this is 4 βs for 3 group means

how dummy coding fixes this
µA = β0
µB = β0 + βB
µC = β0 + βC

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

how does the effects coding constraint help?

A

example = 3 treatment variables

before effects coding:
µA = µ + βA
µB = µ + βB
µC = µ + βC
where µ = grand mean
- this is still 4 things to estimate for 3 group means

how effects coding fixes this (sum to 0)
µA = β0
µB = β0 + βB
µC = β0 - (βA + βB)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

effects coding results - general interpretation

A

β0 = µ = grand mean = reference group
- mean of predictors / k
- e.g. (µA + µB + µC) / 3 = µ

βj = difference between the coded group and the grand mean
- βj = µj - µ

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

steps in effects coding:

A
  1. create k-1 variables
  2. for all observations in the focal group, assign 1
  3. for all observations in the reference group, assign -1 (must sum to 0)
  4. for all other groups, assign 0
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

manual constraints testing:

A

allow us to test a wide variety of constraints so long as they can be written:
- as a linear combination of a population mean
- the associated weights (coefficients) sum to zero

manual constraints chunk variables together and compare them to other chunks to test if the mean of the chunks are significantly different

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

rules for assigning weight constraints:

A
  1. weights range between 1 and -1
  2. the group(s) in one chunk get positive weights and the other chunk gets negative weights
  3. the sum of the weights of comparison must be 0
  4. if a group is not involved in a comparison its weight is 0
  5. the weights assigned to the groups in the comparison = 1/number of groups e.g. if there is two groups in the positive chunk they will both be given a weighting of 1/2
  6. restrict yourself to running k-1 comparisons
  7. each contrast can only compare 2 chunks of variance
  8. once a group is singled out, it can not enter the other contrasts
  9. check if the contrasts are orthogonal
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

orthogonal contrasts

A

test independence of sources of variance
- we like manual contrasts to be orthogonal to avoid ‘double dipping’ in groups

for any pair of orthogonal comparisons the sum of the product of the weights will be 0
e.g. do contrast 1 * contrast 2 for each variable, then add up the results these should be 0

17
Q

non-orthogonal constraints

A

test non-independent sources of variation - presents some statistical challenges when making inferences

18
Q

what are emmeans?

A

estimated marginal means = predicted means from the models
they are used to test model constraints (in R using the contrast function)

19
Q

interpretating manual constraints results

A

the estimate is the difference between the group means in each chunk
e.g. add up the group means in each chunk and then do chunk 1 - chunk 2
we then can use p-values or critical values to determine if there is significant difference

20
Q

factors vs conditions

A

conditions = part of our experimental design (what we manipulate)

factors = what our conditions become when we put our results into a data set. the levels of a factor are the number of ways we vary/manipulate the condition

21
Q

one way analysis

A

in a one way design we only have one condition that is manipulated

22
Q

main effects

A

test overall/average effect of a condition (f-test)

23
Q

contrasts

A

tests differences between group means (based on coding schemes and associated β coefficients)

24
Q

simple contrasts/effects

A

effects of one level on a condition across levels of another
e.g. difference in emmeans for treatment A in hospital 1 and 2

25
Q

pairwise comparisons

A

compare all levels of a given predictor
- e.g compare all levels of treatment with all levels of hospital

CREATES STATISICAL ISSUE OF MULTIPLE COMPARISONS (increases our chances of making a type 1 error

26
Q

multiple tests and type 1 error equations

A

P(type 1 error) = alpha level
P(not making type 1 error) = 1 - alpha level

where m = number of tests done
P(type 1 error in multiple tests) = (1 - alpha) to the power of m = family-wise error rate

P(not making type 1 error in multiple tests = 1 - (1- alpha) to the power of m

27
Q

corrections for multiple test errors

A

to fixt the issue, we either have to make our alpha more conservative or adjust our p-value

28
Q

Bonferroni correction

A

considered a conservative adjustment
- treats individual tests within a family as if they’re independent
- equation = alpha/m or p*m

29
Q

sidak correction

A

similar to bonferoni
- equation = 1 - (1-alpha) to the power 1/m

30
Q

scheffe correction

A

makes broader adjustments
- calculates p-value from the f-distribution
- makes the critical value of F larger for a fixed alpha level, dependent on the number of tests

31
Q

Tukey’s honest significant differences (HSD) correction

A

less conservative
- compare all pairwise group means
- each difference is divided by SE of sum of means
- this produces a q statistic that is compared against a studentised range distribution

32
Q

effects interactions

A

categorical*categorical interactions with effects coding can also be considered the difference in simple effects

33
Q

assumption violation: model misspecification

A

model is not correct
- detected by observing violations of linearity/normality
- solved by including missing terms

34
Q

assumption violation: non-linear transformations

A

when data is skewed, we can transform/convert the units to a different scale to make it more normal data and make interpretations easier

35
Q

assumption violation: generalised linear model

A

when data is not normal or continuous but transforming would create issues

36
Q

assumption violation: bootstrapping inference

A

can help make more reliable inferences even if assumptions are violated

37
Q

bootstrapping

A

= the process of resampling with replacement from the original data to generate multiple resamples of the same n as the original data

this means some samples may contain the same participant’s data multiple times

38
Q

bootstrap distribution

A
  • start with the individual sample size of n
  • take k resamples of n and calculate your statistic on each one
  • as k gets bigger, the distribution of resamples begins to approximate a sampling distribution
  • more samples = more normal looking data
39
Q

bootstrap SE

A

SE = sd of bootstrap distribution