topics Flashcards

1
Q

distribution for CI

A

t-distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

distribution for minimal sample size

A

z-distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

2E and E

A
  • 2E = full range of CI
  • E = half of CI range = margin of error
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

power

A
  • probability of correct decision
  • higher sample sizes yield higher power
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

influence of sample size

A

the same deviation from H0 with more data yields a lower p-value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

bootstrap CI

A
  • sample from the original dataset
  • enlarging B will reduce the variation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

bootstrap test

A
  • sample from H0 distribution
  • compare t-value of original data to surrogate T* values
  • p-value is determined by proportion of T*-values exceeding the t-value of the data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

sign test: test statistic

A
  • number of observations that are different from m0
  • binomial test is done on this outcome
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

wilcoxon signed rank test

A
  • requires symmetric population
  • one sample or (difference between) matched pairs (wilcox.test() with 1 argument)
  • lose a lot of information but is really robust
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

paired sample permutation test:

  1. what is permuted
  2. what is logic behind this
A
  • permute original (x,y) labels
  • under H0 of no difference between distributions of X and Y within pairs, permuting the labels should not chsnge the distribution of T
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

how to test dependence in two paired samples

A
  • pearson’s correlation test
  • spearman’s rank correlation test
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

two paired samples tests

A
  1. sign test
  2. wilcoxon signed rank test
    - uses wilcox.test() with 1 argument
  3. permutation test
  4. t.test(x,y,paired=TRUE)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

two independent samples tests

A
  1. mann-whitney test
  2. kolmogorov-smirnov test
  3. t.test(x, y)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

mann whitney test

A
  • based on ranks
  • uses wilcox.test() with 2 arguments
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

kolmogorov-smirnov test

A
  • tests in distributions are the same
  • differences in histograms
  • T - max vertical difference in summed histograms
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

one-way ANOVA

A
  • NI experimental units
  • I = 2 = two-sample t-test
  • always right sided
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

SSa and RSS

A
  • SSa: variance due to factor
  • RSS: variance not explained by factor in the model
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

kruskal wallis test

A
  • nonparametric anova
  • based on ranks
  • distribution of W under H0 = X^2(I-1)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

independent samples permutation test

  1. what is permuted
  2. what is logic behind this
A
  • 1way ANOVA
    1. group labels are permuted
    2. permutation of groups should not affect group means if there is no effect
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

two way ANOVA

A
  • NIJ experimental units
  • main and interaction effects are tested
  • I + J + 1 linear restrictions: treatment and sum parametrizations
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

F statistic

A
  • always right sided
  • explained variance/unexplained variance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

interaction plot

A

interaction shows up as nonparallel curves

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

testing interaction

A
  1. model that includes interaction –> only significance of interaction effect is relevant
  2. model without interaction –> additive model. check for presence of main effect
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

block designs in 2way anova

A
  1. randomized block design
    –> block = variable not of interest
    –> dont look at significance of block variable in output
  2. repeated measures
    –> block = ID
    –> exchangeable case: errors within a single unit are exchangeable, meaning that ordering is irrelevant
    –> lack of exchangeability makes the block design invalid
  3. friedman test
    –> nonparametric for 2 designs above
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

block designs for random effects

A
  1. crossover design
    –> 2 outcomes per experimental unit (paired samples)
    –> apply treatment in opposite orders between conditions
    –> treatment, learning, and sequence effects
  2. split plot design
    –> 2 treatment factors (independent samples)
    –> subplot and whole plot
  • to get p-values, anova(reduced model, full model)
  • (1|f) for random effect block
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

unbalanced design

A
  • order of variables in the model matters
  • variable of interest goes last
  • otherwise, p-values are unreliable
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

difference RBD and split-plot design

A

RBD:
- 1 level of blocks
- fixed effects

SPD:
- 2 levels of (randomized) blocks (whole and subplots)
- mixed effects

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

fixed and mixed designs

A

fixed
1. one way ANOVA
2. two way ANOVA
3. randomized block design
4. repeated measures block design

mixed
1. crossover design (paired)
2. split-plot design (independent)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

contingency tables

A
  • count of units in cross categories
  • test statistic: difference between expected and observed counts
  • always right sided (1-chisq())
30
Q

fisher’s exact test

A
  • for 2x2 tables
  • odds ratio is used
31
Q

simple linear regression

A

comparable to pearson’s correlation test
- will give exactly the same t-score and p-value

32
Q

multiple linear regression

A
  • multiple explanatory variables
  • to find the best parameters, we minimize the sum of squared differences (SSE)
33
Q

global model fit

A
  • sigma hat squared: residual standard error
  • R^2: proportion of explained variance compared to base model Y = B0 + e
  • F-statistic and overall p-value
  • all of these are found at the bottom of the output
34
Q

coefficients in multiple linear regression

A
  • not all variables have explanatory power
  • we need to find the relevant ones by testing for individual coefficients
  • these are found in the individual rows of the output
35
Q

step up and step down method

A

step down: remove highest nonsignificant variable

step up: add significant variable that yields maximum increase in R^2

36
Q

preferred linear model has

A
  1. least variables
  2. highest R^2 (or only slight decrease)
  3. interpretability
37
Q

confidence interval

A

for population mean Ynew value

38
Q

prediction interval

A
  • for individual observation of Ynew
  • larger interval than CI as the error is taken into account
39
Q

model assumption linear regression

A
  1. linearity of the relationship
  2. normality
40
Q

outliers

A

extremely low or high observation on the response variable

41
Q

leverage point

A

extremely low or high observation on the explanatory variable

42
Q

effect of leverage point

A
  • can be studied by testing model fit with and without the leverage point
  • if parameters change drastically by deleting this point, it’s called an influence point
  • cook’s distance quantifies the influence of an observation on predictions (>1)
43
Q

mean shift outlier model

A
  • dummy vector with all 0s but 1 at outlier index
  • include as variable in the model
  • if variable is significant, the outlier is significant
44
Q

collinearity

A
  • linear relations between explanatory variables, meaning they explain the same
  • straight line in scatterplot
  • reflected in large variances and large CIs –> unreliable estimates
45
Q

how to investigate collinearity

A
  1. pairwise linear correlations
  2. VIF factor. (>5 = concern)
46
Q

ANCOVA

A
  • extends ANOVA by including one or more variables that are expected to influence the dependent variable, but are not of primary interest
  • adjusts the DV for the covariates by holding them constant
  • variable not of interest is continuous (unlike RBD)
  • the only relevant p-value is for the variable of interest
47
Q

summary() parameter estimates

A

gives coefficient estimates as difference between ai and a1

48
Q

anova()

A

gives us p-values, t-statistics, etc

49
Q

interaction between relevant factor and irrelevant variable (ANCOVA)

A
  • H0: B1 = … = Bi
  • parallel lines = no interaction
  • modeled with B_i instead of gamma
  • look at interaction p-valye in the output, the other values should be calculated separately
50
Q

order of factors

A
  1. does not matter in balanced ANOVA
  2. matters in unbalanced ANOVA
  3. matters in ANCOVA (always)
  4. matters in logistic regression (always)
51
Q

family wise error rate

A
  • probability of making a Type I error (false positive) when multiple comparisons are being testsed
  • to provide FWER < 0.05, we use the bonferroni correction (alpha_ind = 0.05/m)
52
Q

multiple testing arises when

A
  1. there are many parameters of interest
  2. investigating all differences between factprs pf a set of effects in ANOVA
53
Q

simultaneous testing

A
  • usually everything is compared to B1 or a1. this is not simultaneous testing
  • tukey etc. show adjusted p-values for simultaneous testing of all Bs
54
Q

logistic regression

A
  • binary outcome
  • linear model for the log odds
  • probability of success
55
Q

log odds

A
  • log odds = log (p(success)/p(failure) = model
  • odds = e^model
56
Q

a change delta in the linear predictor

A

multiplies the odds by e^delta

57
Q

linear predictor

A

coefficient or additive model

58
Q

odds

A

e^delta

59
Q

p(y=1)

A

1/(1+e^delta)

60
Q

poisson regression: lambda

A
  • if Y ~ poisson(lambda), then E(Y) = var(Y) = lambda
  • the larger the lambda parameter, the larger the values of Y on average, and the larger the spread in the values of Y
  • for very large values of lambda, the poisson distribution is approximately normal
61
Q

lambda is modelled as

A
  • log(lambda) = model
  • lambda = e^model
  • QQplot is not useful here
62
Q

survival analysis

A
  • analysis of lifetimes
  • survival function: probability of survival until time t
63
Q

hazard function

A
  • rate of dying within a short interval
  • how likely the event is to happen at a particular moment in timee
64
Q

censoring

A
  • incomplete observation of the survival time of a variable
  • (di = Ti < Ci) = event has not happened yet
65
Q

Kaplan-Meier estimator of the survival function

A
  • only categorical IVs
  • survival probabilities for specific times
66
Q

Nelson Aalen estimator of the cumulative hazard function

A
  • step function increases only at times where events occur
67
Q

log rank test

A
  • tests whether 2+ survival curves are identical
  • can only deal with grouped data
68
Q

proportional hazards model

A
  • unlike KM model, can take many - kinds of predictors
  • main feature: coefficients can be estimated by maximizing the partial likelihood
69
Q

treatment parametrization

A
  • 1 group is a reference group
  • ai are expressed as difference between a1 and ai
  • can be set with ‘contrasts’ command
70
Q

sum parametrization

A
  • ai are expressed as deviations from the mean
  • combined ai average is 0