Stats Sem 2 Flashcards

Question 1

Q

Cross sectional studies: aka…

Answer

A

Cross sectional analysis
Transversal studies
Prevalence studies

Question 2

Q

Cross sectional studies

Answer

A

Observations
Descriptive
Collects data from a population at one specific time
Groups determined by existing differences
Can use to develop a hypothesis
- –> need to use other research designs to test hypothesis

Question 3

Q

CSS Advantages

Answer

A

“snapshot” in time
can draw on inferences from existing relationships or differences
can use large numbers of subjects
relatively inexpensive
can generate
- odds ratio
- absolute risk
- relative risk
- prevalence

Question 4

Q

CSS Disadvatages

Answer

A

static results
does not randomly sample
cannot establish cause and effect

Question 5

Q

Pearson’s product moment correlation

Answer

A

Measures strength of linear relationship between 2 variables

–> B=0 suggests no relationship

R2 = % variablity explained by the model

Question 6

Q

Regression modelling

Answer

A

investigates whether an association exists between variables
measures strength and direction of an association
studies the form of relations
Regression = explained variation
Residual = unexplained variation

Question 7

Q

Regression - continuous outcome

Answer

A

Use linear or non-linear regression

Question 8

Q

Regression - catagorical outcome

Answer

A

Use logistic regression

Question 9

Q

Linear regression considerations

Answer

A

outcome variable must be continuous
independent variables can be categorical or continuous

Question 10

Q

Null hypothesis of linear regression

Answer

A

B=0, No relationship

Question 11

Q

Assumptions of linear regression

Answer

A

relationship b/w DV and IVs is linear
observations are independent and randomly selected
homogeneity of variance
residuals are independent and normally distributed
effects are additive
absence of outliers and multicollinearity

Question 12

Q

Multicollinearity

Answer

A

IVs that are correlated with other IVs

→ regression models may not give valid estimates of individual predictors

Question 13

Q

Descriptives of normal outcome

Answer

A

skewness
kurtosis (sharpness of peak)
mean = median

→ check histogram, box-whisker and QQ plots for normality

Question 14

Q

Tests of normality

Answer

A

compare shape of sample to shape of normal curve
Kalmogorov-Smirnow used for large samples
Shapiro-Wilk used for small samples
p > 0.05 suggests normal distribution

Question 15

Q

Variance inflation factor (VIF)

Answer

A

Measure of how much variance of the estimated regression coefficient is “inflated” by existing IV correlation

VIF = 1 → no correlation among predictors

VIF > 4 → warrents further investigation

VIF > 10 → sign of serious multicollinearity

Question 16

Q

Homoscedasticity

Answer

A

= constant variance
plot of residuals scattered randomly around 0
statistical tests
- → p > 0.05 supports constant variance assumption

Question 17

Q

Data transformations

Question 18

Q

Plots to explore assumptions

Question 19

Q

Flow diagram of fitting a regression model

Question 20

Q

If data is not suitable for transformation or sample is small

Answer

A

spearman rank-correlation coefficients
quantile regression

Question 21

Q

Multiple regression model

Answer

A

Association of all IVs with DV

Question 22

Q

Cohort studies

Answer

A

population identified by a common link
research can follow across time to see what happens
- → natural history of a condition
cohort can be divided at onset to compare experiences, compare outcome of interest
- → considers causitive/predictive factors
followed until event occurs, compare characteristics of those with event vs others
- → identify those most likely to develop outcome

Question 23

Q

Obtaining data on exposure

Answer

A

personal interviews
questionnaire
review of records
medical examination or special test
environmental survey

Question 24

Q

Exposure classification

Answer

A

Exposed or non-exposed
Degree of exposure

Question 25

Q

Comparison types

Answer

A

Internal comparison → one cohort sub-classified

External comparison → 2+ cohorts compared

Comparison with general population rates

Question 26

Q

Cohort study follow-up

Answer

A

mailed questionnaire
telephone calls
personal interviews
periodic medical examination
reviewing records
surveillance of death records

Question 27

Q

Loss of follow-up

Answer

A

death
change of address
migration
change of occupation

→ draw back of cohort studies

Question 28

Q

Data analysis of cohort studies

Answer

A

calculation of incidence rates
estimation of risk

Question 29

Q

Cohort study strengths

Answer

A

incidence rate and risks
establish cause and effect
good when exposure is rare
minimises selection and information bias

Question 30

Q

Cohort study weaknesses

Answer

A

loss to follow-up
often requires a large sample
inefective for rare diseases
long time to complete
expensive
ethical issues

Question 31

Q

Associations b/w categorical outcomes and categorical variables

Answer

A

Chi-Square test → if expected frequency ≥5 in each cell
Fisher’s exact test → if one or more expected frequency <5

→ Both test if there is a significant relationship

Question 32

Q

Associations b/w categorical outcomes and continuous variables

Answer

A

Two groups → T-test
>2 groups → ANOVA

Question 33

Q

Odds ratio

Answer

A

Measure of strength of an association

OR = 1 → exposure does not effect odds of outcome

OR > 1 → higher odds (positive relationship)

OR < 1 → lower odds (negative relationship)

eg. OR = 1.3 → 30% greater odds of outcome

Question 34

Q

Confidence interval of OR

Answer

A

If CI contains 1, the relationship is likely to be insignificant

Question 35

Q

Logistic regression

Answer

A

outcome variable = categorical
IVs = continuous and/or categorical
no distributional assumptions
predicts which possible events will happen given info in IVs

Question 36

Q

Logistic regression assumptions

Answer

A

enough responses in every category
linearity in the logit
absence of multicollinearity and outliers
independence of residuals

Question 37

Q

Logistic regression models

Answer

A

Binary logistic regression
- dichotomous outcome (eg. yes/no)
Multinomial logistic regression
- polychotomous outcome (eg. home/clinic/hospital)
- need to choose one category as a reference (eg. home vs clinic, home vs hospital)
- essentiallly building multiple binary LR models)
Ordinal logistic regression
- ordered outcome (eg. low/mod/high)

Question 38

Q

Omnibus tests

Answer

A

Tests whether explained variance is significantly greater than unexplained variance

Question 39

Q

Goodness of fit

Answer

A

whether estimated logistic regression model fits sample data
Hosmer and Lemeshow test
p > 0.05 suggests a good fit

Question 40

Q

p values

Answer

A

Test of normality
- Kalmogorov-Smirnov
- Shapiro-Wilk
- p > 0.05 suggests normal distribution
Test of homoscedasticity
- p > 0.05 supports constant variance assumption
Goodness of fit
- Hosmer & Lemeshow
- p > 0.05 suggests a good fit

Question 41

Q

Predictive classification

Answer

A

percentage cases correct = sensitivity
percentage non-cases correct = specificity
overall percentage = overall correct classification

Question 42

Q

Predictive accuracy

Answer

A

ROC curve
area = % predictive accuracy
0.5 = no predictive power
1 = perfect predictive power

Question 43

Q

Historical controls

Answer

A

Compare new treatment on new patients with records of previous results

Question 44

Q

Non-randomised concurrent control

Answer

A

two groups recieve different treatment at roughly the same time
could end up not comparable
potential subconscious allocation bias
no control of potential confounding factors

Question 45

Q

Quasi-randomised

Answer

A

allocation not truly random
- eg. by day of enrolment
subject to confounding variables
unexpected factors might affect statistical tests

Question 46

Q

Purpose of random allocation

Answer

A

gold standard evidence
eliminates bias in treatment assignment
equal distribution of covariates
facilitates blinding
allows researchers to make causal inferences

Question 47

Q

Disadvatages of random allocation

Answer

A

expensive
timely
ethical considerations
Hawthorne effect

Question 48

Q

Types of RCTs

Answer

A

parallel
crossover
factorial
- assigns 2 active interventions and controls to 4 groups
- eg. drug A&B, drug A + placebo, etc

Question 49

Q

Sources of bias in RCTs

Answer

A

inadequate generation of randomised sequence
inadequate concealment of allocation
inadequate blinding
excluding participants or significant attrition
analysing participants in the wrong group
selective reporting of outcome

Question 50

Q

RCT ethical considerations

Answer

A

social and clinical value
scientific validity
fair subject selection
favourable risk-benefit ratio
independent review
informed consent
respect for potential and enrolled subjects

Question 51

Q

Intention to treat

Answer

A

compares original treatment groups irrespective of whether patients adhered to the treatment
external validity → evaluates effectiveness in routine practice

Question 52

Q

Per protocol

Answer

A

only includes patients who complete treatment
compromises internal validity

Question 53

Q

Repeated measure data analysis

Answer

A

RMANOVA
Longitudinal data analysis

Question 54

Q

RMANOVA

Answer

A

Repeated Measures ANOVA
complete case analysis
assumes everyone is measured at the same time and equally spaced intervals
restrictive assumptions about correlation structure
only provides p values, not parameter estimates
cannot handle time-dependent covariates

Question 55

Q

Longitudinal data analysis

Answer

A

assesses changes in a response variable over time
measures temporal patterns of response to treatment
identifies factors that influence changes
includes time-varying predictors in the model
investigates causality
better handling of missing data
mixed effect model
marginal model

Question 56

Q

Mixed effect models

Answer

A

subject specific
compares individual changes over time
studies natural history

Question 57

Q

Marginal models

Answer

A

population average
compares populations over time
evaluates interventions or informs public policy

Question 58

Q

Generalised Estimating Equations (GEEs)

Answer

A

extension of linear regression
models clustered or correlated data
offers robust estimates of stardard errors (SEs)
- allows for clustering of observations
estimates regression coefficients and their SEs
can deal with normal and non-normal data
aims to investigate differences in population averaged responses
need to specify correlation structure
- takes into account within-subject correlation of observations

Question 59

Q

GEE correlation structures

Answer

A

exchangable
autoregressive
unstructured
independent

Question 60

Q

Assumptions of GEEs

Answer

A

assumptions of multiple linear regression
responses are from a known family of distribution (eg. normal, exponential) with a specified mean and variance, where variance is a function of the mean
the mean is a linear function of the predictors
correlation structure is specified
any missing data is at random

Question 61

Q

Measuring variation in GEEs

Answer

A

one-way ANOVA measures between group variation and within group variation
(explained variance and unexplained variance)

Question 62

Q

Unexplained variation in longitudinal data analysis

Answer

A

unexplained variation is divided into components, making the ultimate error variance smaller
reduced unexplained variability, resulting in better estimates of the effects

Question 63

Q

Mixed effects modelling

Answer

A

Similar principles to GEE

Brainscape's Knowledge GenomeTM

Stats Sem 2 Flashcards

Brainscape's Knowledge Genome^TM