Stats Sem 2 Flashcards

1
Q

Cross sectional studies: aka…

A
  • Cross sectional analysis
  • Transversal studies
  • Prevalence studies
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Cross sectional studies

A
  • Observations
  • Descriptive
  • Collects data from a population at one specific time
  • Groups determined by existing differences
  • Can use to develop a hypothesis
    • –> need to use other research designs to test hypothesis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

CSS Advantages

A
  • “snapshot” in time
  • can draw on inferences from existing relationships or differences
  • can use large numbers of subjects
  • relatively inexpensive
  • can generate
    • odds ratio
    • absolute risk
    • relative risk
    • prevalence
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

CSS Disadvatages

A
  • static results
  • does not randomly sample
  • cannot establish cause and effect
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Pearson’s product moment correlation

A

Measures strength of linear relationship between 2 variables

–> B=0 suggests no relationship

R2 = % variablity explained by the model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Regression modelling

A
  • investigates whether an association exists between variables
  • measures strength and direction of an association
  • studies the form of relations
  • Regression = explained variation
  • Residual = unexplained variation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Regression - continuous outcome

A

Use linear or non-linear regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Regression - catagorical outcome

A

Use logistic regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Linear regression considerations

A
  • outcome variable must be continuous
  • independent variables can be categorical or continuous
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Null hypothesis of linear regression

A

B=0, No relationship

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Assumptions of linear regression

A
  • relationship b/w DV and IVs is linear
  • observations are independent and randomly selected
  • homogeneity of variance
  • residuals are independent and normally distributed
  • effects are additive
  • absence of outliers and multicollinearity
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Multicollinearity

A

IVs that are correlated with other IVs

→ regression models may not give valid estimates of individual predictors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Descriptives of normal outcome

A
  • skewness
  • kurtosis (sharpness of peak)
  • mean = median

→ check histogram, box-whisker and QQ plots for normality

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Tests of normality

A
  • compare shape of sample to shape of normal curve
  • Kalmogorov-Smirnow used for large samples
  • Shapiro-Wilk used for small samples
  • p > 0.05 suggests normal distribution
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Variance inflation factor (VIF)

A

Measure of how much variance of the estimated regression coefficient is “inflated” by existing IV correlation

VIF = 1 → no correlation among predictors

VIF > 4 → warrents further investigation

VIF > 10 → sign of serious multicollinearity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Homoscedasticity

A
  • = constant variance
  • plot of residuals scattered randomly around 0
  • statistical tests
    • → p > 0.05 supports constant variance assumption
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Data transformations

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Plots to explore assumptions

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Flow diagram of fitting a regression model

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

If data is not suitable for transformation or sample is small

A
  • spearman rank-correlation coefficients
  • quantile regression
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Multiple regression model

A

Association of all IVs with DV

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Cohort studies

A
  • population identified by a common link
  • research can follow across time to see what happens
    • → natural history of a condition
  • cohort can be divided at onset to compare experiences, compare outcome of interest
    • → considers causitive/predictive factors
  • followed until event occurs, compare characteristics of those with event vs others
    • → identify those most likely to develop outcome
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Obtaining data on exposure

A
  • personal interviews
  • questionnaire
  • review of records
  • medical examination or special test
  • environmental survey
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Exposure classification

A
  • Exposed or non-exposed
  • Degree of exposure
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Comparison types

A

Internal comparison → one cohort sub-classified

External comparison → 2+ cohorts compared

Comparison with general population rates

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Cohort study follow-up

A
  • mailed questionnaire
  • telephone calls
  • personal interviews
  • periodic medical examination
  • reviewing records
  • surveillance of death records
27
Q

Loss of follow-up

A
  • death
  • change of address
  • migration
  • change of occupation

→ draw back of cohort studies

28
Q

Data analysis of cohort studies

A
  • calculation of incidence rates
  • estimation of risk
29
Q

Cohort study strengths

A
  • incidence rate and risks
  • establish cause and effect
  • good when exposure is rare
  • minimises selection and information bias
30
Q

Cohort study weaknesses

A
  • loss to follow-up
  • often requires a large sample
  • inefective for rare diseases
  • long time to complete
  • expensive
  • ethical issues
31
Q

Associations b/w categorical outcomes and categorical variables

A
  • Chi-Square test → if expected frequency ≥5 in each cell
  • Fisher’s exact test → if one or more expected frequency <5

→ Both test if there is a significant relationship

32
Q

Associations b/w categorical outcomes and continuous variables

A
  • Two groups → T-test
  • >2 groups → ANOVA
33
Q

Odds ratio

A

Measure of strength of an association

OR = 1 → exposure does not effect odds of outcome

OR > 1 → higher odds (positive relationship)

OR < 1 → lower odds (negative relationship)

eg. OR = 1.3 → 30% greater odds of outcome

34
Q

Confidence interval of OR

A

If CI contains 1, the relationship is likely to be insignificant

35
Q

Logistic regression

A
  • outcome variable = categorical
  • IVs = continuous and/or categorical
  • no distributional assumptions
  • predicts which possible events will happen given info in IVs
36
Q

Logistic regression assumptions

A
  • enough responses in every category
  • linearity in the logit
  • absence of multicollinearity and outliers
  • independence of residuals
37
Q

Logistic regression models

A
  • Binary logistic regression
    • dichotomous outcome (eg. yes/no)
  • Multinomial logistic regression
    • polychotomous outcome (eg. home/clinic/hospital)
    • need to choose one category as a reference (eg. home vs clinic, home vs hospital)
    • essentiallly building multiple binary LR models)
  • Ordinal logistic regression
    • ordered outcome (eg. low/mod/high)
38
Q

Omnibus tests

A

Tests whether explained variance is significantly greater than unexplained variance

39
Q

Goodness of fit

A
  • whether estimated logistic regression model fits sample data
  • Hosmer and Lemeshow test
  • p > 0.05 suggests a good fit
40
Q

p values

A
  • Test of normality
    • Kalmogorov-Smirnov
    • Shapiro-Wilk
    • p > 0.05 suggests normal distribution
  • Test of homoscedasticity
    • p > 0.05 supports constant variance assumption
  • Goodness of fit
    • Hosmer & Lemeshow
    • p > 0.05 suggests a good fit
41
Q

Predictive classification

A
  • percentage cases correct = sensitivity
  • percentage non-cases correct = specificity
  • overall percentage = overall correct classification
42
Q

Predictive accuracy

A
  • ROC curve
  • area = % predictive accuracy
  • 0.5 = no predictive power
  • 1 = perfect predictive power
43
Q

Historical controls

A

Compare new treatment on new patients with records of previous results

44
Q

Non-randomised concurrent control

A
  • two groups recieve different treatment at roughly the same time
  • could end up not comparable
  • potential subconscious allocation bias
  • no control of potential confounding factors
45
Q

Quasi-randomised

A
  • allocation not truly random
    • eg. by day of enrolment
  • subject to confounding variables
  • unexpected factors might affect statistical tests
46
Q

Purpose of random allocation

A
  • gold standard evidence
  • eliminates bias in treatment assignment
  • equal distribution of covariates
  • facilitates blinding
  • allows researchers to make causal inferences
47
Q

Disadvatages of random allocation

A
  • expensive
  • timely
  • ethical considerations
  • Hawthorne effect
48
Q

Types of RCTs

A
  • parallel
  • crossover
  • factorial
    • assigns 2 active interventions and controls to 4 groups
    • eg. drug A&B, drug A + placebo, etc
49
Q

Sources of bias in RCTs

A
  • inadequate generation of randomised sequence
  • inadequate concealment of allocation
  • inadequate blinding
  • excluding participants or significant attrition
  • analysing participants in the wrong group
  • selective reporting of outcome
50
Q

RCT ethical considerations

A
  • social and clinical value
  • scientific validity
  • fair subject selection
  • favourable risk-benefit ratio
  • independent review
  • informed consent
  • respect for potential and enrolled subjects
51
Q

Intention to treat

A
  • compares original treatment groups irrespective of whether patients adhered to the treatment
  • external validity → evaluates effectiveness in routine practice
52
Q

Per protocol

A
  • only includes patients who complete treatment
  • compromises internal validity
53
Q

Repeated measure data analysis

A
  • RMANOVA
  • Longitudinal data analysis
54
Q

RMANOVA

A
  • Repeated Measures ANOVA
  • complete case analysis
  • assumes everyone is measured at the same time and equally spaced intervals
  • restrictive assumptions about correlation structure
  • only provides p values, not parameter estimates
  • cannot handle time-dependent covariates
55
Q

Longitudinal data analysis

A
  • assesses changes in a response variable over time
  • measures temporal patterns of response to treatment
  • identifies factors that influence changes
  • includes time-varying predictors in the model
  • investigates causality
  • better handling of missing data
  • mixed effect model
  • marginal model
56
Q

Mixed effect models

A
  • subject specific
  • compares individual changes over time
  • studies natural history
57
Q

Marginal models

A
  • population average
  • compares populations over time
  • evaluates interventions or informs public policy
58
Q

Generalised Estimating Equations (GEEs)

A
  • extension of linear regression
  • models clustered or correlated data
  • offers robust estimates of stardard errors (SEs)
    • allows for clustering of observations
  • estimates regression coefficients and their SEs
  • can deal with normal and non-normal data
  • aims to investigate differences in population averaged responses
  • need to specify correlation structure
    • takes into account within-subject correlation of observations
59
Q

GEE correlation structures

A
  • exchangable
  • autoregressive
  • unstructured
  • independent
60
Q

Assumptions of GEEs

A
  • assumptions of multiple linear regression
  • responses are from a known family of distribution (eg. normal, exponential) with a specified mean and variance, where variance is a function of the mean
  • the mean is a linear function of the predictors
  • correlation structure is specified
  • any missing data is at random
61
Q

Measuring variation in GEEs

A
  • one-way ANOVA measures between group variation and within group variation
  • (explained variance and unexplained variance)
62
Q

Unexplained variation in longitudinal data analysis

A
  • unexplained variation is divided into components, making the ultimate error variance smaller
  • reduced unexplained variability, resulting in better estimates of the effects
63
Q

Mixed effects modelling

A

Similar principles to GEE