Statistics Flashcards

1
Q

Formula for standard error of the mean (SEM)?

A

SEM = SD / square root on (n)

SD - standard deviation
n = sample size

SEM gets smaller as sample size (n) increases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Definition of power of a study?

A

Power = 1 - the probability of type II error
The probability that a statistically significant difference will be detected
Probability of (correctly) rejecting the null hypothesis when it is false OR
Probability of confirming the alternative hypothesis when the alternative hypothesis is true
Power can be increased by increasing the sample size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Examples of observational studies

A

Cohort study
Case-control study
Cross-sectional study
Case series

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Studies organised in level of evidence they provide.

A
Systematic reviews  
RCTs
Cohort studies
Case-control studies
Cross-sectional studies
Case series
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Prospective cohort study

A

Sample recruited from population in the present, relevant predictors are measured, cohort is followed overtime to measure outcomes
Usual outcome measure is relative risk

Pro: more control over what is measured and how; can measure confounders
Con: expensive; wait until outcome occurs; rare outcome = need more participants

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Retrospective cohort study

A

Cohort assembled after an outcome has occurred using stored data

Pro: cheaper, faster
Con: data quality limited

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Case-control study

A

Start off with people with the disease and ask about exposure
Usual outcome measure is odds ratio

Pro: efficient for rare diseases and outbreaks
Con: hard to find matched controls; recall bias; confounding

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Cross-sectional study

A

Random sample of a population in a point in time.

Descriptive: prevalence of a disease or exposure

Analytic: examine relationship between between different things e.g. obesity and arthritis

Can provide evidence of association but not about causality (hard to determine what came first)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Best study design for an intervention question?

A

Best primary study: RCT

Highest level of evidence: systematic review of RCTs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Best study design for question of harm or prognosis?

A

Prospective cohort study
Individual prospective cohort study
Retrospective cohort study
Case-control study

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Best study type for questions of diagnostic test accuracy

A

Cross-sectional analytic study where the 2 tests are performed on the study participants

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Best primary study type for prevalence of disease?

A

Cross-sectional descriptive study

Burden of disease

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Best primary study type for incidence of disease?

A

Cohort study

Specified period of time; looks at cause of disease

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Relative risk

A

The risk of something occurring relative to the chance of it occurring under different circumstances

= (incidence in exposed)/(incidence in unexposed)
i.e. use division

RR <1: treatment is beneficial
RR >1: treatment is harmful
RR = 1: treatment has no effect

Used in RCTs and cohort studies - need to know incidence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Absolute Risk Reduction

A

= (incidence of disease in exposed) - (incidence of disease in unexposed)
i.e. Use subtraction

Must remain aware if exposure has increased or decreased the risk

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Number needed to treat

A

Number of people that need to be treated in order to prevent one negative outcome

NNT = 1 / (risk difference)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Odds Ratio

A

= (odds of exposure to the risk factor of interest in the cases) / (odds of exposure to the risk factor of interest in controls)
Used in case control studies

OR 0.6 = the exposed group is 40% less likely to develop specific outcomes compared to the control group

OR 1.5 = risk increased by 50%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

P value

A

Probability that the observed results of the study are due to chance rather than an actual effect

IF p<0.05, the probability of getting the results by chance alone is 5% (i.e. statistically significant)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Confidence intervals

A

Provides us with a range within which we would expect the true effect to lie

Wide CI = poor precision
Narrow CI = good precision

IF using RR

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Random error

A

Chance
Gives results either side of the true answer with the mean of all results being close to the true answer

Narrow confidence interval = less random error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Systematic error

A

Bias

Differ in one direction from the truth

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Internal validity

A

How likely it is that the results are are correct for the sample of participants being studied.

Selection bias impacts the internal validity of a study

23
Q

External validity

A

How likely it is that the results will hold true for other settings
= generalisability of the study

24
Q

State 2 principles of a confounder

A
  1. has to be associated both with the risk factor of choice and the outcome
  2. fits into the causal pathway between the risk factor and the outcome (i.e. intervening variable)

Biases the results

25
Effect modification
Where the risk factor or intervention acts differently in one group compared to another E.g. UV exposure, increased risk of melanoma and skin type
26
Loss to follow up
Losses before randomisation: affect the generalisability of our study Losses after randomisation: relate to risk of bias
27
Intention to treat analysis
Means that we analyse people in the groups that they were originally randomised to, regardless of what actually happens during the study Pro: preserves the effect of randomisation Con: dilutes power
28
Composite endpoints
Rather than looking at several outcomes separately a study will combine several outcomes into the one composite measure that is used as the outcome Why are they used? - smaller sample size required to show effect - allows assessment of ‘net’ effect of intervention Why does it matter? - Outcomes of high clinical importance can be grouped with those of minor importance - Overestimate benefit of intervention
29
What is a funnel plot?
Special graph produced to assess likelihood of publication bias; must have >10 studies Point estimate of the effect (e.g. RR or OR) plotted against a measure of the study’s size or precision True value down centre - smaller studies = larger scatter - larger studies = closer to the true value
30
Sensitivity
Portion of those WITH the disease who have a positive test (i.e. true positive) Sensitivity = TP / (TP + FN) SnNout When a highly sensitive test (Sn) Is Negative (N) the disease is ruled out (out) If you want to avoid false negatives choose a test with high sensitivity (negative result in a sensitive test = confident patient doesn’t have disease)
31
Specificity
The proportion of those without the disease who have a negative test (i.e. true negative) Specificity = TN / (TN + FP) SpPin When a highly specificities test (Sp) Is Positive (P) The disease is ruled in (in) If you want to avoid false positives choose a test with high specificity
32
Positive predictive value
Probability of disease in those who test positive = (TP) / (TP + FP) Higher prevalence = higher PPV, lower NPV Lower prevalence = lower PPV, higher NPV
33
Negative predictive value
Probability of no disease in those who test negative = TN / (TN + FN) Higher prevalence = higher PPV, lower NPV Lower prevalence = lower PPV, higher NPV NPV / PPV depend upon the prevalence of the characteristic in a given population
34
Positive likelihood ratio
= (probability of a +ve test in those with the disease) / ( probability of a +ve test in those without disease) i.e. sensitivity / 1-specificity Larger PLR = greater likelihood of disease PLR > 10 will be useful in ruling in disease PLR = 1 indicates a useless test
35
Negative likelihood ratio
= (probability of -ve test in those with disease) / (probability of -be test in those without disease i.e. (1-sensitivity) / specificity Smaller NLR = lower likelihood of disease NLR <0.1 will be useful in ruling out disease NLR = 1 indicates a useless test
36
Bias in screening
Lead time bias - apparent longer survival in screen detected cases as identified at earlier point in disease Length time bias - slowly progressive disease more likely to be picked up by screening
37
Level of evidence
Ia- evidence from meta-analysis of RCTs Ib - evidence from at least one RCT IIa - evidence from at least one well designed controlled trial that is not randomised IIb - evidence from at least one well designed experimental trial III - evidence from case, correlation and comparative studies IV - evidence from a panel of experts Grade A - based on evidence from at least 1 RCT Grade B - based on evidence from non-RCT Grade C - based on evidence from a panel of experts
38
Post test probability
Pre test probability = prevalence Post test probability = prevalence x LR Post test probability after a +ve test = prevalence x PLR Post test probability = (post-test odds)/(post test odds + 1) Post test odds = (pre-test odds) x (likelihood ratio)
39
Best estimate of prevalence?
Prevalence = incidence x duration E.g. disease has annual incidence of 15 cases per 100,000. Mean survival after diagnosis is 5yrs. Prevalence = (15 per 100,000) x 5 = 75 per 100,000
40
Type 1 error
Rejecting the null hypothesis when it is in fact true OR Accepting the alternative hypothesis when it is in fact false (i.e. a false positive result) p value = probability of a type 1 error
41
p value
Probability of a type 1 error OR The probability of finding a difference when there is one Significance is conventionally set at p < 0.5
42
Type 2 Error
= power Accepting the null hypothesis when it is false Observing no difference when there is one A false negative result Rejecting an alternative hypothesis when it is true
43
Power
= 1 - probability of Type 2 error Likelihood of finding an effect when it is present i.e. likelihood of avoiding false negatives
44
Main modifiers of power
1. Size of effect - more difficult to detect small effects 2. Sample size - larger sample size = easier to detect effect 3. Desired significance - i.e. p<0.001 will conclude fewer positive than p<0.05 4. Standard deviation
45
Multivariate analysis
Used to determine whether or not confounding is occurring due to other factors
46
ROC curve
Y axis: true positive (sensitivity) X axis: false positive (1-specificity) Test with good performance swoops into the top L corner Test close to a diagonal line is no better than chance at discriminating between those with and those without the disease
47
Student’s T-test
Parametric (normally distributed) | Paired or unpaired
48
Pearson’s product moment coefficient
Parametric | Correlation of 2 variables
49
Mann-Whitney U Test
Non-parametric | Unpaired data
50
Wilcoxon signed rank test
Non-parametric | Compares 2 sets of observations on a single sample
51
Chi squared test
Non-parametric | Used to compare proportions or percentages
52
Spearman, Kendall Rank
Non parametric | Correlation
53
Paired vs unpaired data
Paired data: obtained from a single group of patients e.g. measurement before and after an intervention Unpaired data: 2 different groups of patients e.e. Comparing response to different interventions in 2 groups
54
Hazard Ratio
Similar to relative risk but used when risk is not constant in time. Typically used when analysing survival over time Reduction in risk of death or progression HR of 0.84 = 16% reduction in risk