Statistics Flashcards

1
Q

What is bimodal distribution?

A

2 peaks in data (two modes)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is standard deviation?

A

Shows the spread of data around the mean
+/- 1SD 68.2%
+/- 2SD 95.4%
+/- 3SD 99.7%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does a large standard deviation mean?

A

Greater spread of data away from the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are confidence intervals?

A

Ranges within which a true value lies
ie we only have mean of samples, we are guessing the true mean of the population

If the CI of two groups do not overlap= significiant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does 95% CI mean

A

We are 95% sure the true mean lies within that range.
If crosses 0, >5% chance nil impact of intervention

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What will a larger study do to CI

A

narrow it

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does it mean if CI includes 1?

A

Intervention makes no difference

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What do they key components of a forest plot mean?

A

diamond= combined estimate of all studies, sat sig if does not cross 0
greatest impact= most positive/negative

left= intervention is better, right = intervention is worse.
Line of no effect- if crosses this, no evidence intervention works
size of square= size of sample
line about square= confidence interval

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the null hypothesis?

A

Intervention has no impact on outcome, any difference found is due to chance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a p value?

A

Probability that any difference noticed between intervention is due to chance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is a significant p value?

A

0.05 = 1 in 20 that observed change is due to chance. Treatment probably did cause outcome.
0.01- highly significant
0.001- very highly significant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are parametric tests used for?

A

Normally distributed data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Which parametric test can be used for >2 samples?

A

ANOVA, to see if means come from same population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Which parametric test is used for 2 samples

A

T/Student’s T, test that the samples come from a population with the same mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Which parametric test is used for 1 sample

A

chi squared- compares improvement with two treatments, gives p value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is paired data?

A

Data from the same population ie the same people before and after treatment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What are non parametric tests used for?

A

Not normal data, may sometimes be used to transform data into normal distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is the Mann Whitney U test used for?

A

Non parametric, compare means between 2 groups and give p value to see if significant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Which of these are parametric?
a) Kruskal Wallis
b) Friedman
c) Wilcoxon Signed Rank
d) ANOVA

A

d) ANOVA- all others are non parametric tests

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is risk?

A

Probability an event will happen 1 in 100 are sick, 1/100= 0.01

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is risk ratio?

A

risk in treated versus untreated group
>1= higher risk if exposed
<1= lower risk if exposed
if CI includes 1= not stat sig

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is odds?

A

Number of times event happens / number of times event does not happen
used in case control studies
ie 1 in 100 are sick. 1/99= 0.0101

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is odds ratio?

A

Odds of exposure in case v control
1= no difference
>1= increased if exposed
<1= decreased if exposed
eg OR = 2.64= 2.62x more likely to have disease

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is ARR?

A

difference in event rate in intervention v control
100/NNT
80% improve in intervention, 60% improve in control. 80-60= 20%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
What is NNT
100/ARR- how many who need to be treated for one person to benefit
26
What is RRR
Proportion by which intervention reduces event rate 40% in placebo and 20% in control =50% RRR
27
What is NNH?
100 / (% with nausea in intervention - % nausea in control) eg 100 / (6-1) = 100/5 = 20 1 in 20 will get nausea
28
What is correlation?
the strength of linear relationship between two variables
29
What is the correlation coefficient? (r)
Strength of the linear relationship between two variables r= 1 (positive- directly related, as one increases so does other) r=-1 (negative- inversely related, as one increases the other decreases) r=0 (no line, random points) Parametric- Pearson's Non-parametric- Kendalls/Spearmans
30
How do we interpret r as a value?
Correlation coefficient 0-0.2= meaningless 0.4-0.6= reasonable 0.6-0.8= high 0.8-1= suspiciously high
31
What tests can we use to assess correlation coefficient?
Pearsons= normal Spearmans= not normal | spearmans= special
32
What is r squared?
How much variation in one value is affected by the other closer to 1= higher correlation
33
What is regression?
How one set of data causes another eg blood glucose and Hba1c We can use one to predict the other using a graph slope of line= regression coefficient univariate- 1 dependent (influenced by something) and 1 independent multivariate- one dependent and 2 or more independent
34
What is regression constant?
Where line crosses vertical axis
35
What is regression equation?
y= a (constant) + b (coefficient) x
36
When do we use logistic regression?
To look at outcome in 1 of 2 groups (has disease/has not)
37
When do we use poisson regression?
study times between events/waiting times
38
when do we use cox regression?
Survival analysis- time until a certain event eg death/discharge
39
What is Kaplan Meier?
Calculates new survival rate after each event | MeIer=survIve
40
What is log rank test?
Compares survival between groups | log = long term
41
What is cox regression?
Explore relationship between event and variable eg death and smoking/BMI 1= same (exposure and control) 2= (double risk if exposure)
42
What is sensitivity?
a/(a+c) pick up rate of a test
43
What is specificity?
d/(d+b) how likely a person without disease tests negative
44
What is PPV?
a/ (a+b) likelihood someone who tests positive has disease
45
What is NPV?
d/ (d+c) Likelihood someone who tests negative does not have disease
46
What is likelihood ratio?
likelihood test result would be expected in someone with v someone without disease sensitivity / (1-specificity) LR =2, if test is +ve this person is twice as likely to have disease than not have it
47
What is Kappa?
How accurately a test can be repeated (ordinal data eg CIN1,2,3) 0= due to chance 0.5= good 0.7= very good 1= perfect ie checking the same sample in two different labs
48
What is Bonneferri?
Multiple testing adjustment More tests gives an increased chance of type 1 error p=0.05= 5% chance error
49
What are tailed tests?
2 tailed= reject null hypothesis, test is better or test is worse 1 tailed= reject null hypothesis= test is only better
50
What is incidence?
Number of new cases over time eg 15/1000 x 100 = 15%
51
What is Prevalence?
Existing cases as a point in time
52
Power
probability that it can detect a statistically significant difference eg if expect 100% cure rate, does not need so many people if expect 1% cure rate, needs a lot more people Probability type 2 error will not be made (>0.8=adequate) -80% likely to find a significant difference -increases with sample size
53
What is Type 1 error?
REJECT TRUE null hypothesis false positive reduced by bonneferri correction
54
What is Type 2 error?
ACCEPT FALSE null hypothesis false negative ie if sample too small or variance too big
55
What is service evaluation?
designed to define/judge current care what standard does this service achieve?
56
What is Quality?
patient experience clinical effectiveness patient safety
57
What is quality framework?
1) Clarify quality 2) Measure and publish results 3) Reward 4) Leadership 5) Innovate 6) Safety
58
Plan Do Study Act
introduce and test potential quality improvements refine prior to wholesale implementation
59
Model for Improvement
decide on measurable QIs and test/refine prior to implementation
60
Performance benchmarking
drive quality improvement by increasing awareness of local/national targets find and share best practice eg KPIs
61
Healthcare failure modes and effect analysis
Identify how a process may fail and assess impact of this
62
Process mapping
map patient journey to identify QI opportunities
63
Statistical process control
measure and control process quality against predefine parameters ensure operating at full potential
64
Root Cause Analysis
identify causes after an event has occurred physical, human or latent fishbone cause and effect model
65
What is Evidence Based Medicine?
Making a clinical decision based on: -research -clinical expertise -patient preference
66
What is internal validity?
To what extent does study measure what it set out to (how good do methods answer research question)
67
What is external validity?
What extent can results be generalised to wider population / real life setting
68
What is efficacy?
Impact under trial conditions
69
What is effectiveness?
Impact under ordinary setting
70
What is PICO?
Patient/problem Intervention Comparison Outcome
71
What is Journal Impact factor?
frequency of citations
72
What is a confounder?
triangular relationship with exposure and outcome associated with, but not consequence of, exposure and outcome eg city living, stress and heart disease +ve = confounder shows an association when there isn't one -ve+ confounder masks association when there is one
73
What is Selection Bias?
issues in recruitment or allocation
74
What is performance bias?
influenced by researcher or participant detection- reduce by blinding attrition- selective drop out reporting-
75
What is an observational descriptive study?
looking at what is observed in a population
76
What is an observational analytic study?
looking at similarities and differences between groups
77
What is an experimental study?
intervene in some way and compare outcome to control
78
What is a longitudinal study?
more than one point in time assess something over days/months/years
79
What is a cross sectional study?
single point in time
80
What is a parallel study?
Looking at two interventions at the same time
81
What is a prospective study?
Present and future collect data as you go along
82
What is a retrospective study?
present and past collect data that already exists
83
What is an ecological study?
Population/community level data (not individuals)
84
What is an explanatory study?
takes place in an ideal setting with homogenous subjects
85
What is a pragmatic study?
takes place in real life eg ward/clinic more effectiveness/real life difficult to blind and limit drop outs
86
Cohort
Observational study Group exposed to risk factor v not exposed prospective attrition bias retrospective- use existing study but add on another outcome inception- recruited early on in disease process before outcome established | co-hort, pro-spective
87
Case- control
Retrospective Look at those who have outcome and do not and ask about exposure quick and cheap recall bias Nested- take population from cohort study and ask about previous exposures case cohort- control group is from initial at risk population | have to be cases (already have disease) so must be retrospective
88
Austin Bradford Hill
9 considerations of association versus causation -strength (strong/large) -consistency (replicated in other studies) -specificity (specific disease) -temporality (exposure precedes disease) -biological gradient (more exposure = higher risk) -plausibility- can we explain causation with science -coherence (consistent with natural history) -experimental evidence (other studies) -analogy (have we seen similar relationships) ## Footnote austin= association
89
Rothman and Greenland
Sufficient cause -minimal conditions and events that inevitably cause disease Component cause -acts with others to cause disease (eg genetic/environmental factors) Rothman's pie
90
Cross-sectional
prevalence of exposure and outcome at single point in time establish associations, not cause and effects
91
Uncontrolled
All participants get the same treatment
92
Controlled
two treatments and compare outcome
93
RCT
random allocation into groups reduces selection bias equally distribute confounders measure efficacy allows for meta analysis
94
Crossover trial
receive one treatment then switch to another treatment, check which made better outcome eg if lack of subjects
95
n of 1 trial
single subject repeated experimental analysis minimal generalisability
96
Factorial study
assess impact of >1 intervention eg group intervention a- then intervention c or d intervention b- then intevevention c or d
97
Phase 0
human microdosing give small doses and assess bioavailability/half life
98
Phase 1
small group of healthy people dosage range/ side effects
99
Phase 2
People with that illness look at effectiveness/safety profile
100
Phase 3
large groups of people effectiveness, dose range, duration, side effects, new treatment v previous treatments Good results= marketing authorisation
101
Phase 4
post-marketing surveillance benefits and side effects in different populations new safety concerns
102
Random sampling
all have equal chance
103
systematic/quasi random sampling
every nth person
104
stratified sampling
based on characteristics eg ethnicity
105
cluster sampling
population put into similar representative clusters, some clusters used
106
convenience sampling
whoever appears
107
snowball sampling
one patient tells their friends
108
Bias of sampling
admission rate- only those who attend healthcare are picked up diagnostic purity- comorbidities excluded membership bias- those in a club/group historical control- subjects chosen over time as definitions change
109
Response bias
Those who volunteer to take part may not reflect popultion
110
Matching
Demographic age/gender/ethnicity Lifestyle smoking Disease comorbidities Treatment factors distributing confounders between groups
111
Types of Randomisation
simple- by subject block- each block given a group of same numbers stratified- like block but distributing characteristics
112
Minimisation
random allocation impacted by those already allocated to keep groups similar
113
Blinding
reduced observation bias single- researcher/participant double- both triple- both + analyst
114
Nocebo effect
Negative effects of a dummy pill
115
Concealed allocation
reduces selection bias
116
Ascertainment bias
researcher not blinded so changes the way questions are asked asc= ask
117
Response bias
participant not blinded so responds differently
118
Hawthorne effect
Participant changes behaviour as aware they're in a study
119
Recall bias
Selective remembering eg case-control
120
What are endpoints?
Clinical- mortality/morbidity/survival Surrogate- predict a clinical benefit eg LDL Composite- many clinical events, if one of these occurs Secondary- other characteristics measure to help describe treatment
121
Validity
Face- does test measure what it's supposed to Content- test measure variables that it should eg exercise ability as a surrogate for CVD
122
Criterion
concurrent- current test measures in the same way a good test would predictive- current test predicts what it is supposed to
123
variable
an entity that can take on value eg gender
124
attribute
eg male/female
125
parameter
numeric quantity that characterises population eg mean or standard deviation
126
Accuracy
how close measurement is to the true value
127
Precision
how close repeats of the test are | precision=repeats
128
Incidence
New cases over a time period
129
Mortality
Rate= deaths in time period/population size ratio = rates of study v general population (lower = better)
130
Morbidity Rate
number of new cases/size of at risk population
131
Point prevalence
proportion of population with disease at a point in time number with disease/number in population
132
Period prevalence
point prevalence (number with disease/number in population) in a set time period
133
Types of Data
Nominal- colours ordinal- mild, moderate, severe interval- temp (no true 0) ratio- scale with a true 0
134
Probability distribution
likelihood of value of a random variable ie heads/tails =0.5 2 heads= 0.25 discrete- only whole numbers like above continuous= any numbers
135
Binomial distribution
two possible outcomes in a fixed number of runs, each run is independent toss a coin 5 times Bernoullis= only one turn | binomial + bernoulli
136
Poisson distribution
repeat runs of a random variable with two outcomes, not fixed number of turns eg if 5 births/day on average what is the likelihood here will be 6 tomorrow
137
Normal distribution
symmetrical around mean
138
Modal
unimodal= one peak in data bimodal= multiple peaks
139
Variance
dispersion around the mean
140
standard deviation
degree of data spread around mean /precision square root of variance large sd= larger spread
141
Effect size
mean of experiment - mean of control /by sd larger= greater impact
142
Coefficient of variation
compare spread of data between two studies using different values
143
Coefficient of skewness
symmetry of data +ve- skewed to tail extends to R -ve- tail extends to left 0= symmetrical | nEg=lEft
144
Coefficient of kurtosis
peakedness of data
145
Standard error of mean
sd of the sample means 95% +/- 1.96 SE
146
Confidence Interval
range in which we are 95% sure population result lies based on sample result shown by error bars
147
Per protocol analysis
Only include those with full compliance to trial protocol explanatory approach
148
Intention to Treat analysis
Include all subjects, whether they complied Reflects real life pragmatic approach
149
Imputation
substitute missing data so data can be analysed
150
Control event rate
C / C+D
151
Experimental event rate
A / A+B
152
ARR
CER - EER -ve = increase eg 0.8 in control, 0.4 in intervention. 40% less likely to get disease if given rx
153
Relative risk
EER / CER ratio of risk of outcome =1 = same >1 - increased risk if exposed <1 = reduced risk 2= double risk
154
RRR
CER- EER / CER
155
NNT
1/ARR lower = better number of people you need to treat for one good outcome
156
Odds Ratio
(a/b) / (c/d) how likely outcomes are between the groups 1= no effect >1= more likely if exposred <1 = less likely if exposed
157
NNH
Number of people needed to be exposed for one bad outcome
158
Risk benefit ratio
NNH (round down to whole number) : NNT (round up to whole number)
159
Null hypothesis
Assume any difference is due to chance ie no relationship between exposure and outcome, any difference between groups is due to chance if alpha = 0.05, nul hypothesis is true, results occur 5% of timw
160
P value
probability observed results are due to chance lower = less likely <0.05 = stat sig
161
Tailed tests
1= 1 direction of interest (greater than or less than-only looking at one way) 2= 2 directions of interest (greater than and less than- accept may be either way)
162
1 sample- categorical test
chi squared fisher's exact if small
163
1 sample- non-normal test
Sign Wilcoxon Sign Ranked | sign= singular
164
1 sample- normal test
Student's T
165
2 samples- unpaired
Chi squared FIsher's exact (small)
166
2 samples- paired
McNemar's
167
2 sample- non normal and unpaired
Mann Whitney U | unpaired=u
168
2 sample- non normal and paired
Wilcoxon
169
2 samples- normal
Student's T
170
>2 samples, categorical
unpaired- chi squared paired- Mcnemar's
171
>2 samples non normal
unpaired- ANOVA/Kruskal-Wallis paired- Freidman's
172
>2 samples normal
ANOVA one way (unpaired) ANOVA repeated measures (paired)
173
Categorical data test
paired- McNemar's unpaired + large- chi squared unpaired + small- Fisher's exact | mcf
174
What do parametric/non parametric tests do?
Non-parametric compare medians Parametric compare means
175
What does paired data mean?
Same individuals at different time points unpaired= different subjects
176
Fragility Index
number of people to have a different outcome for trial to be non significant smaller trial- more fragile closer to 0= less fragile
177
Equivalence study
show equivalence between two drugs - new rx as effective as established one
178
Non-inferiority study
new drug is no worse than established drug
179
Class effect
similar outcomes, therapeutic and adverse effects of two or more drugs
180
Serial testing
If one test is +ve we do another to confirm ie HIV/syphilis
181
Parallel testing
many tests run at the same time to increase sensitivity
182
What is a consort checklist?
A checklist used to increase quality of RCT reports
183
Hazard ratio
1= equal hazard rate >1 = experiment has higher hazard rate <1 experiment has lower hazard rate use Cox regression
184
What is grounded theory?
qualitative study do not start with a theory, theory is developed from data collection
185
What is a phenomenological study?
qualitative, looking into the meaning of a lived experience
186
What is an ethnographic study?
learning from a group to interpret something
187
What is a historical study?
anticipate future events by learning from the past
188
purposive sampling
select those with knowledge
189
quotive sampling
select those with characteristics
190
homogeneity
studies have similar results ie 0% heterogeneity
191
heterogeneity
25% low 50% moderate 75% high variation in results between studies fixed effects model= no heterogeneity test with- forest plot/cochran's/ I2
192
How to test for publication bias
funnel plot/galbraith's
193
Hierarchy o evidence
1) Metanalysis/Systematic Review/RCT 2) Systematic review of case control/cohort 3) Case control/cohort 4) Case report/series 5) Expert opinion
194
what does R² = 0.64
means that the independent variables (like age and comorbidities) explain 64% of the variation in hospital stay length
195
concealed allocation and bias
selection bias
196
double blinding and bias
measurement bias