Stats Flashcards

1
Q

Likelihood ratio of positive test result

A

sensitivity / (1-specificity)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Median

A

middle item in a data set which has been arranged in numerical order

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

mode

A

most frequent item in a data set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

mean

A

add all items in data set together and divide by the number of items

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Relative risk reduction

A

ARR / CER

ARR: absolute risk reduction (the difference between the two rates in control and treatment group)
CER: event rate in the control group

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are funnel plots primarily used for?

A

Assess for potential publication bias in meta-analyses
Graph the size of the effects found in individual studies against a measure of the study’s precision or size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Chi-squared test (4)

A

Used to assess differences in categorical variables
Non-parametric test
Applies assumption that the sample is large
Compares the observed frequencies against those that would have been expected if there was no difference and then produces a value which can be used to assess if the difference is significant (p<0.05)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Pearson’s correlation coefficient

A
  • Measures linear correlation between 2 variables
  • sign of the correlation coefficient tells us the direction of the linear relationship: negative then trend line slopes down, positive then trend line slopes us
  • the size/magnitude of the correlation coefficient tells us the strength of a linear relationship: >0.90 = strong, 0.65-0.9 = moderate, <0.65 = weak
  • parametric test
  • if the data is non-parametric or if both variables are not ratio variables then Spearman’s should be used
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

The 3 types of t-test

A
  • one sample t-test
  • independent t-test
  • paired t-test
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

one sample t-test

A
  • used to see if there is a difference between a sample mean and the hypothesised population mean
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

independent t-test

A
  • used when you want to compare means from independent groups
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

paired t-test

A
  • used when comparing the means of two groups that are considered to be paired (matched, or dependent)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

ANOVA

A
  • statistical test to demonstrate statistically significant differences between the means of several groups
  • similar to a student’s t-test apart from that ANOVA allows the comparison of more than just 2 means
  • assumes that the variable is normally distributed
  • works by comparing the variance of the means
  • distinguishes between within group variance and between group variance
  • the null hypothesis assumes that the variance of all the means is the same as between group variance
  • the test is based on the ratio of these two variances, known as the F statistic
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Relative risk

A

RR = EER / CER

EER: treatment group risk
CER: control group risk

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

NNT - number needed to treat

A
  • used in assessing the effectiveness of a healthcare intervention
  • represents the average number of patients who need to be treated to prevent one additional bad outcome or produce one additional good outcome
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

RISK

A
  • a proportion
  • probability with which an outcome will occur
  • usually expressed as a decimal between 0-1
  • often expressed as a number of individuals per 1000
  • if risk is 0.1, in a sample of 100 people, the number of events observed will on average be 10
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

ODDS

A
  • odds is a ratio
  • the ratio of the probability that a particular event will occur to the probability that it will not occur
  • can be any number 0-infinity
  • commonly expressed as a ratio of 2 integers, eg odds of 0.01 would be 1:100
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

absolute risk

A

basic risk
in many studies it will just be the incidence rate
in experiments, will be the number of events in that group divided by the number of people in the group

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

risk difference / absolute risk reduction

A

the difference between the absolute risk of an event in the intervention group and the absolute risk in the control group

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

relative risk

A

the ratio of risk in the intervention group to the risk int he control group

1 = estimated effects are the same for both interventions

used in cohort, cross-sectional and randomised control trials

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Positive predictive value (PPV)

A

the probability that subjects with a positive screening test truly have the disease

PPV = true positives / (true positives + false positives)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

sensitivity

A

how well a test can identify true positives from all actual positives

sensitivity = number of true positives / (true positives + false negatives)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

specificity

A

how accurately a test identified those without a condition/disease

specificity = number of true negatives / (true negatives + false positives)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

accuracy

A

how close measurements are to ‘true values’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
negative predictive value (NPV)
likelihood that subjects with a negative screening test truly do not have the disease NPV = number of true negatives / (true negatives + false negatives)
26
how to calculate NNT
NNT = 1 / (CER-EER) or NNT = 1 / absolute risk reduction
27
arithmetic mean
adding up all the values and dividing by the number of values
28
harmonic mean
calculated by dividing the number of observations by the sum of the reciprocal of the value used when there is a time factor involved eg speed
29
generalised mean / power mean
involves raising each value to a specific power, adding together, taking average and then taking the root of that average
30
range
difference between largest and smallest values
31
interquartile range
aka the mid spread difference between the 3rd and 1st quartiles
32
ratio / continuous data
like interval but have true zero points eg kelvin scale temp
33
interval data
measurement where the difference between 2 values is meaningful eg temperature, pH
34
ordinal data
observed values can be put into set categories which themselves can be ordered eg social class
35
nominal data
observed values can be put into set categories which have no particular order or hierarchy. you can count but not order or measure nominal data eg birthplace, eye colour
36
quantitative data
numeric values can be further classified into discrete and continuous types
37
qualitative data
not numerical, usually names AKA categorical or nominal variables
38
endemic
consistent presence and/or usual prevalence of a disease in a population within a geographical area
39
epidemic
refers to an increase, often sudden, in the number of cases of a disease above what is normally expected in that population in that area
40
pandemic
an epidemic that has spread over several countries or continents, usually affecting a large number of people
41
standard error of the mean
standard deviation / square root (number of patients)
42
GRADE system
Grading of Recommendations Assessment, Development and Evaluation rates the quality of evidence in systematic reviews and guidelines classified as high, moderate, low or very low
43
internal validity
the confidence that we can place in the cause and effect relationship in a study. the confidence that we have that the change in the independent variable caused the observed change in the dependent variable (rather than due to poor control of extraneous variables)
44
external validity
the degree to which the conclusions in the study would hold for other persons in other places and at other times ie. its ability to generalise
45
face validity
the general impression of a test if it appears to test what it is meant to
46
content validity
the extent to which a test or measure assesses the full content of a subject or area.
47
criterion validity
concerns the comparison of tests you may wish to compare a new test to see if it works as well as an old, accepted method the correlation coefficient is used to test such comparisons
48
criterion validity (concurrent)
the predictor and criterion data are collected at or about the same time
49
criterion validity (predictive)
the predictor scores are collected first, and criterion data are collected at later point want to know if the test predicts future outcomes
50
construct validity
the extent to which a test measures the construct it aims to
51
construct validity (convergent)
has convergent validity if it has a high correlation with another test that measures the same construct
52
construct validity (divergent)
demonstrated through a low correlation with a test that measures a different construct
53
cost effectiveness analysis (CEA)
compares a number of interventions by relating costs to a single clinical measure of effectiveness cost effectiveness ratio = total cost / units of effectiveness combines costs and effects - usually reported as an incremental cost-effectiveness ratio (ICER)
54
cost benefit analysis (CBA)
technique in which all the costs and benefits of an intervention are measured in terms of money used to establish which of the alternatives has the greatest net benefit requires that all the consequences of an intervention, such as life years saves, symptom relieve etc are all allocated a monetary value
55
cost-utility analysis (CUA)
special form of CEA in which health benefits / outcomes are measured in broader, more generic ways enabling comparisons between treatments for different diseases and conditions
56
cost minimisation analysis (CMA)
economic evaluation in which consequences of competing interventions are the same and in which only inputs (costs) are takin into consideration the aim is to decide the least costly way of achieving the same outcome
57
test-retest reliability
assessed the stability of a measure over time by administering the same test to the same individual on two different occasions
58
split-half reliability
assesses the internal consistency of a test by dividing it into 2 halves and comparing the results of each half ensures consistency within the test items, but does not address stability of the tool over time
59
parallel-forms reliability
involved administering two equivalent forms of a test to the same group and comparing results. valuable for avoiding practice effects
60
internal consistence reliability
measures how consistently items within a test measure the same construct, often using statistical methods like Cronbach's alpha. does not assess stability over time
61
inter-rater reliability
assesses the consistency of scores when different raters or observers administer the test critical in situations where multiple clinicians assess the same patient, but not relevant to determining whether tool yields stable results for the same individual across repeated administrations
62
forrest plot weighting
indicated influence an individual study has on pooled result generally, bigger sample size AND narrower confidence interval, the higher the weight shown by larger box
63
heterogeneity in forrest plots
refers to variability between studies and can affect the ability to combine the data of the individual studies
64
clinical heterogeneity
variability caused by differences in clinical variables, eg patient population, interventions etc clinicians determine clinical heterogeneity - subjective
65
statistical heterogeneity
the variability in effect estimates between the studies and can be quantified by various statistics forrest plots only present the statistical heterogeneity
66
denominator for simple variance
n-1
67
Berkson's bias
occurs when the selection of participants for a study is influenced by their likelihood to seek healthcare may not be representative of the general population, and can lead to an overestimation of the association between diseases
68
observer bias
aka information of measurement bias systematic differences in the way data is collected for different groups could be due to the observer's knowledge about participant's exposure status influencing how they measure outcome variables
69
hawthorne effect
changes in behaviour that occur when individuals know they are being observed
70
verification bias
aka referral or test review bias happens when subjects with positive results are more likely to have their test results confirmed than those with negative results may influence the accuracy of diagnostic tests and overall study results, it isnt an example of selection bias since it doesn't affect who gets selected into a study
71
detection bias
arises from differential methods of detection amongst groups leading to an apparent difference in outcome rates between these groups. often seen in studies where one group receives more frequent screening or follow-up than another group, thereby increasing changes of detecting the disease earlier or more frequently but doesn't pertain to selection into a study which defines selection bias
72
ethnography
qualitative research that seeks to understand and describe the culture or social phenoma from the perspective of the subject group researches immerse themselves in the setting, observing and participating in daily activities - deep understanding of behaviours/beliefs/experiences in that particular cultural context
73
bracketing
method used in qualitative research to mitigate the potential deleterious effects of preconceptions that may taint the research process involves identifying and holding in abeyance preconceived beliefs and opinions about the phenomenon under study
74
grounded theory
research methodology that involves the collection and analysis of data with the aim of developing theories grounded in real-world observations seeks to explain phenomena by generating new theories
75
phenomenology
aims to explore how individuals perceive their experiences about understanding human behaviour from the individuals own subjective viewpoint
76
ROC curv
Receiver Operating Characteristic illustrates diagnostic ability of a binary classifier system as its discrimination threshold is varied plotted as sensitivity vs 1-specificity helps in evaluating the performance of diagnostic tests and making informed decisions about cut-off points to maximise sensitivity and specificity
77
how to calculate standard error of the mean
SEM = standard deviation / square room (number of patients)
78
alpha level
the probability of rejecting a null hypothesis when it is true it represents the threshold at which we decide to reject the null hypothesis commonly set at 0.05
79
Type I errors
the null hypothesis is rejected when it is true aka false positive
80
Type II error
the null hypothesis is accepted when it is false aka false negative
81
P-values
the probability of rejecting the null when it is true... a high p-value indicated a high chance the an observed difference is due to chance and vice versa if p-value is less than the pre-decided cut off, then you reject the null hypothesis
82
Randomisation
method used in the design phase of a study to reduce confounding factors
83
Cumulative incidence
The average risk of getting a disease over a certain period of time. CI = the number of newly detected cases that develop during follow up / the number of disease free subjects available at the start of follow up
84
incidence rate
IR = I / PR I: number of new cases in the cohort PT: person-time - total time disease free individuals in the cohort are observed over the study period
85
prevalence
prevalence = incidence x duration of condition
86
point prevalence
number of cases in a defined population / number of people in a defined population at the same time
87
period prevalence
= number of identified cases during a specified period of time / total number of people in that population
88
area under the curve
the higher the AUC, the better the overall performance of the test (the higher the accuracy)
89
SQUIRE
Standards for Quality Improvement Reporting Excellence 19 item checklist ensure all aspects of QI are thoroughly and transparently conveyed
90
MOOSE
Meta-analysis of Observational Studies in Epidemiology for reporting meta-analyses of observational studies
91
STARD
Standards for reporting of diagnostic accuracy studies reporting studies about diagnostic accuracy
92
PRISMA
Preferred reporting items for systematic reviews and meta-analyses
93
CONSORT
Consolidated standards of reporting trials guidelines for reporting RCTs
94
PICO system
P - patient I - intervention C - comparison O - outcome
95
Cochrane Library
collection of 6 databases: CDSR DARE CENTRAL CMR HTA NHS EED
96
Embase
european database broader range then Medline
97
PsychINFO
database of abstract of literature in the field of psychology produced by American Psychological Association
98
CINAHL
Cumulative Index to Nursing and Allied Health Literature references to journal articles from hundreds of nursing journals from UK, USA and other countries
99
OpenGrey
dedicated to grey literature outside of traditional channels
100
Boolean Logic
AND, OR, NOT can be used to combine search terms must be entered in uppercase letters
101
drug trial phases
1 - small number healthy people. safety, side effects and dose range 2 - larger group (100-300), effectiveness and further safety 3 - large groups (1000-3000), effectiveness, SE, compare to commonly used treatments or placebos 4 - after granted a license, eg safety in pregnancy, finding other potential uses for the drug
102
How many lie within +/-1SD
68.2%
103
How many lie within +/-2SD
95/4%
104
How many lie within +/-3SD
99.7%
105
What is the Kappa statistic
aka Cohen's kappa coefficient gives quantitative measure of the magnitude of agreement between observers can be any value between -1 and 1 0: agreement observed no better than change 1: complete agreement -1: complete disagreement
106
primary evidence
aka empirical research sources that contain original data and analysis from research studies
107
secondary evidence
sources that interpret and analyse primary sources. these sources are one or more steps removed from the event
108
how to calculate odds ratio
OR = (a/b)/(c/d) a: exposure yes, outcome yes b: exposure yes, outcome no c: exposure no, outcome yes d: exposure no, outcome no
109
Fixed effect model
used to measure the impact of variables that vary over time
110
voluntary sampling
made of people who self-select eg invited to participate in a poll same chosen by the participants and not the survey administrator
111
Convenience sampling
made up of people who are easy to reach eg approach at hospital cafe
112
Snowball sampling
one case identifies another of its kind often done in marginalised groups eg IVDU or sex workers
113
Quota sampling
population divided into groups and then elements are selected done to ensure that the sample reflects that characteristics of the population eg proportionate representation of males and females
114
Types of random / probability sampling
Simple random sampling Systematic sampling Cluster sampling Stratified sampling Multistage sampling
115
Types of non-random / non-probability sampling
Voluntary sampling Convenience sampling Snowball sampling Quota sampling
116
Simple random sampling
a sample in which every member of the population has an equal chance of being chosen eg each member of population given unique ID number then randomly selected - often via number generator
117
Systematic sampling
every nth member of population gets selected for the sample easier than simple random sampling, but more prone to bias if there is a pattern in the population that is consistent with the sampling frequency
118
Cluster sampling
Involves dividing a population into separate groups (clusters), and a random sample of clusters is then selected and each element included in the final sample
119
Stratified sampling
An entire population is first divided into groups (strata) and then a random sample taken from each this ensures can obtain equal numbers of individuals eg male and female
120
Multi-stage sampling
more complex method of sampling that involved several steps two or more sampling methods are combined allows you to narrow down a large population
121
likelihood ratio for negative test result
(1-sensitivity)/specificity
122
Delphi method
method for achieving convergence of opinion concerning real-world knowledge solicited from experts within certain topic area
123
Background questions
general questions about coniditions/illnesses/pathophysiology etc
124
Foreground questions
About issues of care - query specialised and distinct knowledge needed for specific and relevant clinical decision-making
125
Box and whisker plot - interquartile range
'mid spread' the difference between the 3rd and 1st quartiles
126
line in the box on box and whisker plots
median - Q2
127
left skewed
more on the left of box and whisker plot negative skewness
128
right skewed
more on the right of the box and whisker plot positive skewness
129
how to calculate prevalence
pre-test odd x likelihood ratio
130
how to calculate post test probability
post-test odds / (1 + post-test odds)
131
loss to follow up bias
when follow up cases are lost continuously - lost cases may have something in common resulting in an unrepresentative sample
132
disease spectrum bias / case-mix bias
when a treatment is studied in more severe forms of a disease such results may then not apply to mild forms of the disease
133
sampling bias
the subjects are not representative of the population - may be due to volunteer bias
134
participation bias / non-response bias
those who participate may have shared characteristics resulting in an unrepresentative sample
135
incidence-prevalence bias (survival bias, Neyman bias)
occurs in case-control studies and is attributed to selective survival among the prevalent cases (ie. mild, clinically resolved or fatal cases excluded from the case group)
136
exclusion bias
occurs when certain patients are excluded for example if they are considered ineligible
137
publication or dissemination bias
many studies may not be published may be due to fact that paper with positive results, and large sample sizes are more likely to get published
138
citation bias
articles of high citation are easy to reach and have higher chance to be entered into a given study
139
berkson's bias aka admission rate bias
a type of selection bias can arise when the sample is taken not form the general population but from a subpopulation eg when cases and controls both sampled from a hospital rather than from the community
140
detection bias
when exposure can influence diagnosis eg women on OCP more frequent smears so more likely to have cervical cancer diagnosis
141
recall bias
in retrospective studies where participants are asked to remember their past exposure to risk factors, it is likely that cases will have thought more about what factors in their past may have caused a disease than controls will have, therefore controls less likely to remember an exposure
142
lead time bias
lead time is the period between early detection of disease and the time of its usual clinical presentation. the lead time must be subtracted from the overall survival time of screened patients to avoid lead time bias. otherwise early detection merely increases the duration of the patients' awareness of their disease without reducing their morbidity or mortality
143
interviewer/observer bias
interviewer or observer knowledge about in-question hypothesis and disease and/or exposure can take effect on collection and registry of data
144
verification and work-up bias
the results of a diagnostic test affect whether the gold standard procedure is used to verify the test result more likely to occur when a preliminary diagnostic test is negative because many gold standard tests can be invasive, expensive and carry a higher risk
145
hawthorn effect
when participants alter their usual behaviour due to their awareness that they are being studied
146
ecological fallacy
when conclusions about individuals are based only on analyses of group data
147
expectation bias (pygmalion effect)
only a problem in non-blinded trials observers may subconsciously measure or report data in a way that favours the expected study outcome
148
late-look bias
gathering information at an inappropriate time eg studying a fatal disease many years later when some of the patients may have died already
149
tests to check that distribution is normally distributed
the kolmogorov-smirnov test jarque-bera test wilk-shapiro test p-plot q-plot
150
purposive sampling
participants selected on purpose because the researcher already knows that they have characteristics of interest
151
triangulation
compares the results from either 2 or more different methods of data collection, or 2 or more data sources
152
respondent validation / aka member checking
includes techniques in which the investigators account is compared with those of the research subjects to establish the level of correspondence between the two sets
153
bracketing
methodological device of phenomenological inquiry that requires deliberate putting aside ones own belief about the phenomenon under investigation or what one already knows about the subject prior to and throughout the phenomenological investigation
154
reflexivity
sensitivity to the ways in which the researcher and the research process have shaped the collected data, including the role of prior assumptions and experience, which can influence even the most avowedly inductive inquiries
155
content analysis
interviews (individual and group) are transcribed to produce texts that can be used to generate coding categories and test theories can involve enumerating procedures such as counting work frequencies, sometimes aided by computer software
156
constant comparison
based on grounded theory allows researchers to identify the themes that are important in a systematic way, providing an audit trail as they proceed used by the researcher to develop concepts from the data by coding and analysing at the same time
157
calculate pre-test odds
pre test probability / (1 - pre test probability)
158
calculate post test odds
pre test odds x (likelihood ratio positive result)
159