STATS/CA Flashcards
(48 cards)
what does association mean in statistics
two variables associated with one another in presence
what is spurious
relationship exists but is misleading as its due to the influence of confounding
indirect and direct association
Indirect: impact of a confounder
direct: associated with one another one impacts other
what is labbe graph
the event rate in the experimental (intervention) group against the event rate in the control group, as an aid to exploring the heterogeneity of effect estimates within a meta-analysis
what is shown below and above the line in the graph
above -favours experimental Y
below -favours control X
what are the two criteria used for causation
Susser criteria
Bradford Hill
what are the key terms for both?
Time order
Association
Direction
TAD
SSTCC
Strength
Specificity
Temporality
Consistency
Coherence
what does direction and coherence refer to
plausible
based on knowledge /theory too
Temporality and time order - cause precedes the effect
what is bias ?
systematic error - conclusions that are incorrect
what is a confounding factor
may influence the independent or dependent variabl
ecological bias
conclusions on individuals drawn from group studies
lead time bias
time between early detection and onset of disease
what is the pygamalion effect
subconciously meaure data which favours outcome
late look bias
gather info at inappropriate time
such as studying disease when patient has died
phases of a trial
I
II
III
IV
i -healthy /pharmacological
ii -lager group/efficacy
— even larger/confirmatory
— after licence/post marketing
how can u reduce confounding
randomisation
restriction
matching
what can be done at analysis stage to reduce confounding
stratification
what is regression?
to predict values of other dependent variables from indpdendent
i.e. linear , non linear + or - regiresison
weak, moderate, strong
what can be used for variables correlation coefficient
pearsons
parametrics
what if it is non parametric
then you’d use spearmann
RS- sample, p for population
parametric vs non parametric
The basic idea behind the Parametric method is that there is a set of fixed parameters that are used to determine a probability model that is used in Machine Learning as well. Parametric methods are those methods for which we priory know that the population is normal, or if not then we can easily approximate it using a Normal Distribution which is possible by invoking the Central Limit Theorem.
Parameters for using the normal distribution are as follows:
Mean
Standard Deviation
Non-parametric methods are statistical techniques that do not rely on specific assumptions about the underlying distribution of the population being studied. These methods are often referred to as “distribution-free” methods because they make no assumptions about the shape of the distribution.
what is discrete data
certain values i.e. no of exercebations within the 1 year
vs continous e.g. weight
intervAL or ratio data
ratio like weight:height -BMI meanginful
interval- diff in two values -meaningful
berkson bias
hospital
patient selected
case control with those as controls