Week 7 - Selection Bias, sampling methods and information bias Flashcards

1
Q

What is random error?

A

Random error is error introduced solely by chance and is
inherent in the sampling process

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is systematic error?

A

Also called bias
Systematic error is introduced via manmade actions relating to the conduct of a study

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the sample vs. true population?

A

We do not measure the true population measure (mean,
%, etc) but an estimate of that based on representative
sampling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How can we decrease the random error in epidemiological studies?

A
  • Chance/random bias decreases with increase in the
    sample size
  • Goes down to zero if the total population is included
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a confidence interval of sample estimates?

A
  • A confidence interval indicates the level of uncertainty
    around the estimated measure
  • Most studiesreport the 95% confidence interval (95%CI)
  • 95%CI indicates a range within which we can be 95%
    certain/confident that the true population measure lies
    there; the larger the sample size the narrower is the
    95%CI
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How can we lower systematic error?

A
  • Systematic bias are not influenced by sample size
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is selection bias?

A
  • Selection bias is systematic error resulting from the fact
    that the participants included in the study are not
    representative of the population from where they were
    selected (source population)
  • Selection bias leads to a biased sample, which almost
    always, will give rise to biased estimates
  • The sampling method of choice plays a major role in the
    representativeness of the sample
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a representative sample?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a non-representative sample?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the three sampling methods?

A
  1. Probability (random) sampling: sample selected by
    probabilistic methods; involves random selection,
    allowing you to make strong statistical inferences about
    the whole group
  2. Systematic sampling: sample selected according to some
    simple, systematic rule
  3. Non-probability sampling: sample selected by easily
    employed (convenient); involves non-random selection
    based on convenience or other criteria, allowing you to
    easily collect data.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Sampling methods summary

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is simple random sampling?

A
  • Often referred to simply as ‘random sampling’
  • The most straight-forward of all random sampling methods
  • All individuals in the sampling frame have the same
    probability of being selected independently of all others
  • It is mainly used in quantitative research.
  • Given a large sample size, random sampling ensures the
    chosen individuals are representative of the source
    population
    – Demography (e.g. age, sex, ethnicity)
    – Other important factors (e.g., clinical history, current disease status,
    lifestyle factors, etc.)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the advantages and disadvantages of Simple Random Sampling?

A

Advantages
* Ensures a representative
sample from the source
population
– Provided that the sample size is
large enough
* Less costly and less time
consuming from other more
sophisticated sampling
methods
* Ideal for quantitative studies
& test of hypothesis
Disadvantages
* If the sampling frame is too
large and/or the population
is geographically diverse it
may be impractical to
perform
* If a large sample is required,
simple random sampling
may be time consuming and
costly

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is Stratified Random Sampling?

A
  • Same principles as simple random sampling but
    within strata (subgroups) of the population
    – in terms of key demographic characteristics
  • The size of the random sample should be proportional
    to the specific stratum size in the population
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

An example stratified random sampling.

A
  • The company has 800 female employees and
    200 male employees.
  • You need a sample of 100
  • You sort the population into two strata based
    on gender.
  • You want to ensure that the sample reflects
    the gender balance of the company so you use
    random sampling on each group, selecting 80
    women and 20 men, which gives you a
    representative sample of 100 people.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the procedure Stratified Random Sampling?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What are the advantages and disadvantages of Stratified Random Sampling?

A

Advantages
* It allows you draw more
precise conclusions by
ensuring that every
subgroup is properly
represented in the sample.
* Enables the comparison of
population sub-groups
Disadvantages
* More time-consuming than
simple random sampling
* Higher complexity might
give rise to errors (e.g.
stratification not conducted
properly)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is cluster sampling?

A
  • Based on the hierarchical structure of natural clusters
    (groups) of individuals within the population
    – Natural clusters may be hospitals, schools, streets, city
    districts, etc.
  • Involves taking a random sample of these natural clusters,
    and then selecting all individuals in the selected clusters
  • The sampling frame is a list of all clusters.
  • If it is practically possible, you might include every
    individual from each sampled cluster. If the clusters
    themselves are large, you can also sample individuals from
    within each cluster using one of the techniques above
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What are cluster sampling?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What are the advantages and disadvantages of cluster sampling?

A

Advantages
* Good for dealing with large
and dispersed populations
* Less costly and less time
consuming
Disadvantages
* Substantial differences between
clusters can cause errors
* It’s difficult to guarantee that the
sampled clusters are really
representative of the whole
population
* Representativeness may be
compromised if
– Too few clusters are selected and/or
– Clusters are too specific and/or
– Clusters contain too few individuals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is multi-stage sampling?

A
  • Utilizes the hierarchical structure of natural clusters (groups)
    of individuals within the population
    – Similarly to cluster sampling
  • After randomly selecting clusters, there is a random
    selection of individuals within the cluster
  • May involve several random sampling stages:
    – Stage 1: Random selection of large clusters e.g. schools
    – Stage 2: Random selection of smaller clusters within large clusters
    e.g. class
    – Stage 3: Random selection of individuals within smaller clusters
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What are the advantages and disadvantages of Multi-stage Sampling?

A

Advantages
* Multi-stage sampling may
improve sample
representativeness (compared to
simple random sampling)
– Especially if the population is
geographically diverse and/or the
sample is too small
* Less costly and less time
consuming (depending on the
number of stages however)
Disadvantages
* The representativeness of the
sample may be compromised if
– Too few clusters are selected
and/or
– Clusters are too specific and/or
– Clusters contain too few
individuals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is Systematic Sampling?

A
  • Sample selected according to some simple, systematic rule,
    but not randomly
  • Sample may end up being equivalent to a simple random
    sample, provided there was no biasing pattern in the system
    of selection
24
Q

What is the Systematic Sampling procedure?

25
What are the advantages and disadvantages of Systematic Sampling?
*Advantages* * An acceptable, more convenient, alternative approach if for some reason random sampling is not possible * Faster and possibly also cheaper *Disadvantages* * The representativeness of the sample may be compromised if the system of choice selects individuals in a non-random fashion
26
What is Proportional Quota Sampling?
* Same principle as stratified random sampling – The sample is selected on a weighted manner based on predefined strata (distinct population subgroups) * Strata instead of being filled by random sampling, they are filled by non-random sampling (systematic or other) – For example, if a total sample size of 1000 is required and the population consists of 40% women and 60% men, then (non-random) sampling will continue until these percentages are obtained and the overall sample quota met
27
What is the Proportional Quota Sampling procedure?
*Advantages* * An acceptable, more convenient, alternative approach if for some reason stratified random sampling is not possible * Compared to simple systematic sampling, could ensure the original population structure as it uses predefined population strata *Disadvantages* * The representativeness of the sample may be compromised as individuals are selected in a nonrandom fashion
28
What is convenience sampling?
* Convenience sampling is the most frequent example of non-probability sampling * Individuals are selected in a non-random fashion, solely based on convenience (i.e. they are easy to access)
29
What is the Convenience Sampling procedure?
30
What are the advantages and disadvantages of Convenience Sampling?
*Advantages* * Cheap, fast and convenient *Disadvantages* * The representativeness of the sample will definitely be compromised as individuals are selected in a nonrandom fashion
31
How do you know which sampling method to choose?
* Depends on: – The aim of the study – The nature of the source population – The sample size – Other practical issues (i.e. financial resources, time availability, etc.) * When no financial and time constrains exist: – Always strongly advised to use probability (random) sampling techniques in order to minimize selection bias – Stratified random sampling is the ideal method if the sample is small * When non-random sampling techniques have been used: – The representativeness of the sample is always questionable – Assume that selection bias is operating at some extent
32
How does sampling method affect descriptive research?
In descriptive research (i.e. investigating the prevalence of a disease in a population): – Extremely important to have a perfectly representative sample, as selection bias will greatly influence the findings
33
How does sampling method affect analytic research?
In analytic research (i.e. investigating exposure-outcome associations): – Minor deviations from a perfectly representative sample may be acceptable * Minor selection bias may not affect the findings at a large extent
34
Which sampling method is not prefered?
Convenience Sampling
35
What are the 2 types Systematic Error (bias)?
1. Selection bias: Systematic error arising from mistakes conducted during the selection of the study sample. 2. Information bias: Systematic error arising from mistakes conducted during the measurement of key study variables (exposure and outcome).
36
What is information bias?
* Information bias arises from wrong / inaccurate assessment of either the exposure or the outcome variables * Such mistakes may arise from the researchers’ part (unintentionally) or from the participants’ part (unintentionally or intentionally) * There is also instrument bias (fault of the instrument) which falls under researcher’s part
37
What is assessor bias?
* Wrong/inaccurate diagnosis due to a clinical error * May occur when researchers are not “blinded” to exposure or outcome status of participants * Wrong/inaccurate measurements due to a faulty instrument/machine * Wrong/inaccurate measurements due to poor training of assessor * Mistakes during recording of the data and transferring data from paper form into electronic form
38
How can information bias arise from participant action / misinterpreting?
* Wrong/inaccurate answers from participants due to misinterpretation of a question * Wrong/inaccurate answers from participants due to a sensitive issue relating to the question * Wrong/inaccurate answers from participants due to poor recall (recall bias) * Wrong/inaccurate answers from participants intentionally * Overall, information bias arising from participant actions is called response bias
39
What are the 6 types of information bias?
1. Recall Bias 2. Interviewer Bias 3. Observer bias 4. Hawthorne effect 5. Surveillance bias 6. Misclassification bias
40
What is recall bias?
Those participant with a particular outcome or exposure may remember events more clearly or amplify their recollections – very common in case-control studies- the primary difference arises more from under-reporting of exposures in the control group rather than over reporting in the case group
41
What is interviewer bias?
A researcher’s knowledge may influence the structure of questions and the manner of presentation, which may influence responders – any study design (especially if they are not blinded to exposures)
42
What is observer bias?
Researchers may have preconceived expectations of what they should find in an examination (especially if they are not blinded to exposures or medical history)
43
What is the Hawthorne effect?
Participants act differently if they know they are being watched.
44
What is Surveillance bias?
The group with the known exposure or outcome may be followed more closely or longer than the comparison group (researcher’s bias).
45
What is Misclassification bias?
Errors are made in classifying either disease or exposure status (instrument).
46
What are the two types of errors?
1. Systematic error: a. Information error b. Selection error 2. Random error
47
How can you minimize bias?
* Be purposeful in the study design to minimize the chance for bias; Example: use more than one control group * Define, a priori, who is a case or what constitutes exposure so that there is no overlap; Define categories within groups clearly (age groups, aggregates of person years) * Set up strict guidelines for data collection – Train observers or interviewers to obtain data in the same fashion – It is preferable to use more than one observer or interviewer, but not so many that they cannot be trained in an identical manner – Optimize questionnaire
48
How does information bias affect study results?
1. Fundamental principle of research: If you want to investigate any association between two factors, first make sure you measure these two factors accurately! 2. Information bias can be introduced in the assessment of both the main exposure and the main outcome, thus the association between them will definitely be distorted 3. Information bias arising from participant actions is much more common compared to information bias arising from researcher actions 4. Information bias affects mainly studies that rely on self-reports (i.e. questionnaire-based data collection) – In outcome assessment (measurement), in studies where self-reported disease status is used, there is usually double-checking (confirmation) with the personal GP of the participant or through medical records – Similarly, while assessing exposures (diet, physical activity, smoking, educational attainment, etc.), the most valid and reliable instruments have to be used 5. If a study relies solely on self-reports, then it should be assumed that information bias (measurement error) is operating to some extent 6. The presence of information bias always compromises the validity of the study results and in such a case, findings have to be interpreted with great caution
49
What should all assessment tools have?
1. Validity 2. Reliability
50
What is validity?
The extent to which an assessment tool (e.g. questionnaire, instrument, etc.) measures accurately what it is intended to measure
51
What is criterion validity?
Criterion validity is the most common type of validity used in medical research. In such a case, the results from the assessment tool of interest are compared with those of an established (known as gold standard) assessment tool
52
What is reliability?
The overall consistency of a measure, as regards producing the same results when administered under the same conditions in the same group of people. Also known as reproducibility or repeatability
53
What are the two main types of reliability?
1. Inter-observer reliability: The degree of agreement between the results when two or more researchers (observers) administer the assessment tool on the same people under the same conditions 2. Intra-observer reliability: Describes the agreement between results when the assessment tool is used by the same researcher (observer) on two or more occasions (under the same conditions and in the same test population)
54
What is internally valid?
If a determination is made that the findings of a study were not due to any one of these three sources of error, then the study is considered internally valid. In other words, the conclusions reached are likely to be correct for the circumstances of that particular study.
55
What is external validity?
This does not necessarily mean that the findings can be generalized to other circumstances (external validity)
56
NB!
DO NOT COMPROMISE INTERNAL VALIDITY IN THE GOAL OF GENERALISATION