Sampling Error and Bias Flashcards by Ellen Masters

Why does increasing sample size reduce standard error?

The law of large numbers. Extreme values have less influence on the average. Kind of diluted.

How well did you know this?

Not at all

Perfectly

What are the 2 ways to increase power of a study?

Increase sample size

Reduce variability - sample from a more homogeneous population

How well did you know this?

Not at all

Perfectly

What are type I and type II errors?

Type I is where you wrongly reject the null hypothesis - thinking a difference exists when it doesn’t in reality.

Type II is where you wrongly accept the null hypothesis - - assuming no difference exists when it does in reality.

How well did you know this?

Not at all

Perfectly

What is random error and how is it measures?

The natural variation that occurs through a random sample. Measured by standard error.

How well did you know this?

Not at all

Perfectly

How can you reduce the effect of random error?

Increasing sample size

How well did you know this?

Not at all

Perfectly

What are the types of systematic error (bias)?

Measurement error, sampling error and reporting error

How well did you know this?

Not at all

Perfectly

How can sampling/selection bias occur?

Sample drawn not representative of the population

undercoverage e.g. online surveys underrepresent elderly
sample frame error (when the sample frame includes people that would never be involved)
non-response bias (survey doesn’t account for non-response)

Basement characteristics of 2 groups to be compared not equal
-e.g. experimental group chosen and control are healthy volunteers (voluntary response bias)

How well did you know this?

Not at all

Perfectly

How can measurement bias occur?

Variation in measurements
Different data collectors might vary in method
Instruments not correctly calibrated
Performance bias (e.g. cases more likely to have a knowledge of the disease and symptoms + better previous medical records)
Detection bias (e.g. investigators paying more attention to symptoms of those known to be in case/experimental group)

How well did you know this?

Not at all

Perfectly

How can reporting bias occur?

Citation bias (not citing papers that contradict your argument)
Publication bias (not reporting non-significant results)
Language bias (only reporting English studies)

How well did you know this?

Not at all

Perfectly

What are types of sampling scheme?

Simple random sampling
Systematic sampling
Cluster sampling
Stratified sampling

How well did you know this?

Not at all

Perfectly

Describe the steps of simple random sampling

Define and identify the survey population
Define the sampling frame (all units in a list)
Number each unit
Determine the sampling size
Randomly draw units until the sample size is reached (usually with a random number generator)

How well did you know this?

Not at all

Perfectly

What are the advantages of simple random sampling?

Statistically the optimal method (each unit has an equal likelihood of being chosen)
Sampling error can easily be calculated
Simple to do

How well did you know this?

Not at all

Perfectly

What are the disadvantages of simple random sampling?

Creating a sample frame can be difficult (not always detailed records of population)
Can have logistical challenges if random units chosen are far from each other
Minorities can easily be missed out

How well did you know this?

Not at all

Perfectly

What is the difference between sampling with replacement or without?

Sampling without replacement means that the probabilities of being chosen after each unit is chosen so not equal probability of sampling. However sampling with replacement often makes no sense - e.g. don’t want the same person to fill out the questionnaire twice.

How well did you know this?

Not at all

Perfectly

What are the steps of systematic sampling?

Define and identify the sampling population
Create the sample frame (e.g. population of 10,182)
Arrange the units in a sequence (e.g. alphabetically by surname)
Determine sample size needed (e.g. 320)
Divide total sampling frame by sample size (e.g. 10,192/320 = 32 ish)
Choose a random starting point (between 1 and 32)
Draw units at regular intervals defined in step 5 (every 32nd unit after the first was chosen randomly)

How well did you know this?

Not at all

Perfectly

What are the advantages of systematic sampling?

Study These Flashcards

Ensures representativity
Simple to do
Sampling error easy to determine

What are the disadvantages of systematic sampling?

Study These Flashcards

Creating sample frame can be difficult
If there’s some sort of pattern in the ordered sampling frame then it can lead to a difference in probability of each unit/subgroup of unit being chosen (e.g. if sample frame was ordered male/female and the sample interval was even then the sample would include only 1 gender)

Why would you use cluster sampling?

Study These Flashcards

Because random sampling can be logistically challenging and it can be more practical so cluster the population and sample from representative clusters e.g. schools/community centres

What are the steps of cluster sampling?

Study These Flashcards

List of potential clusters
Create a cumulative list of all the units in all the clusters
Calculate the systematic sampling interval (by dividing cumulative total population by number of clusters wanted)
Choose random number at which to start (between 1 and sampling interval)
Choose each unit at the sampling interval and the cluster that unit is in is the cluster chosen
Continue until the right number of clusters

What is the issue with variability in cluster sampling?

Study These Flashcards

There’s a higher covariance inside clusters, meaning units within clusters are likely to be more similar to one another than to units outside the cluster (e.g. kids from same school likely to be from the same socio-economic group). This gives a high intra-class correlation coefficient. This gives a higher overall sample variance and therefore sample error. Can counteract by increasing sample size but can be inefficient.

What are the advantages of cluster sampling?

Study These Flashcards

More practical when dealing with a dispersed population

- Can be the only way to sample, if you don’t have a sampling frame

What are the disadvantages of cluster sampling?

Study These Flashcards

Co-variance problem (less variability between units within the clusters then outside) - greater covariance within groups. Increases variability and sample error - increased standard error and need a larger sample size.
Fewer clusters are logistically easier but gives more sampling error and a lower sample size
Given the way the clusters are chosen it is important each cluster is the same size so that none are more likely to be chosen

Why might you choose a stratified sampling scheme?

Study These Flashcards

If your population includes minorities at low frequency that your study requires to be represented

How does stratified sampling work?

Study These Flashcards

The sampling frame is divided into homogeneous subgroups (strata) and then the units are chosen from them using random sampling

In stratified sampling, how is it ensured that the same representation of each sub-group in the main population is maintained in the sample?

Usually using probability proportional to size (calculate a sample fraction by dividing sample size by population - sample fraction is % e.g. 22% so take 22% of each subgroup in to the sample). This can mean that the number of units are less than what is required from sample size calculations so can then either increase % for all (might lead to too high of a total sample size) or sample disproportionate to size by removing some from biggest cluster and using more from smallest cluster (leads to smaller group being overrepresented but effect can be corrected after)

What are the advantages of stratified sampling?

- Representation of minority groups - If variability within strata is more heterogeneous than overall population can give better precision (focus on each strata then synthesis results after) - Can have strata within strata but this can increase sample size needed

What are the disadvantages of stratified sampling?

- Can be very difficult to classify strata (not everyone fits to 1 clearly) - Hard to measure standard error - Sample sizes at individual level may be low, meaning high random error and potentially a loss of precision

What should you keep in mind when choosing a sampling scheme?

- Population to be studied (size/geographical distribution) - Availability of sample frame (is there a list of all units?) - Level of precision required - Resources available

Why do you need to calculate sample size?

Too small: - May miss a significant effect (type 2 error) - Estimates of effect too imprecise - Unethical - put patients at risk for no scientific end Too big: - Costly, wasting resources, takes too long to complete research - Unethical - give patients inferior treatment

What do you need to consider when calculating sample size?

- Gives an approximation of sample size (50s, 100s not 53 and 112) - Most assume simple random sampling - Different calculations depending on study - Assume random variation, won't be appropriate if there's systematic bias - Assumes very large populations

What does a sample size calculation for a survey require?

1. Confidence level (z-score) 2. Precision you want (e.g. 10% = 0.1) 3. Proportion

What do you do if you have no estimates for proportion, and why?

Assume 0.5 (50%) because this will give the largest sample size

What is the most appropriate measure to manipulate if you need a smaller sample size? (e.g. if low prevalence)

Precision

What is the power of a study?

The ability to detect a difference between 2 groups

What can increase the power of a study?

Increasing sample size and reducing variation (sampling from a more homogeneous population)

Why is it important to have sufficient power?

Avoids type 2 error (missing a significant effect)

What do you need to calculate sample size of a comparative study?

1. Threshold for a significant result (stops type 1 errors) 2. The power of a study (usually 80-90%) 3. The baseline level measure of interest (usually of control) 4. The minimum effect size you are aiming to detect (e.g. clinical significance)

Sampling Error and Bias Flashcards

(37 cards)