Sampling Error and Bias Flashcards Preview

TROP936 > Sampling Error and Bias > Flashcards

Flashcards in Sampling Error and Bias Deck (37)
Loading flashcards...

Why does increasing sample size reduce standard error?

The law of large numbers. Extreme values have less influence on the average. Kind of diluted.


What are the 2 ways to increase power of a study?

Increase sample size
Reduce variability - sample from a more homogeneous population


What are type I and type II errors?

Type I is where you wrongly reject the null hypothesis - thinking a difference exists when it doesn't in reality.

Type II is where you wrongly accept the null hypothesis - - assuming no difference exists when it does in reality.


What is random error and how is it measures?

The natural variation that occurs through a random sample. Measured by standard error.


How can you reduce the effect of random error?

Increasing sample size


What are the types of systematic error (bias)?

Measurement error, sampling error and reporting error


How can sampling/selection bias occur?

Sample drawn not representative of the population
-undercoverage e.g. online surveys underrepresent elderly
-sample frame error (when the sample frame includes people that would never be involved)
-non-response bias (survey doesn't account for non-response)

Basement characteristics of 2 groups to be compared not equal
-e.g. experimental group chosen and control are healthy volunteers (voluntary response bias)


How can measurement bias occur?

-Variation in measurements
-Different data collectors might vary in method
-Instruments not correctly calibrated
-Performance bias (e.g. cases more likely to have a knowledge of the disease and symptoms + better previous medical records)
-Detection bias (e.g. investigators paying more attention to symptoms of those known to be in case/experimental group)


How can reporting bias occur?

-Citation bias (not citing papers that contradict your argument)
-Publication bias (not reporting non-significant results)
-Language bias (only reporting English studies)


What are types of sampling scheme?

-Simple random sampling
-Systematic sampling
-Cluster sampling
-Stratified sampling


Describe the steps of simple random sampling

1. Define and identify the survey population
2. Define the sampling frame (all units in a list)
3. Number each unit
4. Determine the sampling size
5. Randomly draw units until the sample size is reached (usually with a random number generator)


What are the advantages of simple random sampling?

-Statistically the optimal method (each unit has an equal likelihood of being chosen)
-Sampling error can easily be calculated
-Simple to do


What are the disadvantages of simple random sampling?

-Creating a sample frame can be difficult (not always detailed records of population)
-Can have logistical challenges if random units chosen are far from each other
-Minorities can easily be missed out


What is the difference between sampling with replacement or without?

Sampling without replacement means that the probabilities of being chosen after each unit is chosen so not equal probability of sampling. However sampling with replacement often makes no sense - e.g. don't want the same person to fill out the questionnaire twice.


What are the steps of systematic sampling?

1. Define and identify the sampling population
2. Create the sample frame (e.g. population of 10,182)
3. Arrange the units in a sequence (e.g. alphabetically by surname)
4. Determine sample size needed (e.g. 320)
5. Divide total sampling frame by sample size (e.g. 10,192/320 = 32 ish)
6. Choose a random starting point (between 1 and 32)
7. Draw units at regular intervals defined in step 5 (every 32nd unit after the first was chosen randomly)


What are the advantages of systematic sampling?

-Ensures representativity
-Simple to do
-Sampling error easy to determine


What are the disadvantages of systematic sampling?

-Creating sample frame can be difficult
-If there's some sort of pattern in the ordered sampling frame then it can lead to a difference in probability of each unit/subgroup of unit being chosen (e.g. if sample frame was ordered male/female and the sample interval was even then the sample would include only 1 gender)


Why would you use cluster sampling?

Because random sampling can be logistically challenging and it can be more practical so cluster the population and sample from representative clusters e.g. schools/community centres


What are the steps of cluster sampling?

1. List of potential clusters
2. Create a cumulative list of all the units in all the clusters
3. Calculate the systematic sampling interval (by dividing cumulative total population by number of clusters wanted)
4. Choose random number at which to start (between 1 and sampling interval)
5. Choose each unit at the sampling interval and the cluster that unit is in is the cluster chosen
6. Continue until the right number of clusters


What is the issue with variability in cluster sampling?

There's a higher covariance inside clusters, meaning units within clusters are likely to be more similar to one another than to units outside the cluster (e.g. kids from same school likely to be from the same socio-economic group). This gives a high intra-class correlation coefficient. This gives a higher overall sample variance and therefore sample error. Can counteract by increasing sample size but can be inefficient.


What are the advantages of cluster sampling?

-More practical when dealing with a dispersed population
-Can be the only way to sample, if you don't have a sampling frame


What are the disadvantages of cluster sampling?

-Co-variance problem (less variability between units within the clusters then outside) - greater covariance within groups. Increases variability and sample error - increased standard error and need a larger sample size.
-Fewer clusters are logistically easier but gives more sampling error and a lower sample size
-Given the way the clusters are chosen it is important each cluster is the same size so that none are more likely to be chosen


Why might you choose a stratified sampling scheme?

If your population includes minorities at low frequency that your study requires to be represented


How does stratified sampling work?

The sampling frame is divided into homogeneous subgroups (strata) and then the units are chosen from them using random sampling


In stratified sampling, how is it ensured that the same representation of each sub-group in the main population is maintained in the sample?

Usually using probability proportional to size (calculate a sample fraction by dividing sample size by population - sample fraction is % e.g. 22% so take 22% of each subgroup in to the sample).
This can mean that the number of units are less than what is required from sample size calculations so can then either increase % for all (might lead to too high of a total sample size) or sample disproportionate to size by removing some from biggest cluster and using more from smallest cluster (leads to smaller group being overrepresented but effect can be corrected after)


What are the advantages of stratified sampling?

-Representation of minority groups
-If variability within strata is more heterogeneous than overall population can give better precision (focus on each strata then synthesis results after)
-Can have strata within strata but this can increase sample size needed


What are the disadvantages of stratified sampling?

-Can be very difficult to classify strata (not everyone fits to 1 clearly)
-Hard to measure standard error
-Sample sizes at individual level may be low, meaning high random error and potentially a loss of precision


What should you keep in mind when choosing a sampling scheme?

-Population to be studied (size/geographical distribution)
-Availability of sample frame (is there a list of all units?)
-Level of precision required
-Resources available


Why do you need to calculate sample size?

Too small:
-May miss a significant effect (type 2 error)
-Estimates of effect too imprecise
-Unethical - put patients at risk for no scientific end

Too big:
-Costly, wasting resources, takes too long to complete research
-Unethical - give patients inferior treatment


What do you need to consider when calculating sample size?

-Gives an approximation of sample size (50s, 100s not 53 and 112)
-Most assume simple random sampling
-Different calculations depending on study
-Assume random variation, won't be appropriate if there's systematic bias
-Assumes very large populations


What does a sample size calculation for a survey require?

1. Confidence level (z-score)
2. Precision you want (e.g. 10% = 0.1)
3. Proportion


What do you do if you have no estimates for proportion, and why?

Assume 0.5 (50%) because this will give the largest sample size


What is the most appropriate measure to manipulate if you need a smaller sample size? (e.g. if low prevalence)



What is the power of a study?

The ability to detect a difference between 2 groups


What can increase the power of a study?

Increasing sample size and reducing variation (sampling from a more homogeneous population)


Why is it important to have sufficient power?

Avoids type 2 error (missing a significant effect)


What do you need to calculate sample size of a comparative study?

1. Threshold for a significant result (stops type 1 errors)
2. The power of a study (usually 80-90%)
3. The baseline level measure of interest (usually of control)
4. The minimum effect size you are aiming to detect (e.g. clinical significance)