# Feldmand. Module 5 - Sample size, power & tests Flashcards

Effect of an inadequate Sample Size

- If a study has an inadequate sample size, then a result with a null finding (no statistically significant association detected) is uninformative
- A true lack of association is difficult or impossible to distinguish from a true association that cannot be detected statistically because of inadequate power

Type I error

[2x2 table]

– α is the false-positive error rate, the probability of making a Type I error (α is the level of p-value at which you reject H0, often 0.05)

– Even if H0 is true, in repeated samples, we will reject H0 a proportion α of the time (we detect a difference when one doesn’t exist)

H0: you’re not pregnant. (rejected)

H1: YOU’RE PREGNANT — accepted

Type II error

– β is the false-negative error rate, the probability of making a Type II error

– Traditionally, β = 0.20. Thus, there is a 20% chance of failing to reject H0 when the alternative is true.

H0: YOU’RE NOT PREGNANT. — accepted

H1: you’re pregnant

Power

- The power of a test (1–β) is the probability of rejecting H0 when HA is true, i.e., detecting a difference when one really exists!
- We want power to be as large as possible

Trade-off Between α and β

- Both types of error should ideally be minimized
- However, a decrease in one type of error is achieved at the expense of an increase in the other
- For a given α and magnitude of effect (RR or OR), β can be reduced only by increasing the sample size

Sample Size Considerations

• Should be done at start of study to ensure enough power based on different calculations depending on whether estimates for a categorical or a continuous variable are to be calculated.

The factors to take into consideration include:

- accuracy required,
- sampling method to be used,
- size of the smallest subgroup
- actual variability of the variable of interest in the population.

• To calculate need to know:

– Desired values for the probabilities of α and β

– Baseline (nondiseased or nonexposed) exposure or outcome rates

– Expected magnitude of effect (RR or OR): • Often based on previous studies or reports • The minimum effect the investigator considers worth detecting

What is precision (d)?

This parameter is the distance of the sample estimate in either direction from the true population proportion considered acceptable by the investigator

Sample size formula for Estimation of level of disease occurrence

n = zα^2 * [ P(1-P) / d^2]

If the sample size n is greater than 10% of the total population size, correction:

new n = 1 / [1/n* + 1/N]

n* = n obtained above, N = population size

[assuming a 95% CI (z=1.96)]

Sample size formula to detect disease

a: Finite populations:

n = [ 1 - ( 1 - β)^(1/d) ] [ (N - d/2) + 1/2 ]

(β = confidence level (as proportion) -> probability of observing at least one diseased, if prevalence is d/N; N = population size, n=sample size; d = number of diseased)

b: Infinite populations (> 1000):

n = [ log(1 - β) ] / [ log( 1 - (d / N) )]

(n=sample size, β = level of confidence, d= number of

diseased, N=population size)

Sample size calculation for: Probability of not detecting disease

In the case of importation of animals, it may be necessary to quantify the probability of failure to detect any positives in a sample from an infinite population. The assumption for the formula is that population size is infinite and prevalence (prev).

p = ( 1 - prev )^n

(p = probability of failure to detect positives; n = sample size, prev=prevalence)

Sample size for estimation of continuous- type outcome variable

n = [ (zα - zβ) * sd / L ]^2

zα = 1.96 if P=0.05, zβ = 1.28 if power = 0.90; zβ = 0 if power = 1 L = how accurate estimate is supposed to be expressed in units of parameter of interest

Test Accuracy

- How good is the test at identifying individuals with and without the disease?”
- The sensitivity of the test is the likelihood of a positive test among those with the disease (How good is the test at identifying individuals WITH the disease?)
- The specificity of the test is the likelihood of a negative test among those without the disease (How good is the test at identifying individuals WITHOUT the disease?)

Test Accuracy

• To calculate the sensitivity and specificity, we must know the truth in the population from another source, a gold standard

– May be another test result that has been in use, and sometimes it is the result of a more definitive and often more invasive test

• There is an inverse association between sensitivity and specificity, and therefore one must trade one for the other

Predictive Value

• Predictive value positive (PVP)

– If the test results are positive for this patient, what is the probability that this patient has the disease?

• Predictive value negative (PVN)

– If the test result is negative, what is the probability that this patient does not have disease?

How do test characteristics vary?

• Sensitivity and specificity are test characteristics and do not vary

• Predictive value, however, affected by:

– Prevalence of disease: • Low prevalence of disease results in low predictive value • Test results must be interpreted in the context of the disease prevalence in the population • Most productive and efficient to use test in “high prevalence” populations, e.g., high risk

– Specificity of test: The higher the specificity the higher the predictive value