# Feldmand. Module 2 – Data Types & Applied Statistics Flashcards

1
Q

Independent & Dependent Variables

A

• Independent variables
– The characteristic being observed or measured that is hypothesized to influence an event of manifestation. E.g., Risk factors

• Dependent variables
– The value of which is dependent on the effect of other variable(s). A manifestation or outcome whose variation we seek to explain or account for by the influence of independent variables. E.g., Disease outcome

2
Q

Continuous vs. Discrete Data

A

• Continuous: Quantitative with potentially infinite number of values along continuum. Can be measured to as many decimal places as measuring instrument allows. E.g., Weight, height

• Discrete:
– Count – quantitative data that can be arranged into discrete, naturally occurring or arbitrarily selected groups or sets of values, e.g., pulse rate
– Categorical
-> Nominal–qualitative, named category; the order of the categories is irrelevant to statistical analyses e.g., gender, reproductive status
-> Ordinal–ordered categories, qualitative e.g., disease staging in cancer, education level

3
Q

Descriptive vs. Inferential statistics

A

• Descriptive statistics
– Communicate results without attempting to generalize
– Important first step in epidemiologic studies

• Inferential statistics
– Used to infer the likelihood that the observed results can be generalized to other samples of individuals

4
Q

Measures of central tendency

A
```• Mean
– The average, determined by adding all values and
dividing by total number of subjects
• Mode
– The most common value in the data
• Median
– Value in dataset where 1⁄2 subjects are smaller and 1⁄2 are larger.
List data in ascending order
Find the median location as (n+1) / 2```
5
Q

Measures of Dispersion (Variation)

A

• Need to be able to measure the extent to which individual values differ from mean:

• Range: The difference between the highest and lowest values
• Variance: Average squared deviation of each value from the mean
Σ(Individual value – mean value)^2 / (n - 1)
Because variance is reported in squared units, take square root of the variance and report standard deviation
• Standard deviation (SD): Average measure of how individual values differ from the mean
– The smaller the SD, the less each score varies from the mean
– The larger the spread of scores, the larger the SD. SD = √ Σ(Individual value – mean value)^2
/ (n - 1)

• When reporting estimates of central tendency, report measure of dispersion, e.g., mean ± SD

6
Q

Inference & Assessing the Role of Chance

A
• A principal assumption underlying use of measures of disease frequency is that we can make inferences to the population based on a sample
• Because of random variation from sample to sample, the observed results will probably reflect the play of chance
7
Q

How can we quantify the degree to which chance variability may account for the results observed in any individual study

A

– By performing appropriate test of statistical significance and determining the p-value

8
Q

How to determine likelihood that sampling variability (chance) explains the observed results?

A

Hypothesis Testing
• Performing a test of statistical significance to determine likelihood that sampling variability (chance) explains the observed results

• Make explicit statement of hypothesis to be tested:
– Null hypothesis (H0): Always the hypothesis of no difference. The assertion that there is no association between exposure and disease, e.g., RR = 1, OR = 1
– Alternative hypothesis (H1 or HA): The assertion that there is some association between exposure and disease, e.g., RR ≠ 1, OR ≠ 1

9
Q

The Appropriate Test of Statistical Significance

A

• Will vary by study design, data type and situation
• Generates a test statistic that is a function of:
– The difference between observed values in the study and expected values if null hypothesis were true, and
– The variability in the sample
• Will lead to a probability statement (p-value)

10
Q

p-value

A
• Probability that an effect at least as extreme as that observed in a particular study could have occurred by chance alone, given H0 is true
• The larger the test statistic, the lower the p-value
• Convention in medical research is when p ≤ 0.05, then association between exposure and disease is statistically significant; i.e. There is no more than a 5% (1 in 20) probability of observing results as extreme as that observed due solely to chance
• If p > 0.05, then chance cannot be excluded as a likely explanation
11
Q

t Test

A

• Parametric test for differences between means of independent samples
– Continuous data
- H0: mean1 = mean2
- HA: mean1 ≠ mean2

12
Q

Chi-square test

A

• Test whether observed differences in proportions between study groups are statistically significant
– I.e., Whether there is an association between exposure and outcome
– Categorical data

H0: proportions are equal; no association
HA: proportions are different; there is an association