BIO Statistics Flashcards

(77 cards)

1
Q

Central limit theorem

A

The sampling distribution of the mean of any independent, random variable will be normal, or nearly so, if the size of the sample is large enough.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Gaussian curve: area between u and 1SD, 1 SD and 2 SD, 2 SD and 3 SD, 3SD–> infinity

A

U and 1SD: 34.1%
1SD - 2SD: 13.6%
2SD-3SD: 2.1%
Past 3 SD: 0.1%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Parametric statistics (definition)

A

A class of statistical procedures relying on the assumptions about the shape of the distribution(assume normal), in the population and about the form or parameters (u, SD) of the assumed distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Non parametric statistics (definition)

A

A class of statistical procedures NOT relying on assumptions about the shape or form of the probability distribution from which the data is drawn. `

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Descriptive statistics include

A

Mean, median, mode, range, variance, SD, SE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Range

A

Difference between largest and smallest sample values

Not indicative of the data set’s dispersion

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Variance

A

Average of the square distance of each value from the mean.

Includes negative values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Standard deviation

A

Tells you how tightly each sample is clustered around the mean.

Tight cluster=low SD.

Only under normal distribution.

Shows precision of the calculated mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Standard error

A

Measure of how far the sample mean is from the population mean.

Gets smaller as sample size increases, since the mean of a larger sample is likely to be closer to the population mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Confidence interval (definition)

A

The estimate of the range that is likely to contain the true population mean. Takes into account the size of the population and the scatter of the measurements.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What constitutes reliable data?

A

Precise, accurate, repeatable, reproduce able.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Random error

A

Caused by inherently unpredictable fluctuations on the readings of the measurement apparatus or in the experimenter’s interpretation of instrumental reading.

Can occur in any direction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Systematic error

A

Result of bad science. Predictable, one direction. Caused by imperfect calibration of instruments, imperfect methods.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Alpha

A

Significance level. Probability threshold below which the H0 will be rejected.

0.05 or 0.01 are appropriate.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Type 1 error

A

Incorrect rejection of a true Ho. (False positive)

Say the experiment worked when it didn’t

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Type II error

A

Incorrectly retaining a false Ho. (False negative)

If the true state of the Ho is false and you fail to reject it. Usually an issue with power.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Z Test definition

A

Any statistical test for which the distribution of the test can be approximated by a normal distribution, with n>30.

Assumes pop and sample are normally distributed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What does the value of Z mean in a z test?

A

Z is the chance that the experimental mean would occur by chance, given that the Ho is true. Large Z means that there’s less of a chance this is true.

Z score of 2.5 means that the sample mean is 2.5 SD away from the population mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

T test is used when (general)

A

You have a normal distribution in the population and the sample, and have n

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

P value– what do large and small p mean

A

Large p indicates weak evidence against the Ho. Need to accept.

Small p indicates strong evidence against the Ho, reject.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

One tailed t test

A

To test if the experimental mean is significantly greater than the population mean, or significantly less than, but not both.

Making the assumption about the data makes this less robust

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Two tailed t test.

A

Testing if the exp. mean is significantly greater than and significantly less than pop mean.

More robust because using a smaller area on each side of the distribution (2.5% on each)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Paired t test

A

The observed data are from the same subject, twins, or otherwise matched subject and are drawn from a population with a normal distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Unpaired t test

A

Observed data are from two independent, random samples from a. Population with a normal distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
ANOVA
Compares 3 or more means. Measures the sum of squares to understand the variance. ANOVA tells you whether any of the earns have a difference between each other, taking scatter and variability into consideration.
26
One way ANOVA
One measurement variable and one nominal variable is explored. All the groups are independent, and only one thing is being measured in each group. There is theoretically a normal distribution within each group.
27
Two way ANOVA
1 measurement variable and 2 nominal variables. There are two factors being measured within each group that effect the outcome. Ex: how 3 different drugs affect subjects - both men and women. Drug response and gender are the two factors.
28
Post hoc tests
In follow up to the ANOVA. Used when ANOVA rejects Ho. Tests whether the group means differ significantly, correcting for multiple comparisons.
29
Mann Whitney U test
For independent measures with 2 groups. It's a non-parametric two sample t test. Ranks measurements from highest to lowest values, separating the groups-- U from each sample set. Lowest U is compared to the table. If Uexp
30
Correlation
The extent to which two variables have a linear relationship with each other.
31
Pearson correlation Coefficient
The certainty of when you know X will predict y. How well do the variables correlate.
32
Linear regression
Used to adjust the values of the slope and intercept to find the lie that best predicts y from X based on the data. Assumes that data are linear. They may not be.
33
Categorical data
No mean, median, mode, or normal distribution. Dead or alive, diabetes or no diabetes. May be inherent in the data or made from continuous data. May be more meaningful clinically
34
Chi square- what it is used for
It is the appropriate statistic for measuring relationships between categorical data in a contingency table. Compares experimental outcomes to expected outcomes to see if there is a significant difference.
35
Assumptions made by a chi square test
Data are frequency data Adequate sample size Measures are independent of each other (a patient only goes in one box).
36
When to use a Chi Square (check list)
Categorical data Not normally distributed No assumption that data will be normal.
37
Experimental research design includes these 3 things
Independent variables manipulated, extraneous factor are controlled, random assignments into groups.
38
Run-in experiment
Precedes the randomized control trial. A period of time where subjects are put on the control regimen to see if they will continue with the study and comply. If not then they will be removed before the real study starts.
39
Healthy user bias
Sample is more healthy, or medically fluent than the average population.
40
Berkson's bias
Sample selected from an impaired or diseased group, like hospital patients. Clearly doesn't reflect the regular population
41
Exclusion bias
Excluding subjects based on potential extraneous factors. | Excluding reduces generalizability
42
Selection bias
Bias in placing sample subjects into treatment or control arms. (Hand picking). Leads to non-equivalent groups, which builds inherent biases.
43
Investigator bias
Where the investigators are aware of which subjects are in each group and this influences how they work with the subject or record results
44
Hawthorne effect
Subjects will change their behavior in a study, effecting internal and external validity. Usually done to gain approval of/please investigators.
45
Incidence (def)
The number of new cases of disease arising during a given period of time. Also "absolute risk". (Number of people with disease)/(total number of people)
46
Relative risk
Incidence in exposed population/incidence in unexposed population.
47
Cohort study
A cohort of people who have something in common when they are first assembled are observed to see what happens to them Not random, the cohort subjects have a relationship. Goal: to study predictor variables and associated outcomes
48
Case-control studies
Looking backward to compare people with and without a condition-- trying to determine risk factors for disease or outcome. Good for long latency, or rare disease.
49
Recall bias
People may not remember the exposure or details about it, and is not in medical record.
50
Equation for Variance
SUM [(Mean of data - Mean sample) ^2] / (N-1)
51
Equation for Standard Deviation
Square root of the variance. SQRT: SUM [(Mean sample - Mean pop)^2]/(N-1)
52
Grubb's test
For outliers Z=(Mean - outlier)/SD
53
Effect on required N: increased variability
Increased N
54
Effect on required N: greater differences between groups
Lower N required
55
Effect on required N: smaller alpha
Increase N
56
Effect on required N: decrease Power
Decrease N
57
R^2 correlation
-1+1 0 means no correlation
58
Odds ratio- values?
OR1 increased odds that the exposure is associated with the case
59
Risk factor definition
Characteristic or factor that increases a person's risk of disease. Can be inherited, environmental, socioeconomic, behavioral.
60
Chemical agents
Workplace exposure to chemicals, etc
61
Physical agents
Radioactivity in your state, noise, vibration
62
Biologic agents
Infectious agents (like bacteria, virus), allergens
63
Psychosocial agents
Stress, trauma/ptsd, depression
64
Mechanical agents
Repetitive motion jobs/hobbies (typing), heavy lifting,
65
Lifestyle risk factors
Drugs, alcohol, unsafe sex, sun exposure
66
Framingham calculator
Risk assessment tool for 10 year risk of having a heart attack based on risk factos.
67
Absolute risk
The probability of an event in a population under study. Same as incidence.
68
Attributable Rsik
Absolute risk, or incidence, of a disease in exposed persons, minus the absolute risk from non-exposed persons. Risk attributed to an exposure
69
Relative risk
Compares the probability of an event occurring in the exposed group vs the non-exposed group.
70
Relative Risk Reduction RRR
By how much the treatment reduced the risk of disease outcomes, relative to the control group who did not receive treatment.
71
Absolute Risk Reduction ARR
The most useful. Shows difference in risk comparing treated vs non treated. Expressed as NNT
72
NNT
Number needed to treat. The number of patients you need to treat before seeing a benefit of the intervention 1/(ARR%)
73
High sensitivity of a diagnostic test
Probability of testing positive given the patient has a disease. Small false negative, high false positive (in a normal test)
74
High specificity of a diagnostic test
Probability of testing negative given patient does not have disease. Low false positive rate. High false negative.
75
Prevalence of a diagnostic test
The proportion of people possessing a clinical condition or outcome at a given point in time. The probability of disease before test result is known.
76
Positive predictive value of a diagnostic test
Probability of having disease, given a positive test result.
77
Negative predictive value of a diagnostic test
Probability of not having a disease given a negative test result.