Exam 1: Biostatistics Flashcards

(99 cards)

1
Q

Descriptive Statistics

  1. Involves
  2. Purpose
A

Involves: Collecting, Presenting, and Characterizign Data

Purpose: Describe Data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Inferential Statistics

  1. Involves
  2. Purpose
A

Involves: Estimation and Hypothesis Testing

Purpose: Make decisions about population characteristics

***Allows us to describe a population based on a sample***

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Variable

A

A symbol of an event, act, characteristic, trait, or attribute that can be measured and to which we can assign some values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Categorical Variable

A

Consists of some numeric or character codes that represent either:

  1. The presence or absence of something that is of research interest
  2. The relative weight or rank of the thing that is of research interest
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Quantitative Variable

A

Variable that holds the numerical result of some measurement

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Process

A

Series of actions or operations that transforms inputs to outputs; generates output over time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Characteristics of Variables: Nominal Scale

A

Simplest level of measurement - categories without order

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Characteristics of Variables: Ordinal Scale

A

Nominal variables with an inherent order among the categories

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Characteristics of Variables: Interval Scale

A

Measruable difference or interval or distance between observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Characteristics of Variables: Ratio

A

Same as interval but with an absolute reference point (such as “0”)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Data Presentation: Qualitative Data

A

Summary Table –> Either a Bar Graph or Pie Chart

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Data Presentation: Quantitative Data

A

Dot Plot, Stem and Leaf Display,

or

Frequency Distribution –> Histogram

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Class

A

One of the categories into which qualitative data can be classified

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Class Frequency

A

Number of observations in the data set falling into a particular class

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Class Relative Frequency

A

Class frequency divided by the total numbers of observations in the data set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Class Percentage

A

The class relative frequency multipled by 100

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Bar Graph

A

Classes (Bars) have heights equivalent to class frequency, class relative frequency, or class percentage

(Unlike Histogram –> just class frequency and class relative frequency, bars are touching)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Pie Chart

A

Classes are in slices proportional to the class relative frequency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Central Tendency

A

Tendency to cluster/center about certain numerical values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Variability

A

Spread of the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What symbols represents Sample/Population Mean and Size?

A

X bar should be lower case x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Which is used for both quantiative and qualitative data, mean, median, or mode?

Which is not effected by extreme values?

A

Mode

Median and Mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Summary of Mean, Median, and Mode

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Variance and Standard Deviation

A
  1. Measures of dispersion ***More reliable than Range***
  2. Most common measures
  3. Consider how data are distributed (unlike Range)
  4. Show variation about mean
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
What does **Normal Distribution** mean?
Mean = Median = Mode
26
What does the **mean** equal in the standard normal curve and what is the first standard deviation?
**Mean = 0** First SD is +/- 1
27
**Standard Notation** (Sample vs. Population) 1. Mean 2. Standard Deviation 3. Variance 4. Size
28
When do you use **n-1** vs. **n** in the denominator of the Variance Formula?
**n-1** = Sample Variance **n** = Population Variance
29
Shape of Curve: Mean vs. Median ## Footnote **1. Left-Skewed**
Left Skewed Mean \< Median
30
Shape of Curve: Mean vs. Median 1. Right-Skewed
Right-Skewed Mean \> Median
31
**The Empirical Rule** 1. Applies to 2. What percentage of the measurements lie within 1, 2, and 3 SDs of the mean? What are their **Z-scores**?
**Applies to**: Data sets that are **mound-shaped** and **symmetric** (i.e. **_Normal Distributions_**) **68%** of measurements lie within **one** SD of the mean (x-s to x+s) **z-score = b/w -1 and 1** _95%_ of measurements lie within _two_ SDs of the mean (x-2s to x+2s) **z-score = b/w -2 and 2** *99.7%* of measurements lie within *three* SDs of the mean (x-3s to x+3s) **z-score = b/w -3 and 3**
32
If you scored in the 58th percentile, what percentage of test takers scored **lower**/**higher** than you?
**Lower:** 58% **Higher:** 42%
33
Numerical Measures of Relative Standing: **Z-Scores** 1. Describes... 2. Measures...
Describes the relative location of a measurement compared to the rest of the data Measures the **number of standard deviations** away from the mean a data value is located
34
What is the **Frequentist** definition of **Probability?**
If an experiment is repeated **n times** under identical conditions and if the event **A** occurs **m** times, then as **n** grows, the ratio of m/n approaches a fixed limit called the probability of **A** P(A) = m/n **"Law of Large Numbers"**
35
**Probability** Equation
Frequency of times an outcome occurs **divided by** the total number of possible outcomes (symbolized as *p*)
36
**Random Event**
Any event where the outcomes observed in that event involves uncertainty or the outcome can vary (predicted by **Probability**)
37
When is probability unnecessary to calculate?
For a **fixed event**
38
**An Event** (Two Definitions)
1. An occurrence due to nature 2. A collection of one or more outcomes of an experiment
39
**Simple vs Compound Probabilities**
Simple = Single occurrence Compound = Result of operations -Define relationships between or combination of event occurrences
40
What are the **three operations** that can be used to create **compound events**?
1. Intersection 2. Union 3. Complement
41
**Intersection**
The intersection is defined as **"both A and B"** Represented by **A Π B**
42
**Union**
Union is defined as **"either A or B or both A and B"** **A Ü B**
43
**Complement​**
Defined as **"Not A"** Denoted by **AC or -A**
44
**The Additive Law: _Special_ Rule of Addition**
Two events A and B that **cannot** occur simultaneously are said to be **mutually exclusive or disjoint** e.g. The probability of a newborn weightin under 2000 grams is 0.025 and over is 0.043 \*\*\*simply would **add the probabilities of the individual events**\*\*\*
45
**The Additive Law: _General_ Rule of Addition**
This is used when there is a **common region**; must **_subtract out common region_**
46
Two-Way Table Example
47
Probabilities Example
48
**Independent Events**
Two **unrelated** events \*\*\*When expressing the joint probabilit of independent events, the general rule of **multiplication** _does not hold_
49
**The Special Rule of Multiplication**
e.g. tossing a coin Second toss has nothing to do with the first
50
Questions about **Mutual Exclusiveness...** 1. If events are **mutually exclusive...** 2. If events are **_not_** mutually exclusive...
Use **"or"** and the **additive rule** 1. ME: add them all up 2. Not ME: Subtract out **common region**
51
Questions about **indpendence...** 1. Independent events... 2. Not independent...
Use **"and"** and the **multiplication rule** 1. Multiply them all together 2. P (A and B) = P(A|B) x P(B) P (B and A) = P(B|A) x P(A)
52
**Bayes' Theorem** 1. When is it used 2. P(A) vs P(B|A) 3. Importance
1. When **multiplicative** events are **not independent** 2. P(A) = prior probability (**_known before_** calculation) P(B|A) = posterior probability (only **_known after_** calculation) 3. Helps investigators determine the other pertinent probability when only one is known
53
How do you figure out the **population mean?**
You can use a sample and will be **very close**
54
**Unbiased vs. Biased Estimates**
**Unbiased:** if the sampling distributino of a sample statistic has a mean **equal to** the population paramater that the statistic is intended to estimate **Biased:** if the mean of the sampling distribution is **not equal to** the parameter
55
**Central Limit Theorem**
As sample size gets **large enough**, the **sampling distribution** becomes _almost normal_ \*\*\***Justifies Inferential Statistics**\*\*\*
56
Confidence Interval for a Population Mean: Normal (z) Statistic 1. Finds what?
Finds the **range** over which the population parameter **MIGHT** be found \*\*\*A **range of plausible values** for the **population parameter**\*\*\*
57
What does a **95% Confidence Level** indicate?
In the long run, 95% of our confidence intervals will contain **u** (the **population mean**) and 5% will not
58
What are **2 conditions** required for a Valid Confidence Interval for u?
1. A **Random Sample** is selected from the target population 2. The sample size **n** is **LARGE** - Due to the **Central Limit Theorem** this condition guarantees that the sampling distribution of x(bar) is approximately normal Also, for large n, s will be a good estimator of o- (population standard deviation)
59
**Student's t-statistic**
Has a sampling distribution very much like that of the **z-statistic** (mound shaped, symmetric, with mean 0) \*\*\*Primary difference is that t-statistic is more variable than z-statistic\*\*\*
60
**Degrees of Freedom (df)**
Actual amount of **variability** in the sampling distribution of **t** depends on the **sample size, n** T-statistic has **(n-1)** degrees of freedom
61
What happens as **Degrees of Freedom (df)** go _down_?
The t-distribution **flattens out**
62
**Sampling Error**
A way of expressing the **reliability** associated with a confidence interval for the population mean, u **Sampling Error (SE)** is equal to **half-width** of the **confidence interval**
63
What is a **Hypothesis**?
A statment about the **numerical value** of a _population parameter_
64
**Null Hypothesis (H0)**
The hypothesis that will be accepted unless the data provide convincing evidence that it is false. This usually represents the **"status quo"** or some claim about the population parameter that the researcher wants to test
65
**Alternative Hypothesis (Ha)**
The hypothesis that will be accepted only if the data provide convincing evidence of its truth. This usually represents the values of a population parameter for which **the researcher _wants to gather evidence to support_** \*\*\***Opposite** of the null hypothesis\*\*\*
66
When do we use **Hypothesis Testing**?
1. **_Observational Studies_** - Find the **"true" population parameter** (e. g. what is the prevalence of AIDs in some community) **\*\*\*1 sample\*\*\*** 2. **_Clinical Trials_** - Compare Group 1 to Group 2 or - Compare Baseline state to post-intervention state **\*\*\*2 sample tests - _Independent Samples_\*\*\***
67
**Test Statistic**
A sample statistic, computed from information provided in the sample, that the researcher uses to **_decide between_** the null and alternative hypotheses
68
**Type I Error**
Occurs if the researcher reject the null hypotehsis in favor of the alternative when, in fact, the **null hypothesis is _true_**. The probabilit of committing a Type I error is denoted by **a (alpha)** \*\*\*The level of a is usually small and is referred to as the **level of significance** of the test\*\*\*
69
**Rejection Region**
The set of possible values of the test statistic for which the researcher will reject **H0** in favor of **Ha**
70
**Type II Error**
Occurs if the researcher **accepts the null hypothesis** when, in fact, it is **false**. Probabiility of committing a Type II error is denoted by **B (beta)**
71
How do you identify the null hypothesis?
It will always have an **equality sign**
72
What is a **p-value**?
The **observed significance level** for a specific statistical test is the probability (assuming H0 is true) of observing a value of the test statistic that is at least as contradictory to the null hypothesis, and supportive of the alternative hypothesis, as the actual one computed from the sample data ## Footnote **\*\*\*Used to make _rejection decision_\*\*\***
73
What does a p-value \> or = to a mean?
**DO NOT** reject H0
74
What does a p-value \< a mean?
**REJECT** H0
75
Where is the **Confidence** in hypothesis testing?
Confidence is in the **testing process, _NOT_** in the particular result of a single test
76
Strength of Correlation
Reflects how consistently scores for each factor change
77
Regression Line
The **best fitting straight line** to a set of data points. A best fitting line is the line that minimizes the distance of all data points that fall from it
78
Numerical Measure of Correlation: **Pearson Correlation Coefficient**
**The Pearson (product moment) correlation coefficient (r)-** used to measure the _direction_ and _strength_ of the linear relationship of two factors in which the data for both factors are measured on an interval or ratio scale of measurement Numerator --\> **Covariance** (extent to which X and Y axis vary _together_) Denominator --\> "" _independently_ or _separately_
79
Regression Analysis
Statistical procedure used to determine the equation of a regression line to a set of data points and to determine the **extent to which the regression equation can be used to predict values of one variable**, given known values of a second factor in a population - One quantitative dependent variable - One or more quantitative or qualitative (**binary**) variables
80
Regression Analysis --\> Logistic Regression --\>
Regression Analysis (**Quant**itative DV) Logistic Regression (**Qual**itative DV) -Yes or no, Male or Female
81
What do rows and columns represent in a data table?
Rows = **Cases** Columns = **Variables**
82
What type of data do proportions summarize?
**Nominal** and **Ordinal** | (i.e. **Qual**itative data)
83
How are rates different from proportions?
They are similar to proportions EXCEPT a **multiplier (e.g. 100, etc.) is used** \*\*\*They have a **time reference** - are computed over a known/given period of time\*\*\*
84
Vital Statistics Rates
Also known as **demographic measures** \*\*\*Describe the **health status of a population**\*\*\* e.g. Mortablity Rates (Crude, Specific) and Morbidity Rates
85
Crude Mortality Rate
Number of **all deaths** in a given geography over a given year **divided by** the total population of the geography durnig the same year
86
**Specific Mortality**
Relates to **specific populations** within the geographic region
87
What is Morbidity Rate also known as?
**Prevalence** or **Prevalence Rate**
88
Incidence
The number of **new cases** that have occurred during a given interval of time **divided by** the total population at risk
89
What are **Adjusting Rates** used for?
To make a **fair** comparison between different populations and to **avoid _Confounding_**
90
Examples of Confounding Factors
Age composition, Gender composition, Race/ethnic composition of a population
91
Absolute Risk Reduction (ARR)
The **reduction in risk** (by the experiment) compared with the **baseline risk**
92
Number Needed to Treat (NNT)
The number needed to treat in order to **prevent _one_ event**
93
What is the reciprocoal of the absolute value of NNT?
Absolute Risk Increase or the **Number Needed to Harm**
94
Relative Risk Reduction (RRR)
The amount of risk reductuion relative to the baseline risk
95
Relative Risk What types of studies is it mainly used in?
The ratio of the **incidence of a disease** **in people who are exposed** to a risk to the incidence of **people without exposure to risk** Mainly used in **cohort studies** (Prospective)
96
Odds Ratio What type of study is this used in?
The odds that a person with the disease is exposed to a potential cause for the disease relative to the odds of a person without the disease is expose to the potential cause Mainly used in a **case/control study** (Retrospective)
97
What does a RR or OR \<1, \>1, or =1 mean?
\< 1 = **Protective** exposure \> 1 = **Risky** exposure = 1 = **No effect**
98
Inference (on RR and OR)
Inference is possible using the **normal distribution** RR and OR distributions do not follow the theoretical probability distribution The distribution of the **natural log** of RR and OR **do** follow **normal distribution** \*\*\*Need to **transform** to generate inferential statistics\*\*\*
99
When can you **reject** the **null hypothesis**?
When the p value involves **less error** than you were willing to commit (the **significance level, a**) p-value of **0.03** significance level of **0.05** **\*\*\*\*Can reject** the null hypothesis in this case