# Lecture 8 and 9 (Correlation, Regression, CIs) Flashcards

1
Q

The dependent variable is on what axis?

A

Y-axis

2
Q

The independent variable is on what axis?

A

X-axis

3
Q

When should you use a correlation analysis?

A
• examine relationship between variables
• estimate strength of association between variables
• when independent and dependent variables are not clearly different
• when regression requirements not met
4
Q

A correlation coefficient of 0 means:

A
• there is no association between the two variables
5
Q

A regression is:

A
• how well data fits a line
• r-value close to 0 = no correlation
• r-value closer to 1 or -1 = high correlation
• r-squared tells you the amount of variation in Y that is contributed by variation in X.
6
Q

When should you use regression analysis?

A
• look for a trend in data between variables
• more than one X (independent) variable = multiple regression
• predict a dependent variable
• curve fitting (pharmacokinetics)
• calibration and laboratory assays
• detect patterns in microarray data
7
Q

Regression r-value close to 0:

A

no association

8
Q

Regression r-value close to 1:

A

strong association

9
Q

Regression r-squared value tells you:

A
• the amount of variation in Y that is contributed by variation in X.
10
Q

Parametric test characteristics:

A
• assume variables are normally distributed with equal variances
• dependent on mean and variance
• susceptible to outliers
• requires continuous variables
11
Q

Non-parametric test characteristics:

A
• based on ranks
• distribution, variance, mean does not matter
12
Q

You can transform non-linear data to linear data by:

A
• taking logs
13
Q

Three ways you can control for outliers:

A
1. using non-parametric test
2. dropping the outlier(s)
3. log transformation
14
Q

Multivariate regression:

A
• more than one X (independent) variable
• controls for variable interactions by multiplying variables together
15
Q

Stepwise regression:

A
• finds the top contributing variable, then the second, then the third, etc. until a point of diminishing returns is reached.
• a.k.a finds the group of variables that has the largest collective r-squared value.
16
Q

Multiple logistic regression:

A
• a multivariate analysis
• useful when outcome is dichotomous
• provides a direct estimate of the ODDS RATIO for each independent variable
17
Q

When the distribution of your data is not normal, what type of test should you use?

A

non-parametric

18
Q

If you are analyzing more than one type of independent (X) variable, what type of analysis should you use?

A

multivariate regression

19
Q

Principal Component Analysis (PCA):

A
• takes many variables and reduces them by regression
• gives you groups of variables that best explain variation
20
Q

Risk factors are modfiable through:

A

primary prevention

21
Q

Prognostic factors are modifiable through:

A

secondary prevention

22
Q

Common prognosis endpoints:

A
1. case fatality (patients with disease who die of it)
2. disease-specific mortality (people per 10,000 who are dying of specific disease)
3. response
4. remission
5. recurrence
23
Q

Equipoise:

A
• a genuine lack of consensus in the medical community about a treatment or prognosis, and how to treat.
• allows for RCTs
24
Q

Kaplan-Meier Analysis:

A
• most widely used survival analysis:
• a graph of time to event
• every horizontal segment is a time period
• every vertical drop is an event (death) or a dropout
• larger the sample size, smoother the curve
25
Q

Kaplan-Meier analysis truncation:

A

when a patient enters the study after it has already started

26
Q

Kaplan-Meier analysis censoring:

A

when a patient drops out of a study after it has started

27
Q

Can a Kaplan-Meier analysis handle covariates?

A

No.

• use a Cox regression for this
28
Q

Cox regression:

A
• multivariate survival analysis
• can control for other factors
• calculates hazard ratio (same as relative risk)
29
Q

Equipoise allows for:

A
• randomized control trials to occur
• equipoise = uncertainty in the medical community
30
Q

Variance =

A

measure of the spread/dispersion of values around the mean.

31
Q

Standard deviation =

A

√v; (v = variance)

• decreases as sample size increases
32
Q

Standard error of the mean (SEM) =

A

SD/ √n

33
Q

Central limit theorem posits:

A
• larger the sample size, the closer the study mean is to the population mean
• i.e. narrower confidence interval
34
Q

Interquartile range:

A
• IQR contains 50% of the observations
• (from the 25th - 75th percentile)
35
Q

Confidence intervals describe:

A
• the uncertainty that surrounds a particular observation
• larger the sample size, narrower the CI = MORE PRECISE STUDY
36
Q

Equation for 95% CI:

A

95% CI = mean +/- 1.96(SD/ √n)

• SD = standard deviation
• n = sample size
37
Q

For correlation analyses, the confidence interval cannot contain:

A

0

0 = no correlation

38
Q

For relative risk, hazard ratios, and odds ratios, the confidence interval cannot contain:

A

1