exam Flashcards
(26 cards)
What is an association?
Two variables have an association if change in values of one variable coincide with a pattern of change in the other variable.
What is a positive association?
As a person’s height increases, so does their weight.
What is a negative association?
As water temperature increases, dissolved oxygen concentration decreases.
What data is needed for studying associations?
One sample of randomly chosen individuals from the population of interest, measuring two variables of interest (X and Y).
What does each row in the data for studying associations represent?
Each row (case) contains the measurements for X and Y made on a single individual.
What is correlation analysis?
A descriptive method to determine if an association is positive or negative and if it is strong or weak.
What is regression analysis?
A predictive method to determine direction and strength of association and to predict values of Y given values of X.
What does correlation indicate?
Correlation indicates a statistical pattern of co-variation among two variables.
What is the difference between correlation and causation?
Correlation does not imply causation; correlation is an observation while causation is an inference.
What are the criteria for inferring cause-effect from associations?
- Association documented at multiple sites by multiple studies
- Association present when effects of other lurking variables eliminated
- Plausible cause-effect mechanism that explains co-variation
What is a lurking variable?
A variable that is not included in the analysis but affects the relationship between the studied variables.
What is an example of a plausible cause-effect mechanism?
Contraception increases time between births, allowing infants to nurse longer and delay exposure to waterborne diseases.
Is statistical correlation sufficient to conclude a cause-effect relationship?
No, statistical correlation alone is not sufficient to conclude a cause-effect relationship.
What is the purpose of regression analysis?
To predict values of Y based on data for X and to define associations between variables.
What are the uses of regression analysis?
- To predict future values of Y based on historical data
- To inform policies or regulations based on statistical associations
What are the assumptions of linear regression?
- Linearity
- Statistical independence
- Homoscedasticity
- Normality
What does homoscedasticity refer to?
Constant variance of residuals across predictors.
What happens if the assumptions of linear regression are violated?
Predictions, confidence intervals, and insights might be inefficient, biased, or misleading.
What is the least-squares regression line?
A line fit through the middle of data points in a scatterplot that minimizes the sum of squared vertical differences between observed and predicted values.
What is the purpose of the least-squares regression line?
To predict Y-values based on X-values.
How is the slope of the least-squares regression line calculated?
Using the formula: ( b_1 = rac{sum (x_i - ar{x})(y_i - ar{y})}{sum (x_i - ar{x})^2} )
What is the Y-intercept in the least-squares regression line?
Calculated using the formula: ( b = ar{y} - b_1 ar{x} )
What is the Durbin Watson test used for?
To test for autocorrelation in residuals.
What is the formula for linear regression in R?
lm(response ~ predictor, dataset)