correlation Flashcards
(14 cards)
What does Pearson’s correlation coefficient (r) measure?
Range:
−1 (perfect negative) to
+1 (perfect positive).
Interpretation:
r=0: No linear relationship.
r=0.868 (e.g., LLL vs. height): Strong positive relationship
How is covariance different from correlation?
Covariance: Measures direction but not strength (units-dependent).
Correlation: Standardized covariance (unitless, −1≤r≤1).
How do you test if a correlation is significant?
H0:ρ=0 (no correlation).
H1:ρ≠0
t stat = rsqrt(n-2) / sqrt(1-r^2)
cor.test(x, y)
Does correlation imply causation?
No! Possible explanations:
Chance (spurious correlation).
Confounding
True causation.
What is the simple linear regression equation?
Yi=β0+β1Xi+ϵi
What is the least squares criterion?
Minimizes the sum of squared residuals:
SSRes = ∑(yi−y^i)^2
Residuals: Vertical distances from data points to the regression line.
How is model variance (σe2) estimated?
= 1/n−2 ∑(yi−y^i)^2
What does R-squared (R^2) measure?
Proportion of variance in Y explained by X:
R^2 = SSmodel / SStotal = 1 - SSred/SStotal
Example: R^2=0.75 → 75% of variation in Y is explained by X
When is the intercept (β0) meaningless?
When X=0 is outside the data range (e.g., negative folate levels).
How do you predict values using regression?
Plug X into the fitted equation: Y^= β^0+β^1X
What are the assumptions of linear regression?
Linearity: Relationship between X and Y is linear.
Independence: Residuals are uncorrelated.
Homoscedasticity: Constant residual variance.
Normality: Residuals∼N(0,σ2)∼N(0,σ2).
How do you check regression assumptions in R?
plot(model) # Check:
1. Residuals vs. Fitted (linearity).
2. Q-Q Plot (normality).
3. Scale-Location (homoscedasticity).
What is the F-test in regression used for?
Tests if the model explains significant variance:
H0: All slopes = 0.
H1 : At least one slope ≠0
R output: F-statistic and p-value.
Why avoid extrapolation in regression?
Predictions outside the observed
X
X range may be invalid (e.g., SST = 150°C).