corrolation Flashcards
What is correlation in statistics?
Correlation quantifies the extent of association between two continuous variables.
How is correlation different from regression?
Correlation measures the strength of a relationship between two variables, while regression explains one variable in terms of another with an equation.
What is Pearson’s correlation coefficient (𝑟)?
A measure of linear correlation between two variables, ranging from -1 (perfect negative) to +1 (perfect positive).
What does a correlation coefficient of 0 mean?
It means there is no linear relationship between the two variables.
What is a perfect correlation?
A perfect correlation occurs when all data points fall exactly on a straight line, with 𝑟 = ±1.
What does concurvity refer to?
It describes a non-linear association between two continuous variables.
In R, how can you compute the correlation coefficient between two variables, LLL and TotalHeight?
diffx <- hgt$LLL - mean(hgt$LLL)
diffy <- hgt$TotalHeight - mean(hgt$TotalHeight)
r <- sum(diffx * diffy) / sqrt(sum(diffx^2) * sum(diffy^2))
print(r)
What is a simpler way to compute correlation in R?
cor(x = hgt$TotalHeight, y = hgt$LLL)
What is covariance?
The numerator in the correlation formula, representing how two variables vary together.
How is correlation different from covariance?
Correlation standardizes covariance to a range of -1 to +1, making it comparable across different units.
What are the null (𝐻0) and alternative (𝐻1) hypotheses for testing correlation?
H0:ρ=0 (No association between variables)
𝐻1:𝜌≠0 (There is an association)
What test statistic is used to test correlation?
A 𝑡-test with 𝑛−2 degrees of freedom.
How do you compute a two-tailed 𝑝-value for correlation in R?
2 * pt(q = t_stat, df = n-2, lower.tail = FALSE)
What function in R performs a correlation test?
cor.test(x = hgt$TotalHeight, y = hgt$LLL)
What is the alternative hypothesis in a correlation test?
H: ρ≠0 (The true correlation is not zero, meaning an association exists).
How do we interpret a very small 𝑝-value in a correlation test?
It suggests strong evidence against the null hypothesis, meaning there is likely an association between the variables.
What does the confidence interval in a correlation test represent?
It provides a range within which the true population correlation coefficient (𝜌) is likely to lie.
Why is correlation not the same as causation?
Correlation only shows an association, but a causal link requires further evidence, such as controlled experiments.
What is a “spurious” or “nonsense” correlation?
A correlation between two variables that occurs due to chance or a hidden third variable rather than a causal relationship.
What are three possible explanations for a correlation?
Chance (random coincidence)
A third variable affecting both
Genuine causation
What is Anscombe’s quartet?
A set of four datasets that have the same correlation coefficient but different distributions, illustrating the limitations of correlation.
What is the correlation coefficient (𝑟) for each pair in Anscombe’s quartet?
r=0.816 for all pairs, despite vastly different data patterns.
What does the Anscombe’s quartet demonstrate about correlation?
That correlation alone does not capture the true nature of relationships between variables; visualization is essential.
What does the datasauRus dataset illustrate?
That datasets with different structures can have identical correlation values, emphasizing the need for visualization.