7 - Correlation and Regression Flashcards
(12 cards)
What is correlation?
Correlation is a statistical measure that describes the strength and direction of the relationship between two variables. It is often visualized using scatter graphs.
Define bivariate data.
Bivariate data consists of pairs of values for two random variables, allowing for analysis of the relationship between these variables.
What is Pearson’s Correlation Coefficient (PMCC)?
PMCC (𝑟 or 𝜌) measures the strength of a linear relationship between two variables. It ranges from -1 (perfect negative correlation) to 1 (perfect positive correlation), with 0 indicating no correlation.
Explain the distinction between correlation and causation.
Correlation indicates a relationship between two variables, but it does not imply that one variable causes a change in the other. A causal relationship exists only if changes in one directly result in changes to the other.
What does Spearman’s Rank Correlation measure?
Spearman’s Rank Correlation assesses the relationship between the ranks of two data sets rather than their raw values, making it a non-parametric alternative to PMCC.
Describe the ‘line of best fit’ in linear regression.
The line of best fit, or least squares regression line, minimizes the sum of the squares of the vertical distances (residuals) between the observed values and the line itself. The equation takes the form 𝑦 = 𝑎 + 𝑏𝑥, with 𝑎 as the y-intercept and 𝑏 as the slope.
What are the types of predictions in linear regression?
Predictions are classified as interpolation (within the range of the data) and extrapolation (outside the range). Interpolation is generally more reliable than extrapolation.
What is the Least Squares Regression Line?
The Least Squares Regression Line is the line that best fits a set of data points in a scatter plot, minimizing the sum of the squares of the vertical distances (residuals) between the points and the line.
What is an interpolation prediction?
Interpolation predictions are estimates made within the range of the data used to derive a regression equation, generally considered reliable.
What does extrapolation mean in the context of predictions?
Extrapolation refers to making predictions for values outside the range of the data used in calculating the regression line, which can be unreliable.
What is a spurious correlation?
A spurious correlation occurs when two variables appear to be related but are actually influenced by a third variable or are coincidental.
Why is visual representation (scatter plots) important in correlation analysis?
Scatter plots provide a visual insight into the relationship between variables, allowing for assessments of trends, cluster formations, and potential outliers.