SESSION 9 - CORRELATION AND SIMPLE LINEAR REGRESSION Flashcards
(32 cards)
X and Y associated ? - What do you do first?
Scatter plots and Correlation
Predict Y from X? - What do you do first?
Simple Linear Regression
What do scatterplots do?
What do you look out for?
Shows relationship between 2 continuous variables
Linear association?
Outliers?
What is correlation?
Correlation is a statistical measure that expresses the extent to which two variables are linearly related (meaning they change together at a constant rate).
What is a simple linear regression?
Simple linear regression is used to estimate the relationship between two quantitative variables. You can use simple linear regression when you want to know:
How strong the relationship is between two variables (e.g., the relationship between rainfall and soil erosion).
The value of the dependent variable at a certain value of the independent variable (e.g., the amount of soil erosion at a certain level of rainfall)
What are regression models?
Regression models describe the relationship between variables by fitting a line to the observed data. Linear regression models use a straight line, while logistic and nonlinear regression models use a curved line. Regression allows you to estimate how a dependent variable changes as the independent variable(s) change.
What is the regression coefficient?
How much we expect Y to change as X increases.
What is the E in the equation?
E is the error of the estimate, or how much variation there is in our estimate of the regression coefficient.
What is Y?
Y is the predicted value of the dependent variable (y) for any given value of the independent variable (x).
What is B0?
B0 is the intercept, the predicted value of y when the x is 0.
What is B1?
B1 is the regression coefficient – how much we expect y to change as x increases.
What is X?
X is the independent variable ( the variable we expect is influencing y).
What is the Correlation Coefficient (r)?
The sample correlation coefficient (r) is a measure of the closeness of association of the points in a scatter plot to a linear regression line based on those points.
Possible values of the correlation coefficient range from -1 to +1, with -1 indicating a perfectly linear negative, i.e., inverse, correlation (sloping downward) and +1 indicating a perfectly linear positive correlation (sloping upward).
A correlation coefficient close to 0 suggests little, if any, correlation.
What is Pearson Product-Moment Correlation?
The Pearson product-moment correlation coefficient (or Pearson correlation coefficient, for short) is a measure of the strength of a linear association between two variables and is denoted by r. Basically, a Pearson product-moment correlation attempts to draw a line of best fit through the data of two variables, and the Pearson correlation coefficient, r, indicates how far away all these data points are to this line of best fit (i.e., how well the data points fit this new model/line of best fit).
How can we determine the strength of association based on the Pearson correlation coefficient?
The stronger the association of the two variables, the closer the Pearson correlation coefficient, r, will be to either +1 or -1 depending on whether the relationship is positive or negative, respectively. Achieving a value of +1 or -1 means that all your data points are included on the line of best fit – there are no data points that show any variation away from this line. Values for r between +1 and -1 (for example, r = 0.8 or -0.4) indicate that there is variation around the line of best fit. The closer the value of r to 0 the greater the variation around the line of best fit.
What is the line of best fit?
It is also known as regression line
Line of best fit refers to a line through a scatter plot of data points that best expresses the relationship between those points. Statisticians typically use the least squares method (sometimes known as ordinary least squares, or OLS) to arrive at the geometric equation for the line, either through manual calculations or by using software.
What is the spearmann’s rank correlation?
The Spearman’s rank-order correlation is the nonparametric version of the Pearson product-moment correlation. Spearman’s correlation coefficient, (ρ, also signified by rs) measures the strength and direction of association between two ranked variables.
What are the assumptions of Spearman’s rank-order correlation?
You need two variables that are either ordinal, interval or ratio
Although you would normally hope to use a Pearson product-moment correlation on interval or ratio data, the Spearman correlation can be used when the assumptions of the Pearson correlation are markedly violated. However, Spearman’s correlation determines the strength and direction of the monotonic relationship between your two variables rather than the strength and direction of the linear relationship between your two variables, which is what Pearson’s correlation determines.
What is a monotonic relationship?
A monotonic relationship is a relationship that does one of the following: (1) as the value of one variable increases, so does the value of the other variable; or (2) as the value of one variable increases, the other variable value decreases.
If spearman’s greater than Pearson’s?
(If ρ>r, relationship monotonic and not linear)
Correlation:
Pearson’s r – value range
Correlation:
Pearson’s r – value range
Perfect
negative
association
Negative
association
No
association
Positive
association
Perfect
positive
association
r=-1 r<0 r≈0 r>0 r=1
What is pearson’s correlation coefficient not?
It is not the slope
What is the slope?
That is the regression coefficient
If there is a big outliers, what should be done?
Consider using Spearman’s rank