Unit 13a: Regression I Simple Linear Regression Flashcards
The correlation coefficientβ¦
A. Will always fall between 0 and 1
B. Compares the mean of a sample to a population
C. Measures how strongly related two variables are
D. Can only be calculated for 2 continuous (i.e., ratio) variables
C. Measures how strongly related two variables are
Purpose of Simple (bivariate) Regression
How relations are used to predict outcomes: the stronger the correlation, the more accurate the prediction.
* The correlated variables can indicate the values of each other using a simple or bivariate regression.
* Standard error of the estimate can help us determine the accuracy of a prediction
strong correlation means accurate or weak prediction
accurate
correlation
If two variables co-vary, they have a relation (correlation)
* Regression extends the correlation to make a prediction of one variable on another.
* The accuracy of the prediction depends on the strength of the relation (correlation)
* More information shared between the variables (a higher, stronger correlation) means less error
* Not to be confused with one variable causing another.
* Not sure which variable cause which
* Or if a third variable accounts for relation (confounder)
Simple/Bivariate Regression has how many variables
2-
- 1 outcome variable
- 1 predictor variable
Key Elements of Linear Regression
- F-test: the Omnibus Test
- Is there any association
- Regression Equation
- Beta coefficient
- βSlopeβ of the function
- This is the important element we want because it has
meaningful interpretation
Key Elements of Linear Regression
- F-test: the Omnibus Test
(Is there any association?) - Regression Equation
- Beta coefficient, the βSlopeβ of the function
- This is the important element we want because it has meaningful interpretation
Regression Equation
y = mx + b
* π¦ = π½0 + π½1π₯
* Introduce an index i for each participant or observation (ππ, ππ)
ππ = π½0 + π½1ππ
* We allow an error in the equation for each observation i
ππ = π½0 + π½1ππ + ππ
The regression equation tells us that yβ (the predicted value of y) is a
function of a value for the intercept (π), a value for the slope (π), and
a value for the predictor variable (x).
Regression Equation parts
Regression Equation
ππ = π½0 + π½1ππ + ππ
* ππ: value of the outcome (or response or dependent) variable for the ith
observation
* ππ: value of the predictor (or independent) variable for the ith
observation
* π½0 & π½1: regression parameters (the intercept and slope) to be estimated
* π½0 = the intercept
* π½1 = the slope
* ππ: the random error term for the ith observation
Regression Analysis Variables
In regression language, the criterion variable is regressed on the
predictor variable.
* Criterion variable: the variable to predict
β The dependent variable; the y axis in a scatter-plot
βActual values denoted as y
βPredicted values denoted yβ or ΰ·π¦
- Predictor variable: the variable used in the prediction
βThe independent variable; the x axis in a scatter-plot
Criterion variable
the variable to predict
Predictor variable
the variable used in the prediction
Determining the line of best fit:
the Least Squares Criterion
A good fit will limit the divergence between our predicted value and
the actual data (the βerrorβ)
* With a single line, we cannot fit the data exactly.
* Some points will be above and some below
* How do we trade-off these errors?
* Minimize the square of the errors
* Deviations above or below the line are treated equally οso you square them
Evidence of prediction error is known as
a residual score.
* The difference between y and yβ (or ΰ·π¦).
* ππ = ππ β ΰ·
the least squares criterion
The sum of the squared differences between the actual (y) and predicted (yβ) values must have their lowest possible value
Regression designed to minimize the sum of the squared differences between y and yβ.