Linear Regression Flashcards
What does ‘strength of a relationship’ in regression refer to?
It is an indication of how well one can predict the response variable (e.g., sales) from the predictor (e.g., advertising budget). A strong relationship implies high predictive accuracy, whereas a weak relationship implies a prediction only slightly better than random guessing.
Definition: Simple Linear Regression
A linear model with one predictor (X) used to predict an outcome (Y).
What does the symbol ‘≈’ mean in a regression/statistical context?
It can be read as ‘is approximately modeled as,’ indicating an approximate relationship rather than an exact equality.
sales ≈ β0 + β1 × TV
What does β0 represent in this equation?
Intercept
Definition: Intercept (β0)
Represents the predicted value of Y when X=0.
sales ≈ β0 + β1 × TV
What does β1 represent in this equation?
Slope
Definition: Slope (β1)
Represents the average change in Y for a one-unit increase in X.
sales ≈ β0 + β1 × TV
What are terms used to refer to β0 and β1 collectively?
- Coefficients
- Parameters
Ordinary Least Squares (OLS) Estimation
A method to estimate β0 and β1 by minimizing the sum of squared residuals.
Residual (ε)
ei = yi −yˆi
The difference between an observed value (Y) and the model’s fitted value (Ŷ).
Residual sum of squares (RSS) equation
Include simple form and full form
Equation for slope (β1)
Equation for intercept (β0)
Definition: least squares coefficient estimates
They are the intercept and slope estimates chosen to minimize the sum of squared residuals (differences between observed and predicted values), providing the best linear fit to the data under the least squares criterion.
Best-Fit Line
The linear function Ŷ = β₀ + β₁X that minimizes the sum of squared residuals.
Interpretation of β1 in Linear Regression
Hint: Consider MLR as well as SLR
Indicates how much Y is expected to change when X increases by one unit, holding other factors constant (if any).
Assumption: Linearity
Y is assumed to be linearly related to X.
Assumption: Independence of Errors
The residuals are assumed to be uncorrelated with one another.
Assumption: Exogeneity
The error term or residuals are independent of X.
Assumption: Homoscedasticity
The variance of residuals is constant across all values of X.
Assumption: Normality of Errors
Residuals are assumed to follow a normal distribution (especially important for inference).
Definition: Population regression line
The population regression line is the true (but typically unknown) underlying linear relationship between X and Y.
Definition: least squares line
The least squares line is our estimated linear relationship based on a specific sample of data.
What is the distinction between the least squares line and population regression line?
Different samples yield slightly different least squares lines, but the population regression line remains fixed (and unobserved).