W4 Flashcards
(33 cards)
What is the goal of regression analysis?
To quantify the relationship between a dependent (response) variable and one or more independent (predictor) variables.
What else can regression be used for?
To understand influence of one variable on another, and to forecast outcomes based on predictor values.
Give examples of when to use regression analysis.
Sales depending on price or ad spend, - Market share based on salesforce size, - Wage based on education and age, - Attitude based on cognition or behavior
What’s the difference between correlation and regression?
Regression assumes causality (X → Y), correlation only shows association (X ↔ Y).
What does regression assume that correlation doesn’t?
That the predictor variable (X) causes change in the dependent variable (Y).
What does a regression line represent?
A straight line that best fits the data points and minimizes error between predicted and actual values.
Can you place any line on the data?
Yes, but we are only interested in the one that gives the smallest error bars.
What are different types of relationships we might see?
Linear - Quadratic - No relationship - Relationships influenced by outliers
Give an example where regression might falsely suggest causality.
Advertising spend might seem to cause increased sales — regression assumes it does, but we can’t be sure it’s causal.
What does simple linear regression do?
It explains how a dependent variable relates to a single independent variable.
What is the formula for simple linear regression?
Y = b₀ + b₁X + error, where b₀ is a constant, b₁ is a coefficient, and X is the predictor.
What was the example in the slides?
Predicting the number of credit cards a family has based on their family size.
What is the null model in regression?
A model where we assume no relationship and use the mean as the prediction for all observations (e.g., Y = 7 + error).
Why do we square the errors?
Because summing the raw differences gives 0, so we square the differences to get total error.
How do we calculate prediction error?
Subtract the predicted value from the actual value and square it.
What formula do we use to predict Y from X?
Y = 2.87 + 0.97 × X, where 2.87 is the intercept and 0.97 is the slope.
How do we interpret the slope (0.97)?
For each extra family member, the number of credit cards increases by 0.97 on average.
What does the intercept mean?
When family size = 0, the predicted number of credit cards is 2.87.
What is R² and how do we interpret it?
R² = Explained sum of squares / Total sum of squares. In this case: 16.5 / 22 = 0.75, so 75% of the variability in credit cards is explained by family size.
What’s the goal of comparing simple vs multiple regression?
We’re trying to figure out which model better explains the number of credit cards using different predictors.
How do we decide which model is better?
We look at adjusted R-squared. A higher value means better explanatory power while accounting for extra predictors.
What is a dummy variable?
A dummy variable is used to turn a yes/no category into 0 and 1 so it can be used in regression.
How do dummy variables work?
X = 1 if the condition is met (e.g. owns an electric car), and X = 0 if not.
When do we use dummy variables?
When our variable is categorical (nominal or ordinal), like gender, education, or car ownership.