W4 Flashcards by Eline Olmos van Velden

What is the goal of regression analysis?

To quantify the relationship between a dependent (response) variable and one or more independent (predictor) variables.

How well did you know this?

Not at all

Perfectly

What else can regression be used for?

To understand influence of one variable on another, and to forecast outcomes based on predictor values.

How well did you know this?

Not at all

Perfectly

Give examples of when to use regression analysis.

Sales depending on price or ad spend, - Market share based on salesforce size, - Wage based on education and age, - Attitude based on cognition or behavior

How well did you know this?

Not at all

Perfectly

What’s the difference between correlation and regression?

Regression assumes causality (X → Y), correlation only shows association (X ↔ Y).

How well did you know this?

Not at all

Perfectly

What does regression assume that correlation doesn’t?

That the predictor variable (X) causes change in the dependent variable (Y).

How well did you know this?

Not at all

Perfectly

What does a regression line represent?

A straight line that best fits the data points and minimizes error between predicted and actual values.

How well did you know this?

Not at all

Perfectly

Can you place any line on the data?

Yes, but we are only interested in the one that gives the smallest error bars.

How well did you know this?

Not at all

Perfectly

What are different types of relationships we might see?

Linear - Quadratic - No relationship - Relationships influenced by outliers

How well did you know this?

Not at all

Perfectly

Give an example where regression might falsely suggest causality.

Advertising spend might seem to cause increased sales — regression assumes it does, but we can’t be sure it’s causal.

How well did you know this?

Not at all

Perfectly

What does simple linear regression do?

It explains how a dependent variable relates to a single independent variable.

How well did you know this?

Not at all

Perfectly

What is the formula for simple linear regression?

Y = b₀ + b₁X + error, where b₀ is a constant, b₁ is a coefficient, and X is the predictor.

How well did you know this?

Not at all

Perfectly

What was the example in the slides?

Predicting the number of credit cards a family has based on their family size.

How well did you know this?

Not at all

Perfectly

What is the null model in regression?

A model where we assume no relationship and use the mean as the prediction for all observations (e.g., Y = 7 + error).

How well did you know this?

Not at all

Perfectly

Why do we square the errors?

Because summing the raw differences gives 0, so we square the differences to get total error.

How well did you know this?

Not at all

Perfectly

How do we calculate prediction error?

Subtract the predicted value from the actual value and square it.

How well did you know this?

Not at all

Perfectly

What formula do we use to predict Y from X?

Study These Flashcards

Y = 2.87 + 0.97 × X, where 2.87 is the intercept and 0.97 is the slope.

How do we interpret the slope (0.97)?

Study These Flashcards

For each extra family member, the number of credit cards increases by 0.97 on average.

What does the intercept mean?

Study These Flashcards

When family size = 0, the predicted number of credit cards is 2.87.

What is R² and how do we interpret it?

Study These Flashcards

R² = Explained sum of squares / Total sum of squares. In this case: 16.5 / 22 = 0.75, so 75% of the variability in credit cards is explained by family size.

What’s the goal of comparing simple vs multiple regression?

Study These Flashcards

We’re trying to figure out which model better explains the number of credit cards using different predictors.

How do we decide which model is better?

Study These Flashcards

We look at adjusted R-squared. A higher value means better explanatory power while accounting for extra predictors.

What is a dummy variable?

Study These Flashcards

A dummy variable is used to turn a yes/no category into 0 and 1 so it can be used in regression.

How do dummy variables work?

Study These Flashcards

X = 1 if the condition is met (e.g. owns an electric car), and X = 0 if not.

When do we use dummy variables?

Study These Flashcards

When our variable is categorical (nominal or ordinal), like gender, education, or car ownership.

What does the regression with dummy look like?

Y = b₀ + b₁ × X + error, where X is the dummy variable (e.g. 1 = has electric car, 0 = doesn't).

Can you use both t-test and regression for dummies?

Yes! You get the same t and p values whether you do a t-test or regression with a dummy variable.

Does it matter which group is coded 0 or 1?

No — the direction of the coefficient just flips. The results are the same either way.

Dummy Variables – Multiple Regression (general idea)

In multiple linear regression, dummy variables let us include 3+ mutually exclusive groups by turning them into binary (0/1) variables.

Dummy Variables – Setup example

Suppose we have 3 groups. We create two dummy variables: - X1 = 1 if in group 2, 0 otherwise, - X2 = 1 if in group 3, 0 otherwise, - If both X1 and X2 are 0, the observation is in group 1 (our reference group)

Dummy Variables – Formula

Y = b₀ + b₁X₁ + b₂X₂ + ... + error, This helps us estimate the effect of each group while holding the others constant.

Dummy Variables – Output interpretation (Intercept)

In the regression output, the intercept represents the mean of the reference group (the one where all dummies = 0).

Dummy Variables – F-test

F-test in regression tells us whether at least one of the dummy variables is significant overall – just like in a one-way ANOVA.

Dummy Variables – T-test

T-tests for dummy variable coefficients tell us if the individual group is significantly different from the reference group – like in Tukey HSD.

W4 Flashcards

(33 cards)