W4 Flashcards

(33 cards)

1
Q

What is the goal of regression analysis?

A

To quantify the relationship between a dependent (response) variable and one or more independent (predictor) variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What else can regression be used for?

A

To understand influence of one variable on another, and to forecast outcomes based on predictor values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Give examples of when to use regression analysis.

A

Sales depending on price or ad spend, - Market share based on salesforce size, - Wage based on education and age, - Attitude based on cognition or behavior

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What’s the difference between correlation and regression?

A

Regression assumes causality (X → Y), correlation only shows association (X ↔ Y).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does regression assume that correlation doesn’t?

A

That the predictor variable (X) causes change in the dependent variable (Y).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does a regression line represent?

A

A straight line that best fits the data points and minimizes error between predicted and actual values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Can you place any line on the data?

A

Yes, but we are only interested in the one that gives the smallest error bars.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are different types of relationships we might see?

A

Linear - Quadratic - No relationship - Relationships influenced by outliers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Give an example where regression might falsely suggest causality.

A

Advertising spend might seem to cause increased sales — regression assumes it does, but we can’t be sure it’s causal.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What does simple linear regression do?

A

It explains how a dependent variable relates to a single independent variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the formula for simple linear regression?

A

Y = b₀ + b₁X + error, where b₀ is a constant, b₁ is a coefficient, and X is the predictor.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What was the example in the slides?

A

Predicting the number of credit cards a family has based on their family size.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the null model in regression?

A

A model where we assume no relationship and use the mean as the prediction for all observations (e.g., Y = 7 + error).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Why do we square the errors?

A

Because summing the raw differences gives 0, so we square the differences to get total error.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How do we calculate prediction error?

A

Subtract the predicted value from the actual value and square it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What formula do we use to predict Y from X?

A

Y = 2.87 + 0.97 × X, where 2.87 is the intercept and 0.97 is the slope.

17
Q

How do we interpret the slope (0.97)?

A

For each extra family member, the number of credit cards increases by 0.97 on average.

18
Q

What does the intercept mean?

A

When family size = 0, the predicted number of credit cards is 2.87.

19
Q

What is R² and how do we interpret it?

A

R² = Explained sum of squares / Total sum of squares. In this case: 16.5 / 22 = 0.75, so 75% of the variability in credit cards is explained by family size.

20
Q

What’s the goal of comparing simple vs multiple regression?

A

We’re trying to figure out which model better explains the number of credit cards using different predictors.

21
Q

How do we decide which model is better?

A

We look at adjusted R-squared. A higher value means better explanatory power while accounting for extra predictors.

22
Q

What is a dummy variable?

A

A dummy variable is used to turn a yes/no category into 0 and 1 so it can be used in regression.

23
Q

How do dummy variables work?

A

X = 1 if the condition is met (e.g. owns an electric car), and X = 0 if not.

24
Q

When do we use dummy variables?

A

When our variable is categorical (nominal or ordinal), like gender, education, or car ownership.

25
What does the regression with dummy look like?
Y = b₀ + b₁ × X + error, where X is the dummy variable (e.g. 1 = has electric car, 0 = doesn't).
26
Can you use both t-test and regression for dummies?
Yes! You get the same t and p values whether you do a t-test or regression with a dummy variable.
27
Does it matter which group is coded 0 or 1?
No — the direction of the coefficient just flips. The results are the same either way.
28
Dummy Variables – Multiple Regression (general idea)
In multiple linear regression, dummy variables let us include 3+ mutually exclusive groups by turning them into binary (0/1) variables.
29
Dummy Variables – Setup example
Suppose we have 3 groups. We create two dummy variables: - X1 = 1 if in group 2, 0 otherwise, - X2 = 1 if in group 3, 0 otherwise, - If both X1 and X2 are 0, the observation is in group 1 (our reference group)
30
Dummy Variables – Formula
Y = b₀ + b₁X₁ + b₂X₂ + ... + error, This helps us estimate the effect of each group while holding the others constant.
31
Dummy Variables – Output interpretation (Intercept)
In the regression output, the intercept represents the mean of the reference group (the one where all dummies = 0).
32
Dummy Variables – F-test
F-test in regression tells us whether at least one of the dummy variables is significant overall – just like in a one-way ANOVA.
33
Dummy Variables – T-test
T-tests for dummy variable coefficients tell us if the individual group is significantly different from the reference group – like in Tukey HSD.