W5 Flashcards

(38 cards)

1
Q

Regression Assumptions – Importance

A

We need to follow certain assumptions in regression to make valid conclusions. Violating them can lead to misleading or biased results.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Why assumptions matter

A

If you break assumptions: (a) You risk biased estimates that reduce the credibility of your results. (b) You might draw flawed conclusions from your tests.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How to deal with assumptions

A

2 steps: (a) Diagnostics: Use tools and tests to check if assumptions hold. (b) Solutions: If assumptions are violated, use techniques to fix or minimize the issue.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Assumption 1: Linear relationships

A

We assume a linear relationship between independent and dependent variables. That means: when plotted, the data should form a straight line.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What happens if it’s not linear?

A

If the relationship is curved (like a U-shape or step-like), your regression will be off. That means your results and conclusions could be wrong.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Spotting non-linearity

A

Use a residual plot (predicted values vs. residuals). If the residuals form a straight line = all good. If there’s a curve or weird pattern = not linear.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Real-world examples of non-linearity

A

Quadratic patterns, - Step relationships (e.g. from dummy variables), - Sudden jumps or gaps in the trend line, All of these violate the linearity assumption.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What to do if it’s not linear?

A

You can transform the variables! Example: use a log transformation (like taking the natural log of the variable) to straighten the pattern.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Regression Assumptions – Overview

A

Regression assumptions are needed to make sure your conclusions are accurate. If violated, they can lead to biased estimates and wrong significance tests. We use diagnostics and solutions to check and improve robustness.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Assumption 1: Linear Relationships

A

Assumes a straight-line relationship between independent and dependent variables. The regression line should cut through the scatter in a straight path.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Diagnosing Linearity

A

Plot the residuals (errors) vs. fitted values. If you see a straight horizontal line, it’s linear. If there’s a curve or pattern, it’s not.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Fixing Non-Linearity

A

You can use transformations like a log or quadratic transformation to straighten the relationship and reduce bias.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Assumption 2: Normally Distributed Errors

A

In regression, we assume that the residuals (errors) are normally distributed with a mean of 0 and constant variance. Violations can make significance tests unreliable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Diagnosing Normality

A

Use histograms or Q-Q plots of the residuals. If they form a bell shape or line up well with the normal line, you’re good.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Fixing Normality Issues

A

You can transform the variables (like taking the log) or split the data into groups using dummy variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Assumption 3: Independence of Error Terms

A

Each error should be independent. This means one person’s error shouldn’t be related to someone else’s.

17
Q

When Independence is Violated

A

Happens with clusters or repeated surveys. Use time plots or residual plots by group to detect patterns.

18
Q

Assumption 4: Homoscedasticity

A

The variance of residuals should be consistent across all levels of predictors. If the variance spreads out or narrows in a pattern, that’s a problem.

19
Q

Diagnosing Homoscedasticity

A

Use residual vs. fitted plots or scale-location plots. A fan or cone shape means heteroscedasticity (bad).

20
Q

Assumption 5: No Multicollinearity

A

Independent variables shouldn’t be too highly correlated. If they are, the model can’t tell which variable causes what.

21
Q

Diagnosing Multicollinearity

A

Check the correlation matrix of predictors. Watch out if values are near +1 or -1. This suggests high multicollinearity.

22
Q

Fixing Multicollinearity

A

Drop one variable, combine them, or test if it really matters (does it change conclusions?). You can also use regularisation methods in more advanced courses.

23
Q

Why use standardisation in regression?

A

To compare variables that use different scales (e.g. 1–10 vs. 1–6) by converting them to standard deviations.

24
Q

How do you standardise a variable?

A

Use: Xstd = (X − mean) / standard deviation. This rescales the variable to have mean 0 and SD 1.

25
What does a standardised coefficient tell you?
It shows how many SDs the dependent variable changes per 1 SD increase in the independent variable.
26
Do dummy variables get standardised?
No. Dummy variables stay as 0 and 1. The 0 group becomes the reference group in regression.
27
What does the intercept mean in standardised regression?
It’s the expected value of the DV when all predictors are at their mean values (i.e. all 0 after standardisation).
28
What is a trend variable?
A variable representing time (e.g. T = 1 for 2004, T = 2 for 2005…) to capture change over time in regression.
29
What types of data are used in forecasting?
Cross-section (different units, same time), time-series (same unit over time), and panel (combo of both).
30
How do you forecast with a trend regression?
Use: Y = b0 + b1*T. Insert a future value for T to predict Y (e.g. sales in 2017 using T = 14).
31
What do residuals reveal in forecasting?
If residuals follow a pattern (e.g. early negative, later positive), the model is not capturing a linear trend well.
32
What are two fixes for non-linearity?
1. Log-transform the dependent variable. 2. Add a quadratic (T²) term to capture curve shapes.
33
How does log transformation help?
It linearises curved relationships and helps when the dependent variable grows exponentially.
34
How do you interpret log-transformed regression?
Use exp(predicted value) to convert back to the original scale (e.g. actual sales instead of log sales).
35
Formula for converting log predictions?
Use: Sales = exp(intercept + slope * T) to get back to the real (non-log) scale.
36
Why use a quadratic term in regression?
To capture curved growth (e.g. slow at first, then faster later). This models changes in growth rate.
37
What does Y = b0 + b1T + b2T² mean?
It models curvature. Negative b1 + positive b2 = initial decline followed by growth after a turning point.
38
How do you choose between log vs. quadratic?
Check residuals and fit (R²). Use log if trend looks exponential, or quadratic if there's a turning point.