W5 Flashcards by Eline Olmos van Velden

Regression Assumptions – Importance

We need to follow certain assumptions in regression to make valid conclusions. Violating them can lead to misleading or biased results.

How well did you know this?

Not at all

Perfectly

Why assumptions matter

If you break assumptions: (a) You risk biased estimates that reduce the credibility of your results. (b) You might draw flawed conclusions from your tests.

How well did you know this?

Not at all

Perfectly

How to deal with assumptions

2 steps: (a) Diagnostics: Use tools and tests to check if assumptions hold. (b) Solutions: If assumptions are violated, use techniques to fix or minimize the issue.

How well did you know this?

Not at all

Perfectly

Assumption 1: Linear relationships

We assume a linear relationship between independent and dependent variables. That means: when plotted, the data should form a straight line.

How well did you know this?

Not at all

Perfectly

What happens if it’s not linear?

If the relationship is curved (like a U-shape or step-like), your regression will be off. That means your results and conclusions could be wrong.

How well did you know this?

Not at all

Perfectly

Spotting non-linearity

Use a residual plot (predicted values vs. residuals). If the residuals form a straight line = all good. If there’s a curve or weird pattern = not linear.

How well did you know this?

Not at all

Perfectly

Real-world examples of non-linearity

Quadratic patterns, - Step relationships (e.g. from dummy variables), - Sudden jumps or gaps in the trend line, All of these violate the linearity assumption.

How well did you know this?

Not at all

Perfectly

What to do if it’s not linear?

You can transform the variables! Example: use a log transformation (like taking the natural log of the variable) to straighten the pattern.

How well did you know this?

Not at all

Perfectly

Regression Assumptions – Overview

Regression assumptions are needed to make sure your conclusions are accurate. If violated, they can lead to biased estimates and wrong significance tests. We use diagnostics and solutions to check and improve robustness.

How well did you know this?

Not at all

Perfectly

Assumption 1: Linear Relationships

Assumes a straight-line relationship between independent and dependent variables. The regression line should cut through the scatter in a straight path.

How well did you know this?

Not at all

Perfectly

Diagnosing Linearity

Plot the residuals (errors) vs. fitted values. If you see a straight horizontal line, it’s linear. If there’s a curve or pattern, it’s not.

How well did you know this?

Not at all

Perfectly

Fixing Non-Linearity

You can use transformations like a log or quadratic transformation to straighten the relationship and reduce bias.

How well did you know this?

Not at all

Perfectly

Assumption 2: Normally Distributed Errors

In regression, we assume that the residuals (errors) are normally distributed with a mean of 0 and constant variance. Violations can make significance tests unreliable.

How well did you know this?

Not at all

Perfectly

Diagnosing Normality

Use histograms or Q-Q plots of the residuals. If they form a bell shape or line up well with the normal line, you’re good.

How well did you know this?

Not at all

Perfectly

Fixing Normality Issues

You can transform the variables (like taking the log) or split the data into groups using dummy variables.

How well did you know this?

Not at all

Perfectly

Assumption 3: Independence of Error Terms

Study These Flashcards

Each error should be independent. This means one person’s error shouldn’t be related to someone else’s.

When Independence is Violated

Study These Flashcards

Happens with clusters or repeated surveys. Use time plots or residual plots by group to detect patterns.

Assumption 4: Homoscedasticity

Study These Flashcards

The variance of residuals should be consistent across all levels of predictors. If the variance spreads out or narrows in a pattern, that’s a problem.

Diagnosing Homoscedasticity

Study These Flashcards

Use residual vs. fitted plots or scale-location plots. A fan or cone shape means heteroscedasticity (bad).

Assumption 5: No Multicollinearity

Study These Flashcards

Independent variables shouldn’t be too highly correlated. If they are, the model can’t tell which variable causes what.

Diagnosing Multicollinearity

Study These Flashcards

Check the correlation matrix of predictors. Watch out if values are near +1 or -1. This suggests high multicollinearity.

Fixing Multicollinearity

Study These Flashcards

Drop one variable, combine them, or test if it really matters (does it change conclusions?). You can also use regularisation methods in more advanced courses.

Why use standardisation in regression?

Study These Flashcards

To compare variables that use different scales (e.g. 1–10 vs. 1–6) by converting them to standard deviations.

How do you standardise a variable?

Study These Flashcards

Use: Xstd = (X − mean) / standard deviation. This rescales the variable to have mean 0 and SD 1.

What does a standardised coefficient tell you?

It shows how many SDs the dependent variable changes per 1 SD increase in the independent variable.

Do dummy variables get standardised?

No. Dummy variables stay as 0 and 1. The 0 group becomes the reference group in regression.

What does the intercept mean in standardised regression?

It’s the expected value of the DV when all predictors are at their mean values (i.e. all 0 after standardisation).

What is a trend variable?

A variable representing time (e.g. T = 1 for 2004, T = 2 for 2005…) to capture change over time in regression.

What types of data are used in forecasting?

Cross-section (different units, same time), time-series (same unit over time), and panel (combo of both).

How do you forecast with a trend regression?

Use: Y = b0 + b1*T. Insert a future value for T to predict Y (e.g. sales in 2017 using T = 14).

What do residuals reveal in forecasting?

If residuals follow a pattern (e.g. early negative, later positive), the model is not capturing a linear trend well.

What are two fixes for non-linearity?

1. Log-transform the dependent variable. 2. Add a quadratic (T²) term to capture curve shapes.

How does log transformation help?

It linearises curved relationships and helps when the dependent variable grows exponentially.

How do you interpret log-transformed regression?

Use exp(predicted value) to convert back to the original scale (e.g. actual sales instead of log sales).

Formula for converting log predictions?

Use: Sales = exp(intercept + slope * T) to get back to the real (non-log) scale.

Why use a quadratic term in regression?

To capture curved growth (e.g. slow at first, then faster later). This models changes in growth rate.

What does Y = b0 + b1T + b2T² mean?

It models curvature. Negative b1 + positive b2 = initial decline followed by growth after a turning point.

How do you choose between log vs. quadratic?

Check residuals and fit (R²). Use log if trend looks exponential, or quadratic if there's a turning point.

W5 Flashcards

(38 cards)