What is a time series? + examples
It’s a sequence of data points collected over time.
Think of it like: Tracking your monthly expenses or daily steps over a year.
The evolution of a data point over time
Examples:
* Quarterly sales over the last 5 years
* Monthly CPI over the last 12 years
What are the main types of trend models in time-series analysis?
Linear Trend Model: Assumes a straight-line change over time.
Log-Linear Trend Model: Assumes exponential growth or decline.
Autoregressive Model (AR): Uses past values to predict future ones.
How do lagged models differ from trend models in time-series analysis?
Think of it like:
Predicting today’s temperature based on yesterday’s temperature, rather than just the date.
What does a Linear Trend Model do? What would the independent variable be?
It predicts future values by adding a constant amount each time period.
Time is the independent variable.
Formula:
y = b0 + b1(t) +ε
Example: If your salary increases by £3,000 every year, that’s a linear trend.
What is a Log-Linear Trend Model? + formula
It predicts values that grow (or shrink) at a constant percentage rate over time—ideal for modelling exponential growth.
It assumes the data grows (or shrinks) at a constant percentage rate over time.
How would you use a Log-Linear Model to predict values at t=49?
b0 = 4 and b1 = 0.09
If you were to graph the predicted values from a Linear Trend Model, a Log-Linear Trend Model, and an Exponential Trend Model, how would the shapes of the lines differ over time?
What is serial correlation in time-series trend models, and how does it affect the reliability of regression results?
Serial correlation (also called autocorrelation) occurs when the errors (residuals) from a regression model are correlated across time—meaning the error in one period is related to the error in another.
Why it matters:
* Violates a key regression assumption: that errors are independent.
* Leads to biased standard errors, which can distort hypothesis tests and confidence intervals.
* Makes the model less reliable for forecasting.
What should you do if your linear trend model shows serial (auto)correlation in the regression errors?
If a linear trend model shows serial correlation (i.e., errors are correlated across time), follow these steps:
Test for serial correlation using the Durbin-Watson statistic.
If serial correlation is present:
* Try a log-linear model instead.
* If the log-linear model still shows autocorrelation, switch to an autoregressive model (AR).
What is Ordinary least squares (OLS) regression?
Is used to estimate the coefficient in the trend line
What is the general rule on whether to use a log-linear model or linear trend model?
If the variable grows at a constant rate, a log-linear model is most appropriate
If the variables increases over time by a amount, a linear trend model is most appropriate
What is an autoregressive model (AR) ?
A model in which the dependent variable is regressed against one or more lagged values of itself. It predicts a variable using its own past values.
** ie sales for a firm could be regressed against the sales for the firm in the previous month
What does it mean for a time series to be covariance stationary, and why is it important for autoregressive models?
What are the 3 conditions neccesary to be considered covariance stationary?
A time series is covariance stationary if its behaviour stays stable over time.
Autoregressive models assume the data is stable.
If the data drifts or trends too much, the model can give misleading/meaningless forecasts.
How does an AR(1) model forecast future values using past data? Explain using yesterday, today and tomorrow.
Imagine x0 = 5, b0 = 1.2 and b1= 0.45, how would you calc the Monday and Tuesdays forecast?
An AR(1) model uses yesterday’s value to predict today’s, and then uses today’s prediction to forecast tomorrow’s.
What do you do if your AR model shows serial correlation? What wouldn’t you use?
Why does increasing the number of lags (e.g., from AR(1) to AR(2)) or adjusting for seasonality help fix serial correlation?
Because serial correlation means the model is missing patterns in the data.
Increase Lags (AR(1) → AR(2)):
* Adds more past values to the model.
* Captures short-term dependencies that a single lag might miss.
* Helps explain more variation, reducing leftover structure in the errors.
Analogy:
Predicting today’s mood using only yesterday’s might miss the influence of two days ago. Adding that second lag gives a fuller picture.
Adjust for Seasonality:
* Accounts for repeating patterns (e.g., monthly or yearly cycles).
* Prevents the model from mistaking seasonal effects for random error.
* Common fix: include seasonal lags (e.g., lag 12 for monthly data).
Analogy:
Ice cream sales spike every summer. If the model doesn’t know that, it might treat the spike as an error — when it’s actually seasonal.
How do you test for autocorrelation in an AR(1) model, based on the example below and your findings what would you do?
How do you calculate the SE and t stat?
Compare to Critical Value
Assume a critical t-value of ±2.0 (95% confidence level):
Lag 2 t-stat = 2.3784 > 2.0 → Significant autocorrelation detected
The AR(1) model is incomplete — it doesn’t capture all the time-based patterns.
Fix:
* Increase the number of lags (e.g., move to AR(2)) to include more past values.
* Adjust for seasonality if patterns repeat over time (e.g., monthly cycles).
More lags help the model capture short-term dependencies that were missed.
Seasonal adjustments account for recurring patterns, reducing unexplained variation.
What does it mean if a time series is mean-reverting? For an AR(1) model, how would you calc the MRL?
What does b1 represent and what does a small or big value mean?
The dependent variable tends to return to a long-term average over time.
If above the mean → expected to fall
If below the mean → expected to rise
b1 controls how strongly the past value influences the current one.
If b1 is close to 1, the series takes longer to revert.
If b1 is small, the series reverts quickly.
Analogy:
Imagine a ball rolling toward a resting point:
b0 is like the initial push.
MRL is where the ball eventually comes to rest.
When comparing forecasting models, how can you tell which one performs better, and why is it important to use out-of-sample data?
Use the Root Mean Squared Error (RMSE) to measure how accurate a model’s predictions are.
RMSE tells you the average size of prediction errors — lower is better.
In-sample data is what the model was trained on.
Out-of-sample data is new data that tests how well the model generalises.
Why out-of-sample RMSE matters:
It shows how the model performs in real-world scenarios.
Helps avoid overfitting, where a model looks great on training data but fails on new data.
Bottom line:
Choose the model with the lowest RMSE on out-of-sample data — it’s more likely to make reliable forecasts.
Analogy:
Think of RMSE like a golf score — the lower it is, the better your aim (or prediction accuracy).
What is regression coefficient instability, and how does it affect the reliability of time-series models over different time periods?
Regression coefficient instability means the estimated relationships in your model change over time.
This can happen if the economic environment shifts or the data-generating process evolves.
It creates a trade-off:
* Long time series offer more data but may be less stable.
* Short time series are more stable but may lack statistical power.
Why it matters:
* If coefficients aren’t stable, your model may not be reliable for forecasting.
* You may need to re-estimate the model or use rolling windows to adapt to changing conditions.
Analogy:
It’s like using last year’s map to navigate a city that’s constantly under construction — the roads may have changed.
What defines a random walk in time-series data, and what does it mean for a random walk to have drift?
A random walk is a time-series process where each value depends entirely on the previous value plus a random shock.
The coefficient on xt−1 is 1, meaning the series does not revert to a mean and can drift indefinitely.
Analogy:
Imagine walking in a fog:
Without drift: You take random steps — sometimes forward, sometimes back.
With drift: There’s a gentle slope pushing you forward, so you tend to move in one direction over time.
What is a unit root in an AR(1) model, and why must the coefficient be less than 1?
Unit root is when b1 = 0
For the model to be stationary (i.e., stable over time), the coefficient b1 must be less than 1 in absolute value:
If b1<1
The series is mean-reverting and stationary — it fluctuates around a long-term average.
If b1=1
The series has a unit root and becomes a random walk — it does not revert to a mean and can drift endlessly.
Analogy
Imagine a balloon floating in the wind:
If it has no anchor (unit root), it drifts wherever the wind blows — unpredictable and unstable.
If it’s tied to a post (stationary), it might sway, but it stays near the centre.
How does the Dickey-Fuller test help you figure out if a time series has a unit root and is nonstationary?
The Dickey-Fuller (DF) test transforms the AR(1) model to check whether the series is just drifting randomly (i.e., has a unit root) or reverts to a mean (i.e., is stationary).
Dickey-Fuller Test Logic
You calculate a t-statistic for g1 and compare it to special critical values (not the usual ones).
What is first differencing in time-series analysis, and how does it help fix nonstationary data with a unit root?
First differencing is a technique used to transform a nonstationary time series into a stationary one — which is essential for reliable forecasting. It removes the drift.
Basic First Difference Formula:
* This is the raw change in the original variable xxx from one time period to the next.
* It’s a transformation — not a model yet.
* Useful for removing trends and unit roots.
AR-style Model of the Differenced Series:
* This is a regression model applied to the differenced data.
* It assumes that today’s change depends on yesterday’s change, plus a constant and some noise.
* Helps forecast future changes based on past changes in the original series.