Lecture 17 Flashcards
What is an autoregressive model
AR model uses past values of a variable to predict its current value
- e.g. AR(1): 1st order autoregressive model:
Yt = B0 + B1yt-1 + ut
- interpretation being, today’s value depends on yesterday’s value plus some randomness
How to estimate AR models
- using OLS, but OLS may be biased, especially in small samples, due to serial correlation and endogeneity between yt-1 and u
- prove that AR models are biased, OLS underestimates the true autoregressive coefficient - known as Nickell bias
Moving Average Models
MA(1): Yt = Q0 + ut + Q1ut-1, where ut is i.i.d
- current yt depends on current and past error terms
- MA models involve shocks
- key point: shocks have effects that propagate for a few periods, but then die out
MA(q): yt = Q0 + ut + Q1ut-1 + … + Qq.ut-q
- not easy to estimate with OLS as error terms are unobersvable
ARMA models, autoregressive moving average
Yt = u + SUM(Bt.yt-l) + ut + SUM(Qk.ut-k)
- captures persistence from past (AR) and shock driven dynamics (MA)
- estimated using MLE or specialised time series methods
Distributed Lag models
- DL
Yt = a0 + a1xt-1 + a2xt-2 + … ak.xt-k + ut
- outcome yt depends on lags of another variable xt
- often used for policy analysis
Distributed Lag Models
- ADL
Autoregressive Distributed Lag:
- yt = u + SUM(Bt.yt-1) + SUM(aj.xt-j) + ut
- combines both lags of y and lags of x
- very flexible for modelling feedback and dynamic effects
How many lags to include in time series models?
- too few lags and you miss dynamics, too many and you overfit
3 main approaches:
- Rules of thumb - based on data frequency, monthly data? Try 6 or 12 lags - half or full year
- Cross-Validation - more empirical: hold out parts of data, test predictive accuracy. Still not super common in time series due to serial dependence - often done with rolling windows
- Information criteria: model selection tools that balance fit and Parsimony
Bayesian information criterion (BIC):
BIC(n) = ln (SSR(n)/T) + (n/T).(lnT)
- SSR(n): sum of squared residuals with n parameters
- first term: fit (lower is better)
- second term: penalty for complexity
Minimise BIC to find optimal number of lags, BIC tends to choose fewer lags
AIC - Akaike Information Criterion
AIC(n) = ln(SSR(n)/T) + 2n/T
- similar logic to BIC
- uses a smaller penalty -> tends to select more lags
Violation of stationarity common in time series data - seasonality
- what’s the issue?
- why is this problem?
- stationarity means the statistical properties of a series are constant without, seasonality violates this because certain repeat at regular intervals
- if you ignore you might estimate spurious relationships - detecting false signals or incorrect causality
How to fix violation of stationarity?
- Fixed data events
- seasonal patterns - trigonometric controls
- Fixed date events: use monthly dummies or specific event dummies, e.g. Christmas, bank holidays
- these soak up predictable jumps due to calendar effects - Seasonal patterns: use sin and cos terms to model smooth cyclical patterns, works well when seasonality is regular and continuous
What is a deterministic trend
A predictable, long-term movement in a time series that doesn’t arise from randomness
- deterministic trends break stationarity, as the mean of the series changes over time, violating the assumption that the data’s distribution stays constant
- can model the trend as having a systematic time component, e.g. yt = k1.t + ut
Stochastic trends
- e.g. random walk
Random, trend changes unpredictably over time
- Yt = Yt-1 + ut, conditional mean is Yt-1 but variance grows with time, t.o^2
Consequences of stochastic trends
- using yt-1 as a regressor introduces endogeneity as the regressor and error term are correlated, so OLS underestimates true persistence and coefficient on yt-1 is biased downward
- t stats are no longer valid, as if data has unit roots or non-stationarity, large sample properties break down
- spurious regression, as if y and x follow stochastic, regressing on each other can give high R^2 and t stats even if unrelated.
Does heteroskedasticity matter?
Unconditional heteroskedasticity, i.e., the variance of ut changes over time regardless of xt can violate stationarity
- conditional heteroskedasticity, variance of ut changes depends on xt, does not affect OLS bias/ consistency, but affects inference
Testing for stochastic trends: Dickey-Fuller test
Key issue is whether the process has a unit root, i.e. is non-stationary due to a stochastic trend
- test: H0: B1 = 1 vs H1: B1 < 1, in Yt = B0 + B1.Yt-1 + ut
- cant just use a normal t test under the null as when B1 = 1, series is non-stationary and the usual test stats dont follow standard distributions.
TRI(Yt) = B0 + w.Yt-1 + ut, test H0: w = 0 vs H1: w < 0
Important notes for the Dickey-Fuller Test
- One sided test
- No need for robust SEs as special distributions already account for the non-standard inference
- Generalises to AR(p) models.
Structural breaks
The regression relationship changes at some known point in time, t
- so have large ADL model with a potential break, i.e. interaction terms which are 1 if t exceeds the given time, and they have coefficients
- conduct an f test to see if regression coefficients are 0 or not, if at least 1 is non zero, then regression changes at t
QLR test
In practise you wont know when the break happens
- so run chow test at every possible t in a central window
- take the largest F stat from those
What does 15% trimming mean
Only testing breakpoints within the central 70% of the data
Why would we need HAC standard errors in time series regressions
- in time series, the error term ut might be heteroskedasticity and/or auto correlated
- so regular SEs aren’t valid, we need HAC SEs to correct for both
- maths PROVES OLS SEs are wrong in time series unless we account for them.
Newey West SEs
Newey-West estimator adjusts SEs to account for autocorrelation and HT, so you don’t underestimate your SEs
- estimate the long-run, HAC variance
- bias-variance tradeoff, bigger m - captures more autocorrelation, but adds more noise (higher variance)
- smaller m is a cleaner estimate, but might miss serial dependence.
Newey west component breakdown
Value is the long run variance estimator of estimator
- T is the number of time periods
- yj^ is the sample autocovariance of the regression residuals at lag j, y0^ is the variance, i.e. the autocovariance at lag 0
- m is the truncation parameter, how many lags back we are looking