Flashcards in Time Series Deck (71):
Define time-series data.
Time series data is observations of the same variable at different times.
Name three types of time-series variables and give examples of each.
Types of time-series variables:
1. Stocks and flows
e.g. GDP, investment, exports, wages
2. Prices and rates
e.g. oil price, interest rates, exchange rates
e.g. price index, Consumer Price Index (CPI)
Define what indices are and explain how they are useful for economists.
An index measures the average change in something relative to a base period.
Indices can transform nominal values into real values which is useful since economic behaviour is usually influenced by real, not nominal, variables.
Define what a lag is and why these are useful for economists.
A (time) lag refers to the value of a variable in previous time periods.
First lag of Yt is Yt-1
Second lag of Yt is Yt-2
Lags are useful because there is often a delay between an economic action and a consequence.
Define the first difference of a variable in mathematical notation.
Define the first difference of a variable which has been transformed into logarithms.
Differencing calculates the change in a variable between two successive time periods, in other words the period-to-period change.
First difference refers to the change in value of Yt between (t-1) and t.
∆Yt = Yt - Y(t-1)
First difference of a variable which has been transformed into logarithms:
∆lnYt = ln(Yt/Yt-1)
How can we find an approximation of the growth rate of a variable?
The growth rate of a variable is roughly equal to the change in the logarithm of that variable.
% change in Yt = (roughly) 100 x ∆lnYt
Name three potential problems with time-series data
Three potential problems with time-series data:
1. Autocorrelation/Serial correlation
2. Volatility clustering
3. Non-stationarity (trends/breaks)
Define volatility clustering and what type of data is often affected by this.
Volatility clustering is where a series is characterised by periods of high volatility followed by periods of low volatility.
This is often relevant to financial data.
Define breaks in time-series data.
What may cause breaks to occur?
Why are breaks important to take into considering regarding forecasting models?
Breaks in time-series data refer to when a pattern in data ceases to occur either abruptly or slowly.
This may be due to structural or policy changes in the economy.
Breaks are important in forecasting models since data before the break will not be useful in forecasting future values of the variable after the break.
Define autocorrelation/serial correlation.
What type of autocorrelation is most likely to characterise time-series data?
Define this type of autocorrelation
Autocorrelation/serial correlation refers to when a series is correlated with its own lag, in other words the error terms are correlated.
Alternatively, autocorrelation is where a variable is correlated with itself over various time intervals.
Positive autocorrelation is likely to characterise time-series data.
Positive autocorrelation implies that if a variable is high in one period, it is likely to be high in the next period. Equally, if the variable is low in one period, it is likely to be low in the next period.
Write out formally no autocorrelation.
No autocorrelation: corr(ut, us) = 0
for all t /= s
What are the consequences of autocorrelation?
If a time-series is autocorrelated then:
1. OLS estimates are no longer efficient with minimum variance, which means they are no longer BLUE.
2. OLS standard errors are UNDERestimated, which means that t-values are OVER-estimated, confidence intervals are narrower. This means we are more likely to make TYPE 1 ERRORS where we incorrectly reject a true null hypothesis.
What are two ways that we can try to detect autocorrelation graphically?
How can we do these in Stata?
Two ways to detect autocorrelation graphically:
1. Plot a chronological line graph of the regression residuals, et. If there are long sequences of negative residuals following each other, or positive residuals following each other, they are likely to be autocorrelated.
regress y x
predict res, residuals
line res year
2. Plot the residuals against their lagged values (i.e. residual value on y-axis, lagged residual value on x-axis) and include a line of best fit.
If the line of best fit is upward sloping, this implies positive autocorrelation.
regress y x
predict res, residuals
twoway (scatter res L.res) (lfit res L.res).
How can we formally test for autocorrelation?
How can we allow for violation of the strict exogeneity assumption?
How can we perform this test in Stata?
The simplest way to test for autocorrelation is to regress the residuals on their lagged values (et on et-1).
The null hypothesis is no autocorrelation (coefficient on the lag equals zero).
We then look at the p-value/t-static for the coefficient on L1 to determine whether to reject the null or not. (e.g. if p<0.01 then we reject the null at all conventional significance levels, since 1%, 5% and 10% are all greater than the p value).
H0: lag coefficients are equal to zero (no auto-correlation)
H1: lag coefficients are non-zero (autocorrelation)
To formally test for autocorrelation we use the Durban alternative test for autocorrelation. This regresses the regression residuals on their lags and can include additional lags and allow for violation of strict exogeneity by including the regressor as well as the outcome variable.
regress y x
Why can we not use the error terms, ut, to test for autocorrelation? What can we use instead?
We cannot use the error terms, ut, to test for autocorrelation because we never observe these.
Instead, we can use their proxies, the residuals et, after estimating the regression model.
The regression residuals are consistent estimates of the error terms - as the sample size increases, the values of the residuals converge to the true values of the error terms.
What does it mean if an estimate is 'consistent'?
A consistent estimate is one which converges to the true value of the parameter as the sample size increases.
Define strict exogeneity.
Is this likely to be an issue in cross-sectional data?
Is this likely to be an issue in time-series data?
What is the consequence of violation of the strict exogeneity assumption?
Strict exogeneity is where the error terms associated with the outcome variable, ut, are uncorrelated with the explanatory variables, Xts (past and future).
With time-series data, this implies that the explanatory variable, Xt, does not react to past values of Y.
This is unlikely to be violated in cross-sectional data because it is unlikely that different observations within a random cross-sectional sample will have similar background characteristics.
This is likely to be violated in time-series data because we are often concerned with policy variables which are impacted by what has happened in the past.
For example, welfare expenditure, highway speed limits, labour input.
If strict exogeneity is violated then estimates will be biased.
What does it mean if an estimate is efficient?
An efficient estimate implies that it has minimum variance.
What is a solution for autocorrelation?
How can this be implemented in Stata?
A solution for autocorrelation is to create Heteroscedasticity and Autocorrelation Consistent (HAC or Newey-West) standard errors.
These take into account both heteroscedasticity (unequal variance) and autocorrelation (correlation between a variable and itself over time).
HAC standard errors can be computed in Stata by substituting regress with newey.
What is the difference between exogeneity and autocorrelation?
Exogeneity refers to the relationship between the Xts and the Yts - in other words between the explanatory and outcome variables.
Exogeneity implies that the error terms associated with Yt are not correlated with the explanatory variables.
Autocorrelation refers to the relationship between the Yt and itself over time.
Autocorrelation implies that there is a correlation between the error terms over time.
Is this likely to be a problem in time-series data?
Heteroscedasticity is where the variance of the error term associated with the outcome variable is not constant over time.
Heteroscedasticity is less likely to be an issue than autocorrelation in time-series data.
What are the consequences of heteroscedasticity?
Heteroscedasticity leads to invalid OLS standard errors which invalidates hypothesis testing and t-statistics.
What are two potential solutions to heteroscedasticity?
Two potential solutions to heteroscedasticity:
1. Use HAC standard errors
2. Build a model of the error terms
What is conditional heteroscedasticity?
In what circumstances will conditional heteroscedasticity present in time-series data?
Conditional heteroscedasticity is where the VARIANCE of the error terms is autocorrelated - when variance/volatility is high in one period, it is high in the next.
Conditional heteroscedasticity arises when data is characterised by volatility clustering.
What are AR(p) and ADL(p,q) models?
AR(p) is an auto-regressive model which regresses a variable on its own lags which captures the persistence of a variable after an initial shock has occurred.
'p' indicates the number of lags of the variable itself included in the model.
ADL(p, q) is an auto-regressive distributed lag model. This regresses a dependent variable on its own lags, plus the lags of an additional regressor.
'p' indicates the number of lags of the outcome variable, Y
'q' indicates the number of lags of the additional regressor, X.
ADL models can be extended to include multiple regressors.
Why are a new set of assumptions needed for the Least Squares method for ADL models?
A new set of Least Squares assumptions are needed for ADL models because strict exogeneity is very unlikely to hold with time-series data.
What are the four key Least Squares assumptions for ADL models?
Four key Least Squares assumptions for ADL models:
1. Conditional mean assumption - the conditional mean of the error terms is zero for all the lagged values of the regressors included in the model (Ys and Xs), but importantly NOT on their PRESENT value.
E(ut | Yt-1, Yt-2, ..., Xt-1, Xt-2,...) = 0
2. Stationarity of all random variables - this implies that the variables in the series are not a function of time, and the statistical properties such as mean, variance, autocorrelation etc. are all constant over time.
Alternatively, stationarity means that the probability distribution of the variable, Y, is constant over time.
3. No large outliers
What does the conditional mean assumption of the Least Squares method for ADL models ensure?
The conditional mean assumption ensures that the error terms are not autocorrelated.
In addition, the conditional mean assumption ensures that the model is well-specified and its forecasting power cannot be improved by including more lags of any of the regressors.
Write out mathematically the definition of weak stationarity.
E(Yt) = μ_y
var(Yt) = σ^2
cov(Yt, Yt-j) = γ_j for all j (this implies that the autocorrelation between Y and any of its lags is constant)
Write out mathematically the definition of weak dependence.
cov(Yt, Yt-j) = 0 as j tends to infinity
This implies that, as the distance between the two observations of Y increases, they become uncorrelated.
Stationarity is a characteristic of a time-series whose statistical properties remain constant over time, such as mean, variance, autocorrelation etc.
This implies that the time-series is not a function of time.
Name two types of non-stationarity.
Types of non-stationarity:
1. Breaks - a break is where there is a sudden, unexpected change in the relationship between variables in a time-series.
2. Trends - a trend is a persistent long-term movement in a variable over time.
There are two types of trend:
a) Deterministic trends - this is where the variable is a linear function of time
b) Stochastic trends - the trend over time is random and variable.
Write out the generic regression function for a stationary AR(1) process.
Stationary AR(1) process:
Yt = B0 + B1Yt-1 + Ut
where B1 < 1
This implies that the effect of a shock to Yt dies out as time goes on, and the variable returns to its initial, expected value.
Write out the generic regression function for a non-stationary AR(1) process that follow a random walk.
Non-stationary AR(1) process following a random walk:
Yt = Yt-1 + ut
Write out the generic regression function for a non-stationary AR(1) process that follows a random walk with drift.
Interpret what this might look like graphically.
Non-stationary AR(1) process following a random walk with drift:
Yt = B0 + Yt-1 + ut
This may looks like the series is trending over time with a positive intercept.
Define what a random walk process is.
A random walk process is one in which the current value of a variable is composed of its value in the last period pus a stochastic error term, ut.
A random walk has a unit root B2 = 1 where:
Yt = B1Yt-1 + ut
A random walk is said to be difference stationary. This implies that the first differences are not a function of time but are instead random. However, the series itself is not random and is a function of time.
What are three problems associated with stochastic trends in AR(1) processes (i.e. processes that follow a random walk)?
Problems associated with stochastic trends in AR(1) processes (random walks):
1. The autoregressive coefficient will be biased towards zero if its true value is one (i.e. the coefficient on Yt-1)
2. The t-static associated with the autoregressive coefficient follows a non-normal distribution. This means that we cannot obtain critical values or perform hypothesis tests.
Note: on exception is where we estimate a random walk model and the distribution of the t-statistic can be tabulated.
3. False regressions - two non-stationary series may appear to be correlated when in fact they are not, simply because both are growing over time.
This may be because unobserved trending factors affecting Yt may be correlated with the independent variable Xt.
Briefly name two methods for testing for the presence of unit root in an AR(1) process?
What are the problems associated with both of them?
Two methods for testing for a unit root:
1. Informal test - plot a scatter graph of the variable against its lag. If the series has a unit root then the correlation between the variable and its lag will be close to 1, therefore the data should follow a 45 degree line.
HOWEVER this is very difficult to observe with real life data which is often messy.
2. Formal test for the presence of a unit root - Dickey-Fuller test.
If an AR(1) process doesn't capture all the correlation between a variable and its lag then the DF test is invalid and will not test correctly for the presence of a unit root.
In this case we need to extend the model to an AR(p) version and use the Augmented Dickey-Fuller test instead.
In addition, a DF/ADF test may not be able to distinguish between a time-series with a unit root and a process with an autoregressive coefficient just below 1 but not equal to 1.
Define what a Dickey-Fuller test is for and what it entails.
A Dickey-Fuller test is a method for testing for the presence of a unit root.
Given an AR(1) process: Yt = B0 + B1Yt-1 + ut
We are performing a hypothesis test as follows:
H0: B1 = 1 (non-stationarity, there is a unit root)
H1: B1 < 1 (stationarity)
Note that this is a one-tailed test.
Transform this so it is easier to implement:
Yt - Yt-1 = B0 + B1Yt-1 - Yt-1 + ut
∆Yt = β0 + γY(t-1) + ut
where γ = B1 - 1
The hypothesis test now becomes:
H0: γ = 0 (non-stationarity - there is a unit root)
H1: γ < 0 (stationarity)
The t-statistic associated with the coefficient estimate γ is called the Dickey-Fuller (DF) statistic.
This does NOT follow a normal distribution. The critical values are LARGER than those for a normal t-distribution.
What is an Augmented Dickey-Fuller test used for and what does it entail?
An Augmented Dickey-Fuller test is used to test for the presence of a unit root in AR(p) models.
First transform the AR(p) model to make the hypothesis test easier to implement:
∆Yt = B0 + γYt-1 + C1 ∆Y(t-1) + C2 ∆Y(t-2)... Cp ∆Y(t-p)
H0: γ = 0 (non-stationarity, there is a unit root)
H1: γ < 0 (stationarity)
The t-statistic associated with the coefficient estimate γ is called the Augmented Dickey-Fuller (ADF) statistic.
What is a solution for non-stationary processes with stochastic trends/random walks?
A solution for non-stationary processes with stochastic trends/random walks is first differencing. This eliminates stochastic trends because time-series data exhibiting stochastic trends are difference stationary.
A new series is created where we take the difference between the variable and first lag. This makes the series stationary because it will no longer be a function of time.
Yt = B0 + Yt-1 + ut
Take the first difference of Yt:
Yt - Yt-1 = B0 + Yt-1 - Yt-1 + ut
∆Yt = B0 + ut
∆Yt is stationary
What are the consequences of breaks in time-series data?
Breaks in time-series data may lead to invalid inferences and invalid hypothesis tests.
How can we test for a break in time-series data when we know the date that the break occurred?
If we know the date that the break occurred then we can test for a break in time-series data by using a Chow test.
First, create a model:
Let T denote the hypothesised break date, and D(T) be a dummy variable which equals 0 before the break date and 1 after the break date.
Yt =B0 + B1Yt-1 + ... + BpYt-p + γoD(T)t + γ1D(T)t x Yt-1 + ... + γpD(T)t x Yt-p + ut
Where γpD(T)t x Yt-p are interaction variables which indicate the differing effect of historic lags of Yt on its present value according to whether the lag occurred before or after the break date.**
If there is NO break then the model should be the same over both subsamples (i.e. before and after the break date).
Perform the hypothesis test:
H0: γ0 = γ1 = ... = γp = 0 (i.e. no break - all the coefficient variables are jointly equal to zero)
H1 : at least one of the γis is NOT equal to zero (i.e. a break occurs in the data series)
This hypothesis test can be performed using an F-statistic.
The Chow test effectively uses an F-statistic to test whether a single regression is more efficient than two separate regressions involving splitting the data into subsamples.
Why is it important to include the optimal number of lags in a model?
What is the consequence of including too few lags?
What is the consequence of including too many lags?
If we include too few lags in a time-series model then this leads to omission of useful information.
If we include too few lags in a time-series then we estimate too many coefficients, which leads to more imprecise coefficient estimates.
What methods can we use to estimate the optimal number of lags to include in AR(p), ADL(p,q), DL models and unit root tests?
To estimate the optimal lag length in time-series models and for unit root tests we choose p (number of lags) which minimises one of these two criteria:
1. Bays Information Criterion (BIC)
BIC(p) = ln [SSR(p) / T] + (p+1)[ln(T)/T]
2. Akaike Information Criterion (AIC)
AIC(p) = ln[SSR(p) / T] + (p+1)(2/T)
SSR(p) is the sum of the square residuals of a model estimated with p lags
T is the number of observations in the sample
What is one problem associated with using the AIC(p) criterion over the BIC(p) criterion to estimate the optimal number of lags to include in a time-series model?
The AIC(p) criterion tends to overestimate p.
What is the purpose of forecasting, as opposed to the purpose of causal analysis?
The purpose of forecasting is to determine the predictive power of a model, not to estimate causal effects.
Accurate forecasting can be achieved without having a causal interpretation.
What are three methods for evaluating the predictive power of a model?
Three methods for evaluating the predictive power of a model:
1. Adjusted R^2 - this measures the proportion of variance in the outcome variable which is predicted/explained by the explanatory variables.
2. Root Mean Squared Forecast Error (RMSFE) - this is the size of the typical mistake made in the regression mode.
3. Pseudo out-of-sample forecasting - this gives an indication of how well the model performs in real time.
What is the difference between forecast errors and predicted residuals?
Forecast errors are calculated for observations that will be calculated OUTSIDE the sample used in the DERIVATION of the regression model. Forecast errors refer to the average mistake made using the forecasting model.
Forecast error = (u-hat)_(t+1 | t) = Yt+1 - (Y-hat)_(t+1 | t)
The predicted residuals are calculated for observations IN the sample used in the derivation of the model.
et = Yt - (Y-hat)t
How do we calculate RMSFE?
To calculate the RMSFE:
RMSFE = SQRT(E[Yt+1 - (Y-hat)_(t+1 | t)}^2)
where Yt+1 - (Y-hat)_(t+1 | t) = ut+1 - [(b0 - B0) + (b1 - B1)Yt]
What two types of forecast uncertainty does the RMSFE capture?
Two types of forecasting uncertainty that the RMSFE captures:
1. Unknown value of future shock ut+1.
An estimate of the typical error made in estimate this unknown value of a future shock is the standard deviation of the predicted residuals.
2. Imperfect estimate b0 and b1.
This is particularly important when using small samples.
Define pseudo out-of-sample forecasting.
Pseudo out-of-sample forecasting is where we generate a forecast for the outcome variable in the period (T+1) which is outside the sample used in the derivation of the model, and compare this to the actual observation of the variable in the period (T+1).
Name three things that pseudo out-of-sample forecasting is useful for.
Pseudo out-of-sample forecasting is useful for:
1. Checking how well the model performs in making predictions towards the end of the sample.
2. Comparing the performance of two or more competing models.
3. Helping to obtain an estimate of the RMSFE
How can we generate a set of pseudo out-of-sample forecasts and forecast errors?
To generate a set of pseudo out-of-sample forecasts and forecast errors:
The model will use the last P observations in the sample.
Let s = T-P be the number of observations in the sample used to derive the model.
1. Estimate the model using the reduced data sample for t = 1, ... ,s
2. Compute a pseudo out-of-sample forecast for the period (s+1): (Y-hat)_(s+1 | s)
3. Compute the forecast error for period (s+1): (u-hat)_(s+1 | s) = Ys+1 - (Y-hat)_(s+1 } s)
where Ys+1 is the actual observations of Y at period (s+1)
4. To generate a set of forecasts and forecast errors, repeat these steps for the remaining dates s = T - P = 1 to s = T-1 (using the first observation in the sample only, up to using all but the penultimate observation in the sample )
We now have a set of pseudo out-of-sample forecasts and forecast errors.
The Root Mean Squared Forecast Error is the magnitude of the typical mistake made when using the forecasting model.
This is similar to the standard deviation of ut, except that it focuses on the forecast error made using the estimated coefficients, not using the population regression line.
What does the standard deviation of the pseudo out-of-sample forecast errors provide an estimate for?
The standard deviation of the pseudo out-of-sample forecast errors provides an estimate for the RMSFE.
SD(u-hat_(s+1 | s)) = SD(Ys+1 - (Y-hat)_(s+1 | s)
SE(Yt+1 - (Y-hat)_(t+1 | t) = SD((u-hat)_(t+1 | t))
When comparing forecasting models, what values of adjusted R^2 and RMSFE indicate that one model performs better than another?
When comparing forecasting models, a higher adjusted R^2 and a lower RMSFE indicate that a model performs better than another model with lower and higher values of these respectively.
Define a forecast interval.
Show how the 95% forecast interval for Yt+1 can be calculated.
A forecast interval for a variable is a range of values which contains the future value of that variable a specified proportion of the time (e.g. 95%).
This is a similar concept to confidence intervals in causal analysis.
95% forecast interval for Yt+1:
[(Y-hat)_(t+1 | t) - 1.96 x SE(Yt+1 - (Y-hat)_(t+1 | t)) , (Y-hat)_(t+1 | t) + 1.96 x SE(Yt+1 - (Y-hat)_(t+1 | t))]
SE(Yt+1 - (Y-hat)_(t+1 | t)) is an estimate of the RMSFE
1.96 is the critical value from the normal distribution at the 5% significance value
What is a key assumption needed to calculate a forecast interval?
A key assumption needed to make a forecast interval is the ut+1 is normally distributed, so that we can obtain critical values for it.
What does a Granger causality test measure, and what does it entail?
A Granger causality test does not measure causality. Instead, it measures the predictive content of the regressors/explanatory variables.
The Granger causality test uses the F-statistic of joint significance to test whether the coefficients of the regressor concerned and its lags contained in the model are significantly different from zero.
H0: C1 = C2 = ... = 0 (the predictor and its lags have no predictive content)
H1: at least one of the coefficients is non-zero (the regressor has some predictive content)
How do we identify causality in time-series data as opposed to cross-sectional data?
When referring to time-series data, causality is identified by varying the intensity of a treatment to the same subject over different periods of time.
We can follow the time path of the effect of the shock, which is called the shock's dynamic causal effect.
Cross-sectional data identifies a causal effect by either applying the treatment or not to a random selection of subjects at the time.
Define a DL model
A Distributed Lag (DL) model captures the effect of a shock to Xt on Yt, where the shock has a contemporaneous effect and a dynamic (lagged) effect on dependent variable.
A DL model can be extended to include multiple variables and multiple lags.
What are the four key assumptions of a DL model?
Four key assumptions of a DL model:
1. Exogeneity: E(ut | Xt, Xt-1, ...) = 0
This ensures that we can make causal inferences about the estimated coefficients.
2. a) Stationarity of all random variables
b) Weak dependence
3. No large outliers
4. No multicollinearity - no linear relationship between the Xs over time.
What are two potential causes of endogeneity in a DL model?
Two potential causes of endogeneity in a DL model are:
Omitted variable bias
Why is the exogeneity assumption unlikely to hold in a DL model for policies?
Policy-makers are likely to use past and present information on the explanatory variable to make policy decisions, which will lead to simultaneity bias and therefore endogeneity.
For what type of series is multicollinearity likely to be present?
A highly persistent series is likely to be characterised by multicollinearity, where high/low values of X in one period make it more likely that we will observe high/low values of X in the subsequent period(s).
Define a dynamic multiplier.
A dynamic multiplier refers to s shock which does not just have an instantaneous effect, but instead continues to have an affect on the outcome variable over time.
Define a Keynesian multiplier
A Keynesian multiplier captures the cumulative effect of expansionary monetary policy on the level of GDP.
What might cause a shock to be a dynamic multiplier?
Dynamic multipliers may occur when there is a delay in the implementation of a policy, for example central bankers do not observe GDP in real time so will react to it after a shock has occurred.
In a DL model, which coefficient refers to the contemporaneous dynamic multiplier, and which refers to the h-period dynamic multiplier?
In a DL model:
Yt = B0 + B1Xt + B2Xt-1 + ... + Bp+1Xt-p + ut
B1 is the contemporaneous dynamic multiplier.
B2 is the one-period dynamic multiplier.
Bh is the h-period dynamic multiplier.