Time Series Flashcards

Question

What are AR(p) and ADL(p,q) models?

Answer 1

AR(p) is an auto-regressive model which regresses a variable on its own lags which captures the persistence of a variable after an initial shock has occurred. 'p' indicates the number of lags of the variable itself included in the model. ADL(p, q) is an auto-regressive distributed lag model. This regresses a dependent variable on its own lags, plus the lags of an additional regressor. 'p' indicates the number of lags of the outcome variable, Y 'q' indicates the number of lags of the additional regressor, X. ADL models can be extended to include multiple regressors.

Answer 2

A new set of Least Squares assumptions are needed for ADL models because strict exogeneity is very unlikely to hold with time-series data.

Answer 3

Four key Least Squares assumptions for ADL models: 1. Conditional mean assumption - the conditional mean of the error terms is zero for all the lagged values of the regressors included in the model (Ys and Xs), but importantly NOT on their PRESENT value. E(ut | Yt-1, Yt-2, ..., Xt-1, Xt-2,...) = 0 2. Stationarity of all random variables - this implies that the variables in the series are not a function of time, and the statistical properties such as mean, variance, autocorrelation etc. are all constant over time. Alternatively, stationarity means that the probability distribution of the variable, Y, is constant over time. 3. No large outliers 4. No multicollinearity

Answer 4

The conditional mean assumption ensures that the error terms are not autocorrelated. In addition, the conditional mean assumption ensures that the model is well-specified and its forecasting power cannot be improved by including more lags of any of the regressors.

Answer 5

Weak stationarity: E(Yt) = μ_y var(Yt) = σ^2 cov(Yt, Yt-j) = γ_j for all j (this implies that the autocorrelation between Y and any of its lags is constant)

Answer 6

Weak dependence: cov(Yt, Yt-j) = 0 as j tends to infinity This implies that, as the distance between the two observations of Y increases, they become uncorrelated.

Answer 7

Stationarity is a characteristic of a time-series whose statistical properties remain constant over time, such as mean, variance, autocorrelation etc. This implies that the time-series is not a function of time.

Answer 8

Types of non-stationarity: 1. Breaks - a break is where there is a sudden, unexpected change in the relationship between variables in a time-series. 2. Trends - a trend is a persistent long-term movement in a variable over time. There are two types of trend: a) Deterministic trends - this is where the variable is a linear function of time b) Stochastic trends - the trend over time is random and variable.

Answer 9

``` Stationary AR(1) process: Yt = B0 + B1Yt-1 + Ut where B1 < 1 This implies that the effect of a shock to Yt dies out as time goes on, and the variable returns to its initial, expected value. ```

Answer 10

Non-stationary AR(1) process following a random walk: | Yt = Yt-1 + ut

Answer 11

Non-stationary AR(1) process following a random walk with drift: Yt = B0 + Yt-1 + ut This may looks like the series is trending over time with a positive intercept.

Answer 12

A random walk process is one in which the current value of a variable is composed of its value in the last period pus a stochastic error term, ut. A random walk has a unit root B2 = 1 where: Yt = B1Yt-1 + ut A random walk is said to be difference stationary. This implies that the first differences are not a function of time but are instead random. However, the series itself is not random and is a function of time.

Answer 13

Problems associated with stochastic trends in AR(1) processes (random walks): 1. The autoregressive coefficient will be biased towards zero if its true value is one (i.e. the coefficient on Yt-1) 2. The t-static associated with the autoregressive coefficient follows a non-normal distribution. This means that we cannot obtain critical values or perform hypothesis tests. Note: on exception is where we estimate a random walk model and the distribution of the t-statistic can be tabulated. 3. False regressions - two non-stationary series may appear to be correlated when in fact they are not, simply because both are growing over time. This may be because unobserved trending factors affecting Yt may be correlated with the independent variable Xt.

Answer 14

Two methods for testing for a unit root: 1. Informal test - plot a scatter graph of the variable against its lag. If the series has a unit root then the correlation between the variable and its lag will be close to 1, therefore the data should follow a 45 degree line. HOWEVER this is very difficult to observe with real life data which is often messy. 2. Formal test for the presence of a unit root - Dickey-Fuller test. If an AR(1) process doesn't capture all the correlation between a variable and its lag then the DF test is invalid and will not test correctly for the presence of a unit root. In this case we need to extend the model to an AR(p) version and use the Augmented Dickey-Fuller test instead. In addition, a DF/ADF test may not be able to distinguish between a time-series with a unit root and a process with an autoregressive coefficient just below 1 but not equal to 1.

Answer 15

A Dickey-Fuller test is a method for testing for the presence of a unit root. Given an AR(1) process: Yt = B0 + B1Yt-1 + ut We are performing a hypothesis test as follows: H0: B1 = 1 (non-stationarity, there is a unit root) H1: B1 < 1 (stationarity) Note that this is a one-tailed test. Transform this so it is easier to implement: Yt - Yt-1 = B0 + B1Yt-1 - Yt-1 + ut ∆Yt = β0 + γY(t-1) + ut where γ = B1 - 1 The hypothesis test now becomes: H0: γ = 0 (non-stationarity - there is a unit root) H1: γ < 0 (stationarity) The t-statistic associated with the coefficient estimate γ is called the Dickey-Fuller (DF) statistic. This does NOT follow a normal distribution. The critical values are LARGER than those for a normal t-distribution.

Answer 16

An Augmented Dickey-Fuller test is used to test for the presence of a unit root in AR(p) models. First transform the AR(p) model to make the hypothesis test easier to implement: ∆Yt = B0 + γYt-1 + C1 ∆Y(t-1) + C2 ∆Y(t-2)... Cp ∆Y(t-p) H0: γ = 0 (non-stationarity, there is a unit root) H1: γ < 0 (stationarity) The t-statistic associated with the coefficient estimate γ is called the Augmented Dickey-Fuller (ADF) statistic.

Answer 17

A solution for non-stationary processes with stochastic trends/random walks is first differencing. This eliminates stochastic trends because time-series data exhibiting stochastic trends are difference stationary. A new series is created where we take the difference between the variable and first lag. This makes the series stationary because it will no longer be a function of time. Yt = B0 + Yt-1 + ut Take the first difference of Yt: Yt - Yt-1 = B0 + Yt-1 - Yt-1 + ut ∆Yt = B0 + ut ∆Yt is stationary

Answer 18

Breaks in time-series data may lead to invalid inferences and invalid hypothesis tests.

Answer 19

If we know the date that the break occurred then we can test for a break in time-series data by using a Chow test. First, create a model: Let T denote the hypothesised break date, and D(T) be a dummy variable which equals 0 before the break date and 1 after the break date. Yt =B0 + B1Yt-1 + ... + BpYt-p + γoD(T)t + γ1D(T)t x Yt-1 + ... + γpD(T)t x Yt-p + ut Where γpD(T)t x Yt-p are interaction variables which indicate the differing effect of historic lags of Yt on its present value according to whether the lag occurred before or after the break date.** If there is NO break then the model should be the same over both subsamples (i.e. before and after the break date). Perform the hypothesis test: H0: γ0 = γ1 = ... = γp = 0 (i.e. no break - all the coefficient variables are jointly equal to zero) H1 : at least one of the γis is NOT equal to zero (i.e. a break occurs in the data series) This hypothesis test can be performed using an F-statistic. The Chow test effectively uses an F-statistic to test whether a single regression is more efficient than two separate regressions involving splitting the data into subsamples.

Answer 20

If we include too few lags in a time-series model then this leads to omission of useful information. If we include too few lags in a time-series then we estimate too many coefficients, which leads to more imprecise coefficient estimates.

Answer 21

To estimate the optimal lag length in time-series models and for unit root tests we choose p (number of lags) which minimises one of these two criteria: 1. Bays Information Criterion (BIC) BIC(p) = ln [SSR(p) / T] + (p+1)[ln(T)/T] 2. Akaike Information Criterion (AIC) AIC(p) = ln[SSR(p) / T] + (p+1)(2/T) where SSR(p) is the sum of the square residuals of a model estimated with p lags T is the number of observations in the sample

Answer 22

The AIC(p) criterion tends to overestimate p.

Answer 23

The purpose of forecasting is to determine the predictive power of a model, not to estimate causal effects. Accurate forecasting can be achieved without having a causal interpretation.

Answer 24

Three methods for evaluating the predictive power of a model: 1. Adjusted R^2 - this measures the proportion of variance in the outcome variable which is predicted/explained by the explanatory variables. 2. Root Mean Squared Forecast Error (RMSFE) - this is the size of the typical mistake made in the regression mode. 3. Pseudo out-of-sample forecasting - this gives an indication of how well the model performs in real time.

Answer 25

Forecast errors are calculated for observations that will be calculated OUTSIDE the sample used in the DERIVATION of the regression model. Forecast errors refer to the average mistake made using the forecasting model. Forecast error = (u-hat)_(t+1 | t) = Yt+1 - (Y-hat)_(t+1 | t) The predicted residuals are calculated for observations IN the sample used in the derivation of the model. et = Yt - (Y-hat)t

Answer 26

To calculate the RMSFE: RMSFE = SQRT(E[Yt+1 - (Y-hat)_(t+1 | t)}^2) where Yt+1 - (Y-hat)_(t+1 | t) = ut+1 - [(b0 - B0) + (b1 - B1)Yt]

Answer 27

Two types of forecasting uncertainty that the RMSFE captures: 1. Unknown value of future shock ut+1. An estimate of the typical error made in estimate this unknown value of a future shock is the standard deviation of the predicted residuals. 2. Imperfect estimate b0 and b1. This is particularly important when using small samples.

Answer 28

Pseudo out-of-sample forecasting is where we generate a forecast for the outcome variable in the period (T+1) which is outside the sample used in the derivation of the model, and compare this to the actual observation of the variable in the period (T+1).

Answer 29

Pseudo out-of-sample forecasting is useful for: 1. Checking how well the model performs in making predictions towards the end of the sample. 2. Comparing the performance of two or more competing models. 3. Helping to obtain an estimate of the RMSFE

Answer 30

To generate a set of pseudo out-of-sample forecasts and forecast errors: The model will use the last P observations in the sample. Let s = T-P be the number of observations in the sample used to derive the model. 1. Estimate the model using the reduced data sample for t = 1, ... ,s 2. Compute a pseudo out-of-sample forecast for the period (s+1): (Y-hat)_(s+1 | s) 3. Compute the forecast error for period (s+1): (u-hat)_(s+1 | s) = Ys+1 - (Y-hat)_(s+1 } s) where Ys+1 is the actual observations of Y at period (s+1) 4. To generate a set of forecasts and forecast errors, repeat these steps for the remaining dates s = T - P = 1 to s = T-1 (using the first observation in the sample only, up to using all but the penultimate observation in the sample ) We now have a set of pseudo out-of-sample forecasts and forecast errors.

Answer 31

The Root Mean Squared Forecast Error is the magnitude of the typical mistake made when using the forecasting model. This is similar to the standard deviation of ut, except that it focuses on the forecast error made using the estimated coefficients, not using the population regression line.

Answer 32

The standard deviation of the pseudo out-of-sample forecast errors provides an estimate for the RMSFE. SD(u-hat_(s+1 | s)) = SD(Ys+1 - (Y-hat)_(s+1 | s) ... SE(Yt+1 - (Y-hat)_(t+1 | t) = SD((u-hat)_(t+1 | t))

Answer 33

When comparing forecasting models, a higher adjusted R^2 and a lower RMSFE indicate that a model performs better than another model with lower and higher values of these respectively.

Answer 34

A forecast interval for a variable is a range of values which contains the future value of that variable a specified proportion of the time (e.g. 95%). This is a similar concept to confidence intervals in causal analysis. 95% forecast interval for Yt+1: [(Y-hat)_(t+1 | t) - 1.96 x SE(Yt+1 - (Y-hat)_(t+1 | t)) , (Y-hat)_(t+1 | t) + 1.96 x SE(Yt+1 - (Y-hat)_(t+1 | t))] where SE(Yt+1 - (Y-hat)_(t+1 | t)) is an estimate of the RMSFE 1.96 is the critical value from the normal distribution at the 5% significance value

Answer 35

A key assumption needed to make a forecast interval is the ut+1 is normally distributed, so that we can obtain critical values for it.

Answer 36

A Granger causality test does not measure causality. Instead, it measures the predictive content of the regressors/explanatory variables. The Granger causality test uses the F-statistic of joint significance to test whether the coefficients of the regressor concerned and its lags contained in the model are significantly different from zero. H0: C1 = C2 = ... = 0 (the predictor and its lags have no predictive content) H1: at least one of the coefficients is non-zero (the regressor has some predictive content)

Answer 37

When referring to time-series data, causality is identified by varying the intensity of a treatment to the same subject over different periods of time. We can follow the time path of the effect of the shock, which is called the shock's dynamic causal effect. Cross-sectional data identifies a causal effect by either applying the treatment or not to a random selection of subjects at the time.

Answer 38

A Distributed Lag (DL) model captures the effect of a shock to Xt on Yt, where the shock has a contemporaneous effect and a dynamic (lagged) effect on dependent variable. A DL model can be extended to include multiple variables and multiple lags.

Answer 39

Four key assumptions of a DL model: 1. Exogeneity: E(ut | Xt, Xt-1, ...) = 0 This ensures that we can make causal inferences about the estimated coefficients. 2. a) Stationarity of all random variables b) Weak dependence 3. No large outliers 4. No multicollinearity - no linear relationship between the Xs over time.

Answer 40

Two potential causes of endogeneity in a DL model are: Omitted variable bias Simultaneity bias

Answer 41

Policy-makers are likely to use past and present information on the explanatory variable to make policy decisions, which will lead to simultaneity bias and therefore endogeneity.

Answer 42

A highly persistent series is likely to be characterised by multicollinearity, where high/low values of X in one period make it more likely that we will observe high/low values of X in the subsequent period(s).

Answer 43

A dynamic multiplier refers to s shock which does not just have an instantaneous effect, but instead continues to have an affect on the outcome variable over time.

Answer 44

A Keynesian multiplier captures the cumulative effect of expansionary monetary policy on the level of GDP.

Answer 45

Dynamic multipliers may occur when there is a delay in the implementation of a policy, for example central bankers do not observe GDP in real time so will react to it after a shock has occurred.

Answer 46

In a DL model: Yt = B0 + B1Xt + B2Xt-1 + ... + Bp+1Xt-p + ut B1 is the contemporaneous dynamic multiplier. B2 is the one-period dynamic multiplier. Bh is the h-period dynamic multiplier.

Answer 47

A cumulative dynamic multiplier captures the cumulative impact of a dynamic multiplier on the outcome variable over a period of time. To calculate the h-period cumulative dynamic multiplier: B1 is the zero-period cumulative dynamic multilpier B1 + B2 is the one-period cumulative dynamic multiplier. B1 + B2 + ... + Bp + Bp+1 is the long run cumulative dynamic multiplier.

Time Series Flashcards

(71 cards)