Chapter 2 - Tsay Flashcards

(58 cards)

1
Q

forecasting vs predictions

A

Forecasting is in the future

Prediciton is about cross-sections

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

name an advantage of using linear, as opposed to non linear, models

A

The bias variance tradeoff makes it very unikely that data is being overfitted. Linear models are likely on the underfitting side of things.

The linear model is robust against small datasets. In order to make non linear models, we need to have very large datasets.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what is the random-walk type of forecasting?

A

Just guess the current value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what is the goal of univariate tiem series modeling?

A

We want to model the conditional expecation, E[y_t | F_{t-1}]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

elaborate on seasonality

A

Seasonality refers to regular patterns, cyclical patterns.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

how to deal with seasoality?

A

STL decomposition. Entails dividing into a trend component, a seasonality component, and some remainder.

Seasonal differencing is also widely used.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

elaborate on seasonal differencing

A

Y_t’ = Y_t - Y_{t-s}

Y_t’ is the differenced series. The real differnece sort of. for instance, we can compare a value from january with only january and so on

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

why do seasonal differencing?

A

removes seasonal patterns, this helps us to identify trends and stationarity.

This also prepares our data for ARIMA modeling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what is linear tiem series

A

A tool for analyzing the dynamic structures of a time series. It is linear, so it is restricted to linear relaitonships

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what do linear tiem series models use?

A

Prior information to some point. Typically historical values of the same variable that we are forecasting for,

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

elaborate on the foundation of time series analysis

A

Stationarity.

We have strict and weak.

Strict stationarity requires that the joint distribution of a set of variables, (consecutive in the time series) and the corresponding lag-l set of variables, are equal. This is extremely strong and hard to verify empirically.

Weak statioanrity requires:
constant mean
constant lag-l autocovariance. (includes lag0, variance, constant variance).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

why do we need stationarity

A

it allows us to make predicitons

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

the lag-l autocovariance has 2 important properties

A

1) lag-0 autocovariance == variance == constant (if weak stationariy)

2) lag-(-l) == lag-l autocovariance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

if two variables have 0 correlation, what does this mean?

A

it means that they are independent (if normally distributed)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Elaborate on ACF

A

AAutocorrelation function.

We denote the lag-l autocorrelation as “p_l”.

p_l = gamma_l / gamma_0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

elaborate on requiresments for p_l_pred to be a consistent estimate of the true lag-l autocorrelation

A

This is weird because it is framed weird.

The point is that the sample ACF is a consistent estimator for the true ACF under certain conditions. These are met with weak stationarity.

Now, if the shock sereis in our time series happen to be iid wiht mean 0, we now that it will be normally distributed with mean 0 and variance either given by 1/T or bartlett.

This is important because we can use this to test for each autocorrelation (for each lagged variable), and see if it is statistically different from 0.

If lag-l autocorrelation coeeficient is extreme, we will reject the null hypothesis, which is equivalent to saying that there is some structure in the lag-l variable and the current variable.

this is done wiht a regular t-test.

Furthermore, when we have established that there is a correlation between r_t and r{t-l} then this is evidence that we should include a term in our model that use the shock from t-l.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Elaborate on the crucial part of the sample lag-l autocorrelation function

A

if the time series is iid and we have a finite second moment lower than infinite, then the sample lag-l autocorrelation function is asymptotically normal with mean 0 and variance 1/T.

This is crucial because this is THE foundaiton for testing the null hypothesis of p_l = 0 so that we can figure out if there is presence of autocorrelation or not.

Again, this card is weird. The above is about testing for white noise series, basically.

If we want to test for statistical singificance of ACF in a regular weakly stationary time series, we would use Bartletts formula for the variance. Under the specific conditions that we have a weakly stationary time series, we have that the ACF’s are asymptotically normally distributed with mean 0 and variance as bartlett.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

give the t-ratio test for sample lag-l autocorrelation

A

The test statistic is the t-ratio, which is:

t ~ ^p_l / SE(^p_l) = ^p_l / (1/sqrt(T)) = sqrt(T) ^p_l

This test is basically testing for the white noise conditions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

elaborate on Bartlett

A

Says that if r_t is a weakly stationary series satisfying the linear requriement, where a_t is a white noise series, then the sample lag-l autocorrelation function is asymptotically normal with variance given by bartletts formula.

The result is that if the time series is linear and weakly statinairy, we use bartlett.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

How can we test for individual lag-l autocorrelations for statistical significance?

A

We use t-ratio with Bartlett.

This is two-sided test.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

important thing to remember regarding the sample ACF

A

It is biased in small samples. biased with factor of 1/T. This can be quite large in small samples, but is not an issue with larger.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

“disadvantage” with using t-ratio for testing ACF

A

test one at a time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

how can we speed up the testing process

A

We make it more general and use Portmanteau testing.

There is the traditional Q* statistic, and the Ljung Box statistic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

elaborate on Q* statistic

A

A portmanteau test used to test multiple lags for autocorreltion.

Q*(m) = T∑^p_l^2

We square the sample autocorrelations, sum together, and multiply by the sampel size. This statistic is asymptotically chi squared with “k” degrees of freedom, where k is the number of lags we inclde. we multiply by T to account for the bias of 1/T.

One sided test. Rject is value is very large.

The null hypothesis is that all lags are 0.

25
elaborate on the extension of Q*
The extension is the ljung-box statistic. They modified it to increase the power of the test. Recall that the power of the test is the probability that the test correctly rejects the null hypothesis when it is false. it is equivalent ot the regular Q* statistic, but it has different correction terms.
26
why do we give a fuck about the ACF?
Linear time series models can be characteriized by their ACF, and it is therefore used as a foundation to capture the dynamic relationships
27
Define white noise
a time series is called white noise if it is IID with finite mean and finite variance. this entails constant mean and variance as well. A defining characteristic of white noise, especially in this context, is that all of its lag-l ACF are 0.
28
define a linear time series. ¨ Elaborate on what it entails to have a linear time series
A time series is said to be linear if it can be written as: r_t = mu + ∑w_i a_{t-i} [i=0, infinity], where mu is the mean and {a_t} is a white noise series. What does this mean? it means that a linear time series is centered around a mean level mu, and a sum-product of weights multiplied by error terms into infinittely backwards. The mean of the time series is obviously mu, but then we have the question of the variance. The variance of the time series if given by: Var(r_t) = var(mu) + var(∑w_i a_{t-i}) var(r_t) = 0 + ∑w_i^2 sigma^2 var(r_t) = sigma^2 ∑w_i^2 So the variance of the time series depends very much on both the variance of the residuals/errors/shocks/innovations and their corresponding weights. Note that this open up for a whole lot of relationships. The only requirement is that the mean is constant, which entails that there is no long term trend or drift. Also, we require constant variance. However, the autocorreltion structure can have all special kinds of structures and dependencies.
29
elaborate on the mechanics of a time series
from the formula of linear time series, we know that there is a mean level. The deterministic part of the time series will always try to revert back to the mean. Regardless of whether it is above or below it. For instance, consider the AR(1) model. r_t = ør_{t-1} + a_t ø must be smaller than 1. for instance, consider ø=0.9. Large values follow large values, small values follow small values. there is a clear autocorrelation relationship here. Due to the factor of 0.9, it wil diminish towards towards the mean of 0 in this case. BUT: The white noise error series will provide the fluctuations. it is because of the errors that it is difficult to spot the correlation structure with the naked eye. but becauseo f the white noise being white noise, there should be a strong pattern present if it actually exists. Autocorrelation can take many shapes. Large values follow large values. Small values follow large values. etc
30
elaborate on AR
AutoRegressive models. use a constnat and past returns (with weights) and ultimately an error term. Of cours,e when modeling we are not including the error term. It is like a regular residual.
31
elaborate on conditional mean and conditional variance of the AR(1) model
E[r_t | r_{t-1}] = ø_0 + ø_1r_{t-1} Notice that the a_t term disappears. this is because its expected value is 0. Var[r_t | r_{t-1}] = 0 + ø_1^2 Var(r_{t-1}) + Var(a_t) Here we have done an error. Because r_{t-1} is given, it becomes a constnat. therefore, it has variance 0. we are left with: = Var(a_t) = sigma_t^2 Thus, the conditional variance of the AR(1) model is equal to the variance of the error term. this is important. It tells us that when we use the model for forecasting, all uncertainty lies in the error term for period t.
32
when discussing properties of the AR model, what is our first requirement?
We need to establish conditions for stationarity. Weakly.
33
elaborate on the stationarity conditions for an AR(1) model. Elaborate on the results
We need constnat mean, constant variance and constant lag-l autocovariance. Now we are in the land of unconditional shit. E[r_t] = E[ø0 + ø1r_{t-1} + a_t] = ø0 + ø1E[r_{t-1}] + 0 E[r_t] = ø0 + ø1E[r_{t-1}] Now we use the property that the mean MUST be constant. this is where we establish the condition for constant mean in the AR(1) model. mu = ø0 + ø1mu mu (1-ø1) = ø0 mu = ø0 / (1-ø1) here we have defined the requirement. ø1 cannot be 1. Also, the only way for the series to have mean 0 is if the ø0 term is 0. ON TO THE NEXT Var(r_t) = ø1^2 var(r_{t-1}) + sigma_a^2 Var(r_t) = ø1^2 var(r_t) + sigma_a^2 Var(r_t) (1-ø1^2) = sigma_a^2 var(r_t) = sigma_a^2 / (1 - ø1^2) the variance of the time series is equal to the variance of the error term divide by 1-ø1^2. In order for this to be defined, we require that: 1 - ø1^2 != 0 Therefore, we can find its roots to understand what values we cannot have. ø1^2 = 1 ø1 = +- 1 If ø1 is either 1 or -1 we will not have stationary AR(1) series. Another crucial part: We CANNOT, by definition, have negative variance. Therefore we also require that the denominator is positive. The variance of the errors are always positive, it is squared and so by default. therefore we only need to care about the denominatior. 1-ø1^2 > 0 In this simple case we can see that ø1 must be less in absolute value than 1. If not, we get an illegal result.
34
elaborate on finding the ACF of AR(1)
We perform the expectation to get covariance: E[(a_t - mu_a)(r_t - mu)] we also need to use some prior result. Anyways, the outcome is: p_l = ø1 p_{l-1} We have already established that ø1 must be smaller in absolute value than 1. as a result, the ACf's are decaying.
35
decaying ACF
associated with AR models
36
what is the trick we use to derive the ACF etc from AR models
We find the mean equation, and then we take the expression for ø0 and insert it in the regular AR expression. For instance: mu = ø0 / (1-ø1) ø0 = mu (1 - ø1) We insert this into the AR equaiton: r_t = mu (1 - ø1) + ø1 r_{t-1} + a_t r_t = mu - mu ø1 + ø1 r_{t-1} + a_t r_t - mu = ø1 (r_{t-1} - mu) + a_t now we can take expecation to find covariance: E[(a_t - mu_a) (r_t - mu)] = E[a_t (r_t - mu)] = E[a_t (ø1 (r_{t-1} - mu) + a_t)] = E[a_t^2] + ø1E[r_{t-1} - mu] = sigma_a^2 + ø1 (0 - 0) = sigma_a^2 THEN we use this as an expression for E[a_t(r_t - mu)] and so on,...
37
why is E[a_t(r_t - mu)] = sigma_a^2?
r_t is naturally dependent on what happens in the error term for that period. this makes perfect sense.
38
how to find moment equation. for AR
We need to do this: E[(r_t - mu)(r_{t-l} - mu)]. Using some mixing etc we arrive at the moment equaiton. Then we can divide on gamma_0 to get the autocorrelation form. For AR(2), looks like this: p_l = ø1 p_{l-1} + ø2 p_{l-2} The result is very important. we can write: p_l - ø1 p_{l-1} - ø2 p_{l-2} = 0 p_l (1 - ø1 B - ø2 B^2) = 0 We solve this equation (second degree one). The INVERSES of the roots are called characteristics roots. the absolute value of the characteirtics roots, which is the inverse of the solutions to the equation, must be smaller than 1 is absolute value to satisfy statoanrityy.
39
elaborate on order determination of AR(p)
we know that the ACF is decaying. therefore it can be difficult to find a cutoff point where we are happy. Therefore we use PACF. One can also use information criteria.
40
elaborate on PACF and relate it to order determination of AR(p) models
The direct influence of some earlier lag on the current variable when already accounting for the propagating effect. We are essentiaally looking at what for instance the result of monday specifically affect friday when we have already accounted for the intermediaery effects. We can find it by solving AR(p) models from p=0 and upward. The "next" coefficient we include will hold the PACF effect. Typically done using OLS. The PACF for AR(p) cuts off at lag p. p is the last lag with a significant value.
41
elaborate on information criteria
likelihood based measures that we can use to determine an order. We have AIC, BIC. They differ in how they penalize the number of paramters included. All of them are likelihood based, and need the MLE essentially. Use the ln(likelihood) function. AIC = -2/T ln(liklihood) + 2/T x number of parameters the Ln(likelihood) is actually the MLE estimate for the variance. for gaussian AR(p) model, we get AIC: AIC = - ln(sigma_a^2_pred) + 2/T x p BIC = -ln(sigma_a^2_pred) + p/T ln(T) BIC penalize more parameters harder than AIC
42
what do we need to do after we have fitted some AR(p) model
We need to check if the residual series behave as white noise. We can use Ljung box to do this. Recall how this is done. We need to find the ACF of the residual series. This is done by using the sample autocorrelation, sample correlation estimator, for say k lags. then we enter these into the ljung box sum.
43
elaborate on measuring the fit of the model
We can use goodness of fit, R^2. Still defined as R^2 = 1 - RSS/TSS RSS is still the sum of residual squares. TSS is still the deviation from each point to the mean, squared. Literally just the same as with CLRM.
44
Sometihng to by OBS abouit the R^2 on time series
if the time series is not stationary, R^2 can converge to 1 even with shit conditions.
45
elaborate on forecasitng
We consider an origin, which is our current time step. We consider a step length, called the horizon. We are forecasting in a way such that the squared differnece betwewe minimize the squared error function. we literally just use the model. The itneresting part is what happens when we forecast long into the future. Firstly, the variance grows with the forecasting horizon, which makes sense. the AR(p) model will converge to the uncondiitonal mean, E[r_t] as the forecasting horizon increase. The variance approach the uncodniitonal variance as well.
46
elaborate on the mechanics of MA time series
Firstly, MA is on the form: r_t = mu + a_t - ø1 a_{t-1} Very interesting model. It models the time series as a constant level mu and then two additional terms: 1) The current error, unpredictable, variance equal to sigma_a^2 2) the response term to past error the response term is intersting. It tells us "we have observed the latest error as a_{t-1}. We have the following reaction to that ..." This means that we are reacting to both the magnitude and the sign of this error term. If the time series want to "try to keep values close to each other", we could have ø1=-1. With such a model, if the error, or innovation, is a_x, then the model will try to place the next value at the very same level. However, the current error will likely cause fluctuation. So, the fluctuation in MA models is also only due to the current error.
47
elaborate on the properties of MA models
They are stationary by default. (weakly stationary). The ACF cuts of HARD at the lag equal to the order of the model. This means that all later lags are cut off. The PACF is decaying.
48
how do we determine MA order?
ACF
49
elaborate on the fundamental differnces between AR and MA
AR models remember where they have previously been. It cares about exactly its previous levels. Based on these previous levels, it will make a decision on where the appropriate next point should be. MA models do not remember where they have been. but MA remember the deviations, deviations away from the "expected" status quo. Not necessarily deviation away from the mean level, but deviation from the expected state of 0 shock. It will remember the shock(s) and react to it. for instance, if the shock was large, a reaction can be to try to maintain a large deviation. An AR model will always try to converge to its unconditional mean. It will do this by reacting in a way that given no shocks will drag it towards the unconditional mean. MA models do not do this. Instead, they are based on a mean level, and react to changes that on average are 0. This ensures the stationarity.
50
elaborate on forecasting MA models
because of their finite memory, their forecast will tend to the mean level very quickly. 2-step ahead forecast ofthe MA(1) model is simply the uncodnitional mean.
51
Strenght of ARIMA?
It is about the class of models. Opens up for more strength. For instance, we can use ARIMAX that includes exogeneuous variables (must be time variable). We could add "beta" of the stock to the prediction model if we found it useful, though it might not vary much.
52
elaborate on unit root nonstationary time series
Exactly a unit root. Price series. why? becasue they are assumed t ofollow GBM, which is essentially random walk. Random walk can be modeled using an AR model: r_t = ø0 + ø1 r_{t-1} + a_t r_t = 0 + 1 r_{t-1} + a_t r_t = r_{t-1} + a_t
53
how to model random walk with drift
r_t = mu + r_{t-1} + a_t
54
what happens if the characteristic solutions are greater than 1 in abs
exponential growth.
55
define ARIMA(p,1,q)
A process is ARIMA(p,1,q) if the **change series** c_t = y_t - y_{t-1} follows a stationary and invertible ARMA(p,q) series. the procedure to convert from unit root non stationry to stationry is called differencing.
56
elaborate on the well known unit root testing problem
the DF test. Dickey Fuller. This is a test on AR(1) process to see if it contains a unit root or not. If AR contains unit root, it follow random walk. the statistic is a t-ratio where the hypothesis is that the actual ø1 is 1. So it looks like this: t-ratio = (ø1_est - 1)/std(ø1_est)
57
elaborate on ADF
remains to be done. ADF attempts to remove autocorrelation from the variable. This allows it to extend the framework to AR(p) models.
58