by Jonathan Widarsa

Series Have Friends Too

·

In my previous post, we delved quite deep into time series models like AR, MA, ARMA, and ARIMA. Essentially, by capturing different aspects of a series’ memory, these models usually effectively extract autocorrelation out of data into their structural parameters.

I actually have to apologize—to simplify definitions, I intentionally omitted an important label: univariate. These aforementioned models are univariate time series models, meaning they excel at tasks where all relevant information is contained within a single series.

Thing is, real-world scenarios are rarely so simple. Because univariate models only capture relationships within a single series, they completely ignore the effects of other interconnected series. This is why we have multivariate models. Among them, the simplest (yet remarkably effective) is the Vector Autoregressive (VAR) model.

***

First, let’s define the zero-mean AR(1) model for a scalar process yty_t:

yt=ϕyt1+εty_t = \phi y_{t-1} + \varepsilon_t

where εt𝒩(0,σ2)\varepsilon_t \sim \mathcal{N}(0,\sigma^2). Like we’ve discussed, powerful as it may be, this model completely ignores the influence of other models through its history.

So what if we have two stationary, zero-mean time series y1ty_{1t} and y2ty_{2t}? This is where the VAR model comes in. If we believe that each variable (e.g., GDP growth and inflation) is additionally explained by the other’s past, we can construct it as a VAR process.

Let’s model both variables as simple VAR(1) processes:

y1t=ϕ11y1,t1+ϕ12y2,t1+ε1t,y_{1t} = \phi_{11} \, y_{1,t-1} + \phi_{12} \, y_{2,t-1} + \varepsilon_{1t},
y2t=ϕ21y1,t1+ϕ22y2,t1+ε2t.y_{2t} = \phi_{21} \, y_{1,t-1} + \phi_{22} \, y_{2,t-1} + \varepsilon_{2t}.

Now, we see that each equation looks like a regression of today’s value on all variables’ values from the previous step. Additionally, although ε1t\varepsilon_{1t} and ε2t\varepsilon_{2t} are white noise, they’re contemporaneously correlated (i.e., not independent within the same period) with each other. This means that a surprise shock to y1ty_{1t} and y2ty_{2t} can arrive together at the same time step.

I hope the way the equations are presented above make it easy to tell that the system can be written more compactly in matrix form. We can define the (2×1)(2\times1) vector 𝐲t=(y1t,y2t)\mathbf{y}_t = \left(y_{1t},y_{2t}\right)^{\intercal} and the (2×2)(2\times2) coefficient matrix

𝚽=(ϕ11ϕ12ϕ21ϕ22)\mathbf{\Phi} = \begin{pmatrix} \phi_{11} &\quad \phi_{12} \\ \phi_{21} &\quad \phi_{22} \end{pmatrix}

such that the VAR(1) is simply

𝐲t=𝚽𝐲t1+𝛆t\mathbf{y}_t = \mathbf{\Phi}\,\mathbf{y}_{t-1} + \mathbf{\varepsilon}_t

where 𝛆t𝒩(𝟎,𝚺)\mathbf{\varepsilon}_t \sim \mathcal{N}(\mathbf{0},\mathbf{\Sigma}) is a vector of white noise with covariance matrix 𝚺\mathbf{\Sigma}. Note that the aforementioned contemporaneous correlation between shocks is captured by the off-diagonal elements of 𝚺\mathbf{\Sigma}. In this case, the stationarity condition now requires that all eigenvalues of 𝚽\mathbf{\Phi} lie inside the unit circle (this is the multivariate analogue of |ϕ|<1|\phi|<1).

So far, we’ve only defined the VAR(1) model. Before generalizing to higher orders, it’s worth being precise about what each coefficient means. For y1ty_{1t},

  • ϕ11\phi_{11} is the effect of y1,t1y_{1,t-1} on y1ty_{1t}, also known as the own-lag effect which is analogous to the AR coefficient.
  • ϕ12\phi_{12} is the effect of y2,t1y_{2,t-1} on y1ty_{1t}, also known as the cross-lag effect which captures how y2ty_{2t}‘s past feeds into y1ty_{1t}.

The case for y2ty_{2t} is no different. It’s worth noting that these coefficients are estimated by OLS equation by equation.

With this, we can finally define the general VAR(pp) model as follows:

𝐲t=i=1p𝚽i𝐲ti+𝛆t\mathbf{y}_t = \sum_{i=1}^p \mathbf{\Phi}_i \, \mathbf{y}_{t-i} + \mathbf{\varepsilon}_t

where each 𝚽i\mathbf{\Phi}_i is a (K×K)(K \times K) matrix of coefficients for lag ii. Now, for KK variables and pp lags, each equation then contains KpK\cdot p regressors and so the full system contains K2pK^2\cdot p coefficients in total. As we can see, this number explodes quadratically with more variables and lags, making parameter proliferation a real concern. Therefore, we acknowledge that the choice of pp is consequential.

Fortunately, we have several methods to estimate the optimal pp for VAR models. The standard approach we’ll discuss estimates the VAR for a range of candidate lag lengths p{1,2,...,pmax}p \in \{1,2,…,p_{\text{max}}\} and selects the one that minimizes some information criterion. Two popular criteria are:

AIC(p)=ln|𝚺^p|+2TK2p\text{AIC}(p) = \ln |\hat{\mathbf{\Sigma}}_p| + \frac{2}{T} \cdot K^2p
BIC(p)=ln|𝚺^p|+lnTTK2p\text{BIC}(p) = \ln |\hat{\mathbf{\Sigma}}_p| + \frac{\ln T}{T} \cdot K^2p

where:

  • |𝚺^p||\hat{\mathbf{\Sigma}}_p| is the determinant of the estimated residual covariance matrix
  • TT is the sample size
  • K2pK^2p is the number of estimated slope coefficients

As we can see, both criteria reward better fit (i.e., smaller |𝚺^p||\hat{\mathbf{\Sigma}}_p|) and penalize complexity (i.e., larger K2pK^2p). In practice, we typically report both and check whether they agree.

Once we find an optimal pp and have a fitted VAR, the next question is: does y2ty_{2t} actually help predict y1ty_{1t} beyond what y1ty_{1t}‘s history already tells us? Enter the Granger causality test. We say that y2ty_{2t} Granger-causes y1ty_{1t} if y2ty_{2t}‘s lagged values have statistically significant predictive powers for y1ty_{1t} after conditioning on y1ty_{1t}‘s lags.

Let’s consider a VAR(pp) process with K=2K=2 variables. When we test whether y2ty_{2t} Granger-causes y1ty_{1t}, we’re essentially testing the joint hypothesis:

H0:ϕ12(1)=ϕ12(2)=...=ϕ12(p)=0H_0: \phi_{12}^{(1)} = \phi_{12}^{(2)} = … = \phi_{12}^{(p)} = 0

which explains that all cross-lag coefficients from y2ty_{2t} in the y1ty_{1t} equation are jointly zero. For K>2K>2 variables, we test the joint significance of all lags of a given variable in a given equation. Note that Granger causality isn’t necessarily symmetric, meaning y2ty_{2t} may Granger-cause y1ty_{1t} but not the other way around, or both, or neither. Either way, all these cases are testable.

Granger causality is a nice tool because it tells us whether variables are predictively linked, but we’re still missing on how. For instance, even though we know changes in GDP growth affects inflation, we don’t know how inflation will exactly respond when, say, GDP growth receives an unexpected shock.

The solution to this is the Impulse Response Function (IRF). Basically, the idea is that we trace the dynamic response of each variable in the system to a one-unit shock in one variable, holding all other shocks to zero (by conditioning on them), across hh periods.

As it turns out, any stable VAR(pp) can be rewritten as a Vector Moving Average (VMA(\infty)) representation as follows:

𝐲t=h=0𝚿h𝛆th,\mathbf{y}_t = \sum_{h=0}^{\infty} \mathbf{\Psi}_h \,\mathbf{\varepsilon}_{t-h},

where the matrices 𝚿h\mathbf{\Psi}_h are the impulse response matrices at horizon hh. Within each matrix, the (i,j)(i,j)-th element gives the response of variable ii at time t+ht+h to a unit shock in variable jj at time tt. Plotting this element as a function of hh is what gives the impulse response function for that variable pair.

However, recall that the errors 𝛆t\mathbf{\varepsilon}_t are contemporaneously correlated (i.e., 𝚺\mathbf{\Sigma} is not diagonal), meaning we need to first orthogonalize the shocks. The most common approach to this is the Cholesky decomposition of 𝚺\mathbf{\Sigma}, which imposes a recursive causal ordering on the variables to achieve orthogonality; the variable listed first affects all others contemporaneously while that listed last is affected by all contemporaneously but affects none of them in the same period.

As a side note, since we’re only estimating true values, in practice, IRFs are typically reported with confidence bands (constructed by bootstrap methods) because the coefficient uncertainty in the VAR propagates non-linearly through the VMA representation.

Given we already have the VMA representation, we might as well also apply Forecast Error Variance Decomposition (FVED). Otherwise called variance decomposition, FVED aims to answer a question related to what IRF answers: of the total unpredictability in variable y1ty_{1t} over a hh-step-ahead horizon, what fraction is attributable to shocks in y1ty_{1t} itself, and what fraction comes from shocks in the other variables?

The fortunate thing is that since the orthogonalized shocks in the VMA representation are uncorrelated by construction, their variance contributions simply add up, and each can be expressed as a percentage of the total forecast error variance.

Without going too deep into its inner workings, we introduce several key properties that make FEVD interpretable:

  • At horizon h=0h=0 (i.e., contemporaneous), each variable explains 100% of its own variance by construction.
  • As hh increases, other variables’ shocks can account for a growing share of the variance.
  • The shares across all variables always sum to 100% for any given variable and horizon.

For instance, if inflation’s FEVD reveals that after 8 quarters, 40% of of its forecast error variance is due to shocks in GDP growth, then that is a quantitative statement about how GDP growth affects inflation’s medium-term uncertainty. This is the kind of decomposition that is often reported in central bank research and academic macroeconomics.

Given what we’ve discussed about how a typical VAR analysis works, we can see that, first, we need to verify that all series are stationary (or cointegrated—a topic for another day). Second, we select an optimal lag length using AIC/BIC. Third, we estimate the VAR coefficients using OLS. Fourth, we run the Granger causality tests to identify which predictive links are statistically meaningful. Fifth, we compute IRFs to understand the dynamic shape of said links. Finally, we use FEVD to quantify what share of each variable’s forecast uncertainty is driven by which shocks.

This has been one of the most fun topics I’ve dived deep into and I hope you’ve gained a deeper insight on how to properly run a full VAR analysis moving forward.


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *


More Posts