by Jonathan Widarsa

A Drunk and Her Dog

·

This is a story of cointegration: of common misconceptions about the relationship between multiple time series and how cointegration brings a new perspective to this. Much of the concept of cointegration I’ve encountered comes with in-depth technical details and derivations that often makes it more challenging than it looks, so I thought I’d like to present this by weaving a more storytelling approach that, fingers crossed, still captures (with some rigor) the main ideas behind the topic.

***

Let’s start by defining two completely independent random walks, generated as ARIMA(0,1,0) processes as

yt=yt1+εt,xt=xt1+νt,y_t = y_{t-1} + \varepsilon_t, \qquad x_t = x_{t-1} + \nu_t,

where εti.i.d.(0,σε2)\varepsilon_t \sim \text{i.i.d.}(0, \sigma_\varepsilon^2) and νti.i.d.(0,σν2)\nu_t \sim \text{i.i.d.}(0, \sigma_\nu^2) are independent of each other. Now, suppose we regress yty_t on xtx_t:

yt=α+βxt+ut.y_t = \alpha + \beta x_t + u_t.

Given the two processes are independent, we might expect the regression to produce an insignificant β^\hat{\beta}, a low R2R^2, and well-behaved residuals. However, Granger and Newbold (1974) showed that this regression actually routinely produces high R2R^2 values (sometimes even above 0.9) and a tt-statistic on β^\hat{\beta} that is wildly significant.

How’s this possible? Since both yty_t and xtx_t are trending (in the stochastic sense), the OLS estimator picks up on the fact that they both wander, and spuriously interprets the shared drift as a relationship. This is the spurious regression problem: two entirely unrelated non-stationary series can appear statistically related, simply because they share the property of being integrated (needs to be differenced at least once to become stationary).

Knowing ARIMA, our first instinct may be to difference both series and work with Δyt\Delta y_t and Δxt\Delta x_t, which are stationary. Although this does solve the spurious regression issue (differenced random walks are white noise), we’re bearing the heavy cost of discarding all information about the levels (raw data) of the series. If yty_t and xtx_t are genuinely related—say, they represent log(consumption)\log\left(\text{consumption}\right) andlog(income)\log\left(\text{income}\right)—then their levels contain economically meaningful information, which is the long-run equilibrium relationship that differencing destroys.

Cointegration

Enter cointegration, which is precisely the framework that lets us keep the levels and use them properly. Murray (1994) provides an intuitive illustration: imagine a drunk woman and her dog, each wandering erratically on their own random path, but connected by a leash. Even though each follows a random walk, the leash connects them. However far apart they drift momentarily, the leash pulls them back and over time, they’re never too far from each other.

In our example, yty_t is the drunk, xtx_t is the dog, and the leash is the cointegrating relationship. Both series are I(1)I(1)—integrated of order 1 (i.e., they require one round of differencing to become stationary)—but a particular linear combination of them is stationary:

zt=ytβxtI(0).z_t = y_t – \beta x_t \sim I(0).

In other words, we say that two series yty_t and xtx_t are said to be cointegrated if both are I(1)I(1) and there exists a coefficient β\beta such that zt=ytβxtz_t = y_t – \beta x_t is I(0)I(0). Here, the vector (1,β)(1, -\beta) is called the cointegrating vector and ztz_t represents the deviation long-run equilibrium.

This property also redeems the regression at the start of our discussion. Essentially, if yty_t and xtx_t are cointegrated, then regressing yty_t on xtx_t is not spurious at all. The relationship is real, and the OLS estimate of β\beta is not just consistent but super-consistent, as we’ll see shortly. In fact, the problem was never running the levels regression per se, but assuming a relationship when none exists. Cointegration is basically the condition that makes the levels regression legitimate.

Engle-Granger Approach

Of course, then, the question becomes: how do we determine whether yty_t and xtx_t are cointegrated? The natural starting point for this is the Engle-Granger method, which is a surprisingly simple two-step procedure that tests for cointegrating relationships.

We first run the OLS regression of yty_t on xtx_t (in levels):

yt=α+βxt+u^t.y_t = \alpha + \beta x_t + \hat{u}_t.

The estimated coefficient β^\hat{\beta}—not β\beta—is our estimate of the cointegrating vector and where the remarkable property of super-consistency comes in. In standard OLS with stationary variables, β^\hat{\beta} converges to the true β\beta at rate T1/2T^{1/2}. Interestingly, with cointegrated I(1)I(1) variables, convergence happens at a much faster rate TT (for reasons beyond the scope of this article). For practical purposes, this means that we can simply treat β^\hat{\beta} as if it were the true β\beta when we move on to the next step.

Second, we save the residuals

u^t=ytα^β^xt\hat{u}_t = y_t – \hat{\alpha} – \hat{\beta} x_t

and test whether they are I(0)I(0). If they are, then the series are cointegrated; if they are I(1)I(1), the original regression was spurious. The natural tool is the ADF test, which we’ve mentioned several times in previous articles as the go-to test for stationarity. In math-speak, we run

Δu^t=γu^t1+j=1pδjΔu^tj+et\Delta \hat{u}_t = \gamma \hat{u}_{t-1} + \sum_{j=1}^{p} \delta_j \Delta \hat{u}_{t-j} + e_t

and conclude one of the hypotheses:

  • H0H_0 : γ=0\gamma = 0 (unit root, no cointegration)
  • H1H_1 : H1:γ<0H_1: \gamma < 0 (stationary residuals, cointegration)

where the lags are included to soak up any residual autocorrelation, with the lag length selected by information criteria (AIC/BIC). It’s important to note that we must use the Engle-Granger critical values instead of the ADF critical values because u^t\hat{u}_t coming from an estimated regression implies that the standard ADF distribution is too lenient. The former are more negative (i.e., stricter) to account for this.

Despite its elegance, Engle-Granger has real weaknesses:

  • It’s only designed for bivariate cases because it can only find at most one cointegrating vector
  • It’s asymmetric (regressing yy on xx isn’t the same as regressing xx on yy)
  • Estimation errors from step one carry forward to step two, compounding imprecision

Johansen Test

The Johansen test resolves all three weaknesses of Engle-Granger. Unlike the former, it operates in a multivariate framework, tests for the number of cointegrating relationships simultaneously, and treats all variables symmetrically.

It’s based on the Vector Error Correction Model (VECM), which we’ll examine closely later. For now, it suffices to know that the VECM for an nn-dimensional system can be written to isolate a matrix Π\Pi of rank rr, where rr is the number of cointegrating relationships.

Consider a VAR in levels for an n×1n \times 1 vector 𝐲t\mathbf{y}_t. After rearranging (subtracting 𝐲t1\mathbf{y}_{t-1} from both sides and collecting terms), we arrive at

Δ𝐲t=Π𝐲t1+j=1p1ΓjΔ𝐲tj+𝜺t,\Delta \mathbf{y}_t = \Pi \mathbf{y}_{t-1} + \sum_{j=1}^{p-1} \Gamma_j \Delta \mathbf{y}_{t-j} + \boldsymbol{\varepsilon}_t,

where the matrix Π\Pi is n×nn \times n, and three cases for its rank rr arise:

  • r=0r=0: Π=𝟎\Pi = \mathbf{0}, meaning no cointegration (the system is a standard VAR in differences)
  • 0<r<n0 < r < n: Π\Pi has reduced rank and can be factored as Π=𝜶𝜷\Pi = \boldsymbol{\alpha} \boldsymbol{\beta}^{\intercal}, where 𝜷\boldsymbol{\beta} is n×rn \times r (the cointegrating vectors) and 𝜶\boldsymbol{\alpha} is n×rn \times r (the adjustment speeds)
  • r=nr=n: Π\Pi has full rank, meaning all variables are I(0)I(0) (a standard VAR in levels is appropriate)

The Johansen procedure’s role here is to estimate Π\Pi via reduced rank regression (specifically, maximum likelihood using eigenvalue decomposition) and tests hypotheses about rr. It provides two likelihood ratio tests.

The trace test tests the null hypothesis that the number of cointegrating vectors is at most rr, against the alternative that it’s greater:

λtrace(r)=Ti=r+1nln(1λ^i)\lambda_{\text{trace}}(r) = -T \sum_{i=r+1}^{n} \ln(1 – \hat{\lambda}_i)

where TT is the sample size and λ^i\hat{\lambda}_i are the ordered eigenvalues of the matrix whose rank is being tested (from largest to smallest). We test this sequentially: start with r=0r = 0, and if rejected, test r1r \leq 1, and so on until we fail to reject.

The maximum eigenvalue test is more targeted because it tests the null of exactly rr cointegrating vectors against the alternative of r+1r+1:

λmax(r,r+1)=Tln(1λ^r+1).\lambda_{\max}(r, r+1) = -T \ln(1 – \hat{\lambda}_{r+1}).

The intuition here is that if there’s an additional cointegrating vector, the next largest eigenvalue will be significantly different from zero; the test statistic quantifies that. Note that the critical values for both tests are non-standard (i.e., they depend on nrn-r and the deterministic specification of the model) and should be obtained from simulation.

In practice, these two tests may disagree. When this happens, we should confirm from economic theory to inform the final decision.

Vector Error Correction Model (VECM)

Once we’ve have established cointegration (via Johansen) and determined rr, the VECM is the natural home for our system. Given that we’ve discussed VAR in depth before, we can think of it as a VAR that’s been modified to respect the long-run structure of our data.

We define the VECM as

Δ𝐲t=𝜶𝜷𝐲t1error correction term+j=1p1ΓjΔ𝐲tjshort-run dynamics+𝜺t.\Delta \mathbf{y}_t = \underbrace{\boldsymbol{\alpha} \boldsymbol{\beta}^{\intercal} \mathbf{y}_{t-1}}_{\text{error correction term}} + \sum_{j=1}^{p-1} \underbrace{\Gamma_j \Delta \mathbf{y}_{t-j}}_{\text{short-run dynamics}} + \boldsymbol{\varepsilon}_t.

The long-run term 𝜷𝐲t1\boldsymbol{\beta}^{\intercal} \mathbf{y}_{t-1} is a vector of rr linear combinations of the lagged levels (the cointegrating relationships), where each row of 𝜷\boldsymbol{\beta}^{\intercal} is one cointegrating vector: the coefficients that make 𝜷i𝐲t\boldsymbol{\beta}_i^{\intercal} \mathbf{y}_t stationary. Essentially, this term represents the disequilibrium in the previous period, i.e., how far the system was from its long-run equilibrium at time t1t-1.

The speed of adjustment matrix 𝜶\boldsymbol{\alpha} governs how each variable responds to that disequilibrium. An entry αi,j\alpha_{i,j} tells you how much variable ii moves in the next period in response to disequilibrium in the jj-th cointegrating relationship. A large negative αi,j\alpha_{i,j} means variable ii corrects strongly when the system drifts above the equilibrium. If αi,j0\alpha_{i,j} \approx 0, then variable ii doesn’t respond to that disequilibrium at all, meaning it’s weakly exogenous with respect to that relationship, which carries important interpretive weight.

The short-run dynamics ΓjΔ𝐲tj\Gamma_j \Delta \mathbf{y}_{t-j} are simply lagged differences, which are the same as in a VAR in differences. They simply capture the transitory, period-to-period dynamics that are unrelated to the long-run equilibrium.

With these terms defined, we see that the VECM is therefore a complete decomposition of the system’s behaviour into what’s happening right now (short run) and what’s pulling the system toward where it ought to be (long run). This is precisely why differencing alone is insufficient: it captures the first component but discards the second.

Long-Run Granger Causality

As a closing statement for our discussion on cointegration, I’d like to connect VECM to answering a question imposed by the long-run Granger causality: does variable xx Granger-cause variable yy in the long run? In a VECM, this is tested by examining whether the error correction term enters the equation for Δyt\Delta y_t significantly (i.e., whether αy0\alpha_y \neq 0). If it does, it means that yy is being systematically pulled back toward equilibrium, which implies that the long-run trajectory of xx does have causal influence over yy. This is distinct from short-run Granger causality, which is tested by checking whether the lagged Δx\Delta x terms are jointly significant in the Δy\Delta y equation.

These distinctions matter enormously for economic interpretation since finding that xx Granger-causes yy in the short run but not the long run (or vice versa) tells a very different story than a blanket “Granger causality” result.

Previous Article
Next Article

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *


More Posts