A Drunk and Her Dog

This is a story of cointegration: of common misconceptions about the relationship between multiple time series and how cointegration brings a new perspective to this. Much of the concept of cointegration I’ve encountered comes with in-depth technical details and derivations that often makes it more challenging than it looks, so I thought I’d like to present this by weaving a more storytelling approach that, fingers crossed, still captures (with some rigor) the main ideas behind the topic.

***

Let’s start by defining two completely independent random walks, generated as ARIMA(0,1,0) processes as

y_t = y_{t-1} + \varepsilon_t, \qquad x_t = x_{t-1} + \nu_t,

where $\varepsilon_t \sim \text{i.i.d.}(0, \sigma_\varepsilon^2)$ and $\nu_t \sim \text{i.i.d.}(0, \sigma_\nu^2)$ are independent of each other. Now, suppose we regress $y_t$ on $x_t$ :

y_t = \alpha + \beta x_t + u_t.

Given the two processes are independent, we might expect the regression to produce an insignificant $\hat{\beta}$ , a low $R^2$ , and well-behaved residuals. However, Granger and Newbold (1974) showed that this regression actually routinely produces high $R^2$ values (sometimes even above 0.9) and a $t$ -statistic on $\hat{\beta}$ that is wildly significant.

How’s this possible? Since both $y_t$ and $x_t$ are trending (in the stochastic sense), the OLS estimator picks up on the fact that they both wander, and spuriously interprets the shared drift as a relationship. This is the spurious regression problem: two entirely unrelated non-stationary series can appear statistically related, simply because they share the property of being integrated (needs to be differenced at least once to become stationary).

Knowing ARIMA, our first instinct may be to difference both series and work with $\Delta y_t$ and $\Delta x_t$ , which are stationary. Although this does solve the spurious regression issue (differenced random walks are white noise), we’re bearing the heavy cost of discarding all information about the levels (raw data) of the series. If $y_t$ and $x_t$ are genuinely related—say, they represent $\log\left(\text{consumption}\right)$ and $\log\left(\text{income}\right)$ —then their levels contain economically meaningful information, which is the long-run equilibrium relationship that differencing destroys.

Cointegration

Enter cointegration, which is precisely the framework that lets us keep the levels and use them properly. Murray (1994) provides an intuitive illustration: imagine a drunk woman and her dog, each wandering erratically on their own random path, but connected by a leash. Even though each follows a random walk, the leash connects them. However far apart they drift momentarily, the leash pulls them back and over time, they’re never too far from each other.

In our example, $y_t$ is the drunk, $x_t$ is the dog, and the leash is the cointegrating relationship. Both series are $I(1)$ —integrated of order 1 (i.e., they require one round of differencing to become stationary)—but a particular linear combination of them is stationary:

z_t = y_t – \beta x_t \sim I(0).

In other words, we say that two series $y_t$ and $x_t$ are said to be cointegrated if both are $I(1)$ and there exists a coefficient $\beta$ such that $z_t = y_t – \beta x_t$ is $I(0)$ . Here, the vector $(1, -\beta)$ is called the cointegrating vector and $z_t$ represents the deviation long-run equilibrium.

This property also redeems the regression at the start of our discussion. Essentially, if $y_t$ and $x_t$ are cointegrated, then regressing $y_t$ on $x_t$ is not spurious at all. The relationship is real, and the OLS estimate of $\beta$ is not just consistent but super-consistent, as we’ll see shortly. In fact, the problem was never running the levels regression per se, but assuming a relationship when none exists. Cointegration is basically the condition that makes the levels regression legitimate.

Engle-Granger Approach

Of course, then, the question becomes: how do we determine whether $y_t$ and $x_t$ are cointegrated? The natural starting point for this is the Engle-Granger method, which is a surprisingly simple two-step procedure that tests for cointegrating relationships.

We first run the OLS regression of $y_t$ on $x_t$ (in levels):

y_t = \alpha + \beta x_t + \hat{u}_t.

The estimated coefficient $\hat{\beta}$ —not $\beta$ —is our estimate of the cointegrating vector and where the remarkable property of super-consistency comes in. In standard OLS with stationary variables, $\hat{\beta}$ converges to the true $\beta$ at rate $T^{1/2}$ . Interestingly, with cointegrated $I(1)$ variables, convergence happens at a much faster rate $T$ (for reasons beyond the scope of this article). For practical purposes, this means that we can simply treat $\hat{\beta}$ as if it were the true $\beta$ when we move on to the next step.

Second, we save the residuals

\hat{u}_t = y_t – \hat{\alpha} – \hat{\beta} x_t

and test whether they are $I(0)$ . If they are, then the series are cointegrated; if they are $I(1)$ , the original regression was spurious. The natural tool is the ADF test, which we’ve mentioned several times in previous articles as the go-to test for stationarity. In math-speak, we run

\Delta \hat{u}_t = \gamma \hat{u}_{t-1} + \sum_{j=1}^{p} \delta_j \Delta \hat{u}_{t-j} + e_t

and conclude one of the hypotheses:

$H_0$ : $\gamma = 0$ (unit root, no cointegration)
$H_1$ : $H_1: \gamma < 0$ (stationary residuals, cointegration)

where the lags are included to soak up any residual autocorrelation, with the lag length selected by information criteria (AIC/BIC). It’s important to note that we must use the Engle-Granger critical values instead of the ADF critical values because $\hat{u}_t$ coming from an estimated regression implies that the standard ADF distribution is too lenient. The former are more negative (i.e., stricter) to account for this.

Despite its elegance, Engle-Granger has real weaknesses:

It’s only designed for bivariate cases because it can only find at most one cointegrating vector
It’s asymmetric (regressing $y$ on $x$ isn’t the same as regressing $x$ on $y$ )
Estimation errors from step one carry forward to step two, compounding imprecision

Johansen Test

The Johansen test resolves all three weaknesses of Engle-Granger. Unlike the former, it operates in a multivariate framework, tests for the number of cointegrating relationships simultaneously, and treats all variables symmetrically.

It’s based on the Vector Error Correction Model (VECM), which we’ll examine closely later. For now, it suffices to know that the VECM for an $n$ -dimensional system can be written to isolate a matrix $\Pi$ of rank $r$ , where $r$ is the number of cointegrating relationships.

Consider a VAR in levels for an $n \times 1$ vector $\mathbf{y}_t$ . After rearranging (subtracting $\mathbf{y}_{t-1}$ from both sides and collecting terms), we arrive at

\Delta \mathbf{y}_t = \Pi \mathbf{y}_{t-1} + \sum_{j=1}^{p-1} \Gamma_j \Delta \mathbf{y}_{t-j} + \boldsymbol{\varepsilon}_t,

where the matrix $\Pi$ is $n \times n$ , and three cases for its rank $r$ arise:

$r=0$ : $\Pi = \mathbf{0}$ , meaning no cointegration (the system is a standard VAR in differences)
$0 < r < n$ : $\Pi$ has reduced rank and can be factored as $\Pi = \boldsymbol{\alpha} \boldsymbol{\beta}^{\intercal}$ , where $\boldsymbol{\beta}$ is $n \times r$ (the cointegrating vectors) and $\boldsymbol{\alpha}$ is $n \times r$ (the adjustment speeds)
$r=n$ : $\Pi$ has full rank, meaning all variables are $I(0)$ (a standard VAR in levels is appropriate)

The Johansen procedure’s role here is to estimate $\Pi$ via reduced rank regression (specifically, maximum likelihood using eigenvalue decomposition) and tests hypotheses about $r$ . It provides two likelihood ratio tests.

The trace test tests the null hypothesis that the number of cointegrating vectors is at most $r$ , against the alternative that it’s greater:

\lambda_{\text{trace}}(r) = -T \sum_{i=r+1}^{n} \ln(1 – \hat{\lambda}_i)

where $T$ is the sample size and $\hat{\lambda}_i$ are the ordered eigenvalues of the matrix whose rank is being tested (from largest to smallest). We test this sequentially: start with $r = 0$ , and if rejected, test $r \leq 1$ , and so on until we fail to reject.

The maximum eigenvalue test is more targeted because it tests the null of exactly $r$ cointegrating vectors against the alternative of $r+1$ :

\lambda_{\max}(r, r+1) = -T \ln(1 – \hat{\lambda}_{r+1}).

The intuition here is that if there’s an additional cointegrating vector, the next largest eigenvalue will be significantly different from zero; the test statistic quantifies that. Note that the critical values for both tests are non-standard (i.e., they depend on $n-r$ and the deterministic specification of the model) and should be obtained from simulation.

In practice, these two tests may disagree. When this happens, we should confirm from economic theory to inform the final decision.

Vector Error Correction Model (VECM)

Once we’ve have established cointegration (via Johansen) and determined $r$ , the VECM is the natural home for our system. Given that we’ve discussed VAR in depth before, we can think of it as a VAR that’s been modified to respect the long-run structure of our data.

We define the VECM as

\Delta \mathbf{y}_t = \underbrace{\boldsymbol{\alpha} \boldsymbol{\beta}^{\intercal} \mathbf{y}_{t-1}}_{\text{error correction term}} + \sum_{j=1}^{p-1} \underbrace{\Gamma_j \Delta \mathbf{y}_{t-j}}_{\text{short-run dynamics}} + \boldsymbol{\varepsilon}_t.

The long-run term $\boldsymbol{\beta}^{\intercal} \mathbf{y}_{t-1}$ is a vector of $r$ linear combinations of the lagged levels (the cointegrating relationships), where each row of $\boldsymbol{\beta}^{\intercal}$ is one cointegrating vector: the coefficients that make $\boldsymbol{\beta}_i^{\intercal} \mathbf{y}_t$ stationary. Essentially, this term represents the disequilibrium in the previous period, i.e., how far the system was from its long-run equilibrium at time $t-1$ .

The speed of adjustment matrix $\boldsymbol{\alpha}$ governs how each variable responds to that disequilibrium. An entry $\alpha_{i,j}$ tells you how much variable $i$ moves in the next period in response to disequilibrium in the $j$ -th cointegrating relationship. A large negative $\alpha_{i,j}$ means variable $i$ corrects strongly when the system drifts above the equilibrium. If $\alpha_{i,j} \approx 0$ , then variable $i$ doesn’t respond to that disequilibrium at all, meaning it’s weakly exogenous with respect to that relationship, which carries important interpretive weight.

The short-run dynamics $\Gamma_j \Delta \mathbf{y}_{t-j}$ are simply lagged differences, which are the same as in a VAR in differences. They simply capture the transitory, period-to-period dynamics that are unrelated to the long-run equilibrium.

With these terms defined, we see that the VECM is therefore a complete decomposition of the system’s behaviour into what’s happening right now (short run) and what’s pulling the system toward where it ought to be (long run). This is precisely why differencing alone is insufficient: it captures the first component but discards the second.

Long-Run Granger Causality

As a closing statement for our discussion on cointegration, I’d like to connect VECM to answering a question imposed by the long-run Granger causality: does variable $x$ Granger-cause variable $y$ in the long run? In a VECM, this is tested by examining whether the error correction term enters the equation for $\Delta y_t$ significantly (i.e., whether $\alpha_y \neq 0$ ). If it does, it means that $y$ is being systematically pulled back toward equilibrium, which implies that the long-run trajectory of $x$ does have causal influence over $y$ . This is distinct from short-run Granger causality, which is tested by checking whether the lagged $\Delta x$ terms are jointly significant in the $\Delta y$ equation.

These distinctions matter enormously for economic interpretation since finding that $x$ Granger-causes $y$ in the short run but not the long run (or vice versa) tells a very different story than a blanket “Granger causality” result.

A Drunk and Her Dog

Cointegration

Engle-Granger Approach

Johansen Test

Vector Error Correction Model (VECM)

Long-Run Granger Causality

Comments

Leave a Reply Cancel reply

More Posts

What K-means Says about Stocks

Penalized Regression for Stock Returns

Continuous Latent States with Kalman Filters

HMMs for Volatility Regime-Switching

GARCH Sees What ARIMA Cannot

Can ARIMA Predict SPY Data?