This is a story of cointegration: of common misconceptions about the relationship between multiple time series and how cointegration brings a new perspective to this. Much of the concept of cointegration I’ve encountered comes with in-depth technical details and derivations that often makes it more challenging than it looks, so I thought I’d like to present this by weaving a more storytelling approach that, fingers crossed, still captures (with some rigor) the main ideas behind the topic.
***
Let’s start by defining two completely independent random walks, generated as ARIMA(0,1,0) processes as
where and are independent of each other. Now, suppose we regress on :
Given the two processes are independent, we might expect the regression to produce an insignificant , a low , and well-behaved residuals. However, Granger and Newbold (1974) showed that this regression actually routinely produces high values (sometimes even above 0.9) and a -statistic on that is wildly significant.
How’s this possible? Since both and are trending (in the stochastic sense), the OLS estimator picks up on the fact that they both wander, and spuriously interprets the shared drift as a relationship. This is the spurious regression problem: two entirely unrelated non-stationary series can appear statistically related, simply because they share the property of being integrated (needs to be differenced at least once to become stationary).
Knowing ARIMA, our first instinct may be to difference both series and work with and , which are stationary. Although this does solve the spurious regression issue (differenced random walks are white noise), we’re bearing the heavy cost of discarding all information about the levels (raw data) of the series. If and are genuinely related—say, they represent and—then their levels contain economically meaningful information, which is the long-run equilibrium relationship that differencing destroys.
Cointegration
Enter cointegration, which is precisely the framework that lets us keep the levels and use them properly. Murray (1994) provides an intuitive illustration: imagine a drunk woman and her dog, each wandering erratically on their own random path, but connected by a leash. Even though each follows a random walk, the leash connects them. However far apart they drift momentarily, the leash pulls them back and over time, they’re never too far from each other.
In our example, is the drunk, is the dog, and the leash is the cointegrating relationship. Both series are —integrated of order 1 (i.e., they require one round of differencing to become stationary)—but a particular linear combination of them is stationary:
In other words, we say that two series and are said to be cointegrated if both are and there exists a coefficient such that is . Here, the vector is called the cointegrating vector and represents the deviation long-run equilibrium.
This property also redeems the regression at the start of our discussion. Essentially, if and are cointegrated, then regressing on is not spurious at all. The relationship is real, and the OLS estimate of is not just consistent but super-consistent, as we’ll see shortly. In fact, the problem was never running the levels regression per se, but assuming a relationship when none exists. Cointegration is basically the condition that makes the levels regression legitimate.
Engle-Granger Approach
Of course, then, the question becomes: how do we determine whether and are cointegrated? The natural starting point for this is the Engle-Granger method, which is a surprisingly simple two-step procedure that tests for cointegrating relationships.
We first run the OLS regression of on (in levels):
The estimated coefficient —not —is our estimate of the cointegrating vector and where the remarkable property of super-consistency comes in. In standard OLS with stationary variables, converges to the true at rate . Interestingly, with cointegrated variables, convergence happens at a much faster rate (for reasons beyond the scope of this article). For practical purposes, this means that we can simply treat as if it were the true when we move on to the next step.
Second, we save the residuals
and test whether they are . If they are, then the series are cointegrated; if they are , the original regression was spurious. The natural tool is the ADF test, which we’ve mentioned several times in previous articles as the go-to test for stationarity. In math-speak, we run
and conclude one of the hypotheses:
- : (unit root, no cointegration)
- : (stationary residuals, cointegration)
where the lags are included to soak up any residual autocorrelation, with the lag length selected by information criteria (AIC/BIC). It’s important to note that we must use the Engle-Granger critical values instead of the ADF critical values because coming from an estimated regression implies that the standard ADF distribution is too lenient. The former are more negative (i.e., stricter) to account for this.
Despite its elegance, Engle-Granger has real weaknesses:
- It’s only designed for bivariate cases because it can only find at most one cointegrating vector
- It’s asymmetric (regressing on isn’t the same as regressing on )
- Estimation errors from step one carry forward to step two, compounding imprecision
Johansen Test
The Johansen test resolves all three weaknesses of Engle-Granger. Unlike the former, it operates in a multivariate framework, tests for the number of cointegrating relationships simultaneously, and treats all variables symmetrically.
It’s based on the Vector Error Correction Model (VECM), which we’ll examine closely later. For now, it suffices to know that the VECM for an -dimensional system can be written to isolate a matrix of rank , where is the number of cointegrating relationships.
Consider a VAR in levels for an vector . After rearranging (subtracting from both sides and collecting terms), we arrive at
where the matrix is , and three cases for its rank arise:
- : , meaning no cointegration (the system is a standard VAR in differences)
- : has reduced rank and can be factored as , where is (the cointegrating vectors) and is (the adjustment speeds)
- : has full rank, meaning all variables are (a standard VAR in levels is appropriate)
The Johansen procedure’s role here is to estimate via reduced rank regression (specifically, maximum likelihood using eigenvalue decomposition) and tests hypotheses about . It provides two likelihood ratio tests.
The trace test tests the null hypothesis that the number of cointegrating vectors is at most , against the alternative that it’s greater:
where is the sample size and are the ordered eigenvalues of the matrix whose rank is being tested (from largest to smallest). We test this sequentially: start with , and if rejected, test , and so on until we fail to reject.
The maximum eigenvalue test is more targeted because it tests the null of exactly cointegrating vectors against the alternative of :
The intuition here is that if there’s an additional cointegrating vector, the next largest eigenvalue will be significantly different from zero; the test statistic quantifies that. Note that the critical values for both tests are non-standard (i.e., they depend on and the deterministic specification of the model) and should be obtained from simulation.
In practice, these two tests may disagree. When this happens, we should confirm from economic theory to inform the final decision.
Vector Error Correction Model (VECM)
Once we’ve have established cointegration (via Johansen) and determined , the VECM is the natural home for our system. Given that we’ve discussed VAR in depth before, we can think of it as a VAR that’s been modified to respect the long-run structure of our data.
We define the VECM as
The long-run term is a vector of linear combinations of the lagged levels (the cointegrating relationships), where each row of is one cointegrating vector: the coefficients that make stationary. Essentially, this term represents the disequilibrium in the previous period, i.e., how far the system was from its long-run equilibrium at time .
The speed of adjustment matrix governs how each variable responds to that disequilibrium. An entry tells you how much variable moves in the next period in response to disequilibrium in the -th cointegrating relationship. A large negative means variable corrects strongly when the system drifts above the equilibrium. If , then variable doesn’t respond to that disequilibrium at all, meaning it’s weakly exogenous with respect to that relationship, which carries important interpretive weight.
The short-run dynamics are simply lagged differences, which are the same as in a VAR in differences. They simply capture the transitory, period-to-period dynamics that are unrelated to the long-run equilibrium.
With these terms defined, we see that the VECM is therefore a complete decomposition of the system’s behaviour into what’s happening right now (short run) and what’s pulling the system toward where it ought to be (long run). This is precisely why differencing alone is insufficient: it captures the first component but discards the second.
Long-Run Granger Causality
As a closing statement for our discussion on cointegration, I’d like to connect VECM to answering a question imposed by the long-run Granger causality: does variable Granger-cause variable in the long run? In a VECM, this is tested by examining whether the error correction term enters the equation for significantly (i.e., whether ). If it does, it means that is being systematically pulled back toward equilibrium, which implies that the long-run trajectory of does have causal influence over . This is distinct from short-run Granger causality, which is tested by checking whether the lagged terms are jointly significant in the equation.
These distinctions matter enormously for economic interpretation since finding that Granger-causes in the short run but not the long run (or vice versa) tells a very different story than a blanket “Granger causality” result.

Leave a Reply