Cointegration and Causality: Johansen, VECM and Granger Explained

A practical guide to cointegration analysis: Johansen trace and max-eigenvalue tests, VECM specification, error-correction terms and Granger causality.

Regress one non-stationary macroeconomic series on another — say, energy consumption on GDP — and you will often obtain a high R² and impressive t-statistics that mean nothing: the classic spurious regression. Cointegration analysis resolves this by testing whether non-stationary series share a genuine long-run relationship. This guide walks thesis and journal-article authors through the Engle-Granger and Johansen tests, VECM specification, the interpretation of the error-correction term, and Granger causality, step by step.

The logic of cointegration analysis: long-run equilibrium

Two series that are each integrated of order one, I(1), may individually wander like random walks; yet if some linear combination of them is stationary, I(0), the series are cointegrated. The intuition is the drunk and her dog: each path looks erratic, but the distance between them never drifts off indefinitely, because an economic mechanism keeps pulling the system back towards its long-run equilibrium. When cointegration holds, the levels relationship is real rather than spurious. Before anything else, establish each variable's order of integration with unit root tests (ADF, PP, KPSS) — the choice of cointegration method hinges on this pretesting.

The Engle-Granger two-step procedure and its limits

The oldest approach has two steps: (1) estimate the levels regression by OLS, and (2) apply a unit root test to its residuals. Stationary residuals imply cointegration. One detail trips up many students: because the residuals are estimated, standard ADF critical values do not apply — purpose-built cointegration critical values must be used instead. The method is intuitive but has serious limitations:

  • Single equation, single vector: with more than two variables, several cointegrating vectors may exist; Engle-Granger can detect at most one.
  • Normalisation sensitivity: results can change depending on which variable is placed on the left-hand side, and in small samples this can flip the verdict.
  • Error carry-over: estimation error from the first step contaminates the second, leaving the test with low power.
  • Endogeneity and short-run dynamics are ignored in the first-step regression.

The Johansen approach: trace and maximum eigenvalue tests

The Johansen method works within a full VAR system rather than a single equation and estimates the number of cointegrating vectors — the rank r — directly. In an n-variable system, the rank of the long-run matrix Π is tested: r = 0 means no cointegration; 0 < r < n means r cointegrating vectors exist; r = n means all series are already stationary and a VAR in levels is appropriate. The rank is determined sequentially with two likelihood-ratio tests:

  1. Trace test: H₀: rank ≤ r against H₁: rank > r. Start at r = 0; if rejected, move to r = 1, and stop the first time H₀ cannot be rejected. That r is your number of vectors.
  2. Maximum eigenvalue test: H₀: rank = r against H₁: rank = r + 1 — a sharper alternative hypothesis. When the two tests disagree, applied work usually defers to the trace test.
  3. Deterministic component choice: how the constant and trend enter the model (Johansen's five cases) changes the critical values; the unrestricted-constant case is the most common in practice, and the choice must always be reported.
Engle-Granger vs Johansen vs ARDL at a glance
CriterionEngle-GrangerJohansenARDL bounds test
Integration requirementAll series I(1)All series I(1)Mix of I(0)/I(1) allowed; I(2) ruled out
Number of cointegrating vectorsAt most 1r vectors (determined by rank tests)Single equation, 1 long-run relationship
System structureSingle equation, two stepsMulti-equation VAR/VECMSingle equation, one step
Small-sample performanceWeakModerate (asymptotic test)Strong (preferred with 30–80 observations)
Typical software commandEViews coint, R aTSAEViews/Stata vecrank, R urca::ca.joEViews ARDL, Stata ardl, R ARDL

VAR or VECM? Interpreting the error-correction term

Model choice follows directly from the rank result. With no cointegration (r = 0), estimate a VAR in first differences; with cointegration present, differencing away the levels would throw out the long-run information, so the model must be a VECM. The VECM combines short-run dynamics (lagged difference terms) with the error-correction term (ECT), which measures the previous period's deviation from long-run equilibrium.

The ECT coefficient is the most-quoted number in the model. It should be negative and statistically significant: when the system drifts from equilibrium, the correction mechanism pulls it back. Its magnitude is the speed of adjustment. An estimate of ECT = −0.42 (p < 0.01), for instance, means that 42% of any disequilibrium is corrected in the following period, so full adjustment takes roughly 1/0.42 ≈ 2.4 periods. A positive coefficient, or one below −2, signals a specification problem — revisit the lag length, the deterministic components, or check for structural breaks.

Granger causality: separating short run from long run

Granger causality tests whether the past values of one variable improve the prediction of another — predictive precedence, not causality in the philosophical sense. In a cointegrated system, causality is read through two channels. Short-run causality is assessed with Wald (chi-square) tests on the joint significance of the lagged difference terms in the VECM. Long-run causality is read from the significance of the ECT in each equation: the variable whose ECT is significant bears the burden of adjustment and is the one driven towards equilibrium in the long run. Running a standard differences-only Granger test when cointegration exists ignores the long-run channel and can produce false negatives.

Every one of these tests rests on the lag length. Report AIC, SC (BIC) and HQ together; when the parsimonious SC conflicts with the more generous AIC — common in small samples — let a residual serial-correlation LM test arbitrate: the chosen model must leave no autocorrelation behind. On the software side, EViews offers the Johansen test and VECM estimation through menus; Stata uses the varsoc, vecrank and vec command chain; in R, the ca.jo function from the urca package paired with the vars package covers the full workflow, and the script doubles as a reproducibility appendix for your thesis.

Johansen — %38ARDL bounds test — %34Engle-Granger — %18Other (FMOLS, DOLS, etc.) — %10
Approximate usage shares of cointegration methods in applied papers (illustrative)
Finding cointegration is not the end of the analysis but the beginning; the real finding is who adjusts back to equilibrium, and how fast.

Frequently Asked Questions

How many observations do I need for a cointegration test?

The Johansen test is asymptotic; with annual data, 40 or more observations are generally recommended for reliable results. For shorter series, the ARDL bounds test is usually the better alternative thanks to its small-sample performance.

What if the trace and maximum eigenvalue tests disagree?

When the two conflict, applied work typically follows the trace test, which is considered more robust in small samples. Reviewers nonetheless expect both results to be reported, together with a justification for the one you rely on.

My error-correction term is positive — is my model wrong?

A positive ECT implies the system moves away from equilibrium and usually points to a specification problem. Check the lag length, the deterministic component choice and possible structural breaks; if a break is present, consider break-robust cointegration tests.

What does Celsus offer for cointegration and causality analysis?

We handle the full workflow: unit root pretesting, lag selection, Johansen or ARDL cointegration testing, VECM estimation, and the interpretation of error-correction and causality results in EViews, Stata or R. Outputs are delivered in thesis- and journal-ready format with fully reproducible code.

← All posts