A Nobel in economics for teasing causation apart from correlation

Photo: AP
Photo: AP

Summary

Even our messy non-random world can offer data that lets us make plausible inferences on cause and effect

A sense of anticipation ahead of the Sveriges Riksbank Prize in Economic Sciences in Memory of Alfred Nobel, to cite the award’s full title, is always palpable among academic economists, especially those of us who have had the privilege of working with a future Laureate as doctoral students. To add to the sense of occasion, the Economics prize, by coincidence, happens to fall on Thanksgiving Day in Canada, a major national holiday. This year’s prize was awarded jointly to David Card (half share) and to Joshua Angrist and Guido Imbens (a quarter share each), for their path-breaking work in allowing us make credible causal claims while analysing data.

In a sense, this year’s prize is the flip side of the 2019 prize, awarded to Michael Kremer, Indian-born Abhijit Banerjee and Esther Duflo. That trio won, in good measure, for their work in popularizing ‘randomized controlled trials’ (RCTs), long a staple in the natural sciences, in the realm of economics research. RCTs allow us to make causal claims by randomly assigning subjects to either a ‘control’ or ‘treatment’ group, the randomness ensuring that any difference between the two groups can plausibly be attributed to the treatment, rather than any unobserved differences; the theory being that such differences should average out once subjects are randomized.

But RCTs have some major limitations, as your columnist has argued in detail in these pages (‘The experimental turn in economics’, 30 January 2016, bit.ly/3iVXdaC). A key problem is ‘external validity’: can a finding in one context be replicated in another—say, a very different one? Equally importantly, because creating an RCT is not always feasible, nor perhaps even ethical, in many situations such an approach simply cannot address some of the ‘big’ questions in economics, which perforce require that we analyse raw, non-randomized data, and find some way to tease out causality, if it exists.

Bear in mind that a statistical correlation observed between two variables in a data set is not, in itself, evidence of a causal relationship. Take an example close to home. During the pandemic, my classes switched online, with a pre-recorded two-hour lecture followed by a ‘live’ one-hour Q&A session. Attendance at the latter was highly recommended, but not mandatory. Uniformly, I observed that students who attended and participated actively in the discussion session performed better on the course. But is this because my discussion session allowed them to perform better? Flattering as that would be for any professor, it is equally plausible that those who were going to do well anyway chose to participate—what we call ‘reverse causality’. Or, perhaps there were unobserved differences between those who joined and those who did not—say, internet access and time available to read, study and discuss, rather than struggling with poor connectivity, work, and school—that could explain the correlation. Is it non-random selection, rather than causality, that matters more for outcomes in this case?

Economics is filled with such situations, where a correlation in data entices us to draw a causal inference. This may be treacherous in the absence of randomization, which, as noted, is impossible to achieve in most real world situations.

The genius of David Card, working with the late Alan Krueger, was to find a clever solution, which was to examine a real-world situation that presented a useful natural experiment—in this case, two contiguous US states that were otherwise similar and which shared a common labour market and general macroeconomic conditions, but one of which increased its minimum wage whilst the other did not. (Readers may find a detailed exposition in fine write-ups on this year’s prize by economist Alex Tabarok in the Marginal Revolution blog, bit.ly/3v69bmI, and Tim Harford in the Financial Timeson.ft.com/3lCDZZk). Computing the “differences in differences" before and after the change across the two jurisdictions allowed them to infer that any differential impact on unemployment was very likely driven by the policy change, not any unobserved differences.

In a similar vein, research by Angrist and Imbens, again with Krueger and published in a series of papers, studied important questions such as whether increased schooling increases people’s earnings, an obvious situation where any assumption of a uni-directional causal link may be problematic. For instance, brighter students may study more and also earn higher incomes because of superior abilities. In one seminal paper, Angrist and Krueger asked whether compulsory schooling could increase wages, and found a brilliant technique for randomization. Given the oddities of the US school system, students born in late December would be one class behind those born in early January, and laws in some states allowed students to drop out at 16. The upshot is that there would be at least some students otherwise almost identical who got a year more of schooling for a purely random reason, and, these students were found to earn higher wages, making a causal claim tenable. (Again, readers could check the write-ups of Tabarok and Harford for more details).

The beauty of these contributions is that they were not founded on a complex and technical mathematical or statistical result that would be undecipherable to the layman, but on a simple and profound intuition of how randomization may be found even in our messy and non-random world, thus making causal inferences tenable. Three cheers!

Vivek Dehejia is associate professor of economics and philosophy at Carleton University, Canada

Catch all the Business News, Market News, Breaking News Events and Latest News Updates on Live Mint. Download The Mint News App to get Daily Market Updates.
more

MINT SPECIALS