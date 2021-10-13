Bear in mind that a statistical correlation observed between two variables in a data set is not, in itself, evidence of a causal relationship. Take an example close to home. During the pandemic, my classes switched online, with a pre-recorded two-hour lecture followed by a ‘live’ one-hour Q&A session. Attendance at the latter was highly recommended, but not mandatory. Uniformly, I observed that students who attended and participated actively in the discussion session performed better on the course. But is this because my discussion session allowed them to perform better? Flattering as that would be for any professor, it is equally plausible that those who were going to do well anyway chose to participate—what we call ‘reverse causality’. Or, perhaps there were unobserved differences between those who joined and those who did not—say, internet access and time available to read, study and discuss, rather than struggling with poor connectivity, work, and school—that could explain the correlation. Is it non-random selection, rather than causality, that matters more for outcomes in this case?