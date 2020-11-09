As populists have risen to power in countries from Brazil and the United States in the West to Turkey, India and Philippines in Asia, pre-election polling appears to be systematically under-estimating the counts, particularly for incumbents. Democracies around the world have vastly different election systems—for instance, the US has an electoral college based on states and India has a first-past-the-post parliamentary system by constituency—so it is non-trivial to compare systems. However, there is one common factor among populists, that they connect directly with their electoral base through social media.

The presidential election system in the US is arcane. The basic unit for the setting of election rules, counting of votes and certification is the county. There are over 3,140 county equivalent subdivisions in the US. A typical pre-election national poll in the US “surveys" between 1,500 and 5,000 people. This group is called a sample and the idea is for the survey methodology to get the sample to be as representative of the result for each county and the whole population as possible. Since internet tools are widely available, surveys of vastly different quality, from the statistically robust to the poorly designed, get reported. National polls immediately prior to the presidential election gave Joseph Biden a 12% advantage over Donald Trump with a margin of error of +/- 3%. In states like Pennsylvania, Biden held a survey advantage of about 5%. Once all the votes are counted, Biden is likely to have won 4% more votes across the US than Trump, and a little less than 1% more in Pennsylvania. Nationwide and in most states, Trump’s actual vote totals are about 4-5% greater than expected and 1-2% outside the margin of error.

Many things could have gone wrong with pre-election polls in the US this year. First, nearly 159 million people voted in this presidential election, making it the largest number in US history and handing President-Elect Biden the largest ever vote total of 75 million plus. Despite his loss, Trump increased his popular vote total by nearly 8 million votes, to about 71 million. It also represented the highest voter turnout of over two-thirds of eligible voters, the first time in over a century. These two factors alone may have complicated the surveys, but a well-designed survey should have been able to pick this up. Another factor could be that the demographic groupings that surveys typically use may have been even less homogenous than in prior elections. For instance, Latino votes around Miami-Dade county were less in favour of Democrats than recorded in the last presidential election. One reason offered by non-statisticians is that people may not reflect their true preferences in a survey. While it could be true for processes that involve the fear of recrimination, it is unlikely to be a factor in situations where surveys and actual votes are anonymous, like in India and the US.

Tenuous as it is, if one were to generalize from the US experience in 2016 and 2020 and the Indian experience in 2019, changes will need to be made. These changes are necessitated by a few imperatives, such as: 1) more uneducated voters are voting for populists and these voters are more difficult to reach by typical pre-election surveys; 2) the structure of the voting unit and how it fits into calling the overall election needs improved modelling; and 3) we live in an age when politicians are reaching their core base directly through social media and hyper-local radio. Getting a representative sample of voters requires a much wider, but still random, sample than used before. It also means that survey organizations need to use the same social media methods as politicians do, rather than relying on phone calling, as in the US or some surveys done in India. Modelling improvement means that in addition to surveys, pollsters will need to add detailed voting-unit information, including prior election performance data and stochastic forecasts. Linear modelling, which is typical of election forecasts, will need to be complemented with “option analysis", which allows for “knock-in" and “knock-out" characteristics that arise in different types of elections. In this year’s US election, for the first time, some forecasters added detailed scenario and pathway analysis, which is one way to account for non-linearity.

Short of polling every single voter in a country and assuming they do not change their mind for the actual ballot exercise, there is no way to get a forecast right. While stochastic forecasting could indeed become state-of-the-art, its complexities are difficult to convey to lay audiences. Projecting that Trump had a 10-20% chance of victory with 3,000 fewer pathways than Biden may infuriate more than inform people. That said, pre-election forecasting is an important and necessary tool for democracies to function. They will simply have to get better and communicate more effectively than they have done in recent years.

P.S.: “I haven’t trusted polls since I read that 62% of women had affairs during their lunch hour. I’ve never met a woman in my life who would give up lunch for sex," said humorist Erma Bombeck.

Narayan Ramachandran is chairman, InKlude Labs. Read Narayan’s Mint columns at www.livemint.com/avisiblehand

