Even big data can’t foretell who’ll win the White House4 min read . Updated: 07 Sep 2020, 08:43 PM IST
For all their trumped up powers of prediction, algorithms are clueless about the outcome
The US presidential election is upon us. Most readers would agree that this will be the most important poll in the history of American democracy—244 years old by one measure, but only 55 years old by a more accurate one. I refer here to the fact that America was not a complete democracy until 1965, when then President Lyndon B. Johnson signed the Voting Rights Act, which finally gave African-Americans the unfettered right to vote.
I can’t help but think back with a chuckle to when globalization pundits were talking about India’s increased economic association with the US, pushed largely by offshoring, as being the “meeting of the world’s largest democracy with the world’s oldest". Evidently lost on them was the fact that African-Americans were not given the right to vote until well into the 1960s, almost 20 years after India became a full democracy.
Such comparisons between the American and Indian democratic traditions usually made it to the presentations of Indian information technology services firms that were trying to win offshore outsourcing business contracts after the turn of the century. The dotcom bust of 2000 caused many jobs held by Indian programmers living in the US on H1-B visas to disappear, but a concurrent massive boom in fibre-optic digital cable links serendipitously created a glut of telecom bandwidth and allowed American firms to send work offshore to India, where most of those H1-B programmers had returned.
I used to argue that the US decibel level on issues of immigration and trade barriers displayed a four-year cycle of peaks and valleys, cresting right before an election, after which the deafening roar would peter out to just a distant rumble. After an election is decided, I would argue, the country tends to go back to business as usual and H1 visas get issued in much the same numbers.
Protectionism and harangues over immigration are now no longer just election-year rhetoric. The world began changing inexorably in 2016. A backlash from an uneasy Caucasian populace in countries such as Britain and America has created a new world order that one needs to accept and adjust to. While globalization and trade are far from dead, they have been knocked into a different playing field, where the rules are different and the players tend to play rougher.
While I am apolitical, at least in public, I can’t help but observe with interest the US political process itself, and note how this process has been affected in so many ways by the use, misuse and timing of “data driven" announcements “on both sides". Never has a US election campaign been smudged with as much invective as the current one. Mutual hatred between the Democratic and Republican parties seems to be the only constant theme, and constructive election debate on issues affecting both America and its sphere of global influence has been pushed to the fringes.
The run up to this election is unique, and data scientists as well as almost all polling organizations, which failed in 2016 at predicting both Brexit and the US presidential election outcome, would do well to approach the mug’s game of making predictions much more gingerly this time.
For a predictive algorithm, all you need to reflect the mood of the 300-plus million US population is a representative sample of about 380. However, poll organizations use sample sizes of around 1,000 for most of their polls in an attempt to ensure a lower margin of error, and conduct these polls on a regular basis during election season. The spectre of an “October surprise", which has turned many a past election on its head, looms much larger this year than ever before. In 2016, it was the Federal Bureau of Investigation (FBI) reopening a probe of candidate Hillary Clinton’s personal email server. The case was reopened two weeks before the 2016 ballot, and FBI director James Comey revealed just two days before the election that the FBI had determined Clinton should not be prosecuted, just as his agency had already determined in July that year. Yet, that single event arguably cost Clinton the White House.
But this year, a number of surprises lurk before election day on 3 November. It has been over a hundred years since a pandemic played a role in a US election, and there is simply no data on which to base an empirical analysis of how pandemics may favour a challenger over an incumbent. Second, while the stock market has been sizzling, the economy has clearly not, and the cost in terms of job losses is staggering. No one really knows how these economic indicators influence an election when they are in such stark contrast to one another. Third, the US has been roiled by protests and hard crackdowns that are more reminiscent of the ones seen in China 30 years ago than anything the US has itself witnessed since the civil rights uprisings of the 1960s. And, last but not least, fierce battles rage over whether Americans can exercise their right to vote during the pandemic by posting in their ballots, rather than risk infection by going to possibly-crowded ballot booths.
What might fans of algorithms and big data have to say? If we have truly crossed the chasm of being able to understand how data functions and how it can lead businesses or election campaigns to make good decisions, we should not be where we are. Given all the so-called intelligent data algorithms that have been written about over the last few years, a reliable prediction should have emerged some time ago. But here we are, still completely clueless.
Siddharth Pai is founder of Siana Capital, a venture fund management company focused on deep science and tech in India