Big Data and the US election
The US election is upon us. We watch the see-sawing in the polls with bated breath—last Wednesday’s Mint carried a blurb that Donald Trump now has a slender lead over Hillary Clinton. The truth is we will not know the result until all is said and done on 8 November.
While I am apolitical, at least in public, I can’t help but observe with interest the political process itself, and to note how this process has been affected in so many ways by the use, misuse and timing of data.
Never has a US election campaign been smudged with as much invective as the current one. Mutual hatred seems to be the one constant theme, and constructive debate on the issues, both inside America and within its large sphere of global influence, has been pushed to the fringes. And much of what lies at the core of this shift to hatred is the use and misuse of data in ways that one simply couldn’t imagine a few years ago.
Data never dies in the Internet age. Everything one has ever written or has been recorded saying is saved forever, in some obscure server somewhere on the planet. Going in and deleting data doesn’t work either. It can simply be recovered, even by a hacker with limited skills. Try pulling out of Facebook, for example. While your account will be shut down, Facebook will tell you that you are better off deactivating it so you can come back to it at some future time. It brings to mind Don Henley and Glenn Frey’s immortal line from the song Hotel California, the Eagles’ greatest hit, which says “you can check out, but you can never leave”.
Data is now referred to in zetabytes (where one zetabyte is equivalent to one trillion gigabytes). Studies show that in 2006, the Internet produced about 0.16 zetabytes of data, which then grew at a compound annual rate of 25% for the next decade, going to 8.5 zetabytes at the end of last year. Someone trawling the Internet is bound to stumble upon something of shock value, which can be used at the appropriate time to completely take the wind out of a candidate’s sails.
The media has had a field day discussing the various “October surprises” that came out last month—Federal Bureau of Investigation (FBI) director James Comey’s decision to reopen the email case against Clinton (yes, those are still live on a server somewhere)—and Trump’s foul language against women, which was taped and digitized a decade ago. And so they should—the voting public has a right to know about these things, and the media, however biased it may or may not be, still has a duty to bring these sorts of issues out into the open when people are trying to decide on who their next leader is.
And then there are the ubiquitous polls. For a confidence level of 95% with a margin of error of 5%, all you need as a sample size to reflect the mood of the 300 plus million US population is a sample of about 380. However, poll organizations like Gallup use a sample size of around 1,000 for most of their polls, in an attempt to ensure a lower margin of error, and conduct these polls on a regular basis during the election season. These polls are then made public in the media, and take on more significance each time an October—or earlier “surprise”—surfaces. The New York Times, in an article on 31 October, pointed out that some of these surprises were held out as events that were true turning points that permanently shifted the race, but in truth, while many of the polls have shifted back and forth, and despite “state maps”, FBI flip-flops and “Bayesian flips” which involve sophisticated statistical reasoning, we are no closer to knowing who will win the race.
What say you, fans of ‘analytics’ and ‘big data’? If we have truly crossed the chasm of being able to understand how data functions and how it can lead businesses—and by extension, election campaigns—into making the right decisions, we should not be where we are. Given all the so-called intelligent data algorithms that have been written over the past few years, a clear winner should have emerged some time ago. These same algorithms have been guiding the decisions made by businesses for a while now—especially with respect to strategic decisions and shifts in company direction based on consumer behaviour. In a world where corporations have had to make binary choices (either path X or path Y, but not both), these algorithms fall short.
The New York Times piece goes on to say that this may well be because many voters are still undecided, or like me, will refuse to divulge their choice to a pollster. These voters sit on the margin, which is where the final result always lies in a binary choice; they will be the ones who finally decide who wins on Tuesday. It also points out that a majority of voters dislike both candidates and would prefer not to vote for either of them. This is a sad state of affairs for American polity. That said, in a quote variously attributed to Alexis de Tocqueville and to Joseph de Maistre—“the people get the government they deserve”.
And the big data experts get back to the drawing board to continue to create fear, uncertainty and doubt—all of which can only result in more business for them and their ilk.
Siddharth Pai is a world-renowned technology consultant who has personally led over $20 billion in complex, first-of-a-kind outsourcing transactions.