It is presidential election season in the US, which means opinion poll time. Which always means plenty of material for number nuts like me.
Irrelevant aside: Did you know that in Panaji, Goa, you can find a statue of the “Opinion Pollacho Baap”, or “Father of Opinion Poll”?
That apart. Let’s say a Time magazine poll has Barack Obama leading Mitt Romney by a margin of 52% to 48%. It also says, as such polls usually do, that these numbers come with a “margin of error” of 4%.
So what does this mean? If the election was held on the same day Time conducted its poll, would Obama win?
But before you answer: a week later, another Time poll has Romney ahead 51% to 49%, with the same margin of error. What does this mean? Has Romney surged ahead, so that if the election was held then, he would win?
In both cases, we can’t say. The reason is that margin of error. Applied to these hypothetical Time polls, it suggests that the race is a dead heat. Why so, I will try to explain, and I admit to no margins of error.
What such polls do is gauge a national opinion. Whom will you vote for, they ask, Obama or Romney? They can’t ask every American voter that question—there’s the logistics of approaching millions of people; and that’s the November election’s job anyway—and, therefore, they ask it of a representative sample. “Representative”, because you don’t want pollsters approaching, let’s say, only mothers-in-law. The sample must resemble the overall voting population, and that means other in-laws too, as well as more ordinary folk.
What pollsters learn from the responses of this sample, they extrapolate to the country as a whole.
Which raises the question: how large a sample? A sample size of one doesn’t cut it. Why? Well, imagine a pollster asking Lady Gaga, “I say, whom will you vote for, milady, for President?” Suppose Gaga, for her own perhaps intoxicated reasons, mutters “Mulberry Vodka Desdemona”. Would the pollster be right in reporting “100% of the country wants MV Desdemona as President”, thus setting champagne flowing in obscure Desdemona households?
You’d laugh him off his podium if he did that. (Though of course, he would be right if he reported “100% of my sample wants Desdemona as Prez”, not that that tells us anything much).
It would be just as laughable to extrapolate things about a whole country from sample sizes of two (Gaga, Justin Bieber) and three (Gaga, Bieber, Shania Twain) and so forth. Yet as the pollster increases his sample size, there comes a point when he can legitimately make claims about a whole country. Example: the Obama vs Romney results that I started this article with.
When does that point come?
That really depends on how accurate the claim needs to be. It’s true that the larger your sample, the more accurate your numbers will be. It’s also true that beyond a point, the gains in accuracy become too small to matter. For opinion polls, pollsters will typically pick a sample size that gives them a 4% margin of error. (We can calculate the size from a given margin of error, but that, another time).
What that margin means is this: If you conduct the same poll 100 times, 95 of them will produce numbers within four points of your original poll. Taking our first Time poll figures: in 95 polls out of 100, Obama’s number will be between 48 and 56, Romney’s between 44 and 52. You might have Obama 55, Romney 45; or, as the second poll has it, Romney 51, Obama 49. All possible. Because of the margin of error, each of those is effectively the same result.
And the other five times? You’ll get wildly different numbers. Or, possibly, “Desdemona”.
Put another way, you can be 95% sure your poll reflects the national mood. There’s a 5% chance that it is completely off the mark.
Confusing? It really need not be. The point is this: because opinion polls cannot ask every voter in the country for a response, they must take a sample. Which means we also need a sense of how close to reality their results are. That’s why the margin of error.
And this applies elsewhere, too. For example, despite what you may have heard, scientists haven’t actually found the Higgs Boson. Instead, their experiments suggest, with a certain level of confidence, that it exists. How confident? Well, if the experiments are repeated 3.5 million times, chances are that only once would they fail to suggest its existence.
That’s the kind of standard scientific discoveries are held to. Talk about Pollacho Baap.
Which is why I’m more willing to place a bet on the existence of the Higgs Boson than on a future Desdemona presidency in the US.
(Thanks to Gurudatt Bhobe and Harish Karnick for their thoughts on this essay.)
Once a computer scientist, Dilip D’Souza now lives in Mumbai and writes for his dinners. A Matter of Numbers will explore the joy of mathematics, with occasional forays into other sciences.
To read Dilip D’Souza’s previous columns, go to www.livemint.com/dilipdsouza