The survey was conducted across 35 assembly constituencies (every odd numbered constituency among the 70 constituencies for the Delhi assembly), giving an average of about 96 respondents per constituency. How much information can we get by surveying just 96 respondents? Photo: HT
The survey was conducted across 35 assembly constituencies (every odd numbered constituency among the 70 constituencies for the Delhi assembly), giving an average of about 96 respondents per constituency. How much information can we get by surveying just 96 respondents? Photo: HT

AAP poll survey raises more questions than it answers

Survey gives access to raw data, but it is sponsored by a political party, and its methodology is unknown

In what is probably a first in opinion polling in India, the Aam Aadmi Party (AAP) has made public the raw data from a poll it conducted in August.

According to the data put out by the party on its website (http://goo.gl/trRrTm), 3,372 people responded to the survey, in which they were asked questions related to who they were planning to vote for, the issues they considered important and their views on who was going to win the elections.

For an independent analyst not affiliated to any polling agency, this is exciting since access to raw data means an opportunity to check out correlations that may not normally be reported. For example, how many people who plan to vote for the Congress also think it is going to form the next government (78%)? What is the proportion of people for whom the biggest electoral issue is corruption actually plans to vote for the Aam Aadmi Party (18%)? These are answers to questions that are normally available from opinion polls but usually not tractable since polling agencies do not reveal raw data.

The most interesting part of the survey is that 27% of the respondents (916 out of 3,372) declined to answer the question on which party they were going to vote for. Given that other polls usually don’t disclose this number, it is hard to say whether this is high, but assuming it is, it’s probably a result of the fact that the survey makes it clear upfront who is conducting it. The fifth question asks which party the respondent will vote for. The next is on whether the respondent has made up her mind, and the seventh question is about whether the candidate knows about the Anna Hazare campaign against corruption.

While the question on the party the respondent will vote for comes before making it explicit that the interviewer is from AAP, it is not clear if till that point the interviewer makes it a point to hide the identity of the sponsor of the survey. Now, while parties are free to sponsor their own surveys, it is important from an information-gathering perspective that the identity of the sponsor is kept secret till the end of the interview, for knowledge of the sponsor of the survey might bias the responses. For example, if I’m yet to decide who I’m going to vote for and a surveyor sponsored by the Bharatiya Janata Party (BJP) interviews me, it is likely that I’ll tell him that I’ll vote for the BJP.

The other important drawback of this survey is that the methodology has not been mentioned. If we know that the survey was sponsored by a political party, and the methodology has not been mentioned, how do we know that the sample surveyed is actually random? One thing we can do is to check the demographic distribution of the respondents and check them against the corresponding distribution of the population. While a large discrepancy in this can show up a degree of non-randomness in surveying, it is still possible to survey a sample that has not been chosen at random but still follows similar demographic characteristics of the population.

The survey was conducted across 35 assembly constituencies (every odd numbered constituency among the 70 constituencies for the Delhi assembly), giving an average of about 96 respondents per constituency. How much information can we get by surveying just 96 respondents?

To answer this question, we can use the Central Limit Theorem and Tchebycheff’s Inequality (the more mathematically minded readers are encouraged to refer to the slides of a lecture by Chennai Mathematical Institute’s Rajeeva Karandikar (http://goo.gl/2S5SPX).

This bound can be improved further using the Central Limit Theorem.

Cutting out the math, this simply means that if you interview 640 respondents, the absolute difference between the true vote share of a party ‘p’ and the estimated vote share ‘pn’ is less than five percentage points 99% of the time. If you interview only 100 respondents, then ‘p’ and ‘pn’ can vary by as much as 32 percentage points (99% confidence interval)!

Given that half the contests in parliamentary elections are decided by seven percentage points or less (http://goo.gl/PM5G6k), a 32 percentage point error in the estimated vote share doesn’t give us too much information. So, it is not wise to read too much into the constituency-level survey.

At the state-level, however, 2,456 (3,372–916) voters have indicated which party they will vote for. Of these, 1,597 voters have indicated that they have made up their mind on who to vote for. Assuming that these 1,597 respondents are a truly random sample from the voters of Delhi (this is a leap of faith, since we don’t know the sampling methodology, and also not sure if there’s a bias in terms of people who have indicated they might change their minds) and using Karandikar’s formula above, the expected vote shares of the various parties can be inferred from the accompanying table. As of this survey in August this year, the BJP was expected to get the maximum proportion of votes. However, we need to keep in mind that this is from a survey which was sponsored by a political party whose methodology we don’t know, and where a large proportion of respondents refused to answer the key question and some respondents indicated they were yet to make up their minds. As they say in the cookery column, add salt to taste.

Close