Home / Mint-lounge / Mint-on-sunday /  Not in data’s name: How not to be misled by biased statistics

Going by official statistics, there was a marked increase in the number of crimes committed against women in India in 2013—compared to the previous year, the number of crimes against women increased by a whopping 27%. However, before we jump to the conclusion that the number of crimes against women saw a significant increase in 2013, we need to look at the context.

In December 2012, a woman in Delhi was assaulted and raped by five men in a moving bus, causing massive national outrage (the incident came to be known in popular imagination as the “Nirbhaya case"). The victim later succumbed to her injuries and died in a hospital. The widespread protests that followed attracted attention to crimes against women. Women were encouraged to speak up and report any cases of sexual assault or violence. At the same time, police came under greater pressure to register cases of crimes against women, including sexual crimes, rather than brushing them under the carpet. 

The effect of the protests and their aftermath meant that, in 2013, women in India were much more likely to report cases of crimes and assault, and police more likely to register such cases. Hence, it is more than likely that the 27% rise in crimes against women that year was largely a consequence of increased reporting and data collection, rather than an increase in the incidence of such crimes. 

This story is relevant in the context of some recent discussions on social media regarding the case of mob lynchings in India. On Wednesday, rallies were held in several cities under the #NotInMyName banner to protest against recent cases of mob violence and lynchings across the country, and the government’s indifferent response. 

The protests were carried out on Twitter as well, and the hashtag “#NotInMyName" that the protesters used was soon trending in India. Not unexpectedly, left-right flame wars ensued, with one side coming out in support of the protests and the other decrying them as being without basis and “portraying India in bad light". 

While most such social media “discussions" are unworthy of comment, what made this particular set of discussions interesting was the introduction of data into the argument. Using data from media archives, one user reconstructed statistics of the number of possible lynchings in the last few years, especially when the previous government was in power, in order to argue that the number of lynching incidents this year is not out of the ordinary, and thus the protests are baseless. 

The introduction of data into any argument is usually welcome, as it can provide a solid basis to what can otherwise degenerate into a charade of name-calling. What made the data problematic in this particular case, however, was the choice of data source—a survey of newspaper archives. 

The problem with using newspaper archives to reconstruct historical data is twofold. Firstly, the news media suffers from what can be described as the spectacularness bias. To put it simply, “man bites dog" is far more newsworthy than “dog bites man". In other words, a mundane occurrence is seldom newsworthy since the quantum of information it conveys is rather low. For this reason, a good reporter is always on the lookout for news that is either surprising, counterintuitive or rare.

At the same time, the media likes to ride on existing narratives. An event that supports a prevailing narrative is more likely to get media attention than one that either runs contrary to or is unrelated to the narrative. 

These two together imply that media attention to a class of events (mob lynchings, for example) is at best volatile. Depending upon the prevailing narratives, the likelihood of an event being reported can vary significantly with time. Hence, using media archives to count how many events of a particular class happened in a particular period of time can lead to highly inconsistent data, from which little insight can be drawn (unless one is constructing an argument about the media’s inconsistencies, that is). 

In this particular case (mob lynchings), making use of official statistics is unlikely to be of much help either. As we saw in the case with crimes against women, the number of cases filed and registered are also a function of the prevailing narrative and atmosphere. If there is a prevailing narrative that mob lynching is in some sense “acceptable", at the margin the number of lynching cases filed will be lower. In an atmosphere where lynching cases are being widely reported, on the other hand, more cases are likely to get filed. 

Moreover, official crime statistics are messy for other reasons. Firstly, when a person has committed several crimes, only the first one goes into the statistical records, and there is no science to determine which of the crimes is listed first. For example, if a robber kills someone in the course of his robbery, and gets accused of both robbery and homicide, the case can go into the records as either a robbery or a homicide, but not both. This makes crime records notoriously messy. 

And when it comes to the specific case of lynching, the crime records have no such official category. There is murder, rioting, dacoity and even “making preparation and assembly for committing dacoity", but there is no “lynching". So a case of lynching might get classified under different heads by different officers, making it nearly impossible to get accurate data—in fact, this might have been the reason for the use of media archives in the above discussion. 

Finally, even if we were to prove that the current levels of mob violence are nothing out of the ordinary, does this mean that people should not protest? The trouble with making such an argument is that it is nearly impossible to determine what an “acceptable" level of violence or crime to trigger protest is. Protesters should be free to protest when they feel like, and the worst that can happen is that the government might deem the protests to be too trivial and simply ignore them! 

In this particular case, though, with the prime minister commenting on the issue, the government has admitted that the protests did have some merit. 

Counterintuitively, someone analysing this time period in Indian history sometime in the future might actually conclude that the incidence of lynchings actually went up following the prime minister’s call for peace. Much like the case of crimes against women displayed above, it will be yet another case of greater reporting and filing of complaints being mistaken for a higher crime rate.

The moral of the story is that the next time data is introduced into an argument, stop and question where the data came from, and how it was collected. Think about possible biases that might have gone into collection of the data, and how those biases might influence any insights you are trying to draw from the data. 

Tailpiece: a large number of data sets that we use in everyday discussions and analysis come out of surveys. This piece from Yes, Prime Minister illustrates how clever design can result in surveys giving the answer you’re looking for. 

Comments are welcome at feedback@livemint.com

Catch all the Business News, Market News, Breaking News Events and Latest News Updates on Live Mint. Download The Mint News App to get Daily Market Updates.
More Less
Recommended For You
Get alerts on WhatsApp
Set Preferences My ReadsWatchlistFeedbackRedeem a Gift CardLogout