How citizen data led India’s covid battle

Municipal workers in Srinagar in early March, 2020. By 3 March, 2020, the fact that this was going to be a fast-spreading virus became clear. (Photo: Reuters)
Municipal workers in Srinagar in early March, 2020. By 3 March, 2020, the fact that this was going to be a fast-spreading virus became clear. (Photo: Reuters)


  • When the government failed to serve up data needed to make key decisions, public-spirited citizens stepped up.
  • With new variants now altering the course of both the pandemic and vaccine development, transparent data has become even more important in the pandemic’s second year

CHENNAI : March 2021 marks one year of India’s battle against the pandemic. Despite this lapse in time, large gaps remain in our understanding of its impact, and Indian researchers are using novel methods to fill in the blanks. For instance, two economists recently used a large sample survey typically used to capture consumer behaviour to estimate uncounted deaths and found that deaths in India likely shot up in 2020, but more on account of the disruption of non-covid health services than directly from the SARS-CoV2 virus itself.

Renuka Sane, an associate professor at the National Institute of Public Finance and Policy (NIPFP) and veteran economist Ajay Shah used the Centre for the Monitoring of the Indian Economy (CMIE)’s Consumer Pyramids Household Survey, which talks to over 200,000 households three times a year, seeking details about the whereabouts of each family member.

The survey does not ask about cause of death. Using this data, Sane and Shah found a substantial increase in mortality in 2020. Their estimates show that deaths from all causes almost doubled between May and August compared with the same period for past years. The increase was greater than covid would have suggested among poorer households, rural households, women, and the non-elderly. “What this would suggest is that the disruption of health services caused by covid led to this increase in mortality, and not covid alone," Shah told Mint.

This approach is emblematic of the creative solutions that Indian researchers have had to come up with to access data that in many other countries is routinely available via government sources. In South Africa, for instance, weekly updated all-cause mortality data is made publicly available by the country’s medical council.

In India, however, the end of 2020 did not mean that data for this crucial year would now be available—the most recent national all-cause mortality data is for 2018. As a result, researchers have turned to novel sources—including burial grounds and cremation ground records.

Dr. Arun N. Madhavan is a doctor of internal medicine who runs a clinic in Palakkad. He led a team of volunteers who scanned the state’s media for news reports and obituaries that mentioned a death from covid-19, and cross-referenced it against the state’s official list. By November 2020, the group said that 45% of the deaths they tallied never showed up in the government’s list. By early 2021, however, the group had to stop their watchdog work—most of the media had “moved on" and had stopped publishing local news about covid or obituaries, Madhavan said.

Through the pandemic, Indian journalists, scientists and citizens have had to battle against the Indian state for access to basic data, which is often easily available in most other parts of the world, including in resource constrained Latin American countries. With new virus variants now altering the course of both the pandemic and vaccine development, transparent data will only become ever more important in the pandemic’s second year. Will India change course at least now?

The data gaps

By 3 March 2020, the fact that this was going to be a fast-spreading virus became clear. From 30 January to 1 March, India reported only three cases, all of them from Wuhan returnees in Kerala. Then, from 2 March onwards, super-spreaders and clusters were for the first time identified, including one cluster of 36 cases related to one Hyderabad engineer.

But just as panic set in, reliable data streams necessary to make key decisions stopped. In mid-May 2020, as numbers were beginning to rise across the country, Telangana discontinued information about testing in its daily bulletin with no explanation, and for the next one month, no data on testing was available. It also reports little information by district. Additionally, Telangana does not release details of what share of its tests are rapid antigen tests and what share are RT-PCR tests, another measure which is important, given that antigen tests have a lower sensitivity and, thus, are far more likely to produce false negatives.

Telangana was not the only state that did not give out enough data. Most states, including Bihar and Maharashtra, do not give district-level testing data. Other regions had their own idiosyncrasies: Chennai municipal corporation reports cumulative cases only, requiring one to subtract the previous day’s total to know how many new cases have been reported. Mumbai’s municipal corporation has an enviable dashboard which is updated each day, but no archive, and, as a result, no access to past data.

But no government agency has fallen as short on data dissemination as the Union government itself. Data about non-covid health was the first victim. The Integrated Disease Surveillance Programme (IDSP) provides weekly alerts about disease outbreaks and the action taken to contain it. From the middle of March, these reports stopped appearing until August 2020 with little explanation, and the number of disease outbreaks reported fell too.

Pulse of the pandemic
View Full Image
Pulse of the pandemic

The National Health Mission’s Health Management Information System (NHM-HMIS) tracks indicators on utilisation of health services from over 200,000 health facilities and is updated nearly every day. After Mint wrote about the severe curtailment in health services in March 2020 as compared to previous months and years, the NHM stopped publishing this data. In August, the NHM-HMIS finally published data for April, May and June. After it was once again written about, the portal has published no new data again (see Chart 1).

Data on covid had similar gaps. When the Indian Council for Medical Research published a paper analysing the first million covid tests in India, the data showed among other things that for over half of the people tested by then, no information about the circumstances of the detection of the case—whether they were symptomatic, asymptomatic contacts, healthcare workers etc—had even been taken down (see Chart 2).

The bright spots

From this chasm of official apathy, a diverse range of voices have emerged to fill in the gaps. One key website—often mistaken to be a government entity—is, a volunteer-driven community of developers, public health professionals, data scientists, designers and other citizens who came together over a Telegram group in early March. is now such a reliable source of updated data on covid cases, deaths and vaccines that media and research organisations in India and across the world rely solely on it. In fact, the recently released Economic Survey 2020-21 uses its data, rather than crediting the MoHFW.

Some states innovated too. In the early days of the pandemic, until the end of July 2020, Karnataka disclosed the name, age, gender, comorbidities and district of every new case, and assigned them a unique patient number, which allowed for tracking recovery times and outcomes. Bihar systematically monitored the testing status of returning migrants, allowing for early indications of high-spread source areas like Delhi, even before the official data had begun to show the surge in Delhi. Tamil Nadu continues to report demographic information for every covid death. West Bengal’s bulletins report the quantities of personal protective equipment including N95 masks and oxygen in great detail at the district level.

While many states possibly maintain demographic details for every positive case, Tamil Nadu and Andhra Pradesh disclosed this data—the world’s largest contact tracing dataset—to a team of researchers led by Ramanan Laxminarayan of Princeton University, who were then able to publish the first peer-reviewed paper on transmission patterns in India.

Much of the understanding of the epidemic in India has emerged from sero-surveys, and India has seen a flowering of them, apart from those conducted nationally by the Indian Council for Medical Research.

“If you were to list out countries by the number of sero-surveys they did, India would be close to top of that list. Adjusted by income and health capacity, India probably outperformed most governments other than some in East Asia," Anup Malani, professor at the University of Chicago’s Medical School, said.

Malani has led multiple sero-surveys in India with the IDFC Institute. IDFC contacted governments that were willing, and then tailored their sero-surveys around specific public health questions that the government needed answered.

The data from these sero-surveys is then analysed and shared on Twitter by a small but busy community of public health professionals, scientists and data enthusiasts to help identify important trends in the evolution of the pandemic. Murad Banaji is a lecturer in mathematics at Middlesex University in the United Kingdom who has spent the last year locked down in India and shared his analysis of sero-survey data from Mumbai and Bihar on Twitter. Dr Anupam Singh, a doctor and a “hobbyist coder" closely follows Indian and international research around not just sero-surveys but also drug and vaccine efficacy, posting threads with updates from vaccine trials.

Opaque governments

Despite these efforts, governments continue to be opaque instead of supporting private data researchers. On 23 February, 2021, the director of public health in Telangana, Dr G Srinivasa Rao, announced that his department would no longer issue a daily covid bulletin, and would now only publish a weekly one. Two days later, the state High Court directed them to resume daily updates, and when the bulletin was back online the next day, the numbers had shown a small uptick—small, but the first uptick in months (see Chart 3).

With multiple states reporting new surges, the pandemic is far from over in India. If the hope had been that the last year would bring about significant change in India’s data collection and dissemination record, that fond hope has now dimmed.

“We understand and appreciate that collection of covid-19 data is a complex task that involves multi-level coordination," the collective said in an email. “But could things have been better? We do think so."

First of all, they said, there should be a standardized data reporting formats across states. “This would have ensured a similar level and quality of data coming from all states. Unfortunately, almost a year into the pandemic, states continue to put out different levels of information in their bulletins. A central portal with case data would have been beneficial as well. With cases going down, we hope states do not stop publishing the daily bulletin. As seen recently, cases can go up in future and covid-19 data is still important for us to project the trajectory of the pandemic," they added. Whether things change before a second wave—if there one—may make all the difference.

Rukmini S is a Chennai-based journalist

Catch all the Business News, Market News, Breaking News Events and Latest News Updates on Live Mint. Download The Mint News App to get Daily Market Updates.



Switch to the Mint app for fast and personalized news - Get App