Home >Opinion >Views >Opinion | An old data science trick may hold the key to extensive coronavirus testing
Photo: AFP
Photo: AFP

Opinion | An old data science trick may hold the key to extensive coronavirus testing

  • Given the limitations on the testing capability in the early days in India, it might have been reasonable to focus testing on the targeted groups
  • However, given that almost 80% of the infected are asymptomatic, targeting tested would not help gauge the wider community spread

Since the early days of the coronavirus pandemic, WHO has been repeating the mantra 'Test, Test, Test ....' to be followed by treatment, isolation and contact tracing to limit the spread of covid-19 infection. While countries have tried to follow this guidance to varying degrees, all have been daunted by the sheer scale of testing involved. This has been compounded by the non-availability of enough testing kits at affordable prices as well.

Before exploring further, we need to understand the purposes behind testing, which could be any of the following:

i) To find if those who display covid like symptoms have the infection or not.

ii) To find, if direct or indirect contacts of the confirmed cases are infected, even if they display mild or no symptoms.

iii) To periodically test the front line covid warriors both in the health and non-health sector.

iv) To gauge the extent of the spread in different regions and zones.

South Korea is cited as an example of a country where extensive testing has been used as a strategy early-on to limit the spread of infection. You could see positive outcome of that approach in the speed with which they could flatten the curve with minimum casualties. Countries with higher testing in the beginning have been able to contain the infection far better. While this is a good strategy, the sheer scale of testing involved poses a major problem for countries with bigger populations.

To manage the scale, ICMR in India, and similar organisations in other countries, adopted testing protocols to target the people to be tested to certain high-risk groups. Even in the USA, which has a reasonably high level of testing (43915 per million in comparison with India which has around 2135 per million as of 24 May 2020), it is still not possible to test all the people that need to be tested. You cannot just walk in and get tested, even if you feel that you have mild cold like symptoms. In the absence of wide spread testing, many states in India have adopted extensive contact tracing and quarantining to limit the spread of the infection. Kerala is cited as one such example where this approach has worked very well (Kerala currently has around 1762 tests per million).

Given the limitations on the testing capability in the early days in India, it might have been reasonable to focus testing on the targeted groups. However, with the spread of the infection and given that almost 80% of the infected are asymptomatic, targeting tested would not help gauge the wider community spread. As they say, lack of evidence of the infection due to limited testing is no evidence of the lack of infection. Only wider testing would give us some idea about what strategies (containment or otherwise) are working and what are not. Extensive testing has become even more critical in view of the wide spread movement of large number of migrants across the county thus facilitating the spread of infection.

Before dwelling further on what approach should be followed for testing, we should explore the types of testing a bit more in detail. Two widely used testing methodologies are:

i) RT-PCR testing, which looks for the presence of covid-19 Virus RNA. You would use throat and nasal swabs for testing.

ii) Antibody testing, which looks for the presence of the antibodies to the covid-19 virus among the infected people. You would use blood for this kind of testing.

There are a few other types of testing kits being considered. However, by far, these two are the most widely used ones.

While, RT-PCR testing is said to be very robust, it can also give false negatives during the early days of the infection. In addition, it is expensive and takes a little longer. But this is a gold standard confirmatory test.

When it comes to antibody testing, it has the advantage of being able to confirm if you are currently infected with covid currently or were infected sometime back. It is relatively cheaper. Given these two attributes, it is a good test to gauge community transmission. However, it has a few drawbacks as well. It cannot be used as a confirmatory test for current infection. Since this

test depends on antibodies, there could be false negatives till the body develops sufficient antibodies to be detected.

Coming back to the sheer scale of testing required in a country like India, what is it that can be done to get truer picture of where we stand with respect to the spread of the infection? It is good to remember that it is not sufficient to test a few millions once and hope things are under control. You would be required to keeping testing the same population over and over again. This could go on for months, if not for years. For a country like India, this could mean testing millions of people every week even to get a sense of where we are headed.

It is here that data science can help in a big way. One of the standard techniques adopted is the random testing of the population to get a sense of the infection among the population. While ICMR has been doing some such sampling from the beginning to gauge the community spread, the scope and the scale of those tests were probably very limited. Now that there is a large increase in the number of cases being reported, we need to adopt strategies that give a truer picture by using some of the data engineering techniques. There was one such randomised testing effort carried out in New York State at public places like grocery stores, covering many demographic segments, which indicated an infection rate of 13.9 among the population. But with 5000 samples, the scope of this test was probably limited.

Random Sampling is one of the oldest tricks in the book adopted by the statisticians and data scientists to make sense of the data when the population size is huge. Such sampling methodologies are widely used by the internet and social media companies to get trends over a very large datasets when it is impossible to make detailed analysis. We have seen this being adopted regularly by the psephologists in our TV channels to arrive at election prediction with a few representative samples. As we have seen, these predictions can vary widely base on the methodology and the samples used. However, using a proper sampling strategy covering all the demographic and social segments ('representative sample') and a reasonably large sample, it is possible to avoid 'sampling bias' and get an accurate sense of the extent of the infection.

While Antibody testing is ideal if you want to cover larger sample size, with pooled RT-PCR, you should be able to cover larger populations with fewer tests as well. Pooled PCR is ideal if the infection rate is low. ICMR recommends pooling 5 people. We could adopt one pooled testing for a family. It is reasonable to assume if one member in the family is positive, it is likely the others are infected or likely to be infected as well. Stanford university used pooled testing to gauge the early spread of covid-19 in the Bay area. There are studies that indicate that you can pool up to 32 samples or more.

While targeted testing would continue to play a critical role in infection identification and treatment, random sampling methods would help us answer many critical questions concerning the community transmission like - which are the zones where the spread is wider, how wide spread is the infection among different demographic segments, etc. This would result in a more scientific marking of the zones into red, orange and green based on wider statistical indicators rather than being based on a few directly detected cases. India is endowed with some top-notch data scientists and statisticians who can help the policy makers to navigate the covid-19 crisis with more meaningful and targeted analysis which are effective in addition to being scalable.

The author is a Bengaluru-based data engineer. Views expressed are his own.

Subscribe to Mint Newsletters
* Enter a valid email
* Thank you for subscribing to our newsletter.

Click here to read the Mint ePapermint is now on Telegram. Join mint channel in your Telegram and stay updated

My Reads Redeem a Gift Card Logout