Data hazard: Why is TCS classified as a rubber company or printing service?
Summary
- We have lots of data that isn’t adding up to what it should in this age of specialized analysis. India needs to improve the accuracy of inputs used for key national statistics to get a clearer picture of the economy.
India is home to nearly 18% of the world population, has a billion-plus phones linked to the internet via affordable mobile services and generates an estimated 20% of global data, a figure that’s growing fast.
India’s pioneering digital public infrastructure (DPI) is digitalizing government, citizen and business services. All this is producing vast quantities of data. As Nandan Nilekani famously said, “India will be data rich before it will be economically rich."
How well we use our ‘data richness’ may hold the key to economic prosperity as well. For a country as large, complex and diverse as India, data unlocks the ability to frugally measure, and thus manage, the allocation of capital, resources and benefits.
Under our open data policy, large amounts of data in various government systems are being put in the public domain, so that analysis can be done and insights drawn for making better policy decisions. However, the results are contingent on the availability of updated and credible data.
Also read: Data privacy rules in limbo, tech industry on edge
Our current data and statistics systems are in need of a major overhaul on many fronts—survey methods, completeness, timeliness, formats and indicators, data storage and management, analytics and dissemination, to name a few. The Union budget highlighted this as an area of focus.
There is another problem with our data. Like the proverbial Schrodinger’s cat, it’s there, yet not there; and the only way to determine its status is by opening and peering into ‘data silos.’
Such ‘Schrodinger’s data,’ while being ‘available’ for use, cannot be analysed. It’s curious how our codification systems are set up, how users interact with them, and how they impact statistics, often with unintended consequences.
Take the case of how a company in India gets classified. Upon incorporation, it is issued a Corporate Identification Number (CIN), a 21-character alphanumeric string with information such as its listing status, industry classification, year and state of incorporation codified in it.
This is a key code that represents the company in many other government systems. Embedded within it is the industry or NIC code, which is based on the nature of the company’s economic activity (agricultural, manufacturing, financial services, etc).
This classification of economic activity is provided by the government under its National Industrial Classification (NIC) system, and forms the basis of many crucial economic estimations. It is used for the Periodic Labour Force Survey, Index of Industrial Production and even gross domestic product (GDP) calculations.
Also read: Businesses had better adapt quickly to India’s new privacy law
While the NIC code list is updated every few years, CINs are not. There are currently different versions of the NIC code list in use, including 2004 and 2008.
In addition, there is a separate activity type-based company classification system that stock exchanges (BSE and NSE) have designed and use. In other words, there is no single source of truth.
Compounding the problem, NIC codes in many CINs do not represent the true nature of a company’s activity. Take big IT firms, for example. TCS gets classified as either a rubber products manufacturer (as per NIC 2008) or printing service (as per NIC 2004); Infosys as a primary education provider (NIC 2008) or health and social work organization (NIC 2004); and Wipro as a jewellery manufacturer (NIC 2008) or a maker of electronic equipment (NIC 2008).
Interglobe Aviation, which has in its CIN the latest 2008 NIC code for ‘Computer Programming, Consultancy and Related activities,’ runs Indigo, an airline.
Shakespeare wrote, “What’s in a name? That which we call a rose by any other name would smell as sweet." The reality is that governance and service delivery are not poetry but prose. Misclassifying companies impacts our national statistics, policy measures, resource allocation, taxation and more.
Unit-level data gets anonymized and aggregated as it progresses up the ladder of national statistics, with senior decision makers unlikely to be aware of errors in classification.
Company codes and classifications are not the only place where such problems occur. Recently, I discovered that India’s governing body of a major international sport gets categorized as a Hindu Undivided Family, as per the code assigned.
The MSME Udyam portal—an initiative to foster the ease of doing business and a way to avail benefits, subsidies and incentives—is another case in point.
While registering, MSMEs are asked to choose their activity code from the NIC codes displayed. Their understanding of NIC codes, or lack thereof, will have a long-term impact on MSME statistics and thus affect policy.
The issue is complex. It is not easy to change a CIN or monitor the data users self-report. Policymakers have been holding consultations with stakeholders to arrive at a data architecture that allows for collecting, classifying, codifying, sharing, collaborating on, analysing, inferring from and taking action on data.
Also read: Data dive: Small firms punch above their weight, but the party may not last
With artificial intelligence (AI) becoming all-pervasive, it is all the more crucial that data be unlocked from various systems and used safely in compliance with the laws for the benefit of all.
We have brilliant technologists with Indic AI tools at their disposal, astute policy thinkers and sharp decision-makers who can rethink the system design that forms the backbone of many statistical analyses.
The need of the hour is a foolproof and resilient solution to ensure better data that paints an accurate picture of the economy and steers decisions in the right direction.