One of the arguments frequently trotted out in favour of data localization is that it will help us improve our national competence in Artificial Intelligence (AI). The government seems to believe that by keeping data in-country, we will finally have the training data that our data scientists require to give us the edge we desperately need in AI.

I think this notion is daft. There is no straight line between ordering multinational companies to keep data in the country and magically developing AI prowess. Much of the data collected (particularly the metadata) will be arranged in structures that are only intelligible to the companies collecting them. Simply locating that data in India will not enable us to make any sense of it. More often than not, the behavioural insights that these companies obtain by processing data, which the government seems most eager to benefit from, are usually only useful in the context of their own specific business models and remain largely unusable by anyone else, with the possible exception of direct competitors. For instance, it seems unlikely that the user preferences gathered by Flipkart would be of much use to Ola. Physically locating data in India with a view to making this data more widely available is, therefore, hardly likely to benefit domestic AI research.

It is not as if we don’t already have data that we can use to develop high-quality AI products and services. Just last week, I spent some time with people from a very promising startup that was using case transcripts available on the websites of the Supreme Court and all the state high courts to build an AI-based legal research tool. They told me all the data they needed to train their AI models was easily available. What they were struggling with was finding data centres in India that were capable of offering the level and quantity of GPUs—graphic processing units, technically, though these do a lot more than the term suggests—that they needed for their AI model to be able to generate answers within a reasonable time frame.

What we really need to do is ensure that our AI startups have access to all the building blocks they need to grow their business. The NITI Aayog has announced a 7,000 crore AI mission and seems to have come up with a road map that includes the establishment of five centres of research excellence, 20 institutional centres for transformational AI and a cloud computing platform called Airawat. I am not sure if this is enough.

Last month, I participated in a workshop organized by the Takshashila Institution to gather inputs from an independent, non-partisan expert group as to how the government could best promote the development of AI in the country. Unsurprisingly, none of the suggestions that came up in the course of our discussions suggested localizing data collected by foreign companies to extract value out of it. Instead, the recurrent theme that ran through the entire workshop was to try and figure out methods by which we could incentivize homegrown AI development.

One idea was that we should set ourselves the target of incentivizing as many as 500 leading AI researchers to work in India over the next three years on the condition that each one of them should train 5 to 10 PhD or Masters graduates over the duration of their stay. Another suggestion was to set ourselves the target of making sure that Indian AI research papers feature among the top 10 most cited in the field.

There was also a suggestion to establish centres of excellence, create data repositories and other platforms for AI research to address the infrastructural shortcomings that beset the country. We could also take advantage of the Atal Innovation Mission and, in particular, the Atal Tinkering Labs to make equipment, training, and sandbox facilities available to schools across the country with the objective of improving AI development in people from a young age.

One of the more interesting suggestions was to incentivize entrepreneurs by earmarking funds in the form of challenge grants that could be awarded to teams that achieve extraordinary breakthroughs in solving identified problems. For instance, it was suggested that we could set ourselves the target of ensuring that, in time for our 75th Independence Day, we should build the technology required to conduct simultaneous real-time translation across 30 different Indian languages of the Prime Minister’s speech. To make it sufficiently challenging, the technology should be able to handle a live and unrehearsed question and answer session between different people speaking in each of the 30 different languages.

The reason a moonshot like this will be useful is because beyond an immediate demonstration of AI capability, success will give us advanced abilities in natural language processing across multiple Indian languages. I don’t need to spell out how something like this can be used across a number of different domains in a country as diverse as India.

We have the option of building our AI future using incentives or diktats. It is easy to force companies collecting our data to store it here in the hope that we can learn through osmosis some of the techniques they have developed to use for ourselves.

However, it will probably be of much greater long-term benefit to develop our own AI models using data we already have or can easily collect to solve problems that are uniquely ours.

Rahul Matthan is a partner at Trilegal and author of ‘Privacy 3.0: Unlocking Our Data Driven Future’

Close