OPEN APP
Home / Opinion / Columns /  India badly needs an overarching policy framework for data governance
Listen to this article

Just like an artist creates unique masterpieces with simple colours and a chef dishes out exotic flavours from simple ingredients, data scientists glean insights by linking different datasets. Velocity, volume and variety of data is growing exponentially as it zips around the world. Despite its non-rivalrous nature, it is equated with oil and gold, thanks to the underlying value potential.

For example, besides model, age and claims history, automobile insurers can consider how a particular vehicle is driven and tax evasions can be plugged by sharing income tax data with the goods and services tax network. Yes, sometimes one-and-one may indeed add up to eleven!

Thus, interconnections are even more important than amassing standalone datasets. Hence, it is crucial to separate issues relating to value creation and value extraction.

A panoply of policy, legislative and normative data frameworks are being proposed by the central government as well as the states. These include the Personal Data Protection Bill, National Strategies for Cyber Security and Artificial Intelligence (AI) and Frameworks for Non-Personal Data and Responsible AI. And, platforms proposed across domains entail data-sharing mandates, often with government involvement.

Personal vs non-personal

Data that directly or indirectly identifies a particular individual is considered as ‘personal data’ (PD). Accordingly, every other data should be ‘non-personal data’ (NPD).

However, such mutually exclusive and orthogonal binary classifications have limitations. For example, you could be uniquely identified even in a huge crowd if you are the only one with location tracking enabled on a mobile phone even as location data is considered NPD.

Individuals have been re-identified with more than 90% accuracy by combining anonymized data (treated as NPD) with public datasets! In addition, anonymization itself could be reversible just like a spectrum analyser can show up proportions of basic colours of a particular shade.

More ways to splice!

Other data classifications include: at rest - in transit; on the edge - in the cloud; encrypted - unencrypted; structured - unstructured; low frequency - high frequency; real-time - historical; national - trans-national; physical - physiological; public sector - private sector; individual - community; raw - processed.

The list is indeed endless. Even researchers often juxtapose empirical data with simulated data to test their hypotheses.

There may be more than one way to skin a cat, but data can be sliced and diced in more ways than a ton! Yes, there are ways to cull out data at an extremely minute and infinitesimally sharp and focused manner. After all, online search results, shopping recommendations, medical treatments and even loan offers may be hyper-personalized.

Algorithmic trading in securities can be triggered based on concurrent linking of seemingly disparate datasets like weather forecast, currency exchange rate and crude oil production.

Similarly, a fraud could be detected by using an array of factors like location, frequency, quantum and, lo and behold, even how much pressure the user applies on the mobile screen and the way it is tilted!

The ability of generating ‘synthetic data’ using ‘digital twins’ also opens immense opportunities for innovation. For example, the best vaccine candidates have been and are being proactively identified by testing their efficacy against potential mutations of the SARS-CoV-2 virus even before these emerge.

policy framework

Data is the elephant in the digital room. To deal with it, India needs an overarching policy framework for data governance rather than being blindsided by discrete uni-dimensional instrumentalities based on simple binaries that may lead to cracks and overlaps.

Nobel Prize winner for Economics Ronald Coase famously said: “If you torture the data long enough, it will confess." All the same, data must be handled with care, combined with accountability and transparency to ensure its rightful usage for the larger public good.

The binaries may be useful in appreciating different aspects of data, but the Schrodinger’s cat might just meow “The-lines-they’re-a-blurrin", inspired by another Nobel laureate, Bob Dylan!

Deepak Maheshwari is Senior Fellow at Centre for The Digital Future.

Subscribe to Mint Newsletters
* Enter a valid email
* Thank you for subscribing to our newsletter.

Never miss a story! Stay connected and informed with Mint. Download our App Now!!

Close
×
Edit Profile
My ReadsRedeem a Gift CardLogout