Anyone who possesses a large enough store of data can reasonably expect to glean powerful insights from it. These insights are more often than not used to enhance advertising revenues or ensure greater customer stickiness. In other instances, they’ve been subverted to alter our political preferences and manipulate us into taking decisions we otherwise may not have.
The ability to generate insights places those who have access to these data sets at a distinct advantage over those whose data is contained within them. It allows the former to benefit from the data in ways that the latter may not even have thought possible when they consented to provide it. Given how easily these insights can be used to harm those to whom it pertains, there is a need to mitigate the effects of this data asymmetry.
Privacy law attempts to do this by providing data principals with tools they can use to exert control over their personal data. It requires data collectors to obtain informed consent from data principals before collecting their data and forbids them from using it for any purpose other than that which has been previously notified. This is why, even if that consent has been obtained, data fiduciaries cannot collect more data than is absolutely necessary to achieve the stated purpose and are only allowed to retain that data for as long as is necessary to fulfil the stated purpose.
In India, we’ve gone one step further and built techno-legal solutions to help reduce this data asymmetry. The Data Empowerment and Protection Architecture (DEPA) framework makes it possible to extract data from the silos in which they reside and transfer it on the instructions of the data principal to other entities, which can then use it to provide other services to the data principal. This data micro-portability dilutes the historical advantage that incumbents enjoy on account of collecting data over the entire duration of their customer engagement. It eliminates data asymmetries by establishing the infrastructure that creates a competitive market for data-based services, allowing data principals to choose from a range of options as to how their data could be used for their benefit by service providers.
This, however, is not the only type of asymmetry we have to deal with in this age of big data. In a recent article, Stefaan Verhulst of GovLab at New York University pointed out that it is no longer enough to possess large stores of data—you need to know how to effectively extract value from it. Many businesses might have vast stores of data that they have accumulated over the years they have been in operation, but very few of them are able to effectively extract useful signals from that noisy data.
Without the know-how to translate data into actionable information, merely owning a large data set is of little value.
Unlike data asymmetries, which can be mitigated by making data more widely available, information asymmetries can only be addressed by radically democratizing the techniques and know-how that are necessary for extracting value from data. This know-how is largely proprietary and hard to access even in a fully competitive market. What’s more, in many instances, the computation power required far exceeds the capacity of entities for whom data analysis is not the main purpose of their business.
That said, we have in the recent past begun to see new marketplaces that address this requirement—platforms that offer access to off-the-shelf models and AI algorithms that can be used to extract value from a range of different data sets. The resulting democratization of data science has made it possible for ordinary businesses to extract value from the data they own in ways that was not previously possible. This, in turn, has begun to chip away at the information asymmetry that separated those with technical knowledge from those with data.
Before we close, there is one other type of asymmetry that is often discussed in the context of data. As technology improves, the decisions taken by or based on the suggestions of algorithms will impact us in increasingly significant ways. Today, AI is used to determine our eligibility for loans, the value of our insurance premiums and even the nature and duration of prison sentences. Each of these decisions has an effect on the lives and livelihoods of ordinary people—and any bias inherent in the algorithm can unfairly prejudice those to whom the decision applies. This is particularly true of so-called ‘black box’ algorithms in which the rational for decisions remains opaque even to the programme’s operator. This inability to understand how automated decisions are taken is what Verhulst refers to as intelligence asymmetry—and needs to be addressed to prevent harm on account of algorithmic bias.
The regulatory response to this has traditionally been to require that automated decisions be accompanied by an explanation of the basis on which that decision was arrived at. But explainability is often a trade-off against accuracy. Algorithms whose decisions can be explained are more often than not less accurate than those whose decisions take place inexplicably—in a black-box.
But is this a trade-off we are willing to make? What if a black box algorithm can accurately diagnose the lesion on your skin as malignant far sooner than any human radiologist can hope to. Would you still insist that life-saving algorithms should not exist because their decisions can’t be explained?
Rahul Matthan is a partner at Trilegal and also has a podcast by the name Ex Machina. His Twitter handle is @matthan
Catch all the Business News, Market News, Breaking News Events and Latest News Updates on Live Mint. Download The Mint News App to get Daily Market Updates.