India’s challenge of ‘dark data’
Only 16% individuals in India leave a digital footprint
Google processes tens of thousands of terabytes of data a day, Wal-mart handles and stores over a million transactions an hour, and Akamai analyses over 100 million of the digital clues that we leave about our whereabouts, hopes and desires each day to enable better targeting of advertising. The promise of Big Data is that if we can unravel the patterns in these unwieldy streams, organizations can bring the right things to the right people at the right time. It is a promise of speed and specificity. When you can get the right thing faster, you can accomplish more in a given time frame; the rate of transactions speeds up. Growth is about doing more and doing it faster. Gross domestic product itself is roughly a measure of transactions within a year. It is no wonder that Big Data is big news.
But India has a different challenge.
Only 16% individuals leave a digital footprint. What about the rest? The one billion others? Spread over more than 650,000 settlements, their individual needs, desires, hopes and quests are dark to the world. This is a problem of Dark Data, not Big Data. Without knowing, we are left to stumble through an inefficient agenda of inclusion.
Even as we move towards the Big Data future where fulfilment comes faster, we can’t ignore the Dark Data world. It represents too enormous a population to be ignored.
The more fully and accurately we understand these dark ecosystems and the people that comprise them, the more effective we can be. However, contrary to the Big Data world, where the challenge is ferreting out meaning from unwieldy streaming digital footprints, the Dark Data challenge is about finding any data at all. Addressing this requires an entirely distinct set of mechanisms.
For any organization serving this segment, every data point about its customers comes with a cost, often a burden on the customer’s time, and a challenge of accuracy. Given this, it is paramount to establish first what is most relevant to know before we go after it. In the universe of possible descriptors of human behaviour, what are the ones that matter? How will that data point drive decision-making? Is it worth the trouble? The task is not simple. If Big Data is analogous to picking up all the objects and examining their properties, Dark Data is about having to know ahead of time which object you want. This requires a deeper perspective, a different kind of exploration and research and development.
Yet even if you are certain of the value of a data point, obtaining it with useful accuracy at large scale is not simple. Let’s say you want to know something as simple as occupation. Establishing a consistent classification that is understood by a large group of field workers is not easy. Most people in the informal economy have multiple occupations. They do daily-wage labour by day and sell flowers at a bus stop in the evening, or work their fields in the morning and make and sell papads in the afternoon. Are they in agriculture, manufacturing or retail? One person may check one box, another field officer a different one. Both the design of data acquisition tools and training are essential for consistency.
Furthermore, even if the data element is straightforward, there is the critical issue of reliability.
In organizations operating in the informal economy, data is often gathered by hundreds or thousands of junior-level staff with little awareness or sensitivity to its nuances. Filling in a form is often just statutory with no checks. Strong mechanisms for data auditing are absolutely essential for accuracy. If there are no checks, people take shortcuts. Just tick any box, no one will ask.
Without relevance, consistency or accuracy of data points across an organization, it cannot deliver analysable value. The challenge of Dark Data is not trivial, but with some cleverness, and with the aid of some tools and technology, it is possible.
Tara Thiagarajan is chairperson, Madura Microfinance.
Comments are welcome at firstname.lastname@example.org