Big Data enters Indian policy
The view from above as your plane lands in Mumbai during the monsoon months is a revelation. Some parts of the city are a sea of blue tarpaulins. Home owners apparently use them on terraces to prevent rainwater from seeping into living spaces. I often wonder whether the presence of tarpaulins is a proxy indicator of the poor quality of housing stock in Mumbai.
Economists have traditionally used numbers in their work. They have now begun to use images as well. The use of such satellite images is part of a broader shift towards the use of newer types of inputs for policymaking. There seems to be some change in India as well.
The second volume of the Economic Survey released by the Union finance ministry last week shows that innovative use of images has begun to make its overdue presence felt in India as well.
The eighth chapter uses satellite data to see whether India is more urbanized than most traditional indicators suggest. There are two sets of traditional indicators. The administrative metric to define a town depends on whether it is governed by an urban local body such as a municipality. The census uses three metrics to identify a town—the population should be more than 5,000, the density of the settlement should be at least 400 people per sq. km, and more than three out of every four residents should have employment outside agriculture. Satellite images show that India is far more urbanized than metrics other than the traditional ones suggest.
The finance ministry has used a chart based on work done by urban scholars at the IDFC Institute in Mumbai to show how the density of built-up area in the Kozhikode Metropolitan Area spread between 1975 and 2014. The Economic Survey also uses satellite data from the Global Human Settlement Layer to map the density of built-up area in the country as a whole.
Economists have now begun to use machine learning—or the use of computer algorithms that learn from data—to extract information from satellite images. The finance ministry has also used this new empirical tool in its recent work on urbanization.
There were two similar examples in the first volume of the Economic Survey that was released in January. These examples did not involve satellite images but the use of Big Data for policy analysis.
The first example was for the estimation of internal migration. India is a country on the move. However, most of the information about internal migration comes from either the census or sample surveys. The economists in the finance ministry used monthly data on unreserved railway passenger travel over five years as a proxy for migration in search of work. The data was collated for every pair of railway stations in India.
The extent of migration in this analysis based on railway passenger data was far higher than what the census suggests. The use of Big Data also allows policymakers to understand the direction of labour flows—down to the district level. The standard gravity model of trade was adapted to understand internal migration. The extent of migration is directly proportional to the size of two regional economies and indirectly proportional to the distance between them.
The final example is the way interstate trade in India was estimated using Big Data, both trade between firms and within the same firm. The source of the data was the new GSTN, or the Goods and Services Tax Network, that has been put in place as part of the technological backbone of the new indirect tax system.
The economists used Central sales tax invoices for trade between two states to estimate the extent to which states trade with each other. There were gaps in the data—the name of the exporting state or the tax identification number of a firm. A geographic information system (GIS) software was used to identify the location of firms. These were then spatially merged with a state shapefile to identify the location of firms.
Earlier this year, the Office of National Statistics in the UK opened a new campus where new types of data will be used to understand economic change—traffic sensors to measure economic activity, mobile phone data to understand commuter traffic patterns and satellite images to estimate population change. Night lights have already been extensively used by economists to understand regional patterns in economic growth. Satellite images have been used to estimate the expected yields from farms based on the colour of the standing crop.
The use of all this data may seem intrusive to privacy activists, and their concerns are genuine. Optimists argue that policymaking will be better informed with the ability to analyse large amounts of data in real time. Google chief economist Hal Varian famously said at a conference hosted in 2014 by the European Central Bank: “We don’t have any better ways to predict the future. What we’re working on is predicting the present.”
Note: Anybody wanting to understand these emerging issues better would do well to read two review articles: (1) “Big Data: New Tricks For Econometrics” by Hal Varian, Journal Of Economic Perspectives, Spring 2014; (2) “The View From Above: Applications Of Satellite Data In Economics” by Dave Donaldson and Adam Storeygard, Journal Of Economic Perspectives, Fall 2016.
Niranjan Rajadhyaksha is executive editor of Mint.
Comments are welcome at email@example.com. Read Niranjan’s previous Mint columns at www.livemint.com/cafeeconomics