Multidisciplinary data science
I have been sceptical of data science for years because I have been witness to how it has been abused. Firms are awash with operational and cost data and plough through it incessantly in order to find information that will allow them to win in the never-ending attempt to get more by spending less. I must confess that I have sold many a consulting project over the years based on promising senior executives “insights” by sifting through data so that they could make better money-oriented decisions regarding their operations. In truth, these projects seldom arrive at true insights.
Executives tend to stick to their area of industry and expect to find insights from data available in that industry, or better yet, in the microcosm of the operation they are responsible for. A few years ago, I partnered with professors at the Indian School of Business to conduct a benchmarking study on “captives” (as today’s global in-house centres were then called). The captive construct was under threat about 10 years ago since the prevailing wisdom was that it was better to outsource and “monetize” captive operations by selling them to information technology (IT) or business process outsourcing (BPO) service providers. The argument was that service providers were better equipped to capture economies of scale. This benchmarking project’s data were very detailed and covered items such as seat utilization through multiple shifts, and transport and facilities costs. The eventual aim was to help captive managers explain to their bosses abroad that they were capable of doing business in India just as cost-effectively as service provers could.
My scepticism was fueled further when I noticed that the explosion of digital data in the wake of the Internet boom of the last decade had spawned an entire genre of “big data analytics” firms that have had some spectacular failures in the last year; for instance, their predictions around the Brexit and US presidential elections were completely off the mark.
I have recently had a reset of my views after speaking with Dharam Amin, a friend who holds two Masters degrees from Columbia University: one in business and the other in engineering. Amin has worked across industries and functions over a long, multi-faceted career, and was fresh from a Columbia engineering alumni get-together when we last met. Since Amin has been trained in more than one discipline, he pointed out to me that a multidisciplinary approach to data science is what is actually needed rather than a relentless focus on money.
Amin’s alma mater has formed a “Data Science Institute” or DSI which has been extant for the last five years and looks at data in a vastly different way than most firms do. He spoke of Jeannette Wing, a professor at the DSI who came to the university recently after years at Microsoft. She says data science is still a new and emerging field and doesn’t yet have a crisp definition.
With that caveat out of the way, she goes on further to say that at its core, data science draws on inductive reasoning (as opposed to deductive reasoning). While the conclusion of a deductive process is certain, the truth of the end of an inductive reasoning process is only probable, based on the data it is given. Statistical modelling allows a researcher to systematically quantify and reason the uncertainties that are inherent to inductive reasoning. And today’s huge computing power allows for automating such types of statistical reasoning at scale.
Data science, as it is known today then, fundamentally draws on statistics and computer science. Further, it is not just about the analysis of data, since it involves the whole data “lifecycle”: generating, collecting, storing, managing, analyzing, visualizing and finally interpreting through story telling. DSI’s view is that data is present in every field and can be used to better engineer creativity and new thought in each of these fields. DSI has 12 different schools within the university that provide data to its efforts. These data come from varied disciplines; it works with data from its medical centre, while also working with the largest repository of declassified government documents in the US.
The view is that since data is everywhere, data science is broadly applicable to all sorts of disciplines.
This is heartening, since data science is now seen as being useful not only in business, where it is used solely to make more money, but also in fields as far-flung from one another as medical care and history. As Alan Watts, the British philosopher, once said: “We cannot proceed with a fully productive technology if it must inevitably Los Angeles-ize (or Bengaluru-ize?) the whole earth, poison the elements, destroy all wildlife, and sicken the bloodstream with the promiscuous use of antibiotics and insecticides. Yet this will be the certain result of the technological enterprise conducted in the hostile spirit of a conquest of nature with the main object of making money.”
On another note, Wing recounts her days at Microsoft, where she realized that the company had stepped in to fund research where traditional funding from government sources (which support academia) and private sources (which support start-ups) did not exist. This gap is a conundrum for most researchers in scientific disciplines who would like to take their discoveries outside the lab and apply them in the real world. Takers for the funding of such “long view” start-ups are sparse, and the role that Microsoft played in funding data science research 10 or more years ago was pivotal. Wing says she used to remind the CEO and chief financial officer of that firm that they would not be making certain revenues had they not invested in the space long ago.
I stand corrected; data science is real science, and if it hadn’t been funded by organizations with a long view, we would not be at the threshold of a revolution in the multidisciplinary use of data which is very different from the cost-cutting or revenue enhancement that it was typically used for in the past.
Siddharth Pai is a world-renowned technology consultant who has personally led over $20 billion in complex, first-of-a-kind outsourcing transactions.