Big data trends for CXOs3 min read . Updated: 13 May 2016, 02:13 AM IST
Modern data science can provide a formal approach to many challenges involved with using new technologies, methods and data
Information technologies have always provided a rich environment for introducing new vocabulary and new concepts. In many cases, the phenomenon is justified. It’s hard to imagine a field that changes more rapidly or more consistently over time. Unfortunately, this trend for new concepts and new terminology can also be a sort of smokescreen behind which can hide weak thinking and poorly-formulated value propositions.
Take, for example, the concept of actionable data. The term seems simple enough. Actionable data is data that can be consumed by an algorithm or used to drive the outcome of a decision or process. Many organizations are struggling today to deal with the quantity and variety of information generated within their enterprise, let alone consider new sources of data. Data may be in fact actionable, but not available at the point where decisions need to be made. The phenomenon of growth and disparity of available data continues nevertheless. In this sea of data, it is easy to ignore data that is not conveniently packaged or well understood.
Estimates are that more than 80% of data being created is unstructured, lacking a formal description of its contents, and thus even more unreachable to the decision process. Perhaps the most actionable advice for CXOs (term to describe C-level executives) regarding actionable data is to have a formal strategy for the discovery, curation, and synthesis of data in addition to traditional functions such as governance, quality assurance, and data retention.
Regardless of the approach for decision-making, there will always be a question of what data to use to address a particular challenge. Perhaps the biggest ‘Big Data’ mistake is to presuppose that any set of data is sufficient simply because it is available. Such data is commonly referred to as a convenience sample. There is no logical reason to presume that the data in hand is optimal or even sufficient to make any particular decision without some sort of analysis comparing the merits and implications of data that could be brought to bear on the problem and data that exists but which is not accessible (e.g. covert, confidential, not disclosed). Only by assessing the relative size and importance of different universes of data in some meaningful way can we rationally determine that we are using sufficient data to make a data-based decision. It is nevertheless possible to make decisions with imperfect data. However, we must have some idea of the data we are not using and a clear understanding of the constraints on the types of decisions we can make and defend.
Many new techniques can also be extremely helpful, such as data sensing approaches to recognizing that new data has become available to the enterprise and assessing its value without completely ingesting it first.
At the other end of the spectrum are the tools and methods of Big Data that are used to create insight. If we consider data to be the new oil, these tools and methods are like refineries. Again, there are some basic misconceptions. For example, there are techniques like machine learning and artificial intelligence (AI), which promise to take the drudgery out of making sense of data—and they can deliver. They cannot, however, think for us. We still need to be able to train such methods in terms of what we are looking for. Unleashing an unsupervised learning algorithm on a large dataset can be valuable in certain scenarios, but the vast majority of business decision making still requires a “goal" and some degree of provenance as to how conclusions were reached.
New methods of organizing information, such as neural networks (inspired by our understanding of human information processing), promise to help modern computer science access new insights with a degree of efficiency and speed that was the stuff of science fiction only a few decades ago, but we must remember the fundamental truth that expert knowledge, proper problem formulation, and careful attention to why a method is selected remain critical components of success.
Modern data science can provide a formal approach to many challenges involved with using new technologies, methods and data. It is critical, however, to remember that the skills that have made IT organizations successful to date are only table stakes. Simply scaling up these skills is insufficient and potentially dangerous. Constant learning, including formal processes for empirical testing of new methods and capabilities, is critical to the evolving organization.
Anthony J. Scriffignano is senior vice-president and chief data scientist at Dun and Bradstreet.