Watson must continue to learn from data sets: Bryson Koehler
Bryson Koehler, chief technology officer of IBM’s recently-formed Watson & Cloud platform, is also general manager and a distinguished engineer for IBM. Prior to this role, he served as executive vice president, distinguished engineer and chief information officer and chief technology officer of The Weather Company. IBM acquired The Weather Company’s product and technology assets in January, 2016, and has a license to use the weather data and “The Weather Channel” name as part of the acquisition.
In a recent interview in Mumbai, Koehler spoke about the importance of cloud computing infrastructure and good training data sets to take advantage of the capabilities of Watson; how the company defines ‘cognitive’; and the prospects of the Watson platform in sectors other than healthcare. Edited excerpts:
Besides healthcare, which other sectors get traction from Watson?
We talk about health a lot but other industries are also using Watson. We have seen tremendous uptake in financial services. Security is another area. If you look at the constant risks of security and Watson’s ability to understand security and the state of the Internet and the networks around us, it is phenomenal. I also see massive opportunities in the oil and gas industry, as Watson understands the earth’s geology now. The Weather Company is also leveraging Watson to continue to improve weather modelling.
How has Watson been evolving, including the use of voice technology?
The evolution of Watson has been focused around the IBM cloud. Watson is inextricably linked to the infrastructure, to data, to (data) models and the ability of all of that to play together. Why is that important? Because, if you think about voice, it’s about training—how do I cut down on training time? How do I continue to learn faster? That’s where data comes up. Even with great computing horsepower and great algorithmic work, (Watson) cannot continue to learn and improve unless it has the right amount of training data. So a lot of work is related to the data side of what’s become the Watson Data Platform.
Instead of the speech team and all of the individual Watson teams having to spend a lot of time managing a lot of different training data sets, we bring the Watson Data Platform consistently across all the IBM cloud to manage data—not just for Watson but for customers as well.
Can you give details of how you are capturing speech data?
IBM has hundreds of thousands of employees around the world who are constantly having conversations—calling up help desk or HR, for instance. If you look at the number of languages IBM employees speak, IBM is a microcosm of the training data as well. We want to make sure that Watson has access to an ever-increasing data set. Three things are important when we talk about the Watson Data Platform—public data, private data and licensed data. As we work to improve Watson, the public data and the licensed data are very important. We crawl the Internet every day to collect all the information that is publicly available. IBM also has a rich history of licensed data sets—think about the weather, for example. We need to make sure that Watson continues to learn from those data sets, but we also have to ensure that our customers can keep their data private.
IBM talks a lot about cognitive. How do you define the term?
A cognitive system is a system that can understand, reason, learn and interact—you need those four elements for what we believe to be cognitive applications. A cognitive application understands the environment and the data. It can understand learn to work around the data but also it has to be interactive—whether that’s through speech, text, visualization or tactile…there has to be an interaction with the system for it to be truly cognitive. We look at cognitive as helping a human make a smarter decision.
How many firms are paying clients of Watson?
Anyone can sign up on Bluemix (cloud platform) and use Watson APIs—it’s difficult to put a number to it but it’s pervasive.