Working to improve quality of translations, says Google’s Barak Turovsky
Barak Turovsky is the head of product and design at Google Translate and machine learning, at Google Inc. His focus is on applying advanced machine learning techniques to deliver “magical experiences” and breaking language barriers across web, mobile applications, Search, Chrome and other products.
Prior to joining Google in 2011, he was director of product for Microsoft’s mobile advertising business and head of mobile commerce at PayPal.
Turovsky will be speaking at EmTech India 2018—an emerging technology conference organized by Mint and MIT Technology Review—on 9 March in Gurgaon. Edited excerpts of an email interview:
Google Translate is said to be doing over 140 billion translations a day. How far has Google neural machine translation (GNMT) technology come today?
We started working on GNMT in early 2016, estimating that it will take us three years to launch our first language on neural technology. We finished rolling out neural technology for 96 languages (including 11 Indian languages) within 18 months.
In June last year, you had remarked that translation by machines can be likened to the capability of a five-year-old. What progress have you made in this direction? What are some of the new challenges you are working on?
There are several very exciting opportunities in GNMT we are working on. One is low resource training, leveraging multi-language training to improve quality of languages where we don’t have enough training data. This is particularly exciting for Indian languages, where we leveraged “low-resource training” to overcome a severe shortage of training data. As a result, we achieved a pretty amazing quality improvement for Indian and other languages, and are working on expanding this approach to more languages and use cases.
Another exciting opportunity is in zero-shot translation: we are now working on an approach of leveraging multi-language training to offer translations for language pairs in which we have no training data. For example, in one (machine translation) model across English, Japanese and Korean, all our training data is between English and Japanese and (English and) Korean. But we would like to translate between Japanese and Korean, but we don’t have any training data for this language pair. Using multi-language training, GNMT can still translate between two languages, in the same manner that a human being who speaks English, Japanese and Korean, can translate between Japanese and Korean. This is a very promising development, which will benefit Indian languages that generally suffer from lack of training data.
How are Google’s efforts to use “crowd-sourcing” to translate culturally sensitive text such as jokes and idioms working out?
We are very excited about the tremendous user response to our Google Translate Community crowd-sourcing initiative. Almost 10 million language enthusiasts from many countries contributed more than 700 million translated words, to help Google Translate improve quality and help users better communicate in their languages.
We invest a lot of efforts in India, given the high priority of India for Google, as well as the lack of a large corpus of training data. We ran two Translate Community Translatathons, one in 2014 (Hindi), and the other in 2015 (Hindi, Bengali, Telugu, Marathi, Tamil, Gujarati, Kannada, Malayalam and Punjabi).
User response and impact has been tremendous: globally, around 15% of all translations on Google Translate properties are served using Translate Community contributions, while for English to Hindi translations, the ratio is much higher—22%. Huge thanks to our India users for this amazing achievement!
While you have said that there will always be some limitations in machine translation compared to human translation, when do you think we will reach an inflection point, or have we reached one already?
We are continuously working on further improving quality of machine translations. Launch of GNMT was a huge step forward, but there is a lot of exciting work ahead of us.
Since GNMT requires a lot of compute power and energy, how are these challenges being met for more environment-friendly results?
Google is deeply committed to sustainability and ensuring we have a positive impact on the environment. In 2010, we began buying renewable energy at scale. Today, we are the world’s largest corporate buyer of renewable power, with commitments reaching 2.6 gigawatts (2,600 megawatts) of wind and solar energy. Our data centres, which are the engines of the internet, are some of the most efficient in the world.