Believers turn to the scriptures for solace and spiritual guidance, but researchers at Dartmouth College have turned to the Bible for a different reason. They have used the holy book to improve translations done by computers.

How? The Bible (and its versions) is a repository of an enormous amount of data, and algorithms improve when they get more data to train on. Using the data in the Bible, the researchers developed an algorithm that trained on various versions of the book and can now convert written works into different styles without losing the context of the text and understanding idiomatic expressions.

Researchers are increasingly striving to help machines translate words from one language to another the way professional translators would. This implies that machines must understand the context of words and sentences, and make sense of idioms, phrases and jokes.

However, despite the fact that billions of words are being translated daily by multilingual machine translation services such as Google Translate, Microsoft Translator and Systran’s Pure Neural Machine Translator, machines have a long way to go before they can function as fluently as humans do.

The Dartmouth research team saw in the Bible “a large, previously untapped data set of aligned parallel text (or translation/s)", according to a 23 October press statement by the university. For instance, each version of the Bible contains more than 31,000 verses that the researchers used to produce over 1.5 million unique pairings of source and translated verses for machine learning training sets.

According to the research published in the journal Royal Society Open Science, this is not the first parallel data set created for style translation. But it is the first that uses the Bible. Other texts that have been used in the past, ranging from Shakespeare to Wikipedia entries, provide data sets that are either much smaller or not as well suited for the task of translation.

“The English-language Bible comes in many different written styles, making it the perfect source text to work with for style translation," said Keith Carlson, a PhD student at Dartmouth and lead author of the research paper.

An added benefit for the research team is that the Bible is already thoroughly indexed by the consistent use of book, chapter and verse numbers. The predictable organization of the text across versions eliminates the risk of errors that could be caused by automatic methods of matching different versions of the same text.

“The Bible is a ‘divine’ data set to work with to study this task," said Daniel Rockmore, a professor of computer science at Dartmouth and a contributing author on the study. “Humans have been performing the task of organizing Bible texts for centuries, so we didn’t have to put our faith into less reliable alignment algorithms."

To define “style" for the study, the Dartmouth researchers reference sentence length, the use of passive or active voices and word choice that could result in texts with varying degrees of simplicity or formality. The team used 34 stylistically distinct Bible versions ranging in linguistic complexity from the “King James Version" to the “Bible in Basic English". The texts were fed into two algorithms—a statistical machine translation system called “Moses" and a neural network framework commonly used in machine translation, “Seq2Seq". Neural networks are algorithms modelled on the human brain.

While different versions of the Bible were used to train the computer code, systems could ultimately be developed that translate the style of any written text for different audiences. As an example, a style translator could take an English-language selection from Moby Dick and translate it into different versions suitable for young readers, non-native English speakers, or any one of a variety of audiences, according to the researchers.

Close