Sorting the DNA of books3 min read . Updated: 23 Sep 2011, 11:26 PM IST
Sorting the DNA of books
Sorting the DNA of books
Inever recommend books—it takes a shamanistic insight to know what makes a person pick one book over another. The team of the Book Genome Project believes that a computer can do what many people are no good at, by analysing the building blocks, or DNA, of a book.
Started eight years ago in 2003 as a University of Idaho project of Aaron Stanton’s, founder and CEO of Booklamp, the Book Genome Project analyses books and discovers the different elements of writing in them. The public face of the project, Booklamp.org, launched on 15 August after two years of beta testing. The team scans books acquired through ties with publishers for language, character and theme, to form links between them.
Stanton told Mint about the project on email, talking about the elements of analysis, the “DNA" and “RNA" of books. He writes, “We look at both the story, that is the thematic elements, and the language, things like the pacing, dialogue and descriptive lines in the story."
The goal is to help readers find new books of their choice more effectively than possible through either their buying history or recommendations from friends—because “how many people will be able to recommend thousands of books across all genres".
Right now, Booklamp makes recommendations based on a database of just around 20,000 books, so they are not always useful, but this “will get better as we get more books into the database", says Stanton.
So what goes into a book, according to the Book Genome Project?
The two components, language and story, are divided into sub-categories. “Language is made up of pacing, perspective, description, density, motion, and dialogue. Pacing refers to the way paragraphs are broken down, and a scene with high pacing will be one where the reader’s eye moves down quickly. Motion has to do with physical motion, action in the book," says Stanton, “and is about the technical aspects of an author’s writing style. There is also story DNA, the thematic content of a book. There are more than 2,000 thematic elements that are all measured and categorized."
Booklamp is able to track elements in books that readers might not have noticed actively, but which form part of why they like a book. “Each ingredient is measured, and each text is broken down and categorized in terms of its thematic ‘expressions’. Some books express more romance than crime, others more nature than cities."
He adds, “Each ‘gene’ of the Story DNA is measured relative to the other genes in a given book and in relation to the dominant themes of the entire corpus."
That sounds confusing, but using Booklamp helps understand the process. For example, search for The Da Vinci Code, by Dan Brown and the results will show other books about ancient conspiracies, like The Templar Legacy, by Steve Berry.
Click “Show DNA" and you’ll see that the writing is balanced, with equal parts motion, description and dialogue, while story elements like history, religious institutions and culture are prominent, as are secrets.
You can decide which genres to make matches from, so choosing only humour, for example, shows Under the Rose, by Diana Peterfreund, which has a similar writing style and themes like religion and mystery.
As more books are added to the database, such recommendations will become even more useful, according to Stanton.
“We make no claims to rightness," he says, adding that Booklamp is continuing to evolve, and that the priority is to add more books. “Sometimes a suggestion is amazing—suggesting the works of Richard Bachman, a Stephen King pseudonym, when comparing against his books. At other times, the connection isn’t obvious."
In contrast to Booklamp, personal recommendations are based on exposure to fewer books, while buying history (like Amazon) tends to mostly show other books by the same author, or at least of the same genre.
Compare this with searching for Frank Herbert’s Dune, turning off science fiction, and seeing James Clavell’s Gai-Jin. It’s an excellent match, but it’s unlikely that anyone would have thought of making the connection, and it wouldn’t have come up in any buying database either. It’s almost as if a friend who knows me well made the suggestion.