As trends go, how does sushi fare in comparison with sausage? When did coffee become more popular than tea? Are we worried more about diabetes or malaria? And who is more famous in history: Sigmund Freud, Albert Einstein or Charles Darwin?
Also read | Shekhar Bhatia’s earlier articles
One way of answering these questions would be by counting the number of times these names and words appear in books published over a very long time span. But there are millions of books in the world. A post on the Inside Google Books blog says that as of now, there are about 130 million books in the world. To be precise, the number is 129,864,880.
Some years ago Google started to scan these books and has so far digitized nearly 15 million books published between 1500 and 2008 in English, French, Spanish, German, Russian and Chinese, drawn from libraries worldwide. That’s over 12% of all books in the world. But what do you do with this massive word bank?
Google joined hands with a team of Harvard researchers and the result of their four-year collaboration is an ingenious tool called Google Books Ngram Viewer (ngrams.googlelabs.com) that enables experts as well as ordinary people to “quantify a wide variety of cultural and historical trends”. The Harvard team calls it “culturomics”.
Also see| Books Ngram Viewer( PDF)
Launched in December, the Ngram Viewer lets you track the history of words or phrases—how often a word or phrase has appeared in books over the past 500-odd years, whether it’s still popular, and also lists the books in which it appears. You put in two or more words and it shows you the comparative trend on a timeline graph.
For example, try “diabetes, malaria, plague, cholera, polio” and it shows that we are no longer worried about plague and cholera; our biggest concern is diabetes. Or try “Islam, Christianity, Hinduism, Buddhism” and it shows a massive dip in Christianity from around 1850; and though Christianity is still at No. 1, Islam and Buddhism have been moving up from around 1920. In fact the use of the word “God” has been declining since the 1840s.
The Ngram tool draws on five million scanned books (the databank does not use all the books that Google has scanned so far) and every time you search for a word or phrase, it runs through a digital archive of 500 billion words and offers “a glimpse into their history and popularity over the years”.
Ngram Viewer is as simple to use as a search on Google. You enter a word (Ngram is case sensitive), and you get a chart. Enter two words and you get two overlapping charts. You can change the years tracked from 1500 to 2008 or anywhere in between. You get a macro view (tea and coffee started to appear more frequently around 1800) or narrow it down (from around 1968 coffee appears more often than tea, and the gap is widening by the year).
Can Ngram be used to predict culture and trends? A post on the Google Blog says: “Scholars interested in topics such as philosophy, religion, politics, art and language have employed qualitative approaches such as literary and critical analysis with great success. As more of the world’s literature becomes available online, it’s increasingly possible to apply quantitative methods to complement that research.”
Ngram tells us a lot about how culture changes—in terms of what people are talking about. Type in “slavery” and you will see peaks—when the US changed its constitution and around the time of the Civil Rights movement. It also tells us how language evolves—how old words fall into disuse or get modified. Try “hippie, bohemian, freak, nonconformist” and it shows how “hippie” surged almost overnight in the early 1960s. I tried “out of the box, outside the box” and although the latter is the correct usage, the former has always been the more popular expression.
So, who is more famous: Helen Mirren, Salman Rushdie or Bill Clinton? And can their fame be measured? Come to think of it, at what age do celebrities become famous? According to Harvard researchers, “the most famous actors tend to become famous earlier (around 30) than the most famous writers (around 40) and politicians (after 50).”
Shekhar Bhatia is a former editor, Hindustan Times, a science buff and a geek at heart.