Mumbai: Last week, Google Inc. introduced a major improvement in its search algorithm to help it better determine when to give users more up-to-date and relevant results.
“Given the incredibly fast pace at which information moves in today’s world,” said Amit Singhal, a Google Fellow, and head of Google’s core ranking team, on the company’s official blogpost, “the most recent information can be from the last week, day or even minute, and depending on the search terms, the algorithm needs to be able to figure out if a result from a week ago about a TV show is recent, or if a result from a week ago about breaking news is too old.”
Delivering results: Google’s Amit Singhal (left) and Rajan Patel
An algorithm is a set of mathematical steps that solves a problem. A Google Fellow is a designation the company reserves for its elite master engineers in the area of “ranking algorithm”.
Tweaking algorithms to yield better search results is not new to Google. Over the past decade, the owner of the world’s most popular search engine has introduced many innovations such as PageRank, named after Google’s co-founder and chief executive Larry Page, which works by counting the number and quality of links to a page to determine a rough estimate of how important the website is.
In February, it had introduced another algorithmic improvement christened “Panda” to improve user experience by catching and demoting “low-quality” sites that did not provide useful original content or otherwise add much value while simultaneously giving better rankings to high-quality sites—“those with original content and information such as research, in-depth reports and insightful analysis”.
Singhal, along with numerous other Google scientists, analysts and engineers, continuously works on refining searches with a good reason. The search firm answers more than one billion questions a day from people in 181 countries and 146 languages. It indexes millions of Web pages, but the challenge is to return only the most relevant and most recent result, depending on the context of the query. Moreover, the company in February had a nearly 90% share of the global search engine market, according to research firm StatCounter.
Microsoft Corp.’s Bing search engine garnered a mere 4.37% share and was marginally ahead of Yahoo at 3.93%.
To maintain its huge lead, Google has to ensure that its searches remain relevant. With this in mind, the company’s engineers have been making around 500 changes to its search algorithms every year, or at least one change a day, according to Singhal.
One such engineer is 31-year-old Rajan Patel. Known as a “search scientist” at Google, Patel is a versatile statistician who has been on the team that developed Google Flu Trends, and even conducted statistical analyses for early-phase clinical trails when he worked at Amgen Inc.—a biotechnology company. A PhD in biostatistics from Emory University, Patel is trying to develop better ways to evaluate the quality of search results and develop new signals to improve the ranking of search results. A signal is quantitative learning from a document or query, or a combination of both, that can help produce better search results. “Today our algorithms rely on more than 200 unique signals, some of which you’d expect, like how often the search terms occur on the Web page, if they appear in the title or whether synonyms of the search terms occur on the page,” Patel says.
Google’s latest change, incidentally, builds on the momentum from its “Caffeine” Web indexing system, which was introduced in 2010, and allows the company to crawl and index the Web for fresh content quickly. Different searches have different “freshness” needs.
Changes to algorithms at Google undergo extensive quality evaluation before being released in the public domain. A typical algorithmic change begins as an idea from a engineer. It is then implemented on a test version of Google. “Before and after” results pages are generated and presented to “raters”—people who are trained to evaluate search quality. If the feedback is positive, Google may run what it terms as a “live experiment”, where it tries out the updated algorithm on a very small percentage of Google users to see (also called a “Sandbox”), for instance, how many searchers click the new top result more often.
In 2010, Google ran 13,311 precision evaluations to test whether potential algorithm changes had a positive or negative impact on the precision of its results. Based on all these experimentation, evaluation and analysis, it introduced 516 improvements to search.
The process is highly automated. In very few cases, Google has manual controls to address spam and security concerns such as malware and viruses. Google also manually intervenes in search results for legal reasons, for instance to remove child sexual abuse content (child pornography) or copyright infringing material.
Despite these improvements, some researchers have raised questions over the efficacy of the searches. According to an Experian Hitwise report released in August, more than 81.3% of searches in Yahoo Search resulted in a visit to a website, with Bing a close second at 80.6%. By contrast, Google’s success rate was significantly lower at 67.6%.
The report partly attributed Google’s “not-so-accurate” performance to the massive number of library books in its database. To scan millions of older works, Google and its library partners used optical character recognition (OCR) programmes “that are not 100% foolproof—especially when processing old texts typeset in archaic fonts or with foreign-language characters”.
Due to OCR errors, noted the report, Google Books contains a huge number of word misidentifications that can lead Internet users down false trails, especially when users conduct “one-word searches”.
Patel said he’s “not too sure how the report arrived at this conclusion”.