Are you reading this because you are intrigued by the word “sex” in the headline? If you are, you’ve just proved the point that I am going to make: Nothing sells as much as sex.
Let me explain. My friend Namita Bhandare and I run a blog, Asian Window—a news aggregator that focuses on South Asia. On average, we get a modest 400-500 visitors a day. Like all bloggers, our endeavour is to be relevant and interesting, and, at the same time, draw more visitors.
We get a good deal of hits on what I call our bread-and-butter posts—these include stories on politics, foreign policy, economy, the South Asian diaspora, opinion pieces and book reviews. But we notice that what really works is stories on sex. Every day at least a dozen people continue to stumble upon our blog as they do a search for “sexy housewife porn site”. Our year-old link to the story on Savita Bhabhi (bless her creators!), the star of the eponymous Indian porn website that the government banned last year, is still a big draw, followed by a story headlined “Politics and sex in India”.
Human nature: Where there is cyberspace sexiness, a trail of spam is inevitable. Filters can’t catch them all, so you need Captcha.
But the problem is, where there is sex there is also spam. It comes in the form of comments on our blog, inviting our readers to “seduction points”. Some appear to be taking the spam route to promote their business—websites selling anything from cookware to “sexy” lingerie—others lead you to photos of escort services and worse. The built-in filters block the bulk of this unwanted traffic, but quite a few manage to sneak in. These tricky ones have to be dealt with manually, which is quite a pain.
Spammers are a clever and inventive bunch. They don’t sit and write these comments; they use a spambot—an automated computer program that circumvents the program designed to filter spam. The challenge for the good guys is to write a program that can tell the difference between who is writing the mail or comment—a person or a computer. If it’s a person, then the message is genuine; if it’s a computer program, then it’s most likely spam.
A few years ago a group of computer scientists designed a novel filter system called “Completely Automated Public Turing Test To Tell Computers and Humans Apart”—popularly known by its acronym Captcha. It’s those skewed and distorted words that you have to type in before you can post a comment on a blog, forward someone a story from a website, or fill a form online. It’s a test to determine whether the user is a human or a computer. A computer program cannot read these characters because of the way in which they are written; one has to type them manually. If you get it right, you’ve “proved” to the computer that you’re a human and not a software program.
In April, The New York Times reported that spammers have outsourced the job of cracking Captchas to India, China and Bangladesh. The going rate: about $1 (Rs47.3) for a thousand solved Captchas.
Computer scientists who designed this brilliant software say that about 200 million Captchas are typed every day, and it takes about 10 seconds to fill in the characters. Collectively, that’s 550,000 hours a day. They considered this a gross waste of “a most precious resource: human brain cycles”, and wondered how it could be put to better use.
Their solution is an ingenious piece of software called reCaptcha: instead of one, there are two distorted words. One is used to verify that you are human; the other word is used to build a digital library. And this is how it works: While scanning the page of a book, often some words get smudged. The computer cannot read these blurred words; they require a human to decipher them. So every time you are asked to fill in a reCaptcha, part of your effort goes towards digitizing online libraries—you are actually correcting scanned pages, one word at a time. At the rate of 30 million words a day, reCaptcha will have digitized all 130 years of The New York Times by the end of this year.
I have installed reCaptcha on our blog. Even if some spammers slip through the cracks, we will at least have the satisfaction of having put them to good work.
Shekhar Bhatia is a former editor, Hindustan Times, a science buff and a geek at heart.
Write to Shekhar at firstname.lastname@example.org