Software robots, or bots for short, are more human than you think. They imitate us to a great degree. While some bots are good and benevolent, helping us find the right answers or lowest-priced air fares and hotels, the evil ones can steal our data, impersonate us and even crash our websites.
Consider this. If you have proprietary content or products and pricing information; sign-up, login and account pages; payment processors; Web forms such as contact, discussion forums, and reviews—all features of a digital age—then you are definitely a soft target for bad bots, according to the “2017 Bad Bot Report” by US-based bot detection and mitigation company Distil Networks Inc.
India ranked 10th among the bad bot originating countries in 2014, moving up dramatically to take the second place in 2015. In 2016, things improved a bit, yet India still ranked seventh among the top 10, with the US in first place, followed by the Netherlands and China, according to the report.
Most of us use bots, knowingly or unknowingly. If you use Apple Inc.’s Siri, Microsoft Corp.’s Cortana, Amazon.com Inc.’s Alexa or Google Inc.’s Allo, for instance, then you are using Artificial Intelligence, or AI, bots. You can also find smart chatbots on many messaging apps, including Facebook Messenger, WhatsApp, Kik and Telegram. A bot, simply put, is a software program that operates as an agent for a user or another program or simulates a human activity. It is any application that performs an automated task.
Search-engine crawlers, which most users are familiar with, are examples of good bots because they index websites by keywords to help people match their search-engine queries with the best set of websites for a given question. However, we also have “Feed fetcher” bots that ferry website content to mobile and Web applications, and display to users. Other good bots include commercial crawlers that are used for authorized data extractions, usually on behalf of digital marketing tools, and there are monitoring bots too, which ensure the proper functioning of various online features.
Bad bots, on the other hand, scrape data from sites without permission in a bid to reuse the data (e.g., pricing, inventory levels) and gain a competitive edge, according to the Distil Networks report. The truly ugly ones undertake criminal activities, such as fraud and outright theft.
In 2016, bad bots accounted for 19.9% of all website traffic—a 6.98% increase over 2015. Both human and bad-bot traffic numbers were higher than the prior year. Good bots decreased by 30.5% in 2016, accounting for 18.8% of all website traffic. While the percentage of bad bot traffic increased, its proportion to other traffic remained relatively constant. The reason: More people are coming online from developing nations, using multiple devices (including smartphones, tablets, work and personal laptops) to access the Internet, the report notes.
In 2016, 82% of sites with sign-up pages were victims of bot activity aimed at creating fake accounts. Among other things, bad bots resort to competitive data mining, personal and financial data harvesting, digital ad fraud, spam and transaction fraud. Bad bots also love scraping unique content or using stolen credentials to take over accounts, but the cascading secondary effects like Denial of Service (unexpected spikes in bad bot traffic that can lead to slowdowns or downtime) can be just as damaging, notes the Distil Networks report. In 2016, 97% of sites were victims of Web-scraping bots.
Some businesses even hire bad bots. Developers that create sophisticated scraping bots can earn as much as $128,000 (around Rs83 lakh) per year. Renting out bots-for-hire can cost as little as $3.33 per hour, according to Distil Networks.
Bad bots display another human trait—they lie. About 76% claimed to be the most popular browsers: Chrome, Safari, Internet Explorer and Firefox. In 2016, 16.1% of bad bots self-reported as mobile users. Mobile Internet service providers (ISPs) accounted for 9.4% of bad bot traffic, the report notes.
Bad bots also live in the cloud and are experimenting with mobile. Amazon AWS, the world’s leading cloud host, generated four times the amount of bad bot traffic (16.37%) that OVH SAS, a French ISP, did (3.94%). Clouds such as Amazon, Digital Ocean and Google Cloud were all in the top 10 bad bot originating ISPs. T-Mobile, responsible for 1.98% of bad bot traffic, popped into the top 10 as well.
Bots even have fights like humans. For instance, benevolent bots that are designed to improve articles on Wikipedia sometimes have online “fights” over content which can continue for years, according to a research paper published in February in the science journal PLOS ONE.
Editing bots on Wikipedia undo vandalism, enforce bans, check spelling, create links and import content automatically, whereas other (non-editing) bots can mine data, identify data or identify copyright infringements, according to the PLOS ONE research paper by the University of Oxford and the Alan Turing Institute in the UK.
The researchers found that German editions of Wikipedia had the fewest conflicts between bots, with each undoing another’s edits 24 times, on average, over 10 years. This shows relative efficiency, says the research paper, when compared with bots on the Portuguese Wikipedia edition, which undid another bot’s edits 185 times, on average, over 10 years. Bots on the English Wikipedia undid another bot’s work 105 times, on average, over 10 years, three times the rate of human reverts. The paper insists that while bots constitute a tiny proportion (0.1%) of Wikipedia editors, they stand behind a significant proportion of all edits. According to the researchers, the findings have implications not only for how we design artificial agents but also for how we study them, implying a need for more research into the sociology of bots.
Can we rid ourselves of bad bots? If you’re using a Web-application firewall (WAF) and are filtering out known violator user agents and Internet Protocol (IP) addresses, that’s a good start, according to the Distil Networks report. However, bad bots rotate through IPs and cycle through user agents to evade these WAF filters. You’ll need a way to differentiate humans from bad bots using headless browsers, browser automation tools, and man-in-the-browser malware, suggests Distil Networks.
Some businesses try to avoid bad bots by geo-fencing their websites—blocking users from foreign nations where the company doesn’t do business. Moreover, a whitelist (as opposed to blacklist) policy that imposes an age limit for browser versions stops up to 10% of bad bots, according to the Distil Networks report.
Governments, meanwhile, have tried to address the bad bot problem with legislation. For instance, the US Congress passed the Better Online Ticket Sales Act (BOTS) in September, aimed at prohibiting the use and sale of software that circumvents security measures on ticket-seller websites. However, while legislation is a welcome deterrent, scraping is a technical problem, and it’s difficult to legislate against those you can’t identify.