A local online retail company was paying a lot for their ecommerce hosting. They had a decent server and Content Delivery Network. Unfortunately, the site was suffering from slow response times and time-out errors throughout the day. Following an analysis it was discovered that 84% of their site traffic was from third parties who were aggressively scraping content from their website using bots. These bots (short for ‘robots’) are automated, computer-generated programs that visit your site with a pre-programmed purpose. For example, GoogleBot is a web crawler. It visits your site and takes away your content to include in the Google search engine allowing customers to find you. GoogleBot (provided it is from Google) is a good bot.
Ecommerce managers should be familiar with DDOS attacks These happen via bad bots whose objective is to take down a server by bombarding it with traffic requests. More sophisticated bad bots have application layer objectives for competitive data mining, personal and financial data harvesting, digital ad fraud or transaction fraud.
Examples of competitive data mining are the price checker systems run by suppliers such as skuuudle, pricefy, prisync, insitetrac, ecwid etc. These companies offer paid services to scrape inventory content for business intelligence or marketing advantage. Worst of all, Financial Investment businesses scrape sites to track business performance. A report from Optimas estimates that hedge funds are expected to pay $2B in 2020 to collect and store data scraped from websites. If yours is a reasonably large ecommerce business that’s looking to sell their company then this is likely to be happening to you.
According to Distil Networks, over half of all global bad bots originate in the US. Amazon was the single largest source for Bad Bots which accounted for 18% of the global total. That said, by far the most blocked country remains Russia followed by Ukraine. For the record, bots lie with over half claiming to be Google Chrome.
From an Ecommerce Manager’s perspective all these bad bots are definitely bad and need to be defended against. Using a car analogy, it’s like you having a car and somebody has introduced a hidden handbrake that’s slowing your car down and making it less efficient. To defend against bad bots you need a Web Application Firewall. Example WAF service providers include Cloudflare, Citrix, Fortinet, Trustwave, Securi, Akamai, F5, Radware, Incapsula/Imperva and SiteLock. The three key factors around choosing what’s best for your business are cost, up-time and programmability. An example of down-time service quality can be seen on the Cloudflare blog which makes a good (if worrying) read. It’s not good when the service that you pay to protect your website actually causes your entire site to fail.
The requirement for programmability is important in case you use tools to monitor your own web server or SEO performance. Unless programmed for, tools such as ScreamingFrog are blocked by default.
Back to my story from the start of this article. The company needed a quick fix followed by a longer-term solution. Although the bots were getting launched from thousands of different IP addresses, they were all based in one particular European country. The client was not selling much to that country. Rather than blocking individual IPs, in the short-term, the solution was to block all traffic originating from France. The longer-term solution was to install an Enterprise-Level Web Application Firewall. The server calmed down.