How to Combat AI Bot Traffic on Your Website

Artificial intelligence (AI) tools are becoming a staple for web developers. We use them to write and troubleshoot code, analyze data, and more. We’re finding new uses for these models every day.

The downside is in how AI models gather information. They scrape the web and index the data. For instance, AI doesn’t “know” WordPress without first indexing related documentation, tutorials, and code snippets. It doesn’t conjure answers out of thin air.

There are a few issues with this. First, this practice is controversial concerning copyrights. Is it OK for ChatGPT or Gemini to learn copyrighted content and repackage it to their users? The legal and moral ramifications are beyond our expertise. So, we’ll focus on the other elephant in the room.

The bots deployed by AI models can be a traffic nightmare for some websites. How bad is it? Wikimedia claims its bandwidth usage rose 50% due to AI scrapers.

Perhaps that’s an extreme case, given Wikimedia’s size. However, smaller organizations can also feel the impact. This hits website owners in the wallet and their site’s performance.

Blocking AI bots is one way to combat the issue. Let’s look at how to keep these tools from hogging your server resources (not to mention taking your content).

Blocking AI Bots Isn’t Easy

Keeping various bots at bay typically requires adding entries to your site’s robots.txt file. That tells a specific bot it’s not welcome. It’s up to the bot to respect your instructions, though.

Search engines usually comply. But AI tools appear to have a problem with such requests. Reports indicate that some companies ignore robots.txt and crawl sites regardless. Thus, it’s not the complete and quick fix you might have thought.

In addition, new tools and models are being released all the time. Each one unleashes a different army of bots on the web. So, even if you could block each bot via robots.txt, there’s always more to find. It’s a game of virtual whack-a-mole.

The result is an imperfect process for keeping AI bots away from your content. It requires routine checks to ensure you’re blocking all known bots. And even that isn’t foolproof.

The good news is that service providers and individual developers are keeping track. In the next section, we’ll dig into their solutions.

Methods for Blocking Those Pesky AI Bots

Traffic spikes from AI models are becoming more common, leading to some new tools for combating them. None are 100% effective, but they can help slow bots down and save you precious bandwidth.

Here are a few options worth checking out:

ai.robots.txt

Here’s a manual bot-blocking solution you can use with any website. It’s an open list of web crawlers known to belong to AI models. The list is regularly updated to include new bots as they come online.

The package comes with three methods for blocking:

  • robots.txt: A list of user agents to paste into your site’s robots.txt file. This is a reminder that the rules set in this file are voluntary. There’s no guarantee a bot will respect your request.
  • .htaccess: This file works with Apache web servers and will block bots from accessing your site. A bot with a matching user agent visiting your site will receive an error message. This is better for content protection, but repeat offenders may keep hammering your site.
  • nginx-block-ai-bots.conf: This configuration file for Nginx servers works similarly to the .htaccess file above.

This method requires ongoing maintenance but is simple to set up, provided you have server access. Check for updated bot listings and update your file accordingly.

AI Robots TXT provides manual bot blocking tools.

Block AI Crawlers

Block AI Crawlers is a WordPress plugin that automatically tweaks your site’s robots.txt file. Install it and block known AI bots with a single click.

It’s a handy tool for WordPress sites as new bots can be added via plugin updates. This reduces the burden on website owners and features a “set it and forget it” philosophy.

 Block AI Crawlers is an easy to use WordPress plugin.

Cloudflare AI Labyrinth

Cloudflare’s solution to combating AI bots is to use (wait for it) generative AI. Their AI Labyrinth tool springs into action when it detects unauthorized crawling of a site. It redirects the offending bot into a set of AI-generated content. From there, the bot wastes time and resources scanning phony web pages.

The company also uses this trap to identify bad actors. It adds them to a list where they can be blocked for good.

Cloudflare is a content delivery network (CDN) that stands between visitors and your web server. Trapping and blocking AI bots at this level prevents them from accessing your site, saving you some bandwidth.

It’s an automated tool and requires no configuration. Cloudflare users can turn it on and relax.

Cloudflare customers have access to the automated AI Labyrinth tool.

Take Control of Who’s Crawling Your Website

For all the convenience of AI tools, there is concern about how they obtain content. Allowing their bots unfettered access to websites is problematic. Over-zealous companies can slow down your site and eat up your server resources.

The problem is likely to get worse, given the absence of regulation. AI companies can choose not to comply with robots.txt requests, and no one is making them. That leaves site owners, web hosts, and security companies to pick up the slack.

The situation is similar to fighting spam. We use a combination of manual and automated tools to mitigate the issue. However, some bad actors inevitably slip through the cracks.

None of the solutions above is perfect, but they provide some relief from this new phenomenon. Here’s hoping the future brings new, more effective methods for putting bots in their place.

How to Combat AI Bot Traffic on Your Website Medianic.

Scroll to Top