Cloudflare, a prominent cloud service provider, has introduced a new free tool aimed at preventing AI companies from scraping content from its clients’ websites for the purpose of training large language models. This tool is available to all Cloudflare customers, including those on free plans, and will automatically update over time to identify and block offending bots.
In a blog post, Cloudflare’s team shared insights about how their clients are dealing with the surge of bots that scrape content for AI model training. According to their internal data, a significant majority of customers (85.2%) have chosen to block even the AI bots that properly identify themselves from accessing their sites.
The company also revealed the most active bots over the past year. Bytedance-owned Bytespider bot attempted to access approximately 40% of websites under Cloudflare’s management, while OpenAI’s GPTBot tried on 35%. These two bots, along with Amazonbot and ClaudeBot, made up half of the top four AI bot crawlers by number of requests on Cloudflare’s network.
However, consistently and fully blocking AI bots from accessing content has proven challenging due to the rapid pace of AI model development. Some companies have been accused of skirting or outright breaking the rules around blocking scrapers. For instance, Perplexity AI was recently accused of scraping websites without the necessary permissions.
Despite the challenges, Cloudflare is committed to addressing this issue. They expressed concern that some AI companies may persistently adapt to evade bot detection. To counter this, Cloudflare plans to continue monitoring the situation, add more bot blocks to their AI Scrapers and Crawlers rule, and evolve their machine learning models to help content creators maintain control over their content.