853
The AI company Perplexity is complaining their bots can't bypass Cloudflare's firewall
(www.searchenginejournal.com)
This is a most excellent place for technology news and articles.
This isn’t about AI crawlers. This is about users using AI tools.
There’s a massive difference in server load between a user summarizing one page from your site and a bot trying to hit every page simultaneously.
Should Cloudflare block users who use ad block extensions in their browser now?
The point of the article is that Cloudflare is blocking legitimate traffic, created by individual humans, by classifying that traffic as bot traffic.
Bot traffic is blocked because it creates outsized server load, this is something that user created traffic doesn’t do.
People use Cloudflare to protect their sites against bot traffic so that human users can access the site without it being ddos’d by bot traffic. By classifying user generated traffic and scraper generated traffic as the same thing, Cloudflare is incorrectly classifying traffic and blocking human users from accessing websites,
Websites are not able to opt out of this classification scheme. If they want to use Cloudflare for bot protection then they have to also agree that users using AI tools cannot access their sites even if the website owner wants to allow it. Cloudflare is blocking legitimate traffic and not allowing their customers to opt out of this scheme.
It should be pretty easy to understand how a website owner would be upset if their users couldn’t access their website.
And their "AI tool" looks just like the hundreds of AI scraping bots. And I've already said the answer is easy. They need to differentiate themselves enough to convince cloudflare to make an exception for them.
Until then, they're "just another AI company scraping data"
Well, Cloudflare is adding, to the control panel, the ability to whitelist Perplexity and other AI sources (default: on).
Looks like they differentiated themselves enough.
That option is only likely to be for paid accounts. The freebie users like me have to make our own anti bot WAF rules. Or, as I do, just toss every page I expect a user to be using via managed challenge. Adding exceptions uses up precious space in those rules which I've used to put in exceptions for genuine instance to instance traffic.
But I am glad they were able to convince cloudflare. Good for them.