this post was submitted on 19 Aug 2025

866 points (99.3% liked)

Technology

77096 readers

2976 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

L4s@hackingne.ws

866

The AI company Perplexity is complaining their bots can't bypass Cloudflare's firewall (www.searchenginejournal.com)

submitted 3 months ago* (last edited 3 months ago) by Davriellelouna@lemmy.world to c/technology@lemmy.world

246 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] GamingChairModel@lemmy.world 19 points 3 months ago (3 children)

gaining unauthorized access to a computer system

And my point is that defining "unauthorized" to include visitors using unauthorized tools/methods to access a publicly visible resource would be a policy disaster.

If I put a banner on my site that says "by visiting my site you agree not to modify the scripts or ads displayed on the site," does that make my visit with an ad blocker "unauthorized" under the CFAA? I think the answer should obviously be "no," and that the way to define "authorization" is whether the website puts up some kind of login/authentication mechanism to block or allow specific users, not to put a simple request to the visiting public to please respect the rules of the site.

To me, a robots.txt is more like a friendly request to unauthenticated visitors than it is a technical implementation of some kind of authentication mechanism.

Scraping isn't hacking. I agree with the Third Circuit and the EFF: If the website owner makes a resource available to visitors without authentication, then accessing those resources isn't a crime, even if the website owner didn't intend for site visitors to use that specific method.

[–] Glitchvid@lemmy.world 19 points 3 months ago* (last edited 3 months ago) (3 children)

When sites put challenges like Anubis or other measures to authenticate that the viewer isn't a robot, and scrapers then employ measures to thwart that authentication (via spoofing or other means) I think that's a reasonable violation of the CFAA in spirit — especially since these mass scraping activities are getting attention for the damage they are causing to site operators (another factor in the CFAA, and one that would promote this to felony activity.)

The fact is these laws are already on the books, we may as well utilize them to shut down this objectively harmful activity AI scrapers are doing.

[–] ubergeek@lemmy.today 11 points 3 months ago (1 children)

The fact is these laws are already on the books, we may as well utilize them to shut down this objectively harmful activity AI scrapers are doing.

Silly plebe! Those laws are there to target the working class, not to be used against corporations. See: Copyright.

[–] RangerAndTheCat@lemmy.dbzer0.com 4 points 3 months ago

[–] tomalley8342@lemmy.world 8 points 3 months ago (1 children)

Nah, that would also mean using Newpipe, YoutubeDL, Revanced, and Tachiyomi would be a crime, and it would only take the re-introduction of WEI to extend that criminalization to the rest of the web ecosystem. It would be extremely shortsighted and foolish of me to cheer on the criminalization of user spoofing and browser automation because of this.

[–] Glitchvid@lemmy.world 5 points 3 months ago* (last edited 3 months ago) (1 children)

Do you think DoS/DDoS activities should be criminal?

If you're a site operator and the mass AI scraping is genuinely causing operational problems (not hard to imagine, I've seen what it does to my hosted repositories pages) should there be recourse? Especially if you're actively trying to prevent that activity (revoking consent in cookies, authorization captchas).

In general I think the idea of "your right to swing your fists ends at my face" applies reasonably well here — these AI scraping companies are giving lots of admins bloody noses and need to be held accountable.

I really am amenable to arguments wrt the right to an open web, but look at how many sites are hiding behind CF and other portals, or outright becoming hostile to any scraping at all; we're already seeing the rapid death of the ideal because of these malicious scrapers, and we should be using all available recourse to stop this bleeding.

[–] tomalley8342@lemmy.world 3 points 3 months ago (1 children)

DoS attacks are already a crime, so of course the need for some kind of solution is clear. But any proposal that gatekeeps the internet and restricts the freedoms with which the user can interact with it is no solution at all. To me, the openness of the web shouldn't be something that people just consider, or are amenable to. It should be the foundation in which all reasonable proposals should consider as a principle truth.

[–] ubergeek@lemmy.today 2 points 3 months ago (1 children)

How "open" a website is, is up to the owner, and that's all. Unless we're talking about de-privatizing the internet as a whole, here.

[–] tomalley8342@lemmy.world 2 points 3 months ago (1 children)

How “open” a website is, is up to the owner, and that’s all.

As someone who registered this account on this platform in response to Reddit's API restrictions, it would be hypocritical of me to accept such a belief.

[–] ubergeek@lemmy.today 1 points 3 months ago (1 children)

Well, until we abolish capitalism, that's the state of things. Unless you feel like Nazis MUST be freely given access to everything too?

[–] tomalley8342@lemmy.world 2 points 3 months ago

Well, until we abolish capitalism, that’s the state of things.

I can see that things are the way things are. Accepting it is a different matter.

Unless you feel like Nazis MUST be freely given access to everything too?

To me, the "access" that I am referring to (the interface with which you gain access to a service) and that "access" (your behavior once you have gained access to a service) are different topics. The same distinction can be made with the concern over DoS attacks mentioned earlier in the thread. The user's behavior of overwhelming a site's traffic is the root concern, not the interface that the user is connecting with.

[–] Aatube@lemmy.dbzer0.com 4 points 3 months ago

That same logic is how Aaron Swartz was cornered into suicide for scraping JSTOR, something widely agreed to be a bad idea by a wide range of lawspeople including SCOTUS in its 2021 decision Van Buren v. US that struck this interpretation off the books.

[–] EncryptKeeper@lemmy.world 3 points 3 months ago (1 children)

If I put a banner on my site that says "by visiting my site you agree not to modify the scripts or ads displayed on the site," does that make my visit with an ad blocker "unauthorized" under the CFAA?

How would you “authorize” a user to access assets served by your systems based on what they do with them after they've accessed them? That doesn’t logically follow so no, that would not make an ad blocker unauthorized under the CFAA. Especially because you’re not actually taking any steps to deny these people access either.

AI scrapers on the other hand are a type of users that you’re not authorizing to begin with, and if you’re using CloudFlares bot protection you’re putting into place a system to deny them access. To purposefully circumvent that access would be considered unauthorized.

[–] GamingChairModel@lemmy.world 4 points 3 months ago

That doesn’t logically follow so no, that would not make an ad blocker unauthorized under the CFAA.

The CFAA also criminalizes "exceeding authorized access" in every place it criminalizes accessing without authorization. My position is that mere permission (in a colloquial sense, not necessarily technical IT permissions) isn't enough to define authorization. Social expectations and even contractual restrictions shouldn't be enough to define "authorization" in this criminal statute.

To purposefully circumvent that access would be considered unauthorized.

Even as a normal non-bot user who sees the cloudflare landing page because they're on a VPN or happen to share an IP address with someone who was abusing the network? No, circumventing those gatekeeping functions is no different than circumventing a paywall on a newspaper website by deleting cookies or something. Or using a VPN or relay to get around rate limiting.

The idea of criminalizing scrapers or scripts would be a policy disaster.

[–] finitebanjo@lemmy.world 1 points 3 months ago (1 children)

Site owners currently do and should have the freedom to decide who is and is not allowed to access the data, and to decide for what purpose it gets used for. Idgaf if you think scraping is malicious or not, it is and should be illegal to violate clear and obvious barriers against them at the cost of the owners and unsanctioned profit of the scrapers off of the work of the site owners.

[–] GamingChairModel@lemmy.world 0 points 3 months ago (1 children)

to decide for what purpose it gets used for

Yeah, fuck everything about that. If I'm a site visitor I should be able to do what I want with the data you send me. If I bypass your ads, or use your words to write a newspaper article that you don't like, tough shit. Publishing information is choosing not to control what happens to the information after it leaves your control.

Don't like it? Make me sign an NDA. And even then, violating an NDA isn't a crime, much less a felony punishable by years of prison time.

Interpreting the CFAA to cover scraping is absurd and draconian.

[–] finitebanjo@lemmy.world 1 points 3 months ago* (last edited 3 months ago) (1 children)

If you want anybody and everyone to be able to use everything you post for any purpose, right on, good for you, but don't try to force your morality on others who rely on their writing, programming, and artworks to make a living to survive.

[–] GamingChairModel@lemmy.world 0 points 3 months ago (1 children)

I'm gonna continue to use ad blockers and yt-dlp, and if you think I'm a criminal for doing so, I'm gonna say you don't understand either technology or criminal law.

[–] finitebanjo@lemmy.world 1 points 3 months ago* (last edited 3 months ago) (1 children)

Thats a crime yeah and if Alphabet co wants to sue you for $1.34 damages then they have that right, just as we should have the right to sue them if their AI crawlers make our site unusable and plagiarize our work to the effect of thousands of dollars, or even press charges for the criminal act of intentional disruption of services.

[–] GamingChairModel@lemmy.world 0 points 3 months ago (1 children)

Thats a crime yeah and if Alphabet co wants to sue you for $1.34 damages then they have that right

So yeah, I stand by my statement that anyone thinks this is a crime, or should be a crime, has a poor understanding of either the technology or the law. In this case, even mentioning Alphabet suing for damages means that you don't know the difference between criminal law and civil law.

press charges for the criminal act of intentional disruption of services

That's not a crime, and again reveals gaps in your knowledge on this topic.

[–] finitebanjo@lemmy.world 1 points 3 months ago (1 children)

That is actually a crime, you will get prison for DDoS in USA, UK, and EU. Presumably you will disappear if you do it in China.

[–] GamingChairModel@lemmy.world 0 points 3 months ago (1 children)

you will get prison for DDoS in USA

Who said anything about DDoS? I'm using ad blockers and saving/caching/archiving websites with a single computer, and not causing damage. I'm just using the website in a way the owner doesn't like. That's not a crime, nor should it be.

[–] finitebanjo@lemmy.world 1 points 3 months ago (1 children)

press charges for the criminal act of intentional disruption of services

That’s not a crime, and again reveals gaps in your knowledge on this topic.

We did

[–] GamingChairModel@lemmy.world 0 points 3 months ago (1 children)

You were talking about $1.34 in damages, which doesn't sound like downtime or disruption.

[–] finitebanjo@lemmy.world 0 points 3 months ago

You appear to have misread

just as we should have the right to sue them if their AI crawlers make our site unusable and plagiarize our work to the effect of thousands of dollars, or even press charges for the criminal act of intentional disruption of services.

YOU caused google lost ad revenue

GOOGLE's Crawlers have crippled sites