this post was submitted on 09 Jun 2026

177 points (97.3% liked)

Technology

85333 readers

4573 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 3 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

177

New York Becomes First State to Ban Bots That Scrape News Sites (news.bgov.com)

submitted 2 days ago by TryingToBeGood@reddthat.com to c/technology@lemmy.world

19 comments fedilink hide all child comments

top 19 comments

sorted by: hot top controversial new old

[–] Zarxrax@lemmy.world 33 points 2 days ago (1 children)

Inaccurate headline. The bill doesn't ban web scraping, it just requires that bots accurately identify themselves through the user agent string, and maybe some additional requirements to disclose the purpose of scraping the data.

[–] deliriousdreams@fedia.io 7 points 2 days ago (1 children)

And there's a fine if the company doesn't comply which is basically now gonna be considered the cost of doing business

[–] partofthevoice@lemmy.zip 2 points 1 day ago (1 children)

That’s still a win. Make the act less lucrative. Fewer entremenureial types to take after it.

[–] deliriousdreams@fedia.io 2 points 1 day ago

Oh I'm not saying it's not better than doing nothing. I'm just saying I'm still disatisfied with the results because of the socio economic implications of businesses being treated legally like people.

I agree with you though.

[–] toiletobserver@lemmy.world 24 points 2 days ago (1 children)

How will they enforce it? Pretty please?

[–] TryingToBeGood@reddthat.com 19 points 2 days ago (4 children)

The New York measure defines a stealth crawler as any software that retrieves, scrapes or otherwise accesses a website, including AI agents. Under the bill, the attorney general’s office would be able to sue companies that fail to disclose such activity. Violations could net civil penalties of up to $15,000 per day.

🤔

[–] Texas_Hangover@lemmy.radio 1 points 1 day ago

If said companies dont fall under new York jurisdiction, what happens then?

[–] Pika@sh.itjust.works 15 points 2 days ago* (last edited 2 days ago)

"This website or search engine is not designed for the state of new york" I assume is going to be a disclaimer we will be seeing soon.

[–] pelespirit@sh.itjust.works 8 points 2 days ago (2 children)

Soooo, that means archive.is too. This fucking sucks.

[–] XLE@piefed.social 4 points 2 days ago* (last edited 2 days ago)

I hope those tools are exempt because, just like a browser, they respond only to specific commands issued by a human user. They don't "crawl" pages in the way we describe bots that jump from page to page.

[–] Prove_your_argument@piefed.social 2 points 2 days ago

Isn’t there something like 340 news sites that actively block them already?

[–] toiletobserver@lemmy.world 2 points 2 days ago

Thanks, that becomes the cost of doing business. Harumph.

[–] AceFuzzLord@lemmy.zip 15 points 2 days ago (1 children)

The only bots I hope gets exempt are Internet Archive bots. Only bots I support.

[–] rob_t_firefly@lemmy.world 6 points 2 days ago

The article is about "stealth bots" that don't identify themselves as such. The Internet Archive bots have always been clearly identifiable.

[–] Pika@sh.itjust.works 9 points 2 days ago* (last edited 2 days ago)

Honestly, I would love if forced ident was required. but archival services need a hard exemption from being blocked as well.

[–] rob200@retrofed.com 6 points 2 days ago (2 children)

Realistically what good would it do once you already had scraped the pattern of news sites it's already over. All this is doing in actuality is preventing new start ups from competing in the ai space. so really this is the fastest enshitification world record of a medium. Whether you like or hate ai this is actually an enshitification of it. ( I hate ai.)

[–] fonix232@fedia.io 1 points 1 day ago

What are you on about?

For AI purposes the really useful part of a news site is the actual news - you know, the stuff that changes practically every minute - not the "structure" of the site.

These news sites aren't being scraped for training data anymore but to provide near-realtime up to date information to the models.

Meaning e.g. Gemini can scan your news article, extract the useful information for the user, and deliver it to the user, without them ever going to your news site and providing the interaction that at the end of the day is converted to money - money your site needs to run.

[–] nullspace@lemmy.world 5 points 2 days ago

I'm guessing it's to eliminate the issue of a site not getting clicks because the article you were about to read is already summarized for you. It also opens the door for revenue negotiations for allowing their content to be scraped for that purpose, as the scraper bots would now be identified.

[–] tangeli@piefed.social 7 points 2 days ago

Why only news sites?

Any bot that ignores robots.txt should be banned.