this post was submitted on 18 Aug 2025

1147 points (99.0% liked)

Technology

76415 readers

3499 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

L4s@hackingne.ws

1147

Codeberg: army of AI crawlers are extremely slowing us; AI crawlers learned how to solve the Anubis challenges. (i.imgur.com)

submitted 2 months ago by Pro@programming.dev to c/technology@lemmy.world

235 comments fedilink hide all child comments

cross-posted from: https://programming.dev/post/35852706

Source.

top 50 comments

sorted by: hot top controversial new old

[–] SufferingSteve@feddit.nu 312 points 2 months ago* (last edited 2 months ago) (8 children)

There once was a dream of the semantic web, also known as web2. The semantic web could have enabled easy to ingest information of webpages, removing soo much of the computation required to get the information. Thus preventing much of the air crawling cpu overhead.

What we got as web2 instead was social media. Destroying facts and making people depressed at a newer before seen rate.

Web3 was about enabling us to securely transfer value between people digitally and without middlemen.

What crypto gave us was fraud, expensive jpgs and scams. The term web is now even so eroded that it has lost much of its meaning. The information age gave way for the misinformation age, where everything is fake.

[–] Marshezezz@lemmy.blahaj.zone 99 points 2 months ago (40 children)

Capitalism is grand, innit. Wait, not grand, I meant to say cancer

load more comments (40 replies)

[–] tourist@lemmy.world 66 points 2 months ago (6 children)

Web3 was about enabling us to securely transfer value between people digitally and without middlemen.

It's ironic that the middlemen showed up anyway and busted all the security of those transfers

You want some bipcoin to buy weed drugs on the slip road? Don't bother figuring out how to set up that wallet shit, come to our nifty token exchange where you can buy and sell all kinds of bipcoins

oh btw every government on the planet showed up and dug through our insecure records. hope you weren't actually buying shroom drugs on the slip rod

also we got hacked, you lost all your bipcoins sorry

At least, that's my recollection of events. I was getting my illegal narcotics the old fashioned way.

[–] Chee_Koala@lemmy.world 20 points 2 months ago (4 children)

the old fashioned way.

A whole swath of trained toads using a special made tube network?

[–] tourist@lemmy.world 18 points 2 months ago (2 children)

getting into a car with a stranger who said he was 15 minutes away two hours ago

load more comments (2 replies)

load more comments (3 replies)

[–] raspberriesareyummy@lemmy.world 16 points 2 months ago

also we got hacked, you lost all your bipcoins sorry

aaaaaaaaand - it's gone!

load more comments (4 replies)

[–] muusemuuse@sh.itjust.works 19 points 2 months ago

Sound like it went the same way everything else went. The less money is involved the more trustworthy it is.

[–] GreenShimada@lemmy.world 18 points 2 months ago (2 children)

Mr. Internet, tear down these walls! (for all these walled gardens)

Return the internet to the wild. Let it run feral like dinosaurs on an island.

Let the grannies and idiots stick themselves in the reservations and asylums run by billionaires.

Let's all make Neocities pages about our hobbies and dirtiest, innermost thoughts. With gifs all over.

load more comments (2 replies)

load more comments (4 replies)

[–] zifk@sh.itjust.works 99 points 2 months ago (2 children)

Anubis isn't supposed to be hard to avoid, but expensive to avoid. Not really surprised that a big company might be willing to throw a bunch of cash at it.

[–] sudo@programming.dev 35 points 2 months ago* (last edited 2 months ago) (3 children)

This is what I've kept saying about POW being a shit bot management tactic. Its a flat tax across all users, real or fake. The fake users are making money to access your site and will just eat the added expense. You can raise the tax to cost more than what your data is worth to them, but that also affects your real users. Nothing about Anubis even attempts to differentiate between bots and real users.

If the bots take the time, they can set up a pipeline to solve Anubis tokens outside of the browser more efficiently than real users.

[–] black_flag@lemmy.dbzer0.com 18 points 2 months ago (3 children)

Yeah but ai companies are losing money so in the long run Anubis seems like it should eventually return to working.

load more comments (3 replies)

load more comments (2 replies)

[–] randomblock1@lemmy.world 19 points 2 months ago (2 children)

No, it's expensive to comply (at a massive scale), but easy to avoid. Just change the user agent. There's even a dedicated extension for bypassing Anubis.

Even then AI servers have plenty of compute, it realistically doesn’t cost much. Maybe like a thousandth of a cent per solve? They're spending billions on GPU power, they don't care.

I've been saying this since day 1 of Anubis but nobody wants to hear it.

load more comments (2 replies)

[–] PhilipTheBucket@piefed.social 96 points 2 months ago (12 children)

I feel like at some point it needs to be active response. Phase 1 is a teergrube type of slowness to muck up the crawlers, with warnings in the headers and response body, and then phase 2 is a DDOS in response or maybe just a drone strike and cut out the middleman. Once you've actively evading Anubis, fuckin' game on.

[–] turbowafflz@lemmy.world 112 points 2 months ago (4 children)

I think the best thing to do is to not block them when they're detected but poison them instead. Feed them tons of text generated by tiny old language models, it's harder to detect and also messes up their training and makes the models less reliable. Of course you would want to do that on a separate server so it doesn't slow down real users, but you probably don't need much power since the scrapers probably don't really care about the speed

[–] xthexder@l.sw0.com 63 points 2 months ago

I love catching bots in tarpits, it's actually quite fun

[–] 31ank@ani.social 47 points 2 months ago* (last edited 2 months ago)

Some guy also used zip bombs against AI crawlers, don't know if it still works. Link to the lemmy post

[–] phx@lemmy.ca 20 points 2 months ago (2 children)

Yeah that was my thought. Don't reject them, that's obvious and they'll work around it. Feed them shit data - but not too obviously shit - and they'll not only swallow it but eventually build up to levels where it compromises them.

I've suggested the same for plain old non-AI data stealing. Make the data useless to them and cost more work to separate good from bad, and they'll eventually either sod off or die.

A low power AI actually seems like a good way to generate a ton of believable - but bad - data that can be used to fight the bad AI's. It doesn't need to be done real-time either as datasets can be generated in advance

load more comments (2 replies)

[–] sudo@programming.dev 18 points 2 months ago* (last edited 2 months ago) (2 children)

The problem is primarily the resource drain on the server and tarpitting tactics usually increase that resource burden by maintaining the open connections.

load more comments (2 replies)

[–] TIN@feddit.uk 36 points 2 months ago

Wasn't this called black ice in Neuromancer? Security systems that actively tried to harm the hacker?

load more comments (10 replies)

[–] prole@lemmy.blahaj.zone 85 points 2 months ago (5 children)

Tech bros just actively making the internet worse for everyone.

[–] ShaggySnacks@lemmy.myserv.one 65 points 2 months ago

Tech bros just actively making ~~the internet~~ society worse for everyone.

FTFY.

load more comments (4 replies)

[–] UnderpantsWeevil@lemmy.world 48 points 2 months ago (2 children)

I mean, we really have to ask ourselves - as a civilization - whether human collaboration is more important than AI data harvesting.

[–] devfuuu@lemmy.world 25 points 2 months ago* (last edited 2 months ago) (2 children)

I think every company in the world is telling everyone for a few months now that what matter is AI data harvesting. There's not even a hint of it being a question. You either accept the AI overlords or get out of the internet. Our ONLY purpose it to feed the machine, anything else is irrelevant. Play along or you shall be removed.

load more comments (2 replies)

load more comments (1 replies)

[–] londos@lemmy.world 44 points 2 months ago (4 children)

Can there be a challenge that actually does some maliciously useful compute? Like make their crawlers mine bitcoin or something.

[–] raspberriesareyummy@lemmy.world 67 points 2 months ago (6 children)

Did you just say use the words "useful" and "bitcoin" in the same sentence? o_O

[–] polle@feddit.org 74 points 2 months ago (9 children)

The saddest part is, we thought crypto was the biggest waste of energy ever and then the LLMs entered the chat.

load more comments (9 replies)

[–] kameecoding@lemmy.world 33 points 2 months ago (8 children)

Bro couldn't even bring himself to mention protein folding because that's too socialist I guess.

[–] londos@lemmy.world 19 points 2 months ago* (last edited 2 months ago)

You're 100% right. I just grasped at the first example I could think of where the crawlers could do free work. Yours is much better. Left is best.

load more comments (7 replies)

load more comments (4 replies)

load more comments (3 replies)

[–] oeuf@slrpnk.net 43 points 2 months ago (2 children)

Crazy. DDoS attacks are illegal here in the UK.

load more comments (2 replies)

[–] thatonecoder@lemmy.ca 41 points 2 months ago (1 children)

I know this is the most ridiculous idea, but we need to pack our bags and make a new internet protocol, to separate us from the rest, at least for a while. Either way, most “modern” internet things (looking at you, JavaScript) are not modern at all, and starting over might help more than any of us could imagine.

[–] Pro@programming.dev 44 points 2 months ago* (last edited 2 months ago) (13 children)

Like Gemini?

From official Website:

Gemini is a new internet technology supporting an electronic library of interconnected text documents. That's not a new idea, but it's not old fashioned either. It's timeless, and deserves tools which treat it as a first class concept, not a vestigial corner case. Gemini isn't about innovation or disruption, it's about providing some respite for those who feel the internet has been disrupted enough already. We're not out to change the world or destroy other technologies. We are out to build a lightweight online space where documents are just documents, in the interests of every reader's privacy, attention and bandwidth.

load more comments (13 replies)

[–] wetbeardhairs@lemmy.dbzer0.com 35 points 2 months ago (4 children)

Gosh. Corporations are rampantly attempting to access resources so they can perform copyright infringement en-masse. I wonder if there is a legal mechanism to stop them? Oh, no there isn't because our government is fully corrupted.

load more comments (4 replies)

[–] nialv7@lemmy.world 34 points 2 months ago* (last edited 2 months ago) (1 children)

We had a trust based system for so long. No one is forced to honor robots.txt, but most big players did. Almost restores my faith in humanity a little bit. And then AI companies came and destroyed everything. This is why we can't have nice things.

[–] Shapillon@lemmy.world 19 points 2 months ago

Big players are the ones behind most AIs though.

[–] zbyte64@awful.systems 30 points 2 months ago (6 children)

Is there nightshade but for text and code? Maybe my source headers should include a bunch of special characters that then give a prompt injection. And sprinkle some nonsensical code comments before the real code comment.

load more comments (6 replies)

[–] 0x0@lemmy.zip 23 points 2 months ago (3 children)

It's always a cat-n-mouse game.

load more comments (3 replies)

[–] cupcakezealot@piefed.blahaj.zone 23 points 2 months ago

reminder to donate to codeberg and forgejo :)

[–] Kyrgizion@lemmy.world 19 points 2 months ago (3 children)

Eventually we'll have "defensive" and "offensive" llm's managing all kinds of electronic warfare automatically, effectively nullifying each other.

[–] ProdigalFrog@slrpnk.net 32 points 2 months ago (4 children)

That's actually a major plot point in Cyberpunk 2077. There's thousands of rogue AI's on the net that are constantly bombarding a giant firewall protecting the main net and everything connected to it from being taken over by the AI.

load more comments (4 replies)

load more comments (2 replies)

[–] StopSpazzing@lemmy.world 18 points 2 months ago* (last edited 2 months ago) (2 children)

Is there a migration tool? If not would be awesome to migrate everything including issues and stuff. Bet even more people would move.

[–] BlameTheAntifa@lemmy.world 17 points 2 months ago

Codeberg has very good migration tools built in. You need to do one repo at a time, but it can move issues, releases, and everything.

load more comments (1 replies)

[–] mfed1122@discuss.tchncs.de 16 points 2 months ago* (last edited 2 months ago) (11 children)

Okay what about...what about uhhh... Static site builders that render the whole page out as an image map, making it visible for humans but useless for crawlers 🤔🤔🤔

[–] lapping6596@lemmy.world 25 points 2 months ago (2 children)

Accessibility gets throw out the window?

load more comments (2 replies)

load more comments (10 replies)

load more comments