this post was submitted on 14 Aug 2025

477 points (98.4% liked)

Technology

77096 readers

2964 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

L4s@hackingne.ws

477

This website is for humans (localghost.dev)

submitted 4 months ago by MarcellusDrum@lemmy.ml to c/technology@lemmy.world

62 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] drmoose@lemmy.world 14 points 4 months ago (5 children)

As someone who's been on the web since the 90s I hate this.

The web was designed to be user agent agnostic. Desktop, phone, fridge, ai agents, curl, python script - whatever agent you are using shouldn't matter for access. That's the whole point of open internet, period.

[–] Allero@lemmy.today 26 points 4 months ago* (last edited 4 months ago) (1 children)

When the Web was first designed, some of the concerns we have today were nonexistent.

I believe in freedom of information, and would love for the information I share to be accessed in any way a given user wants.

But I have to stand defensive and support the author here, too. The modern LLM boom aims to essentially replace original resources with AI-generated summaries step by step. This is detrimental to the Internet, and to knowledge as we know and preserve it.

First, there is an event commonly called Google Zero, which is briefly mentioned in the article. If you don't know what it is, it is the not-so-hypothetical-anymore moment when Google (or, really, any other large player) essentially accumulates all information on the Web, feeds it to AI, and since then doesn't serve links anymore, going straight to answers based on training data. Users will jump to this - they already do - because it offers convenience. But for any independent creators it means having no audience, no money, and no means to produce new quality content, trapping users in a self-containing loop that loses nuance, actuality, and truthfulness, and stays under corporate control. This goes beyond cooking recipes and personal notes - it permeates science, political discussion, and much more.

Second, LLMs multiply traffic coming to sites, which becomes an infrastructural problem. Bots access sites at much higher rates than humans do, and when their intent is to scrape your entire website every now and again and there are dozens of them, this becomes huge.

Third, having proprietary models train on the data I provide without any attribution, copyright etc. makes giant corporation profit off my back, while at the same thing making it so that less genuine users will see what I produce. This means careers of authors, journalists etc. are dying, and this also means they are left free to abuse each and every one of us without any consent.

Fourth, and I wonder if you see it by now, LLMs and the way they represent data, along with SEO tools meant to drive information through the search bots, begin to shape how we talk. All I say doesn't have to be a list of points, yet it is. It could be less verbose, more readable, yet it is the way it is. Because when we interact with the products of such developments too much, we begin shaping our own language in a way that is less human-readable and more meant for machines, without us often being aware of it. This is a real issue of communication.

So, as much as I hate it, I'm gonna protect a lot of the data I share.

[+] drmoose@lemmy.world -26 points 4 months ago (1 children)

I fundamentally disagree with all of your claims. The web was already ruined by SEO farms and I know that cause I worked in organic growth for years, AI is not making this worse.

The traffic argument is non-sensical in 2025. serving 200kb html file costs literally nothing. So if you want to write and share something you can do it without spending a single penny, ever.

I understand the frustration and confusion here but all of this whining is lacking any real vision. Information should be free and accessible to all and the rest can be solved without changing this core principle.

So while you "protect" what you share we all will continue to grow and share information freely and actualy contribute real change not start breaking the looms.

[–] Allero@lemmy.today 20 points 4 months ago* (last edited 4 months ago)

Locking information into corporate-controlled loops is antithetical to freedom and accessibility.

Having singular proprietary point of entry, or even few of them, into the entire knowledge of mankind is not sharing.

This is the part people are willing to protect. Actual peer-to-peer sharing of information, with as little private choke points as possible.

And having the web ruined by SEO is not an argument to keep going. It's already worse than it should be, and search engines already provide worse quality results than before. This needs to be reversed, not reinforced.

[–] whyNotSquirrel@sh.itjust.works 14 points 4 months ago (2 children)

Open until your server is down because LLM are overloading it

[–] kescusay@lemmy.world 15 points 4 months ago (1 children)

At my company, we had to implement all sorts of WAF rules precisely for that reason. Those things are fucking aggressive.

[–] bravesilvernest@lemmy.ml 8 points 4 months ago

Same. And just because page size is "low" doesn't mean shit when they're flooding requests. Try having public research data and watch how much your costs go up just due to load balancer throughput.

[+] drmoose@lemmy.world -22 points 4 months ago (1 children)

overloading from 200kb of html? We're not in dialup era anymore

[–] xthexder@l.sw0.com 4 points 4 months ago (1 children)

It's one request Michael, what could it cost? 200kb?

[–] drmoose@lemmy.world -5 points 4 months ago (1 children)

Literally free. So many tech illiterate people in this threat that still think website hosting costs money.

[–] Shivering6658@midwest.social 2 points 4 months ago (1 children)

I disagree, someone has to pay for the internet, server and electricity to serve your supposedly "free" page. If you are talking about something like GitHub pages? Not really a solution if Microsoft, just...i dont know, got rid of freebies once they have you locked into hosting your content with them? Corpos would never ensure thay have you locked in as a consumer before turning the screws

[–] drmoose@lemmy.world -2 points 4 months ago* (last edited 4 months ago) (1 children)

There are thousands of alternatives to github pages that will host your content for free. Taking your static website to a different host literally takes 5 minutes. At the scale of Github or Netlify your 1gb/mo bandwidth blog website is a rounding error and well worth the potential conversion price. The ignorance in this thread is astounding.

You can "disagree" all you want but reality is that information hosting made huge gains in the past 20 years to the point where the only cost to share a blog these days is the domain name. Even if you don't want to feed the "free corporate machines" then you can easily host it for 5$/mo and if you can't afford that I don't know what to tell you.

[–] Shivering6658@midwest.social 3 points 4 months ago* (last edited 4 months ago) (1 children)

Okay sure the internet is free and corpos are always going to give away web hosting...or i do self host and know that someone has to pay the bill despite it being "200 kb"...bruh, they aint giving it away for free, your content and everything you "own" are the price you pay for admission. Start up or mega corp, if you are not paying for it, you are the product

[–] drmoose@lemmy.world -1 points 4 months ago (1 children)

I literally host like 20 websites for free and here you're telling me to cry over hosting prices lol

[–] Shivering6658@midwest.social 0 points 4 months ago (1 children)

Nope. Just disagree that its free. You and your sites are the product, you just got duped into taking the free lunch

[–] drmoose@lemmy.world 0 points 4 months ago

Been eating this free lunch for over a decade with no signs of it ending. Pretty good dupe!

[–] JustARaccoon@lemmy.world 12 points 4 months ago (1 children)

They did have a lot of concerns with abuse though and you can see that in the way the cookies debate went before they were supported in their current form. I think AI crawlers tanking bandwidths for websites and misusing the data they scrape would 100% be something the Mozilla from back then would've had concerns over allowing or encouraging.

[–] drmoose@lemmy.world -1 points 4 months ago (1 children)

You're conflating two different issues. The topic is "for whom the web is for?" not banwidth distribution and optimization.

If LLM bot is being abusive then that's no different from any other user agent behaving like this and we should expand these protections from intentional/unintentional ddos irrelevant of user agent.

[–] ernest314@lemmy.zip 7 points 4 months ago* (last edited 4 months ago)

I think your starting point (allowing bot user agents to crawl the web has overlooked benefits) is a good one, but things aren't black and white--there are clear drawbacks, too. Bots obviously have an orders of magnitude higher potential for abuse; to the point where bot traffic--as it currently stands in the real world--is qualitatively different from human traffic.

we should expand these protections from intentional/unintentional ddos irrelevant of user agent.

Sure, but targeted regulation based on heuristics (in this case, user agent) is also a widely accepted practice. DUI laws exist, even though the goals (fewer murders and safer roads) are already separately regulated.

Would it be nice if we didn't have to do this? Or there were some other solution? Sure, but I have no idea where to even start, unfortunately.

[–] frezik@lemmy.blahaj.zone 7 points 4 months ago (1 children)

Instructions unclear, built whole site with nested tables.

[–] caseyweederman@lemmy.ca 3 points 4 months ago

Each one had better be in its own iframe.

[–] xeroxguts@lemmy.dbzer0.com 2 points 4 months ago

Lol this is such a bizarre comment. Back then, AI wasn't scraping everything humans made for the profit of a few. It was a non-issue, and therefore you have no standing in claiming that "that was the whole point."

This works as well on my phone as it does on my computer, and loads faster than most modern websites making it that much more accessible to MORE humans.

The web designer isn't limiting access, they are expanding on it - for humans. The people actually sentient and able to understand their words rather than just copy and recontextualize them.