this post was submitted on 13 Dec 2025
76 points (96.3% liked)

Technology

77096 readers
3248 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related news or articles.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS
 

Self-hosting anything that is deemed "content" openly on the web in 2025 is a battle of attrition between you and forces who are able to buy tens of thousands of proxies to ruin your service for data they can resell.

This is depressing. Profoundly depressing. i look at the statistics board for my reverse-proxy and i never see less than 96.7% of requests classified as bots at any given moment. The web is filled with crap, bots that pretend to be real people to flood you. All of that because i want to have my little corner of the internet where i put my silly little code for other people to see.

i have to learn to protect myself from industrial actors in order to put anything online, because anything a person makes is valuable, and that value will be sucked dry by every tech giant to be emulsified, liquified, strained, and ultimately inexorably joined in an unholy mesh of learning weights.

top 5 comments
sorted by: hot top controversial new old
[–] dotslashme@infosec.pub 5 points 3 days ago (1 children)

I feel like this wouldn't reduce costs, since the load is the same, but just moved to a different daemon, in this case nginx. I for one, pay for bandwidth on my VPS, so the cost for me would be the same.

One thought I've had, is to use a slow loris technique combined with a small pool of connections and an ai poisoner, to keep the scraper occupied for as long as possible, without using a lot of bandwidth.

Maybe, the AI companies analyze the http response, realize that it is bullshit and stop sending requests for some time...

[–] nyan@lemmy.cafe 3 points 3 days ago

Rather like the proportion of spam to legitimate email.

[–] snoons@lemmy.ca 4 points 3 days ago

I need to remember this when I come across another brain dead (literally) ai zealot.

[–] cy_narrator@discuss.tchncs.de -1 points 3 days ago

I would be honored if ChatGPT or any other AI thinks my code is worth training