this post was submitted on 01 Oct 2025
971 points (97.4% liked)

Technology

77096 readers
3950 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related news or articles.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] mesamunefire@piefed.social 37 points 2 months ago (11 children)

heh I wonder if all the "old" content getting messed with and/or removed is causing issues with the algorithm/scraper.

[–] NuXCOM_90Percent@lemmy.zip 39 points 2 months ago (10 children)

For unauthorized scrapers? Definitely

For paid API usage? That tends to not be public for obvious reasons but, allegedly, people have, allegedly, done tests and found "deleted" content in the results.

[–] mesamunefire@piefed.social 11 points 2 months ago (3 children)

Ive heard the same but I haven't seen real evidence anywhere, so im skeptical. But yes I agree, if they CAN get that data, it means the training data is better-ish....

But we are still on this site for a reason :)

[–] stsquad@lemmy.ml 3 points 2 months ago

It's all relative I guess. I can see why the original GPT's used the Reddit corpus for training. However I've always been a little sceptical about the quality of the training set in any social media given how much it exaggerates the extremes of people's behaviour.

load more comments (2 replies)
load more comments (8 replies)
load more comments (8 replies)