this post was submitted on 25 Apr 2026
44 points (97.8% liked)

Technology

42810 readers
222 users here now

A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.

Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.

Subcommunities on Beehaw:


This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.

founded 4 years ago
MODERATORS
 

If so are these programs that claim to 'poison' the training datasets effective ?

you are viewing a single comment's thread
view the rest of the comments
[–] fiat_lux@lemmy.zip 2 points 10 hours ago (1 children)

We can see that it’s solved by the fact that AI models continue to get better despite an increasing amount of AI-generated data being present in the world that training data is being drawn from.

Even if it logically followed that model improvement means model collapse is a solved problem, which it absolutely doesn't, even the premise that models are improving to a significant degree is up for debate.

MMLU pro benchmark over time line graph showing plateauing values Massive Multitask Language Understanding (MMLU) benchmark vs time 07-2023 to 01-2026

A lot of people really want to believe that AI is going to just “go away” somehow, and this notion of model collapse is a convenient way to support that belief

Model collapse may for some people be an argument used to support a hope that AI will go away, but the reality of that hope does not alter the validity of the model collapse problem.

You can tell it's not a solved problem because researchers are still trying to quantify the risk and severity of collapse - as you can see even just from the abstracts in the links I provided.

Some choice excerpts from the abstracts, for those who don't want to click the links:

Our results show that even the smallest fraction of synthetic data (e.g., as little as 1% of the total training dataset) can still lead to model collapse

...we establish ... that collapse can be avoided even as the fraction of real data vanishes. On the other hand, we prove that some assumptions ... are indeed necessary: Without them, model collapse can occur arbitrarily quickly, even when the original data is still present in the training set.

[–] XLE@piefed.social 1 points 3 hours ago

It's really interesting reading a conversion between somebody who knows what they're talking about, providing sources, and a known troll (FaceDeer) who can only go "nuh-uh" and complain about ghosts.