this post was submitted on 25 Apr 2026
44 points (97.8% liked)
Technology
42810 readers
222 users here now
A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.
Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.
Subcommunities on Beehaw:
This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.
founded 4 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Alright, so instead of simply saying "include external data in your training run", extend that to "and also filter the data to exclude erroneous stuff." That's a routine part of curating training data in real-world AI training as well, I was already writing a lot so I didn't feel like adding more detail there would have enhanced it.
The basic point remains the same, that real world training accounts for the things that were necessary to force model collapse to happen in that old paper I linked. It's a solved problem. We can see that it's solved by the fact that AI models continue to get better, despite an increasing amount of AI-generated data being present in the world that training data is being drawn from. Indeed, most models these days use synthetic training data that is intentionally AI-generated.
A lot of people really want to believe that AI is going to just "go away" somehow, and this notion of model collapse is a convenient way to support that belief. So it's very persistent and makes for great clickbait. But it's just not so. If nothing else, the exact same training data that was used to create those earlier models is still around. AI models are never going to get worse than they are now because if they did get worse we'd just throw them out and go back to the earlier ones that worked better, perhaps re-training with the same data but better training techniques or model architectures.
Even if it logically followed that model improvement means model collapse is a solved problem, which it absolutely doesn't, even the premise that models are improving to a significant degree is up for debate.
Model collapse may for some people be an argument used to support a hope that AI will go away, but the reality of that hope does not alter the validity of the model collapse problem.
You can tell it's not a solved problem because researchers are still trying to quantify the risk and severity of collapse - as you can see even just from the abstracts in the links I provided.
Some choice excerpts from the abstracts, for those who don't want to click the links:
It's really interesting reading a conversion between somebody who knows what they're talking about, providing sources, and a known troll (FaceDeer) who can only go "nuh-uh" and complain about ghosts.