No Stupid Questions
No such thing. Ask away!
!nostupidquestions is a community dedicated to being helpful and answering each others' questions on various topics.
The rules for posting and commenting, besides the rules defined here for lemmy.world, are as follows:
Rules (interactive)
Rule 1- All posts must be legitimate questions. All post titles must include a question.
All posts must be legitimate questions, and all post titles must include a question. Questions that are joke or trolling questions, memes, song lyrics as title, etc. are not allowed here. See Rule 6 for all exceptions.
Rule 2- Your question subject cannot be illegal or NSFW material.
Your question subject cannot be illegal or NSFW material. You will be warned first, banned second.
Rule 3- Do not seek mental, medical and professional help here.
Do not seek mental, medical and professional help here. Breaking this rule will not get you or your post removed, but it will put you at risk, and possibly in danger.
Rule 4- No self promotion or upvote-farming of any kind.
That's it.
Rule 5- No baiting or sealioning or promoting an agenda.
Questions which, instead of being of an innocuous nature, are specifically intended (based on reports and in the opinion of our crack moderation team) to bait users into ideological wars on charged political topics will be removed and the authors warned - or banned - depending on severity.
Rule 6- Regarding META posts and joke questions.
Provided it is about the community itself, you may post non-question posts using the [META] tag on your post title.
On fridays, you are allowed to post meme and troll questions, on the condition that it's in text format only, and conforms with our other rules. These posts MUST include the [NSQ Friday] tag in their title.
If you post a serious question on friday and are looking only for legitimate answers, then please include the [Serious] tag on your post. Irrelevant replies will then be removed by moderators.
Rule 7- You can't intentionally annoy, mock, or harass other members.
If you intentionally annoy, mock, harass, or discriminate against any individual member, you will be removed.
Likewise, if you are a member, sympathiser or a resemblant of a movement that is known to largely hate, mock, discriminate against, and/or want to take lives of a group of people, and you were provably vocal about your hate, then you will be banned on sight.
Rule 8- All comments should try to stay relevant to their parent content.
Rule 9- Reposts from other platforms are not allowed.
Let everyone have their own content.
Rule 10- Majority of bots aren't allowed to participate here. This includes using AI responses and summaries.
Credits
Our breathtaking icon was bestowed upon us by @Cevilia!
The greatest banner of all time: by @TheOneWithTheHair!
view the rest of the comments
This is really the fear we should all have. And I've wondered about this specifically in the case of Thiel, who seems quite off their rocker.
Some things we know.
Architectural, the underpinnings of LLM's existed long before the modern crops. Attention is all you need is basic reading these days; Google literally invented transformers, but failed to create the first llm. This is important.
Modern LLM's came through basically two aspects of scaling a transformer. First, massively scale the transformer. Second, massively scale the training dataset. This is what OpenAI did. What google missed was that the emergent properties of networks change with scale. But just scaling a large neural network alone isn't enough. You need enough data to allow it to converge on interesting and useful features.
On the first part, of scaling the network. This is basically what we've done so far, along with some cleverness around how training data is presented, to create improvements to existing generative models. Larger models, are basically better models. There is some nuance here but not much. There have been no new architecural improvements that have resulted in the kind of order of magnitude scaling in improvement we saw in the jump from lstm/GAN days, to transformers.
Now what we also know, is that its incredibly opaque what is actually presented to the public. Open source models, some are in the range of 100's of billions of parameters Most aren't that big. I have quen3-vl on my local machine, its 33 billion parameters. I think I've seen some 400b parameter models in the open source world, but I haven't bothered downloading them because I can't run them. We don't actually know how many billion parameters models like Opus-4.5 or whatever shit stack OpenAI is sending out these days. Its probably in the range of 200b-500b, which we can infer based on the upper limits of what can fit on the most advanced server grade hardware. Beyond that, its MoE, multiple models on multiple GPU's conferring results.
What we haven't seen is any kind of stepwise, order of magnitude improvement since the 3.5-4 jump open AI made a few years ago. Its been very.. iterative, which is to say, underwhelming, since 2023. Its very clear that an upper limit was reached and most of the improvements have been around QoL and nice engineering, but nothing has fundamentally or noticeably improved in terms of the underlying quality of these models. That is in and of itself interesting and there could be several explanations of this.
Getting very far beyond this takes us beyond the hardware limitations of even the most advanced manufacturing we currently have available to us. I think the most a blackwell card has is ~288GB of VRAM? Now it might be at this scale we just don't have hardware available to even try and look over the hedge to see what or how a larger model might perform. This is one explanation: we hit the memory limits of hardware and we might not see a major performance improvement until we get into the TB range of memory on GPU's.
Another explanation, could be that at the consumer level, they stopped throwing more compute resources at the problem. Remember the MoE thing? Well these companies, allegedly, are supposed to make money. Its possible that they just stopped throwing more resources at their product lines, and that more MoE does actually result in better performance.
In the first scenario I outlined, executives would be limited to the same useful, but kinda-crappy LLM's we all have access to. In the second scenario, executives might have access to super powered, high MoE versions.
If the second scenario is true and when highly clustered, llm's can demonstrate an additional stepwise performance improvement, then we're already fucked. But if this were the case, its not like western companies have a monopoly on GPUs or even models. And we're not seeing that kind of massive performance bump elsewhere, so its likely that MoE also has its limits and they've been reached at this point. Its also possible we've reached the limits of the training data. That even having consumed all of 400k's years of humanities output, and its still too dumb to draw a full glass of wine. I don't believe this, but it is possible.
Thanks for the detailed answer!