As for the ‘corporate control’ aspect, all this stuff is racing towards locally run anyway (since it’s free).
I am not at all sure about that. I use an XT 7900 XTX and a Framework Desktop with an AI Max 395+, both of which I got to run LLMs and diffusion models locally, so I've no certainly no personal aversion to local compute.
But there are a number of factors pulling in different directions. I am very far from certain that the end game here is local compute.
In favor of local
-
Privacy.
-
Information security. It's not that there aren't attacks that can be performed using just distribution of static models (If Anyone Builds It, We All Die has some interesting theoretical attacks along those lines), but if you're running important things at an institution that depend on some big, outside service, you're creating creating attack vectors into your company's systems. Not to mention that even if you trust the AI provider and whatever government has access to their servers, you may not trust them to be able to keep attackers out of their infrastructure. True, this also applies to many other cloud-based services, but there are a number of places that run services internally for exactly this reason.
-
No network dependency for operation, in terms of uptime. Especially for things like, say, voice recognition for places with intermittent connection, this is important.
-
Good latency. And no bandwidth restrictions. Though a lot of uses today really are not very sensitive to either.
-
For some locales, regulatory restrictions. Let's say that one is generating erotica with generative AI stuff, which is a popular application. The Brits just made portraying strangulation in pornography illegal. I suspect that if random cloud service is permitting for generation of erotic material involving strangulation, they're probably open to trouble. Random Brit person who is running a model locally may well not be in compliance with the law (I don't recall if it's just commercial provision or not) but in practical terms, it's probably not particularly enforceable. That may be a very substantial factor based on where someone lives. And the Brits are far from the most-severe. Iranian law, for example, permits execution for producing pornography involving homosexuality.
In favor of cloud
-
Power usage. This is, in 2025, very substantial. A lot of people have phones or laptops that run off batteries of limited size. Current parallel compute hardware to run powerful models at a useful rate can be pretty power hungry. My XT 7900 XTX can pull 355 watts. That's wildly outside the power budget of portable devices. An Nvidia H100 is 700W, and there are systems that use a bunch of those. Even if you need to spend some power to transfer data, it's massively outweighed by getting the parallel compute off the battery. My guess is that even if people shift some compute to be local (e.g. offline speech recognition) it may be very common for people with smartphones to use a lot of software that talks to remote servers for a lot of heavy-duty parallel compute.
-
Cooling. Even if you have a laptop plugged into wall power, you need to dissipate the heat. You can maybe use eGPU accelerators for laptops
I kind of suspect that eGPUs might see some degree of resurgence for this specific market, if they haven't already
but even then, it's noisy.
-
Proprietary models. If proprietary models wind up dominating, which I think is a very real possibility, AI service providers have a very strong incentive to keep their models private, and one way to do that is to not distribute the model.
-
Expensive hardware. Right now, a lot of the hardware is really expensive. It looks like an H100 runs maybe $30k at the moment, maybe $45k. A lot of the applications are "bursty"
you need to have access to an H100, but you don't need sustained access that will keep that expensive hardware active. As long as the costs and applications look like that, there's a very strong incentive to time-share hardware, to buy a pool of them and share them among users. If I'm using my hardware 1% of the time, I only need to pay something like 1% as much if I'm willing to use shared hardware. We used to do this back when all computers were expensive, had dumb terminal and teletypes that connected to "real" computers that ran with multiple users sharing access to hardware. That could very much again become the norm. It's true that I expect that hardware capable of a given level of parallel compute will probably tend to come down (though there's a lot of unfilled demand to meet). And it's true that the software can probably be made more hardware-efficient than it is today. Those argue for costs coming down. But it's also true that the software guys probably can produce better output and more-interesting applications if they get more-powerful hardware to play with, and that argues for upwards pressure.
- National security restrictions. One possible world we wind up in is where large parallel compute systems are restricted, because it's too dangerous to permit people to be running around with artificial superintelligences. In the Yudkowsky book I link to above, for example, the authors want international law to entirely prohibit beefy parallel-compute capability to be available to pretty much anyone, due to the risks of artificial superintelligence, and I'm pretty sure that there are also people who just want physical access to parallel compute restricted, which would be a lot easier if the only people who could get the hardware were regulated datacenters. I am not at all sure that this will actually happen, but there are people who have real security concerns here, and it might be that that position will become a consensus one in the future. Note that I think that we may already be "across the line" here with existing hardware if parallel compute can be sharded to a sufficient degree, across many smaller systems
your Bitcoin mining datacenter running racks of Nvidia 3090s might already be enough, if you can design a superintelligence that can run on it.
I think that a lot of people who say this have looked at a combination of material produced by early models and operated by humans who haven't spent time adapting to any limitations that can't be addressed on the software side. And, yeah, they had limitations ("generative AI can't do fingers!") but those have rapidly been getting ironed out.
I remember posting one of the first images I generated with Flux to a community here, a jaguar lying next to a white cat. This was me just playing around. I wouldn't have been able to tell you that it wasn't a photograph. And that was some time back, and I'm not a full-time user, professionally-aimed at trying to make use of the stuff.
kagis
Yeah, here we are.
https://sh.itjust.works/post/27441182
"Cats"
https://lemmy.today/pictrs/image/b97e6455-2c37-4343-bdc4-5907e26b1b5d.png
I could not distinguish between that and a photograph. It doesn't have the kind of artifacts that I could identify. At the time, I was shocked, because I hadn't realized that the Flux people had been doing the kind of computer vision processing on their images as part of the training process required to do that kind of lighting work at generation time. That's using a model that's over a year old
forever, at the rate things are changing
from a non-expert on just local hardware, and was just a first-pass, not a "generate 100 and pick the best", or something that had any tweaking involved.
Flux was not especially amenable, as diffusion models go, to the generation of pornography last I looked, but I am quite certain that there will be photography-oriented and real-video oriented models that will be very much aimed at pornography.
And that was done with the limited resources available in the past. There is now a lot of capital going towards advancing the field, and a lot of scale coming.