Google releases Gemma 4 open models : technology

[–] brucethemoose@lemmy.world 14 points 2 weeks ago* (last edited 2 weeks ago)

They seem to have held back the "big" locally runnable model.

It's also kinda conservative/old, architecture wise: 16-bit weights, sliding window attention interleaved with global attention. No MTP, no QAT (yet), no tightly integrated vision, no hybrid mamba like Qwen/Deepseek, nothing weird like that. It's especially glaring since we know Google is using an exotic architecture for Gemini, and has basically infinite resources for experimentation.

It also feels kinda "deep fried" like GPT-OSS to me, see: https://github.com/ikawrakow/ik_llama.cpp/issues/1572

it is acting crazy. it can't do anything without the proper chat template, or it goes crazy.

IMO it's not very interesting, especially with so many other models that run really well on desktops.

[–] brucethemoose@lemmy.world 13 points 2 weeks ago* (last edited 2 weeks ago) (1 children)

Also, for any interested, desktop inference and quantization is my autistic interest. Ask my anything.

I don't like Gemma 4 much so far, but if you want to try it anyway:

On Nvidia with no CPU offloading, watch this PR and run it with TabbyAPI: https://github.com/turboderp-org/exllamav3/pull/185
With CPU offloading, watch this PR and the mainline llama.cpp issues they link. Once Gemma4 inference isn't busted, run it in IK or mainline llama.cpp: https://github.com/ikawrakow/ik_llama.cpp/issues/1572
If you're on an AMD APU, like a Mini PC server, look at: https://github.com/lemonade-sdk/lemonade
On an AMD or Intel GPU, either use llama.cpp or kobold.cpp with the vulkan backend.
Avoid ollama like it's the plague.
Learn chat templating and play with it in mikupad before you use a "easy" frontend, so you understand what its doing internally (and know when/how it goes wrong): https://github.com/lmg-anon/mikupad

But TBH I'd point most people to Qwen 3.5/3.6 or Step 3.5 instead. They seem big, but being sparse MoEs, they can run quite quickly on single-GPU desktops: https://huggingface.co/models?other=ik_llama.cpp&sort=modified

[–] TrippinMallard@lemmy.ml 4 points 2 weeks ago (1 children)

What's wrong with ollama?

[–] brucethemoose@lemmy.world 9 points 2 weeks ago* (last edited 2 weeks ago)

Ughhh, I could go on forever, but to keep it short:

Tech bro enshittification: https://old.reddit.com/r/LocalLLaMA/comments/1p0u8hd/ollamas_enshitification_has_begun_opensource_is/
Hiding attribution to the actual open source project it's based on: https://old.reddit.com/r/LocalLLaMA/comments/1jgh0kd/opinion_ollama_is_overhyped_and_its_unethical/
A huge support drain on llama.cpp, without a single cent, nor a notable contribution, given back.
Constant bugs and broken models from "quick and dirty" model support updates, just for hype.
Breaking standard GGUFs.
Deliberately misnaming models (like the Deepseek Qwen distills and "Deepseek") for hype.
Horrible defaults (like ancient default models, 4096 context, really bad/lazy quantizations).
A bunch of spam, drama, and abuse on Linkedin, Twitter, Reddit and such.

Basically, the devs are Tech Bros. They're scammer-adjacent. I've been in local inference for years, and wouldn't touch ollama if you paid me to. I'd trust Gemini API over them any day.

I'd recommend base llama.cpp or ik_llama.cpp or kobold.cpp, but if you must use an "turnkey" and popular UI, LMStudio is way better.

But the problem is, if you want a performant local LLM, nothing about local inference is really turnkey. It's just too hardware sensitive, and moves too fast.

[–] madcaesar@lemmy.world 6 points 2 weeks ago (1 children)

What could this be used for?

[–] baatliwala@lemmy.world 10 points 2 weeks ago* (last edited 2 weeks ago) (1 children)

Local LLMs, probably even ones you can host on phones. But they won't be as powered of course

[–] madcaesar@lemmy.world 4 points 2 weeks ago (3 children)

Yea I get that, but does anyone have any practical ideas for local LLM?

[–] Imgonnatrythis@sh.itjust.works 17 points 2 weeks ago (1 children)

Literature summarization, data analysis, not being a pawn in corporate data harvesting.

[–] XLE@piefed.social 0 points 2 weeks ago

As long as you don't care if the summaries and analyses are wrong!

[–] felsiq@piefed.zip 10 points 2 weeks ago (1 children)

Home assistant is the big one imo, voice control for a private smart home is useful and low-stakes so hallucinations won’t be the end of the world

[–] leftascenter@jlai.lu 4 points 2 weeks ago

I'm eagerly waiting for a locally run phone assistant. Just for voice control while driving.

[–] baatliwala@lemmy.world 2 points 2 weeks ago* (last edited 2 weeks ago)

In addition to what the others said, some apps allow you to link to an LLM model for additional features.

For eg Immich has prebuilt models you can choose depending on how powerful your PC is, which will give facial recognition and powerful NLP-like search capabilities for your library. So if they think this is model good they can make a new prebuilt one using this as a base. Software like Microsoft Teams uses LLM for better background blurring for video calls, so maybe an open source equivalent can make use of it.

Also you can use it for other stuff like image generation too

[+] mrnobody@reddthat.com -14 points 2 weeks ago (2 children)

Why would anyone care about Gemini or other AI here? I mean, i get this is the tech Space, but AI=bad.

[–] brucethemoose@lemmy.world 14 points 2 weeks ago* (last edited 2 weeks ago)

There's a whole lot of interest in locally runnable ML. It was there even before ChatGPT 3.5 started the tech bro hype train, when tinkerers were messing with GPT-J 6B and GAN models.

In a nutshell, it's basically Lemmy vs Reddit. Local and community-developed vs toxic and corporate.

[–] axum@lemmy.blahaj.zone 1 points 2 weeks ago

Openly downloadable ai models are here. You may as well download one and play with it on your own hardware so that you can learn the it's and outs of it as well as limitations and use cases

Technology

Our Rules

Approved Bots