this post was submitted on 03 Apr 2026
23 points (62.9% liked)

Technology

83990 readers
3447 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related news or articles.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] brucethemoose@lemmy.world 13 points 2 weeks ago* (last edited 2 weeks ago) (1 children)

Also, for any interested, desktop inference and quantization is my autistic interest. Ask my anything.

I don't like Gemma 4 much so far, but if you want to try it anyway:


But TBH I'd point most people to Qwen 3.5/3.6 or Step 3.5 instead. They seem big, but being sparse MoEs, they can run quite quickly on single-GPU desktops: https://huggingface.co/models?other=ik_llama.cpp&sort=modified

[–] TrippinMallard@lemmy.ml 4 points 2 weeks ago (1 children)
[–] brucethemoose@lemmy.world 9 points 2 weeks ago* (last edited 2 weeks ago)

Ughhh, I could go on forever, but to keep it short:

Basically, the devs are Tech Bros. They're scammer-adjacent. I've been in local inference for years, and wouldn't touch ollama if you paid me to. I'd trust Gemini API over them any day.

I'd recommend base llama.cpp or ik_llama.cpp or kobold.cpp, but if you must use an "turnkey" and popular UI, LMStudio is way better.

But the problem is, if you want a performant local LLM, nothing about local inference is really turnkey. It's just too hardware sensitive, and moves too fast.