this post was submitted on 16 Jun 2026
496 points (96.8% liked)
Technology
85515 readers
4276 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related news or articles.
- Be excellent to each other!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
- Check for duplicates before posting, duplicates may be removed
- Accounts 7 days and younger will have their posts automatically removed.
Approved Bots
founded 3 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
TBH local models aren't as good as cloud. Even with 16GB VRAM you aren't getting anywhere close to >100GB cloud LLM
No, it’s not quite as strong, and especially the initial prefill can take a bit. I also sometimes run into infinite thinking loops where I have to stop it and re-run my last prompt.
It’s surprising how close Qwen 3.6 gets on the benchmarks to Claude models, though. Especially when running locally with 200k context, I’ve found it’s good enough to be a daily driver. Despite the faults, it’s better than paying Anthropic $200 a month so they can rate limit me and collect my data.
I prefer to run with cheap pay-per-prompt cloud model. You can find really good open models that cost $0.50 per million tokens.