this post was submitted on 11 Mar 2026
49 points (94.5% liked)

Technology

82518 readers
4556 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related news or articles.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS
 

Evaluating 35 open-weight models across three context lengths (32K, 128K, 200K), four temperatures, and three hardware platforms—consuming 172 billion tokens across more than 4,000 runs—we find that the answer is “substantially, and unavoidably.” Even under optimal conditions—best model, best temperature, temperature chosen specifically to minimize fabrication—the floor is non-zero and rises steeply with context length. At 32K, the best model (GLM 4.5) fabricates 1.19% of answers, top-tier models fabricate 5–7%, and the median model fabricates roughly 25%.

you are viewing a single comment's thread
view the rest of the comments
[–] RandAlThor@lemmy.ca 8 points 4 hours ago (2 children)

This is pretty bonkers. How TF are they fabricating answers?????

[–] bad1080@piefed.social 8 points 3 hours ago (1 children)
[–] snooggums@piefed.world 5 points 3 hours ago (2 children)

Aka being wrong, but with a fancy name!

When Cletus is wrong because he mixed up a dog and a cat when deacribing their behavior do we call it hallucinating? No.

[–] Scipitie@lemmy.dbzer0.com 13 points 3 hours ago (1 children)

Accepting concepts like "right" and "wrong" gives those tools way too much credit, basically following the AI narrative of the corporations behind them. They can only be used about the output but not the tool itself.

To be precise:

LLMs can't be right or wrong because the way they work has no link to any reality - it's stochastics, not evaluation. I also don't like the term halluzination for the same reason. It's simply a too high temperature setting jumping into a closeby but unrelated vector set.

Why this is an important distinction: Arguing that an LLM is wrong is arguing on the ground of ChatGPT and the likes: It's then a "oh but wen make them better!" And their marketing departments overjoy.

To take your calculator analogy: like these tools do have floating point errors which are inherent to those tools wrong outputs are a dore part of LLMs.

We can minimize that but then they automatically use part of their function. This limitation is way stronger on LLMs than limiting a calculator to 16 digits after the comma though...

[–] CubitOom@infosec.pub 4 points 2 hours ago* (last edited 2 hours ago) (1 children)

What word would you propose to use instead?

Fabrication?

[–] Scipitie@lemmy.dbzer0.com 3 points 2 hours ago (1 children)

That's my problem: any single word humanizes the tool in my opinion. Iperhaps something like "stochastic debris" comes close but there's no chance to counter the common force of pop culture, Corp speak a and humanities talent to see humanoid behavior everywhere but each other. :(

[–] Telorand@reddthat.com 3 points 2 hours ago (1 children)

We do enjoy pareidolia, don't we?

[–] deranger@sh.itjust.works 1 points 51 minutes ago

Paredolia just means seeing patterns that aren’t there, it’s not implicitly human. If you see a dog in the clouds, that’s paredolia.

[–] bad1080@piefed.social 4 points 3 hours ago

if you have a lobby you get special names, look at the pharma industry who coined the term "discontinuation syndrome" for a simple "withdrawal"

[–] ji59@hilariouschaos.com 1 points 3 hours ago* (last edited 3 hours ago)

Because guessing correct answer is more successful than saying nothing.