In AI, a "hallucination" is just as much "there" as a non-"hallucination." It's a way for scientists to stomp their foot and say that the wrong output is the computer's fault and not a natural consequence of how LLMs work.
XLE
So, what can we glean from this? Here are a few of my observations.
Current studies largely treat LLMs as black boxes... Just as... neuroscience investigations into individual neuronal activity and synaptic interactions shape theories of cognition like learning and memory, analyzing neurons – the fundamental computational units of LLMs – is essential for decoding hallucination. By scrutinizing neurons’ activation patterns in relation to hallucinations, we can gain deeper insights into model reliability.
So these researchers are left poking at the compiled code of a closed source database. What a pain.
The funny part is, although they insist it's not a black box...
The process begins by generating a balanced dataset of faithful (green check) and hallucinatory (red cross) responses using the TriviaQA benchmark. We extract the contribution profiles of neurons specifically on the answer tokens to train a linear classifier. Neurons assigned positive weights by this classifier are identified as "H-Neurons", distinguishing them from normal neurons based on their predictive role in generating hallucinations.
... The researchers clearly have no idea what the bad nodes are doing to make anything bad. They just can observe that when they are hit, a bad thing happens. So the nodes themselves are black boxes to them.
Our investigation reveals that a remarkably sparse subset of neurons – comprising less than 0.1% of the model’s total neurons – can accurately predict whether the model will produce hallucinated responses.
The "bad" nodes are everywhere. If you look at a 1,000th of the database, you will find them scattered across it. The mystery deepens.
Our investigation reveals that H-Neurons originate during the pre-training phase...observed "parameter inertia" suggests that standard instruction tuning does not effectively restructure the underlying hallucination mechanics; instead, it largely preserves these pre-existing circuits... Findings suggest that hallucinations are not merely artifacts of model scaling or alignment procedures, but rather deeply rooted in the fundamental training objectives that shape LLM behavior from their inception.
The "bad" nodes are among the first ones added to models, before anything else is filtered or further trained. This is very funny because it implies they're part of something crucial.
We hypothesize that the neurons identifying hallucinations do not merely encode factual errors, but rather drive a fundamental behavioral we term over-compliance, which means the model’s tendency to satisfy user prompts even at the expense of truthfulness, safety, or integrity. Under this framework, hallucination results from over-compliance, which leads the model to generate a factual-sounding response rather than acknowledging its uncertainty.
They made a (second) new phrase: This earliest data that goes into the model, and persists after adding more data, they call "over-compliance" and insist it's the model trying to bullshit a user extra hard.
Alternative hypothesis: what if this data is simply the basis for even making the results legible?
This originates from the inherent characteristics of the next-token prediction objective. This training paradigm does not distinguish between factually correct and incorrect continuations – it merely rewards fluent text generation.
Never mind, they just said it outright.
And a "hallucination" is also an inaccurate humanization of the actual meaning: "statistical relationship that we AI folks don't like."
"Hallucinations" even include accurate data.
It is a trash marketing buzzword.
Anthropic is performatively wringing their hands; we know they don't really mind causing harm because they're already complicit. Anthropic CEO Dario Amodei, and his anti-LGBT executive goon Jason Clinton, make Sam Altman look almost ethical and honest.
How the heck have I not heard of this one yet? It already looks promising - I see video, chat, channel organization...
So when all your friends want to screenshare on somewhere else, they install Movim and then sign up on...
And then when they jump into chats, will they be able to search the room history and post memes (images)?
I have too. It's... Really not ready for primetime, based on present-day experiences watching rooms lurch into view (and I don't even think it's possible to search for them on Android). Has video at least gotten better?
I don't think you're missing anything. I'm pretty sure this is the trend. People buy Mac Minis, probably don't even download a local model, FA, and FO.
Don't the tools we have include internet and even (gasp) book literacy rather than going to a chatbot? At very best, evidence AI helps anyone is shaky. At worst, we are witnessing a reverse Flynn effect in education right now, and this alleged tool - besides not doing what was promised and can't even make enough money to prop itself up - has been caught enticing children into suicide. If a billionaire genius like Sam Altman can't code in a guardrail to save a child's life, how can you?
Why encourage it?
Are the children being taught a tool, or are they being used as guinea pigs?
I got into volunteering through TEALS, Microsoft’s nonprofit.
Good for you / I'm sorry to hear that
The class runs on Chromebooks managed by Google Classroom, writing code on code.org—which is powered by AWS.
My condolences to the students. It sounds like they're already being brought up in a world where they are expected to own nothing and be happy.
I hope you teach them about how terrible this privacy violation is, and how they are slowly being groomed into dependency.
Corporate infrastructure is already the foundation of public CS education.
That's very sad too.
...wait, you're upset because you want to indoctrinate the children with more stuff?
That's incorrect. Wrong responses will still be generated even if you remove the element that randomizes the response for the same question.
If that wasn't the case, this paper wouldn't exist.