this post was submitted on 08 Jun 2025
771 points (95.7% liked)

Technology

71143 readers
2960 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related news or articles.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS
 

LOOK MAA I AM ON FRONT PAGE

you are viewing a single comment's thread
view the rest of the comments
[–] auraithx@lemmy.dbzer0.com 1 points 7 hours ago* (last edited 7 hours ago) (1 children)

This is an elegant metaphor, but it fails to capture the essential difference between symbolic enumeration and neural computation. Representing an LLM as a decompression function that reconstructs a giant transition table assumes that the model is approximating a complete, enumerable mapping of inputs to outputs. That’s not what is happening. LLMs are not trained to reproduce every possible sequence. They are trained to generalize over an effectively infinite space of token combinations, including many never seen during training.

Your thought experiment—recording the output for every possible input at temperature 0—would indeed give you a deterministic function that could be stored. But this imagined table is not a Markov chain. It is a cached output of a deep contextual function, not a probabilistic state machine. A Markov model, by definition, uses transition probabilities based on fixed state history and lacks internal computation. An LLM generates the distribution through recursive transformation of continuous embeddings with positional and attention-based conditioning. That is not equivalent to symbolically defining state transitions, even if you could record the output for every input.

The analogy to a spigot algorithm for pi misses the point. That algorithm computes digits of a predefined number. An LLM doesn't compute a predetermined output. It computes a probability distribution conditioned on a context it was never explicitly trained on, using representations learned across many dimensions. The model encodes distributed knowledge and compositional patterns. A Markov table does not. Even a giant table with manually filled hypothetical entries lacks the inductive bias, generalization, and emergent capabilities that arise from the structure of a trained network.

Equivalence in output does not imply equivalence in function. Replacing a rich model with an exhaustively recorded output set may yield the same result, but it loses what makes the model powerful: the reasoning behavior from structure, not just output recall. The function is not a shortcut to a table. It is the intelligence.

[–] vrighter@discuss.tchncs.de 1 points 7 hours ago* (last edited 7 hours ago) (1 children)

"lacks internal computation" is not part of the definition of markov chains. Only that the output depends only on the current state (the whole context, not just the last token) and no previous history, just like llms do. They do not consider tokens that slid out of the current context, because they are not part of the state anymore.

And it wouldn't be a cache unless you decide to start invalidating entries, which you could just, not do.. it would be a table with token-alphabet-size^context length size, with each entry being a vector of size token_alphabet_size. Because that would be too big to realistically store, we do not precompute the whole thing, and just approximate what each table entry should be using a neural network.

The pi example was just to show that how you implement a function (any function) does not matter, as long as the inputs and outputs are the same. Or to put it another way if you give me an index, then you wouldn't know whether I got the result by doing some computations or using a precomputed table.

Likewise, if you give me a sequence of tokens and I give you a probability distribution, you can't tell whether I used A NN or just consulted a precomputed table. The point is that given the same input, the table will always give the same result, and crucially, so will an llm. A table is just one type of implementation for an arbitrary function.

There is also no requirement for the state transiiltion function (a table is a special type of function) to be understandable by humans. Just because it's big enough to be beyond human comprehension, doesn't change its nature.

[–] auraithx@lemmy.dbzer0.com 1 points 3 hours ago

You're correct that the formal definition of a Markov process does not exclude internal computation, and that it only requires the next state to depend solely on the current state. But what defines a classical Markov chain in practice is not just the formal dependency structure but how the transition function is structured and used. A traditional Markov chain has a discrete and enumerable state space with explicit, often simple transition probabilities between those states. LLMs do not operate this way.

The claim that an LLM is "just" a large compressed Markov chain assumes that its function is equivalent to a giant mapping of input sequences to output distributions. But this interpretation fails to account for the fundamental difference in how those distributions are generated. An LLM is not indexing a symbolic structure. It is computing results using recursive transformations across learned embeddings, where those embeddings reflect complex relationships between tokens, concepts, and tasks. That is not reducible to discrete symbolic transitions without losing the model’s generalization capabilities. You could record outputs for every sequence, but the moment you present a sequence that wasn't explicitly in that set, the Markov table breaks. The LLM does not.

Yes, you can say a table is just one implementation of a function, and from a purely mathematical perspective, any function can be implemented as a table given enough space. But the LLM’s function is general-purpose. It extrapolates. A precomputed table cannot do this unless those extrapolations are already baked in, in which case you are no longer talking about a classical Markov system. You are describing a model that encodes relationships far beyond discrete transitions.

The pi analogy applies to deterministic functions with fixed outputs, not to learned probabilistic functions that approximate conditional distributions over language. If you give an LLM a new input, it will return a meaningful distribution even if it has never seen anything like it. That behavior depends on internal structure, not retrieval. Just because a function is deterministic at temperature 0 does not mean it is a transition table. The fact that the same input yields the same output is true for any deterministic function. That does not collapse the distinction between generalization and enumeration.

So while yes, you can implement any deterministic function as a lookup table, the nature of LLMs lies in how they model relationships and extrapolate from partial information. That ability is not captured by any classical Markov model, no matter how large.