Temporal coding in neural populations
Single-unit recordings reveal compressed sequential activity and slowly varying temporal context signals that can support a mental timeline.
Although in-context learning has become a powerful mechanism in LLMs, it remains unclear how these models retrieve and organize information as context length grows. We investigate whether in-context retrieval in modern language models resembles human episodic memory or whether it is driven by a narrower mechanism specialized for continuing local sequences.
Using ideas from cognitive science, we analyze retrieval as a function of temporal lag and connect those behavioral signatures to specific components inside the model. The emerging picture is that transformers often show a strong forward-contiguity bias: once a cue is retrieved, the model strongly prefers what came next, rather than reinstating a richer temporal neighborhood around the cue.
In human episodic memory, recalling one item from an experience can reinstate nearby items from the same episode. That kind of retrieval is not just "predict the next token." It is a structured recovery of temporal context, where a cue can reactivate neighbors that were encoded around it.
The schematic below illustrates the target comparison. During encoding, items are experienced in sequence. At recall, a cue can in principle recover a local neighborhood of that sequence rather than only the immediately following item. This distinction gives us a concrete way to compare human memory with model behavior.

We next ask which parts of the model are responsible for this temporal behavior. Scoring individual attention heads reveals that most heads show little lag structure, but a small subset in later layers stands out sharply. Those heads exhibit a pronounced asymmetric peak immediately after lag 0, indicating a mechanism that is especially tuned to move forward from a cue into what came next.

To test whether those heads are merely correlated with the retrieval pattern or actually responsible for it, we compare conditional response probability across lags under three interventions: no ablation, random-head ablation, and targeted induction-head ablation. In this analysis, a large spike at lag +1 means the model strongly prefers the item that followed the cue during encoding.
Without ablation, the model shows a pronounced lag +1 peak. Removing random heads weakens the effect only modestly, but ablating induction heads nearly eliminates it. That result shows that the dominant serial-recall bias is carried by a small, specialized subset of heads rather than being spread diffusely across the network.

Human episodic memory is characterized by reinstatement of temporal context, whereas current language models often rely on a more local and asymmetric forward-retrieval mechanism. The ablation result makes that distinction causal rather than descriptive.