LLMs and Episodic memory
Language models retrieve temporal sequences in a way that looks more like serial recall than human episodic retrieval, and induction heads help explain why.
We compare how humans and large language models coordinate in groups when they share a goal but cannot communicate directly. We study that problem with Group Binary Search, a common-interest game in which each player proposes a number and the group only sees feedback about how far the sum of their guesses is from a hidden target.
Good performance requires more than understanding the instructions: each player has to infer what the others are doing from aggregate feedback alone, decide how strongly to react, and know when to stop changing a guess so the group can stabilize around a shared solution.
In every round, each agent submits a private number. The guesses are summed and compared with a mystery number known only through public feedback.


The clearest behavioral difference is switching. As human groups approach a solution, fewer participants change their guess from one round to the next. That spontaneous stabilization reduces noise and effectively lowers the number of moving parts the group must coordinate.
LLM groups behave very differently. Across multiple model families, the tendency to switch stays extremely high even near the end of the game. This persistent volatility is one of the main reasons model groups typically underperform humans: they react in the right direction, but they do not reliably settle into the stable roles that make collective convergence possible.

These results show that many current LLMs understand the task but still coordinate in a distinctly non-human way, with a strong volatility and action bias. Our ongoing work uses these diagnostics to design training objectives that reward adaptability, stability, and complementary roles in model groups. Our goal is to make LLM group behavior easier to compare with, and eventually align to, human coordination strategies.