Behavioral experiments: Compressed timeline of past and future
Behavioral experiments test whether people search the recent past and the near future along an ordered, compressed timeline.
To better understand perception-to-action learning in mammalian brains, we embed a neural-level cognitive model in an end-to-end differentiable network and train it in virtual environments designed to mimic experimental tasks from neuroscience and cognitive science. The resulting system forms a map-like internal representation in which the ordinal position of a neuron in the chain of sequentially activated neurons conveys information about what it encodes. Mechanistically, this is implemented by recurrent dynamics with a spectrum of decay constants followed by a linear readout, effectively yielding a discrete approximation of the real-domain Laplace transform and its inverse. In this way, time-varying input functions are encoded in the instantaneous activity of sequentially activated units.
To extend this framework from time to physical and abstract spaces, we allow the network to learn modulatory signals that convert maps of time into maps of task-relevant latent variables. Specifically, modulating the rate of decay (the strength of the recurrent connection) by the magnitude of the change in the latent variable converts a map of time into a map of that latent variable, resulting in sequentially activated neurons as a function of the logarithm of the magnitude of the latent variable. Thus, the functions of any latent variable can be encoded in instantaneous neural activity as long as the time derivative of the latent variable can be learned from the input.
Specifically, we built a virtual version of an "accumulating tower task" that is used in neuroscience to study evidence accumulation. In this environment, the agent moves through a virtual corridor and sees towers appear intermittently on the left and right walls. At the end of the track it must decide which side contained more towers. The correct latent variable is therefore not present in any single frame; it has to be inferred by integrating information over time.
The visual stream changes continuously as the agent moves, and the network has to discover a task-relevant latent dimension that summarizes the left-minus-right evidence carried by those observations.


The network begins with a convolutional encoder that extracts task-relevant structure from the pixel input. Those visual features are then passed to a cognitively inspired recurrent core built from Laplace-domain dynamics with a spectrum of decay constants followed by a linear readout. In effect, the model constructs a map of time and then learns modulatory signals that convert that map of time into a map of the task-relevant latent variable.
In the towers task, the latent variable is accumulated evidence. Modulating the decay dynamics by the change in evidence yields units that become sequentially active as a function of the magnitude of that evidence. That means the hidden state becomes an ordered internal axis that can support both content-based and address-based retrieval, much like a compressed mental map.

The structured agents also learn competitively with other recurrent architectures that have many trainable parameters.

After training, the hidden units become tuned to different magnitudes of accumulated evidence, and sorting them by their preferred evidence reveals a smooth tiling of the evidence axis. In other words, the agent learns a compressed number line of evidence from raw visual experience.
