Temporal coding in neural populations
Single-unit recordings reveal compressed sequential activity and slowly varying temporal context signals that can support a mental timeline.
Many sequence models fail on long-range temporal dependencies because they try to preserve the entire past at roughly uniform resolution. The compressed-memory work in DeepSITH, SITHCon, and our later reinforcement-learning extension starts from a different premise: recent history should be represented in fine detail, while distant history should be retained more coarsely on a logarithmic timeline.
This design is directly motivated by the same log-compressed mental maps that appear in cognitive models and neural data. Instead of asking a network to invent an arbitrary recurrent state, we give it a structured representation of recent and distant history in which each unit corresponds to a different temporal scale. That makes it easier to preserve informative events across long delays without wasting capacity on uniform frame-by-frame storage.
The payoff is twofold. First, the representation supports long-range time-series prediction. Second, because logarithmic compression turns multiplicative changes in time into additive shifts, it provides a natural route to time-scale invariance.
DeepSITH replaces the generic recurrent state with a Scale-Invariant Temporal History (SITH) layer that explicitly encodes what happened when across many time scales. At each step, the model keeps a fixed-size, logarithmically compressed summary of the past: nearby events are represented precisely, while distant events are stored more coarsely.
Stacking these memory layers with learned readouts lets the network build increasingly abstract temporal features while still retaining access to the underlying temporal structure. In the NeurIPS DeepSITH paper, this architecture outperformed strong recurrent baselines such as LSTMs and GRUs on tasks that required learning long-range temporal dependencies.

The key representation is shown below for a single input signal. Instead of storing the past at evenly spaced delays, the memory samples history with receptive fields that become broader farther back in time. Recent events are encoded by many narrow filters, whereas distant events are summarized by fewer, wider filters.
This is the central compression trick. A fixed number of units can cover a very long temporal range because the representation is uniform in relative resolution rather than uniform in clock time. That is exactly the kind of tradeoff you would want if nearby events need precise timing while distant events are useful mainly at a coarser scale.

The next step was SITHCon, a convolutional model built on top of the same compressed memory. The crucial mathematical idea is simple: if a temporal pattern is stretched or compressed by a factor a, then logarithmic time turns that rescaling into a translation.
Key identity.log(ax) = log(a) + log(x)
If x denotes elapsed time and a changes the speed of the pattern, then multiplying time by a becomes an additive shift on the log-time axis. That means the same temporal structure appears in a similar form after rescaling, just shifted in the compressed coordinate.
This is why the final figure matters. Two signals that evolve at different speeds can look very different in raw clock time, but they become much easier to align once they are represented on the compressed temporal axis. In the ICML SITHCon paper, convolution and pooling applied to that representation yielded robust generalization across temporal rescalings that standard temporal convolutional networks did not achieve.
The same principle carried into the AAAI work by Md Kabir and colleagues, where compressed memory supported reinforcement-learning agents trained across a range of temporal scales without needing a separate memory tuned for each one. In other words, the representation is not just compact; it makes downstream learning more stable when the world speeds up or slows down.

Taken together, these papers show how a brain-inspired compressed memory can serve as a practical engineering primitive. In DeepSITH it improves long-range prediction, in SITHCon it turns rescaling into a tractable invariance problem, and in the AAAI reinforcement-learning work it enables agents to operate across multiple temporal scales using the same underlying memory framework.