Learning from infant perspective: slow changes for development of a visual system in humans and machines
Head-mounted infant videos reveal a natural slow-to-fast developmental curriculum, and training in that order improves later visual recognition.
In collaboration with Dr. Linda Smith, we investigate object learning in children and compare it with object learning in machines. In each trial of the behavioral task, the child hears the name of an object category and must select the matching item from a small array of alternatives. The task covers eight familiar object categories, each instantiated by multiple examples, and evaluates recognition in five conditions. Realistic images preserve the natural photograph. Silhouettes preserve coarse outline while removing texture and internal detail. Geons probe recognition from simple volumetric parts. Blurred images retain only coarse low-frequency structure. Feature patches isolate sparse local information. The response is simple enough for toddlers, but the task still yields a clean measure of category recognition under controlled visual transformations.

The task covers eight familiar object categories, each instantiated by multiple examples, and evaluates recognition in five conditions.

We found that children identified objects from silhouettes about as well as from realistic images, and better than from blurred images, geons, or feature patches. This suggests that global shape supports recognition strongly, but performance drops when the task depends on more abstract simplifications or very sparse local evidence.
Using the children's data, we built a developmentally motivated benchmark to evaluate computational models, creating a direct comparison between early human object recognition and machine vision. Because all five conditions are built from the same underlying categories, the benchmark can ask which visual cues are sufficient for recognition and which transformations expose limits in either children or models. This creates a more developmentally meaningful test of generalization than conventional top-1 classification benchmarks.
We used the benchmark to evaluate models trained on different datasets and at different scales. Larger models generally improve, but the figure below also shows that performance depends strongly on the transformation.

