Object segmentation from common fate: Motion energy processing enables human-like zero-shot generalization to random dot stimuli

Abstract

Motion is a crucial cue for human object perception. Previous studies have demonstrated that humans can recognize objects even in synthetic random dot stimuli where only motion, not appearance, is informative. Remarkably, this works without prior exposure to these types of stimuli, indicating a purely zero-shot capability akin to Wertheimer’s “common fate” Gestalt principle, proposed a hundred years ago as a “law of organization in perceptual forms.” Here, we evaluate computer vision approaches and a neuroscience inspired motion energy model for zero-shot figure-ground segmentation of random dot stimuli. Examining a broad range of models, we find that state-of-the-art optical flow models struggle to estimate motion patterns in random dot videos, resulting in poor figure-ground segmentation performance. Conversely, the neuroscience-inspired model significantly outper- forms all optical flow models in this context. For a direct comparison, we conduct a psychophysical study using a shape identification task as a proxy to measure human segmentation performance. All state-of-the-art optical flow models fall short of human performance, but only the motion energy model matches human capability. Specifically, we use the extensively validated motion energy model proposed by Simoncelli and Heeger in 1998 which is fitted to a broad range of neural recordings in cortex area MT. This neuroscience-inspired model successfully addresses the lack of human-like zero-shot generalization to random dot stimuli in current computer vision models, and thus establishes a compelling link between the Gestalt psychology of human object perception and cortical motion processing in the brain. Code, models and datasets will be published.

Publication
NeurIPS 2024
Matthias Tangemann
Matthias Tangemann
PhD candidate
Matthias Kümmerer
Matthias Kümmerer
Postdoc

I’m interested in understanding how we use eye movements to gather information about our environment. This includes building saliency models and models of eye movement prediction such as my line of DeepGaze models. I also work on the question of how to evaluate model quality and benchmarking and I’m the main organizer of the MIT/Tuebingen Saliency Benchmark.

Matthias Bethge
Matthias Bethge
Professor for Computational Neuroscience and Machine Learning & Director of the Tübingen AI Center

Matthias Bethge is Professor for Computational Neuroscience and Machine Learning at the University of Tübingen and director of the Tübingen AI Center, a joint center between Tübingen University and MPI for Intelligent Systems that is part of the German AI strategy.