The human visual system is adept at nearly instantaneously interpreting a rich 3D environment filled with varied surfaces and textures. This feat of perceptual inference is astonishing in light of the massive amount of information entering the eye at any one instant and its inherent ambiguity. We study the human visual system's solution to this problem via various psychophysical experiments incorporating computational techniques. One goal is to develop a mathematical description of the human visual system's sensitivity to statistical structure in natural images. To do so, we test human sensitivity to statistical assumptions about natural image regularities. Complementary to our quantitative model comparisons, these experiments test various models' efficacy in capturing perceptually relevant information.
stimuli and results
Left: Example stimulus showing a texture generated from natural images. Middle: Example stimulus generated with the ICA model. Right: Psychometric functions for the different models as a function of patch size. The closer the subjects are to chance level, the less distinguishable is the image model from the true statistics of natural images.
The results of an initial study (Gerhard & Bethge, 2011) reveal that human observers are highly sensitive to the statistical regularities of natural images, even when very little information is provided. In a two-alternative forced choice experiment, we presented observers with samples drawn from two sources: 1) from natural photographs, or 2) from a statistical model. Figure (A) shows two example textures made by tiling the respective samples. The task was to identify the samples originating from natural images.

We tested 7 models. Five were natural image models: a random filter model capturing only 2nd order pixel correlations (RND), the independent component analysis model (ICA), a spherically symmetric model (L2S), the Lp spherical model (LPS), and the mixture of elliptically contoured distributions (MEC). We also tested sensitivity to independent phase scrambling in the Fourier Basis (IPS) and to global phase scrambling (GPS) preserving correlations between phases and between amplitudes yet destroying dependencies between phases and amplitudes.

Figure (B) shows human discrimination performance as a function of sample size in pixels. Low values indicate better model performance. Observers were above chance in all cases except at patch size 3x3 for MEC. The relative ordering of the models parallels the models' average log-loss ordering, suggesting that the human visual system may have near perfect knowledge of natural image statistical regularity and that log-loss is a useful model comparison in terms of perceptual relevance. Furthermore, the superior performance indicates that even for very small image patches, current models lag behind the human visual system in understanding natural scenes.As part of this work, we also ask which features human observers utilize in identifying whether an image is natural or not, which can potentially lead to improvements in current statistical models of natural images.

Selected References

M. Kümmerer, T. Wallis, and M. Bethge
How close are we to understanding image-based saliency?
arXiv, 2014
URL, BibTex

H. E. Gerhard, F. A. Wichmann, and M. Bethge
How Sensitive Is the Human Visual System to the Local Statistics of Natural Images?
PLoS Computational Biology, 9(1), 2013
#natural image statistics, #psychophysics
URL, PDF, BibTex

H. E. Gerhard and M. Bethge
Towards rigorous study of artistic style: a new psychophysical paradigm
Art and Perception, 2, 23-44, 2014
#psychophysics, #texture discrimination, #stylometry
Code, URL, DOI, PDF, BibTex
University of Tuebingen BCCN CIN MPI