Predicting Fixations From Deep and Low-Level Features

Matthias Kümmerer, Thomas SA Wallis, Leon A Gatys, Matthias Bethge

January, 2017

Abstract

Learning what properties of an image are associated with human gaze placement is important both for understanding how biological systems explore the environment and for computer vision applications. Recent advances in deep learning for the first time enable us to explain a significant portion of the information expressed in the spatial fixation structure. Our saliency model DeepGaze II uses the VGG network (trained on object recognition in the ImageNet challenge) to convert an image into a high-dimensional feature space which is then readout by a second very simple network to yield a density prediction. DeepGaze II is right now the best performing model for predicting fixations when freeviewing still images (MIT Saliency Benchmark, AUC and sAUC). By retraining on other datasets, we can explore how the features driving fixations change over different tasks or over presentation time. Additionally, the modular architecture of DeepGaze II allows us to quantify how predictive certain features are for fixations. We demonstrate this by replacing the VGG network with very simple isotropic mean-luminance-contrast features and end up with a network that outperforms all previous saliency models before the models that used pretrained deep networks (including models with high-level features like Judd or eDN). Using DeepGaze and the Mean-Luminance-Contrast model (MLC), we can separate how much low-level and high-level features contribute to fixation selection in different situations.

Matthias Kümmerer

Postdoc

I’m interested in understanding how we use eye movements to gather information about our environment. This includes building saliency models and models of eye movement prediction such as my line of DeepGaze models. I also work on the question of how to evaluate model quality and benchmarking and I’m the main organizer of the MIT/Tuebingen Saliency Benchmark.

Predicting Fixations From Deep and Low-Level Features

Abstract

Matthias Kümmerer

Postdoc

Matthias Bethge

Professor for Computational Neuroscience and Machine Learning & Director of the Tübingen AI Center