Meaning maps and saliency models based on deep convolutional neural networks are insensitive to image meaning when predicting human fixations

Marek A Pedziwiatr, Matthias Kümmerer, Thomas SA Wallis, Matthias Bethge, Christoph Teufel

January, 2021

Abstract

Eye movements are vital for human vision, and it is therefore important to understand how observers decide where to look. Meaning maps (MMs), a technique to capture the distribution of semantic information across an image, have recently been proposed to support the hypothesis that meaning rather than image features guides human gaze. MMs have the potential to be an important tool far beyond eye-movements research. Here, we examine central assumptions underlying MMs. First, we compared the performance of MMs in predicting fixations to saliency models, showing that DeepGaze II – a deep neural network trained to predict fixations based on high-level features rather than meaning – outperforms MMs. Second, we show that whereas human observers respond to changes in meaning induced by manipulating object-context relationships, MMs and DeepGaze II do not. Together, these findings challenge …

Matthias Kümmerer

Postdoc

I’m interested in understanding how we use eye movements to gather information about our environment. This includes building saliency models and models of eye movement prediction such as my line of DeepGaze models. I also work on the question of how to evaluate model quality and benchmarking and I’m the main organizer of the MIT/Tuebingen Saliency Benchmark.

Meaning maps and saliency models based on deep convolutional neural networks are insensitive to image meaning when predicting human fixations

Abstract

Matthias Kümmerer

Postdoc

Matthias Bethge

Professor for Computational Neuroscience and Machine Learning & Director of the Tübingen AI Center