Adversarial Examples

We aim to explore and narrow the gap between human and machine perception. To explore this gap, we focus on adversarial examples which are minimal, almost imperceptible perturbations in images which completely derail artificial neural networks. We approach the problem of adversarial examples from two different angles: First, we try to unambiguously quantify the robustness of neural network models, which has been notoriously difficult in the past; second, we try to design robust architectures, which involves probabilistic generative models and Bayesian inference.

Decision based Attacks

Until recently, it was unclear how much risk adversarial perturbations carry for the safety of real-world machine learning applications because most methods used to generate such perturbations rely either on detailed model information (gradient-based attacks) or on confidence scores such as class probabilities (score-based attacks), neither of which are available in most real-world scenarios. We emphasise the importance of attacks which solely rely on the final model decision and reliable find adversarial perturbations.

Defenses using generative modelling

Despite much effort, deep neural networks remain highly susceptible to tiny input perturbations and even for MNIST, one of the most common toy datasets in computer vision, no neural network model exists for which adversarial perturbations are large and make semantic sense to humans. We aim to design a robust network architecture by fundamentally adapting the information processing in neural networks using feedback connections to perform an analysis by synthesis (ABS).
some figure
Adversarial examples for our ABS models are perceptually meaningful: For each sample (randomly chosen from each class) we show the minimally perturbed L2 adversarial found by any attack. Our ABS models have clearly visible and often semantically meaningful adversarials.

Key Publications

L. Schott, J. Rauber, W. Brendel, and M. Bethge
Towards the first adversarially robust neural network model on MNIST
International Conference on Learning Representations (ICLR), 2019
URL, BibTex

W. Brendel, J. Rauber, and M. Bethge
Decision-Based Adversarial Attacks: Reliable Attacks Against Black-Box Machine Learning Models
International Conference on Learning Representations, 2018
#adversarial attacks, #adversarial examples, #adversarials
Code, URL, OpenReview, BibTex

J. Rauber, W. Brendel, and M. Bethge
Foolbox: A Python toolbox to benchmark the robustness of machine learning models
Reliable Machine Learning in the Wild Workshop, 34th International Conference on Machine Learning, 2017
#adversarial attacks, #adversarial examples, #adversarials
Code, URL, BibTex

University of Tuebingen BCCN CIN MPI