Natural image generation is currently one of the most actively explored fields in Deep Learning. A surprising recent result has been that feature representations from networks trained on a purely discriminative task can be used for state-of-the-art image synthesis (Gatys et al., 2015). However, it is still unclear what aspects of the pre-trained network are critical for high generative performance. It could be, for example, the architecture of the convolutional neural network (CNN) in terms of the number of layers, specific pooling techniques, the connection between filter complexity and filter scale (larger filters are more non-linear), the training task and the network’s performance on that task or the data it was trained on.