van Hateren's Natural Image Dataset

This dataset contains approximately 4000 monochrome, calibrated images, photographed by Hans van Hateren. It can be used freely for scientific, non-commercial uses. If you publish work based on these data, please cite the article where this collection of images and its calibration are first described:
  title = {Independent Component Filters of Natural Images Compared with Simple Cells in Primary Visual Cortex},
  volume = {265}, 
  number = {1394}, 
  journal = {Proceedings: Biological Sciences}, 
  author = {Hateren, J. H. van and Schaaf, A. van der}, 
  year = {1998}, 
  month = {Mar}, 
  pages = {359-366}


These images were obtained with a Kodak DCS420 digital camera. For details and a description of the calibration see the Methods section of the article cited above. The format of the files is as follows: 1536 horizontal by 1024 vertical pixels, starting with a horizontal row in the upper left hand corner of the image. Each pixel is a 2-byte unsigned integer (with the byte-order 'BigEndian'). The pixels are linear in intensity (stricly so for the *.iml image set, approximately so for the *.imc image set, see the section Differences between *.imc and *.iml below). The intensity scaling is determined by the settings of the camera for each picture. As these were recorded in the original image files, I compiled a list (see the file camera settings below) which gives these settings: column 1=image number; 2=ISO setting (i.e., electronic equivalent), 3=aperture; 4=reciprocal shutter time (1/s); 5=factor for converting pixel values to luminance (cd/m2; luminance=factor*pixel value). I determined the conversion factor from the ISO setting, aperture, and shutter time, and a calibration picture from an object (grey overcast sky) for which I simultaneously measured the luminance with Minolta luminance meter. This calibration should be considered as only an approximation, as the spectral sensitivity of the camera is not identical to the human photopic sensitivity curve. The order of magnitude should be correct, though, thus for statistical purposes this should be useful. Note that although the pixels in the files are 16 bit deep, the camera digitized the original images as 12 bit (and stored it after a nonlinear transformation as 8 bit). Effectively, the bit depth will be close to 12 bit, but see the above article for a further discussion of this issue and the calibration. The angular resolution of the pictures is approximately 1 minute of arc per pixel (for the 1536x1024 pixels). The files do not contain a header, but start immediately with the data in the above format. The *.imc and *.iml extensions are newspeak, standing for 'image calibrated' and 'image linear'.

Remarks: Code for loading and displaying images:
The following code for MATLAB® on a PC takes care of the byte swapping:
f1 = fopen('imk00001.imc', 'rb', 'ieee-be');
w = 1536; h = 1024;
buf = fread(f1, [w, h], 'uint16');
For Python, use this code to load an image:
import numpy, array
with open(filename, 'rb') as handle:
   s =
arr = array.array('H', s)
img = numpy.array(arr, dtype='uint16').reshape(1024, 1536)

Converting the images to other formats:
The images can be converted to other formats using the convert program of ImageMagick, e.g.
convert -size 1536x1024 -depth 16 gray:imk00001.iml imk00001.fits
convert -size 1536x1024 -depth 16 -endian MSB gray:imk00001.iml imk00001.fits
depending on your architecture. For most desktop PCs the latter one should be used.

Differences between *.imc and *.iml

The *.iml image set ('linear') are the raw images produced by the camera, linearized with the lookup table generated by the camera for each image. The images are slightly blurred by the point-spread function of the camera (in particular due to the optics of the lens). For projects where a stricly linear relationship between scene luminance and pixel values is important (e.g., when looking at contrast variations over images) this may be the set of choice.

The *.imc image set ('calibrated') is computed from the *.iml set by deconvolving the images with the point-spread function corresponding to the used lens aperture (see the methods section of the article cited above). This strongly reduces the blur at sharp edges and lines. The deconvolution occasionally leads to overshoots and undershoots; the latter can produce negative pixel values in a minority of images. For those images this was compensated by adding a fixed offset to all pixel values of the image. These offsets are listed below in the IMC offest list. Although they are generally quite small, they slightly compromise the linearity of the relationship between scene luminance and pixel value. Therefore this image set is best suited for projects where well-defined edges are of more importance than strict linearity.


The data is provided as is without any warranty. If you have questions regarding this website or the downloads, please contact Philipp Lies.

Paul Ivanov from UC Berkeley provides an alternative mirror of the data set on his website:

Thanks to Hans van Hateren for providing all the information and Malte Persike for the ImageMagick commands.
University of Tuebingen BCCN CIN MPI