The visual system is able to extract depth information from the disparity of the two images on the retinae. Every system that makes use of disparity information must identify corresponding points in the two images. This correspondence problem constitutes a principal difficulty in depth from stereo and many questions are left open about how the visual system solves it. In this work, we seek to understand how depth inference can emerge from unsupervised learning of statistical regularities in binocular images. In a first step we acquire a database of training data by using virtual 3D sceneries which are rendered into stereo images from two eye-like positioned cameras. This provides us with an extensive repository of stereo images along with precise depth and disparity maps. In the future we will use this data as ground truth for a quantitative analysis and comparison of different models for depth inference.