*** NEW! Optimised Parameters. See this doc for detailed parameter settings and algorithm details for our best descriptors. ***

Multi-view Stereo Correspondence Dataset

The dataset consists of corresponding patches sampled from 3D reconstructions of the Statue of Liberty (New York), Notre Dame (Paris) and Half Dome (Yosemite). Initial point cloud reconstructions were computed using Noah Snavely's Photo Tourism algorithm [Snavely 2008], from which dense depth maps were computed using Michael Goesele's multi-view stereo algorithm [Goesele 2007]. Corresponding interest points were found by mapping between images using the stereo depth maps; this dataset consists of patches sampled around each interest point. Difference of Gaussian and Harris interest points have been used.

Description of the data

Each zipfile contains 1024x1024 bitmap (.bmp) images, each containing a 16x16 arrary of image patches. Here are some examples (click to view image):


Each patch is sampled as 64x64 grayscale, with a canonical scale and orientation. For details of how the scale and orientation is established, please see the paper.

Two associated metadata files are included. The first file "info.txt" contains the match information. Each row of info.txt corresponds corresponds to a separate patch, with the patches ordered from left to right and top to bottom in each bitmap image. The first number on each row of info.txt is the 3D point ID from which that patch was sampled -- patches with the same 3D point ID are projected from the same 3D point (into different images). The second number in is not used at present.

The file "interest.txt" has information about the original interest points. Each row of interest.txt also corresponds to a separate patch, so it has the same number of rows as info.txt. The first number is the ID of the reference image in which the interest point was found. IMPORTANT: in order to establish matches and non-matches, you must use patches with the same reference image ID. Correspondences were found by projecting between images using this reference image only, so it is possible that patches with different 3D point ID's that have different reference image ID's could actually correspond to the same 3D point. The other information in interest.txt is: x, y, orientation, scale (log2 units). In order to make sure that non-matches were sufficiently different, we checked that these values were sufficiently far apart when establishing non-matches.

To allow researchers to replicate our learning results (if desired), we have include the match files that we used to generate the results in the paper. These are name "m50_n1_n2.txt" where n1 and n2 are the number of matches and non-matches present in the file. The format of the file is as follows:

patchID1   3DpointID1   unused1   patchID2   3DpointID2   unused2
"matches" have the same 3DpointID, and correspond to interest points that were detected with 5 pixels in position, and agreeing to 0.25 octaves of scale and pi/8 radians in angle. "non-matches" have different 3DpointID's, and correspond to interest points lying outside a range of 10 pixels in position, 0.5 octaves of scale and pi/4 radians in angle.

Download the datasets

Follow the links below to download zipfiles for each of the 3 datasets. Each contains around 400,000+ patches and is around 1-2Gb in size. The first set of patches is computed from Difference of Gaussian (DOG) maxima:

        Liberty   ::   Notre Dame   ::   Half Dome

A second set has been computed at multi-scale Harris corners:

        Liberty (Harris)   ::   Notre Dame (Harris)   ::   Half Dome (Harris)

Optimised Parameters

See Simon's document describing the optimal parameter settings and implementation details for the best descriptors found in the course of our experiments (with performance / computation time tradeoffs):

Old Dataset

Our original dataset (accompanying the CVPR'2007 paper) is available here. The current dataset is more suitable for training descriptors based on difference of Gaussian, or Harris corners, as the patches are centred on real interest point detections, rather than being projections of 3D points as is the case in the old dataset.


If you have any questions please contact Matthew Brown, Simon Winder or Gang Hua.


We'd like to thank Noah Snavely and Michael Goesele for making their camera/point cloud and multi-view stereo reconstructions available to us.


  • [Snavely 2008] Noah Snavely, Steven M. Seitz, Richard Szeliski, "Modeling the world from Internet photo collections," International Journal of Computer Vision, 2008

  • [Goesele 2007] Michael Goesele, Noah Snavely, Brian Curless, Hugues Hoppe, Steven M. Seitz, "Multi-View Stereo for Community Photo Collections", International Conference on Computer Vision, 2007