The Inria 3DMovie Dataset contains all the stereo pairs and their annotations used in our ICCV 2013 paper [1]. Most of this data was extracted from the "StreetDance 3D" [Giwa and Pasquini, 2010] and "Pina" [Wenders, 2011] stereo movies. Some of the negative stereo pairs were harvested from Flickr and were originally shot using the Fuji W3 camera. The dataset includes stereo pairs, ground truth segmentations, poses and person bounding boxes, and is split into a training and test parts.
The training set includes:
The test set includes:
All the annotations were produced manually. The stereo pairs are provided as jpegs. Estimated disparity using [2] is also provided for each stereo pair.
If you use this dataset, please cite:
[1] K. Alahari, G. Seguin, J. Sivic, I. Laptev
Pose Estimation and Segmentation of People in 3D Movies
Proceedings of the International Conference on Computer Vision (ICCV), 2013.
http://www.di.ens.fr/willow/research/stereoseg/
The code
folder contains three demo MATLAB scripts, which load and display
sample frames, disparity and the corresponding ground truth.
>> demo_persondetection
>> demo_pose
>> demo_segmentation
This dataset is split into several folders, one for each task and train/test part, each folder containing at least three subdirectories:
frames
, which holds the stereo pairsdisparity
, which holds the disparity mapslabels
, which holds the appropriate annotationsA visualization
directory is also provided for some of the subdatasets to
show how the ground truth looks out of the box.
Disparity maps are provided as matfiles, the uv
variable holding the whole
flow computed between the left and the right image. We use the horizontal
component of the flow as disparity, i.e. uv(:,:,1)
.
Segmentation labels are MATLAB files containing a 3D array named det_gt
,
where each layer of the third dimension det_gt(:,:,i)
is the ground truth
segmentation mask for a single person.
We provide a summary MATLAB file which holds a struct array pos
element, in
which each struct pos(i)
has an im
field specifying the image location, a
pose
field specifying the coordinates of each of the 10 annotated joints, and
for the training set an extra occluded
field specifying for each joint
whether it is occluded or not.
The 10 joints are:
For the train dataset, we provide a MATLAB file very similar to the one we
provide for pose estimation. It contains a single pos
element, which is a
struct array in which each struct has an im
field giving the example left
image filename and x1
, y1
, x2
, y2
fields specifying the detection
bounding box.
For the test dataset, we provide xml files compatible with the VOC devkit,
as well as a summary txt file, in which each line provides information on a
single bounding box annotation in the form filename 1.0 x1 y1 x2 y2
.
If you have any questions, please contact Guillaume Seguin guillaume.seguin@ens.fr.
This work is partly supported by the Quaero Programme, funded by OSEO, the MSR-INRIA laboratory, ERC grant Activia, Google and the EIT ICT Labs.
[2] A. Ayvaci, M. Raptis, S. Soatto
Sparse Occlusion Detection with Optical Flow
In International Journal of Computer Vision, 2011
We do not own the copyright of the videos and stereo pairs, and only provide them for non-commercial research purposes.