Instance-level video segmentation from object tracks



We address the problem of segmenting multiple object instances in complex videos. Our method does not require manual pixel-level annotation for training, and relies instead on readily-available object detectors or visual object tracking only. Given object bounding boxes at input, we cast video segmentation as a weakly-supervised learning problem. Our proposed objective combines (a) a discriminative clustering term for background segmentation, (b) a spectral clustering one for grouping pixels of same object instances, and (c) linear constraints enabling instance-level segmentation. We propose a convex relaxation of this problem and solve it efficiently using the Frank-Wolfe algorithm. We report results and compare our method to several baselines on a new video dataset for multi-instance person segmentation.


CVPR 2016 Paper / Poster


The Inria 3DMovie Dataset v2 contains all the stereo pairs and their annotations used in our CVPR 2016 paper.


Extended results


This work is partly funded by the MSR-INRIA laboratory and ERC grants Activia and VideoWorld.

