Instance-level video segmentation from object tracks

People

Abstract

We address the problem of segmenting multiple object instances in complex videos. Our method does not require manual pixel-level annotation for training, and relies instead on readily-available object detectors or visual object tracking only. Given object bounding boxes at input, we cast video segmentation as a weakly-supervised learning problem. Our proposed objective combines (a) a discriminative clustering term for background segmentation, (b) a spectral clustering one for grouping pixels of same object instances, and (c) linear constraints enabling instance-level segmentation. We propose a convex relaxation of this problem and solve it efficiently using the Frank-Wolfe algorithm. We report results and compare our method to several baselines on a new video dataset for multi-instance person segmentation.

Paper

CVPR 2016 Paper / Poster

BibTeX

@InProceedings{Seguin16,
    author = "Seguin, Guillaume and Bojanowski, Piotr and Lajugie, Rémi and Laptev, Ivan",
    title = "Instance-level video segmentation from object tracks",
    booktitle= "Proc. CVPR",
    year = "2016"
}

Dataset

The Inria 3DMovie Dataset v2 contains all the stereo pairs and their annotations used in our CVPR 2016 paper.

Code

Extended results

Acknowledgements

This work is partly funded by the MSR-INRIA laboratory and ERC grants Activia and VideoWorld.

Copyright Notice

The documents contained in these directories are included by the contributing authors as a means to ensure timely dissemination of scholarly and technical work on a non-commercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright.