Is object localization for free? – Weakly Supervised Object Recognition with Convolutional Neural Networks

See also

People

Abstract

Successful methods for visual object recognition typically rely on training datasets containing lots of richly annotated images. Detailed image annotation, e.g. by object bounding boxes, however, is both expensive and often subjective. We describe a weakly supervised convolutional neural network (CNN) for object classification that relies only on image-level labels, yet can learn from cluttered scenes containing multiple objects. We quantify its object classification and object location prediction performance on the Pascal VOC 2012 (20 object classes) and the much larger Microsoft COCO (80 object classes) datasets. We find that the network (i) outputs accurate image-level labels, (ii) predicts approximate locations (but not extents) of objects, and (iii) performs comparably to its fully-supervised counterparts using object bounding box annotation for training.

Paper

Technical report (HAL-01015140, June 2014)

Extended version (CVPR 2015, April 2015)

BibTeX

@inproceedings{Oquab15,
         author = "Oquab, M. and Bottou, L. and Laptev, I. and Sivic, J.",
         title = "Is object localization for free? – Weakly-supervised learning with convolutional neural networks",
         booktitle =  "Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition",
         year = "2015"
         }

Code

V1.0 (May 2015): Training/testing code (Torch based, includes pre-trained network for Pascal VOC 2012 classification): Code (358MB)

V1.1 (Oct 2015): Training/testing code (Searches over fewer scales and fixes pooling over scales at test time): Code (358MB)

Acknowledgements

This work was supported by the MSR-INRIA laboratory, ERC grant Activia (no. 307574), ERC grant Leap (no. 336845) and the ANR project Semapolis (ANR-13-CORD-0003).

Copyright Notice

The documents contained in these directories are included by the contributing authors as a means to ensure timely dissemination of scholarly and technical work on a non-commercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright.