Learning and Transferring Mid-Level Image Representations using Convolutional Neural Networks

People

Abstract

Convolutional neural networks (CNN) have recently shown outstanding image classification performance in the large-scale visual recognition challenge (ILSVRC2012). The success of CNNs is attributed to their ability to learn rich mid-level image representations as opposed to hand-designed low-level features used in other image classification methods. Learning CNNs, however, amounts to estimating millions of parameters and requires a very large number of annotated image samples. This property currently prevents application of CNNs to problems with limited training data. In this work we show how image representations learned with CNNs on large-scale annotated datasets can be efficiently transferred to other visual recognition tasks with limited amount of training data. We design a method to reuse layers trained on the ImageNet dataset to compute mid-level image representation for images in the PASCAL VOC dataset. We show that despite differences in image statistics and tasks in the two datasets, the transferred representation leads to significantly improved results for object and action classification, outperforming the current state of the art on Pascal VOC 2007 and 2012 datasets. We also show promising results for object and action localization.

Paper

CVPR 2014
Technical report (HAL-00911179, Nov. 2013)

BibTeX

@inproceedings{Oquab14,
	author = "Oquab, M. and Bottou, L. and Laptev, I. and Sivic, J.",
	title = "Learning and Transferring Mid-Level Image Representations using Convolutional Neural Networks",
        booktitle =  "CVPR",
	year = "2014",
}

Code

Testing-only version (cuda-convent based) that reproduces the results of the CVPR 2014 paper: Code (839MB)

Training/testing code (Torch based) coming soon.

Results: Objects

We show high-ranked images from Pascal VOC 2012 object classification test set together with corresponding score maps.
For more results see objects.pdf

Results for object classification and localization

Results: Actions

We show high-ranked images from Pascal VOC 2012 action classification test set together with corresponding score maps.
For more results see actions.pdf

Acknowledgements

The authors would like to thank Alex Krizhevsky for making convolutional neural network code available. This work is partly supported by the Quaero Programme, funded by OSEO, the MSR-INRIA laboratory, ERC grant Activia, and the EIT ICT Labs.

Copyright Notice

The documents contained in these directories are included by the contributing authors as a means to ensure timely dissemination of scholarly and technical work on a non-commercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright.