2013 Computer Vision Internships in the Willow Group
We are looking for strongly motivated candidates with interest in computer vision and applications of machine learning to computer vision problems. Good background in applied mathematics, strong programming skills and prior experience with Matlab are required. The internships can lead to a PhD in the Willow Group.
Proposed internship topics:
1. Large-scale image classification and object detection with Deep Convolutional Neural Networks
2. Predicting actions in places
3. Triangulation de nuages de points
4. Learning discriminative part models
5. Modeling viewpoint variation in object detection
We will assign topics to qualified students in the first-come, first-served basis. To apply, please send us your CV and come to visit us in the lab to discuss the topics.
Project supervisors: Leon Bottou <leon@bottou.org>, Ivan Laptev <Ivan.Laptev@ens.fr> and Josef Sivic <Josef.Sivic@ens.fr>
Location: Willow Group, Laboratoire d'Informatique de l'École Normale Supérieure
Goal
You will experiment with a very recent and, as it appears, groundbreaking approach to image classification based on deep convolutional neural networks in [Krizhevsky12]. The goals are to replicate the state-of-the-art results in [Krizhevsky12] and to extend the method to object detection.
Motivation
Recognizing thouthands of object categories from images is a long-standing goal of computer vision. In recent years the research on large-scale image classification, e.g., [Sanchez11] has been sparked by the large amounts of now available image data and large-scale datasets such as ImageNet. Convolutional Neural Network (CNN) based methods exist for several decades, however, until recently the successful applications of CNNs have been only shown for relatively limited problems such as handwritten digit recognition [LeCun90] and face detection [Rowley98]. The groundbreaking results of [Krizhevsky12] presented in the Large Scale Visual Recognition Challenge 2012 Workshop (ILSVRC2012) now indicate that CNN is a highly competitive tool when powered with lots of image data. It might be that the necessary amount of image data and the critical processing power of modern GPUs sufficient to train successful CNN classifiers has just been reached and we are in front of many exciting new applications of CNNs. This internship will investigate this very timely topic by first re-producing results in [Krizhevsky12], investigating the performance and properties of this method when applied to other classification tasks, such as PASCCAL VOC, and then extending the classification method in [Krizhevsky12] to a more challenging task of object detection. This is an exploratory internship topic in an exciting and emerging area, which may have a significant impact on the current state of visual recognition.
Project description
The project will build on an existing publically available codebase available from [Krizhevsky12] and will proceed in the following three steps:
The project will be co-supervised by Leon Bottou who is one of the world leading experts on neural networks and large-scale learning.
Requirements
We are looking for strongly motivated candidates with an interest in computer vision and machine learning. The project requires strong background in applied mathematics and excellent programming skills. The project will also involve using and possibly programming GPUs. Prior experience with GPUs will be also useful, but not required. If we find a mutual, match the project can lead to a Phd at the Willow group.
References
[Krizhevsky09] A. Krizhevsky, I. Sutskever, and G. Hinton. ImageNet Classification with Deep Convolutional Neural Networks (2012), In Proc. NIPS 2012.
[Sanchez11] J. Sanchez, F. Perronnin. High-dimensional signature compression for large-scale image classification, In Proc. CVPR 2011.
[LeCun90] Y. Le Cun, B. Boser, J.S. Denker, D. Henderson, R. Howard, W. Hubbard, L. Jackel. Handwritten digit recognition with a back-propagation network. in Proc. NIPS 1990.
[Rowley98] H. A. Rowley, S. Baluja, and T. Kanade, Neural Network-Based Face Detection, in PAMI 1(20) 23-38, 1998
Project supervisors: Ivan Laptev <Ivan.Laptev@ens.fr>, Josef Sivic <Josef.Sivic@ens.fr> and Aude Oliva, CSAIL, MIT, Visiting professor at Willow in Spring 2013.
Location: Willow Group, Laboratoire d'Informatique de l'École Normale Supérieure
Goal
The goal of this project is to design algorithms able to predict human actions for particular places. Given images or videos of places as input data, the aim is to learn visual predictors of actions using supervision (a) acquired by mining textual resources, e.g. thousands of movie scripts available on the Internet, and (b) obtained by large-scale manual image labelling using crowdsourcing.
Motivation
What is a person trying to do on the left figure below? What actions can we expect to happen in scenes depicted on the right? Currently there exist very little computer vision technology that can answer these and other similar questions.
A classical framework for classifying objects, actions, or events from still images or videos in computer vision involves identifying visually informative features within a category. Humans, on the other hand, heavily use other sources of information to predict which object or action is occurring or going to happen, like the place or visual context (e.g. cooking and eating in a kitchen), and the affordances of the object and spatial structure in the world (e.g. a chair affords seating). The aim of this internship is to leverage the human-like strategy of action recognition to computer vision to enhance both the numbers of different actions artificial systems should learn to discriminate and the overall recognition accuracy of current systems.
Given still images or dynamic visual scenes, we will train predictors of human actions. The problem will be formulated as an automatic image tagging task. Action tags (run, sit, having a meeting, getting married, ...) will be obtained from discriminative classifiers trained on image data directly. We will in particular investigate different sources of supervision to train such classifiers, these sources will include (a) knowledge mined from generic textual resources (e.g. movie scripts), describing what people do in particular scenes, and (b) obtained by large-scale manual image labelling using Amazon Mechanical Turk. We will also aim to provide functional categorization and grouping of places according to their similarity in typical human actions, and will identify places and situations allowing for better action predictions. The project will build on our recent work on action recognition [3,4] (Willow) and scene understanding [1,2] (MIT) and will be co-supervised by Aude Oliva (CSAIL, MIT) who is a visiting professor at Willow in Spring 2013.
Requirements
We are looking for strongly motivated candidates with an interest in computer vision and machine learning. The project requires strong background in applied mathematics and excellent programming skills. Prior experience with text processing will be also useful, but not required. If we find a mutual match the project can lead to a Phd at the Willow group.
References
[1] Xiao, J., Hays, J., Ehinger, K. A., Oliva, A., & Torralba, A. (2010). SUN Database: Large Scale Scene Recognition from Abbey to Zoo. Proceedings of the 23rd IEEE Conference on Computer Vision and Pattern Recognition (pp. 3485-3492), IEEE Computer Society.
[2] SUN dataset and scene category recognition benchmark: http://groups.csail.mit.edu/vision/SUN/
[3] M. Marszałek, I. Laptev and C. Schmid. (2009). Actions in Context. in Proc. CVPR'09, Miami, US.
[4]. V. Delaitre, J. Sivic and I. Laptev. (2011). Learning person-object interactions in still images. in Proc. NIPS'11, Granada, Spain.
Project supervisor: Jean Ponce <Jean.Ponce@ens.fr>
Location: Willow Group, Laboratoire d'Informatique de l'École Normale Supérieure
Project description
Des algorithmes de stéréo multi-vies extrêmement performants sont disponibles aujourd’hui, par
exemple le logiciel PMVS (cf. [1] et http://www.di.ens.fr/pmvs/ ). Combinés avec des logiciels de « structure from motion » tels que Bundler (cf. [2] et http://phototour.cs.washington.edu/bundler ), ils permettent de modéliser facilement et avec grande précision des objets et des environnements complexes, en général sous la forme d’un nuage de points. Passer de ce nuage de points à une triangulation plus facile à manipuler et visualiser reste problématique, la plupart des algorithmes de « meshing » disponibles aujourd’hui ne prenant pas en compte la position des capteurs. Une exception est formée par une classe d’algorithmes qui construisent une tétraédrisation de Delaunay du nuage de points et effacent les tétraèdres coupés pas les rayons joignant les capteurs aux points de mesure [3,4]. Le sujet de ce stage est l’implantation d’un tel algorithme, adapté à des données à très grande échelle (centaines de millions de points), et permettant de prendre en compte les informations (approximatives) de visibilité et d’adjacence disponibles pour les points reconstruits par PMVS.
References
[1] Yasutaka Furukawa and Jean Ponce, Accurate, Dense, and Robust Multi-View Stereopsis, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 32, Issue 8, Pages 1362-1376, August 2010.
[2] Noah Snavely, Steven M. Seitz, Richard Szeliski, Photo Tourism: Exploring image collections in 3D, ACM Transactions on Graphics (Proceedings of SIGGRAPH 2006), 2006.
[3] Jean-Daniel Boissonnat , Olivier Faugeras, E. Le Bras-Mehlman, Representing stereo data with the Delaunay triangulation, Artificial Intelligence, 44:41-87, 1990.
[4] Hoang-Hiep Vu, Patrick Labatut, Jean-Philippe Pons, Renaud Keriven, High Accuracy and Visibility Consistent Dense Multiview Stereo, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, no. 5, pp. 889-901, 2010.
Project supervisor: Jean Ponce <jean.ponce@ens.fr>
Location: Willow Group, Département d’informatique, Ecole normale supérieure (http://www.di.ens.fr/willow/)
Project description:
Object detection and categorization are fundamental and difficult computer vision tasks [1,2]. One of
difficulties arising in these problems is that the variation in appearance can be higher within a class
than between classes, making direct comparison between instances of a same class meaningless. The
objective of this project is to learn discriminative deformable sub-parts among training exemplars
that can move independently, forming a flexible model robust to occlusion and high intra-class
variations.
Références :
[1] P. Felzenszwalb, R. B. Girshick, D. McAllester and D. Ramanan, Object Detection with
Discriminatively Trained Part Based Models, IEEE Transactions on Pattern Analysis and Machine
Intelligence, vol. 32, No. 9, pages 1627-1645, 2010.
[2] S. Lazebnik, C. Schmid and J. Ponce, Beyond bags of features: spatial pyramid matching for
recognizing natural scene categories, Proc. IEEE Conference on Computer Vision and Pattern
Recognition, 2006.
Project supervisor: Jean Ponce <jean.ponce@ens.fr>
Location: Willow Group, Département d’informatique, Ecole normale supérieure (http://www.di.ens.fr/willow/)
Project description:
Image categorization and object detection are fundamental computer vision tasks [1]. Most existing
methods, however, essentially ignore the effect of viewpoint on this problem, including appearance
changes and occlusion [2]. The objective of this project is to construct new visual models capable of
explicitly handling viewpoint variations. We will consider an approach where we construct an
intermediary 2D/3D deformable model capable of representing simultaneously all the possible
viewpoints, while remaining compact and easy to learn.
Références :
[1] P. Felzenszwalb, R. B. Girshick, D. McAllester and D. Ramanan, Object Detection with
Discriminatively Trained Part Based Models, IEEE Transactions on Pattern Analysis and Machine
Intelligence, vol. 32, No. 9, pages 1627-1645, 2010.
[2] O. Duchenne, A. Joulin and J. Ponce, A Graph-matching Kernel for Object Categorization , Proc.
Int. Conference on Computer Vision, 2011.
6. More internship topics available upon request.
Talk with the course instructors if you wish to know additional internship topics.