Computer Vision
2012 Internships in the
Willow Group

We are looking for strongly motivated candidates with interest in computer vision and applications of machine learning to computer vision problems. Good background in applied mathematics, strong programming skills and prior experience with Matlab are required. The internships can lead to a PhD in the Willow Group.

Proposed internship topics:

  1. Person detection and tracking in crowds
  2. Scene clustering and alignment in TV series
  3. More internship topics are available on request

We will assign topics to qualified students in the first-in, first-served basis. To apply, please send us your CV and come to visit us in the lab to discuss the topics.


Project Title: Person detection and tracking in crowds

Project supervisors: Ivan Laptev <Ivan.Laptev@ens.fr> and Josef Sivic <Josef.Sivic@ens.fr>

Location: Willow Group, Laboratoire d'Informatique de l'École Normale Supérieure

Goal

The goal of this project is to detect and track people in crowded video scenes such as open-air concerts, marathons and rush-hour streets.

Motivation

Person detection and tracking in crowded scenes is a very challenging vision task due to heavy occlusions, high person densities and significant variation in people’s appearance. Recent work on the detection and tracking demonstrates improvements when combining local detectors [Felzenszwalb10] with additional and more global cues such as spatial relations among objects [Desai09], dynamic interactions among people [Pellegrini10] and constraints on the crowd density within the scene [Rodriguez11a]. This research direction currently provides interesting opportunities to obtain further improvements and to make significant impact on the state of the art. Detecting and tracking people in crowded scenes is a crucial component for a wide range of applications including surveillance, group behavior modeling and crowd disaster prevention.

Project description

We have recently published two papers [Rodriguez11a, Rodriguez11b] advancing the state of the art in person detection and tracking in crowds. This project will build on this work and will extend it in several ways:

  1. As shown in [Rodriguez11a] better crowd density estimates will immediately improve person detection and tracking. Hence, the first task will be to experiment with alternative sptio(-temporal) video features and to obtain better crowd density estimates. The generalization of the density estimation to the new scenes will also be addressed.
  2. Crowd densities change smoothly over time, in addition, people leave and enter the scene in specific locations only (frame boundaries, doors, etc.). These constraints are very useful as they can prevent the algorithm from making wrong decisions. The second task of the project will be to leverage these constraints and to develop an algorithm for spatio-temporal track detection respecting the expected track structure and temporal density evolution.
  3. Given the success in the first two tasks, the project may continue along several open research directions. One of these concerns detection and tracking of people in low-resolution videos - the task which has not been approached in the literature so far.  

This project will be a part of our on-going work on crowd analysis and will potentially lead to a publication as well as to a new PhD thesis.  

Requirements

The student should have strong motivation in computer vision as well as some prior experience with image and/or video analysis. Strong programming skills and prior experience with Matlab are also required.

References

[Desai09] C. Desai, D. Ramanan, and C. Fowlkes. Discriminative models for multi-class object layout (2009). In Proc. ICCV’09.

[Felzenszwalb10] P. Felzenszwalb, R. Girshick, D. McAllester, and D. Ramanan. Object detection with discriminatively trained part based models. IEEE PAMI, 32(9), 2010.

[Lempitsky10] V. Lempitsky and A. Zisserman; Learning to Count Objects in Images (2010). In Proc. NIPS’10.

Project page.

[Pellegrini10] S. Pellegrini, A. Ess, K. Schindler, and L. Van Gool. You’ll never walk alone: Modeling social behavior for multi-target tracking (2009). In Proc. ICCV’09.

[Rodriguez11a] M. Rodriguez, I. Laptev, J. Sivic and J.-Y. Audibert; Density-aware person detection and tracking in crowds (2011). In Proc. ICCV’11. Project page.

[Rodriguez11b] M. Rodriguez, J. Sivic, I. Laptev and J.-Y. Audibert; Data-driven Crowd Analysis in Videos (2011). In Proc. ICCV’11. Project page.


Project Title: Scene clustering and alignment in TV series

Project supervisors: Josef Sivic <Josef.Sivic@ens.fr> and Ivan Laptev <Ivan.Laptev@ens.fr>

Location: Willow Group, Laboratoire d'Informatique de l'École Normale Supérieure

Goal:

The goal of this project is to automatically find all shots of frequently occurring scenes in TV videos and establish visual correspondences between objects in the scene depicted in the different shots. Such correspondence will then serve to provide an “object-centric” view of the video: for example, the goal is to find and play all shots fixated on the “dining table” or “a particular shelf” in the kitchen scene of TV series “Friends”.  

Motivation

Our recent works on analysis of person-object and person scene interactions [Delaitre11, Fouhey11] demand large amounts of examples of such interactions in realistic scenarios. This internship will explore the possibility of automatically collecting examples of such person-object/scene interactions from TV videos by collecting and aligning all videos depicting a particular object in the scene. Finding all occurrences of a particular object/scene in the video is a very challenging task as objects/scenes are depicted from different viewpoints, under different illumination and may get partially occluded by other objects and people. In addition, the methods must be efficient as the goal is to analyze ~100 hours of TV footage (several seasons of a TV series).

Project description 

The project will employ state-of-the-art techniques from large scale visual search and recognition of particular objects and places [Chum07, Chum11, Sivic03] as well as scene classification [Lazebnik06]. The project has a strong implementation component and is expected to result in a working system, which can automatically analyze tens ~100 hours of TV videos.

Requirements

The student should have strong motivation in computer vision as well as some prior experience with image and/or video analysis. Strong programming skills and prior experience with Matlab are also required.

References

[Brown03] M. Brown and D. Lowe, Recognizing panoramas, ICCV 2003.

[Chum07] Chum, O. , Philbin, J. , Sivic, J. , Isard, M. and Zisserman, A.

Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval

ICCV, 2007

[Chum11] Chum, O., Mikulik, A., Perdoch, M. and Matas, J.: Total Recall II: Query Expansion Revisited, CVPR 2011

[Delaitre11] Delaitre, V., Sivic, J. and Laptev, I.: Learning person-object interactions for action recognition in still images, NIPS 2011

[Fouhey11] Fouhey, D., Delaitre V., Efros, A., Gupta, A., Laptev, I. and Sivic, J., People watching: Human actions as a cue for single view geometry, in submission, 2011

[Lazebnik06] S. Lazebnik, C. Schmid, and J. Ponce, Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories, CVPR 2006

[Philbin10] Philbin, J., Sivic, J. and Zisserman, A.

Geometric Latent Dirichlet Allocation on a Matching Graph for Large-scale Image Datasets

International Journal of Computer Vision (2010)

[Liu08] F. Liu, Y. Hu and M. Gleicher, Discovering panoramas in web-videos, Proceeding of the 16th ACM international conference on Multimedia, 2008