ERC VideoWorld

Modeling, Interpreting, and Manipulating
Digital Video





VideoWorld is a research project funded by the European Research Council (ERC) and coordinated by Jean Ponce. It is located within the joint Department of Computer Science of Ecole Normale Superieure (UMR 8548, a joint  ENS/CNRS/INRIA laboratory) in downtown Paris.

Summary: Digital video is everywhere, at home, at work, and on the Internet. Yet, effective technology for organizing, retrieving, improving, and editing its content is nowhere to be found. Models for video content, interpretation and manipulation inherited from still imagery are obsolete, and new ones must be invented. With a new convergence between computer vision, machine learning, and signal processing, the time is right for such an endeavor. Concretely, we will develop novel spatio-temporal models of video content learned from training data and capturing both the local appearance and nonrigid motion of the elements (persons and their surroundings) that make up a dynamic scene. We will also develop formal models of the video interpretation process that leave behind the architectures inherited from the world of still images to capture the complex interactions between these elements, yet can be learned effectively despite the sparse annotations typical of video understanding scenarios. Finally, we will propose a unified model for video restoration and editing that builds on recent advances in sparse coding and dictionary learning, and will allow for unprecedented control of the video stream. This project addresses fundamental research issues, but its results are expected to serve as a basis for groundbreaking technological advances for applications as varied as film post-production, video archival, and smart camera phones.



Staff:
 


PI: Jean Ponce
Permanent staff:
Ivan Laptev and Josef Sivic
PhD students:
Louise Benoit, Florent Couzinie-Devy, Piotr Bojanowski, Rafael Sampaio de Rezende (Fall 2013)
Post-doctoral researchers:
Minsu Cho, Jian Sun

Invited professors: Alyosha Efros, Rene Vidal
Alumni: Y-Lan Boureau (post-doc, NYU), Olivier Duchenne (Intel, Korea), Armand Joulin (post-doc, Stanford), Oliver Whyte (Microsoft)




Publications:

 

·         Journal articles:

o   J. Mairal, F. Bach, and J. Ponce. Task-Driven Dictionary Learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(4):791-804, 2012.

o   O. Whyte, J. Sivic, A. Zisserman, and J. Ponce. Non-uniform Deblurring for Shaken Images. International Journal of Computer Vision, 98(2):168-186, 2012.

o   O. Duchenne, F. Bach, I. Kweon, and J. Ponce. A tensor-based algorithm for high-order graph matching. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33:2383 - 2395, 2011.

·        Articles published in the proceedings of peer-reviewed conferences

o P. Bojanowski, F. Bach, I. Laptev, J. Ponce, C. Scdmid, J. Sivic. Finding Actors and Actions in Movies. Accepted for publication in International Conference on Computer Vision, 2013.

o M. Cho, K. Alahari, and J. Ponce. Learning Graphs to Match. Accepted for publication in International Conference on Computer Vision, 2013.

o J. Sun and J. Ponce. Learning Discriminative Part Detectors for Image Classification and Cosegmentation. Accepted for publication in International Conference on Computer Vision, 2013.

o F. Couzinié-Devy, J. Sun, K. Alahari, J. Ponce. Learning to Estimate and Remove Non-uniform Image Blur. In IEEE Conference on Computer Vision and Pattern Recognition, 2013.

o A. Joulin and S.B. Kang. Recovering Stereo Pairs from Anaglyphs. In
IEEE Conference on Computer Vision and Pattern Recognition, 2013.

o M. Rubinstein, A. Joulin, J. Kopf. Unsupervised Joint Object Discovery and Segmentation in Internet Images.
In IEEE Conference on Computer Vision and Pattern Recognition, 2013.

o V. Delaitre, D. Fouhey, I. Laptev, J. Sivic, A. Gupta, and A. Efros. Scene semantics from long-term observation of people. In European Conference on Computer Vision, 2012.

o D. Fouhey, V. Delaitre, A. Gupta, A. Efros, I. Laptev, and J. Sivic. People Watching: Human Actions as a Cue for Single-View Geometry. In European Conference on Computer Vision, 2012.  

o A. Joulin and F. Bach. A convex relaxation for weakly supervised classifiers. In International Conference on Machine Learning, 2012.

o H. Azizpour and I. Laptev. Object Detection using Strongly-Supervised Deformable Part Models. European Conference on Computer Vision, 2012.

o A. Joulin, F. Bach, and J. Ponce. Multi-Class Cosegmentation. In IEEE Conference on Computer Vision and Pattern Recognition, 2012.

o L. Benoît, J. Mairal, F. Bach, and J. Ponce. Sparse Image Representation with Epitomes. In IEEE Conference on Computer Vision and Pattern Recognition, 2011.  

o  Y.-L. Boureau, N. Le Roux, F. Bach, J. Ponce, and Y. LeCun. Ask the locals: Multi-way local pooling for image recognition. In International Conference on Computer Vision, 2011.

o  O. Duchenne, A. Joulin, and J. Ponce. A Graph-Matching Kernel for Object Categorization. In International Conference on Computer Vision, 2011.

·         Submission

                    o F. Couzinie-Devy, J. Mairal, F. Bach, and J. Ponce. Dictionary Learning for Deblurring and    
                    Digital Zoom
. Submitted to Journal of Mathematical Imaging and Vision, 2013.

·         Defended PhD theses

o O. Duchenne. Non-Rigid Inage Alignment for Object Recognition, Ecole normale supérieure de Cachan, 2013.

o A. Joulin. Convex Optimization for Cosegmentation, Ecole normale supérieure de Cachan, 2013.

o Y-Lan Boureau, Learning Hierarchical Feature Extractors For Image Recognition, New York University, 2012.

o Oliver Whyte, Removing Camera Shake Blur and Unwanted Occluders from Photographs, Ecole normale supérieure de Cachan, 2012.