Reconnaissance d’objets et vision artificielle 2015/2016
Object recognition and computer vision 2015/2016

Jean Ponce, Ivan Laptev, Cordelia Schmid and Josef Sivic

Course Information

Class time: Tuesday 16:15-19:15



List of received reports: Please follow this link. 

Course description

Automated  object  recognition -- and  more  generally  scene  analysis -- from  photographs  and videos  is  the  grand  challenge  of  computer  vision.  This  course  presents  the  image,  object,  and scene models, as well as the methods and algorithms, used today to address this challenge.


There will be three programming assignments representing 50% (10% + 20% + 20%) of the grade. The supporting materials for the programming assignments and final projects will be in Matlab.

Final project

The final project will represent 50% of the grade. Suggested topics for final projects will be added here. See examples from the last year.

Collaboration policy

You can discuss the assignments and final projects with other students in the class. Discussions are encouraged and are an essential component of the academic environment. However, each student has to work out their assignment alone (including any coding, experiments or derivations) and submit their own report.  For the final project, you may work alone or in a group of maximum of 2 people. If working in a group, we expect a more substantial project, and an equal contribution from each student in the group. The final project report needs to explicitly specify the contribution of each student. Both students are expected to present the project at the oral presentation and contribute equally to writing the report. The assignments and final projects will be checked to contain original material. Any uncredited reuse of material (text, code, results) will be considered as plagiarism and will result in zero points for the assignment / final project. If a plagiarism is detected, the student will be reported to MVA.

Computer vision and machine learning talks

You are welcome to attend seminars in the Willow group. Please see the current seminar schedule. Typically, these are one hour research talks given by visiting speakers. The talks are at 23 avenue d'Italie. Ring the bell to get into the building, take the elevator to the 5th floor and then ring the bell again to get to the Inria reception.

Course schedule (subject to change):



Topic and reading materials.



Sep 29

Introduction (J. Ponce);

Instance-level recognition I. - Camera geometry (J. Ponce)

Class logistics, assignments, final projects (I. Laptev and J. Sivic)

Background materials: History: J. Mundy - Object recognition in the geometric era: A retrospective.; Camera geometry: Forsyth&Ponce Ch.1-2. Hartley&Zisserman - Ch.6




Oct 6

Instance-level recognition II. - Local invariant features (1.5hrs, C. Schmid);

Materials: Mikolajczyk & Schmid, Scale and affine invariant interest point detectors, IJCV 2004; D. Lowe, Distinctive image features from scale-invariant keypoints, IJCV 2004, R. Szeliski (pdf), Sections 4.1, 4.1.1 and 4.1.2 from Chapter 4: Feature detection and matching;

Instance-level recognition III. - Correspondence and image matching (1.5 hrs, C. Schmid)


R. Szeliski (pdf), Sections 4.1.3 (feature matching) and 6.1 (feature-based alignment);

Assignment: Assignment 1 out.




Oct 13

Sparse coding and dictionary learning for image analysis (3hrs, J. Ponce)

Materials: Bach, Mairal, Ponce, Sapiro, Tutorial on sparse coding and dictionary learning for image analysis, at CVPR'10.




Oct 20

Instance-level recognition IV. - Efficient visual search (1.5hrs, C. Schmid)


Muja & Lowe, Fast approx. nearest neighbors with automatic algorithm configuration, VISAPP'09; Sivic & Zisserman, Video Google: Efficient visual search of videos (chapter from this book), Philbin et al., Object retrieval with large vocabularies and fast spatial matching, CVPR'07.

Jegou et al., Improving bag-of-features for large scale image search, IJCV 2010; Jegou et al., Aggregating local image descriptors into compact codes, PAMI 2011;

Bag-of-feature models for category-level recognition (1.5hrs, C. Schmid)

Materials: Csurka et al., Visual categorization with bags of keypoints, 2004

Assignments: Assignment 1 due.

Assignment 2 out.




Oct 27

Neural networks; Optimization methods (N. Le Roux)


1. Python examples

2. For more details on neural networks you can watch the video lectures by Hugo Larochelle. The website also includes links to useful reading materials such as “Practical Recommendations for Gradient-Based Training of Deep Architectures” by Y. Bengio.

3. The draft of the book on deep learning by Y. Bengio



Nov 3

Convolutional neural networks for visual recognition (J. Sivic)


Y. LeCun et al., Gradient-based learning applied to document recognition, Proceedings of the IEEE 86(11): 2278–2324, 1998.

M.D. Zeiler, R. Fergus, Visualizing and Understanding Convolutional Networks, ECCV 2014.

M. Oquab et al., Learning and Transferring Mid-Level Image Representations

using Convolutional Neural Networks, CVPR 2014

Assignments: Assignment 2 due.

Assignment 3 out.

Final project topics are out. Due on Nov 24.



Nov 10

Structured models for category-level localization and pose estimation (J. Sivic)


Felzenszwalb et al., A Discriminatively Trained, Multiscale, Deformable Part Model, CVPR’08; Pascal VOC Challenge; Yang and Ramanan, Articulated Human Detection with Flexible Mixtures of Parts, PAMI’13. P. Felzenszwalb and D. Huttenlocher, Distance Transforms of Sampled Functions.

Girshick et al.’14, Rich feature hierarchies for accurate object detection and semantic segmentation, CVPR 2014



Nov 17

Motion and human actions I. (J. Sivic)


R. Szeliski (pdf), section 5.1.2 Dynamic snakes and CONDENSATION; Chapter 8 Dense motion estimation, and in particular sections: 8.1 Incremental refinement and 8.2.2 Learned motion models.

Efros et al., Recognizing action at a distance, ICCV 2003.

Laptev et al., Learning realistic human actions from movies, CVPR’08; Wang et al., Dense trajectories and motion boundary descriptors for action recognition, CVPR’11.

Assignments: Assignment 3 due.



Nov 24

Motion and human actions II. (C. Schmid)


Assignments: Final project proposal due (Nov 24).



Dec 1

3D object recognition and Convolutional neural networks (M. Aubry)

Weakly-supervised learning (I. Laptev)

Materials: Bojanowski et al. Finding Actors and Actions in Movies, ICCV 2013. Bojanowski et al. Weakly supervised action labeling in videos under ordering constraints, ECCV 2014. Vu et al. Predicting Actions from Static Scenes, ECCV 2014. Delaitre et al. Scene semantics from long-term observation of people, ECCV 2012.




Jan 7

Jan 8

Final project presentations and evaluation (I. Laptev, J. Sivic)

Jan 7: 13:00-16:00 Salle INFO 1 - 2nd underground level Rataud building (24 on this map).

Jan 8: 13:00-16:00 Salle R - 2nd underground level Aile rataud (23 on this map)

Relevant literature:


D.A. Forsyth and J. Ponce, "Computer Vision: A Modern Approach", Prentice-Hall, 2nd edition, 2011


J. Ponce, M. Hebert, C. Schmid and A. Zisserman "Toward Category-Level Object Recognition", Lecture Notes in Computer Science 4170, Springer-Verlag, 2007


O. Faugeras, Q.T. Luong, and T. Papadopoulo, "Geometry of Multiple Images", MIT Press, 2001.


R. Hartley and A. Zisserman, "Multiple View Geometry in Computer Vision", Cambridge University Press, 2004.


J. Koenderink, "Solid Shape", MIT Press, 1990


R. Szeliski, "Computer Vision: Algorithms and Applications", 2009. A draft of a new book, which can be downloaded online.