Reconnaissance d’objets et vision artificielle 2016/2017
Object recognition and computer vision 2016/2017

Jean Ponce, Ivan Laptev, Cordelia Schmid and Josef Sivic

Course Information

Class time: Tuesday 16:15-19:15

Room: Salle Conference, 46 rue d'Ulm.

Teaching assistant:  Gul Varol


List of received reports: Please follow this link.

Course description

Automated  object  recognition -- and  more  generally  scene  analysis -- from  photographs  and videos  is  the  grand  challenge  of  computer  vision.  This  course  presents  the  image,  object,  and scene models, as well as the methods and algorithms, used today to address this challenge.


There will be three programming assignments representing 50% (10% + 20% + 20%) of the grade. The supporting materials for the programming assignments and final projects will be in Matlab.

Final project

The final project will represent 50% of the grade. Suggested topics for final projects will be added here. See examples from the last year.

Collaboration policy

You can discuss the assignments and final projects with other students in the class. Discussions are encouraged and are an essential component of the academic environment. However, each student has to work out their assignment alone (including any coding, experiments or derivations) and submit their own report.  For the final project, you may work alone or in a group of maximum of 2 people. If working in a group, we expect a more substantial project, and an equal contribution from each student in the group. The final project report needs to explicitly specify the contribution of each student. Both students are expected to present the project at the oral presentation and contribute equally to writing the report. The assignments and final projects will be checked to contain original material. Any uncredited reuse of material (text, code, results) will be considered as plagiarism and will result in zero points for the assignment / final project. If a plagiarism is detected, the student will be reported to MVA.

Computer vision and machine learning talks

You are welcome to attend seminars in the Willow group. Please see the current seminar schedule. Typically, these are one hour research talks given by visiting speakers. The talks are at 2 Rue Simone IFF. When you enter the building, tell the receptionist you are going for seminar.

Course schedule (subject to change):



Topic and reading materials.



Oct 4

Introduction; Camera geometry (J. Ponce)

Class logistics, assignments, final projects (I. Laptev and J. Sivic)

Background materials:

History: J. Mundy - Object recognition in the geometric era: A retrospective.; Camera geometry: Forsyth&Ponce Ch.1-2. Hartley&Zisserman - Ch.6




Oct 11

Instance-level recognition I. - Local invariant features, correspondence, image matching  (3hrs, J. Sivic);


Mikolajczyk & Schmid, Scale and affine invariant interest point detectors, IJCV 2004; D. Lowe, Distinctive image features from scale-invariant keypoints, IJCV 2004; R. Szeliski (pdf), Sections 4.1, 4.1.1 and 4.1.2 from Chapter 4: Feature detection and matching; R. Szeliski (pdf), Sections 4.1.3 (feature matching) and 6.1 (feature-based alignment);

Assignment: Assignment 1 out.



Oct 18

Sparse coding and dictionary learning for image analysis (3hrs, J. Ponce)

Materials: Bach, Mairal, Ponce, Sapiro, Tutorial on sparse coding and dictionary learning for image analysis, at CVPR'10.




Oct 25

Instance-level recognition II. - Efficient visual search (1.5hrs, C. Schmid)


Muja & Lowe, Fast approx. nearest neighbors with automatic algorithm configuration, VISAPP'09; Sivic & Zisserman, Video Google: Efficient visual search of videos (chapter from this book), Philbin et al., Object retrieval with large vocabularies and fast spatial matching, CVPR'07; Jegou et al., Improving bag-of-features for large scale image search, IJCV 2010; Jegou et al., Aggregating local image descriptors into compact codes, PAMI 2011;

Bag-of-feature models for category-level recognition (1.5hrs, C. Schmid)

Materials: Csurka et al., Visual categorization with bags of keypoints, 2004

Assignments: Assignment 2 out.



Nov 1

Holidays. No lecture.

Assignments: Assignment 1 due on Wed Nov 2.


Nov 8

Neural networks; Optimization methods (3hrs, N. Le Roux)


1. Python examples

2. For more details on neural networks you can watch the video lectures by Hugo Larochelle. The website also includes links to useful reading materials such as “Practical Recommendations for Gradient-Based Training of Deep Architectures” by Y. Bengio.

3. The draft of the book on deep learning by Y. Bengio

Assignments: Assignment 2 due.

Assignment 3 out.



Nov 15

Convolutional neural networks for visual recognition I. (J. Sivic)


Y. LeCun et al., Gradient-based learning applied to document recognition, Proc. of the IEEE 86(11): 2278–2324, 1998; M.D. Zeiler, R. Fergus, Visualizing and Understanding Convolutional Networks, ECCV 2014; M. Oquab et al., Learning and Transferring Mid-Level Image Representations using Convolutional Neural Networks, CVPR 2014.

Final project topics are out. New due date for project proposals: Dec 6.



Nov 22

Convolutional neural networks for visual recognition II. (I. Laptev, J. Sivic)


K. Simonyan and A. Zisserman, Very Deep Convolutional Networks for Large-Scale Visual Recognition, 2014; K. He et al., Deep Residual Learning for Image Recognition, CVPR 2016.

Dalal and Triggs, Histograms of oriented gradients for human detection, CVPR 2005; Felzenszwalb et al., A Discriminatively Trained, Multiscale, Deformable Part Model, CVPR’08; Pascal VOC Challenge; Girshick et al., Rich feature hierarchies for accurate object detection and semantic segmentation, CVPR 2014; Girshick, Fast R-CNN, CVPR 2015; Ren et al., Faster R-CNN: Towards real-time object detection with region proposal networks, NIPS 2015.

Assignments: Assignment 3 due.




Nov 29

Motion and human actions I. (C. Schmid)


Brox and Malik, Large Displacement Optical Flow: Descriptor Matching in Variational Motion Estimation, PAMI 2011; Weinzaepfel et al. Deepflow: Large displacement optical flow with deep matching, CVPR 2013; Laptev et al., Learning realistic human actions from movies, CVPR 2008; Wang et al., Dense trajectories and motion boundary descriptors for action recognition, CVPR 2011; Simonyan and Zisserman, Two-stream convolutional networks for action recognition in videos, NIPS 2014; Tran et al. Learning spatiotemporal features with 3D convolutional networks, ICCV 2015.





Dec 6

Human pose estimation; Weakly-supervised learning I (I. Laptev)


Yang and Ramanan, Articulated Human Detection with Flexible Mixtures-of-Parts, PAMI 2013; Toshev and Szegedy, DeepPose: Human Pose Estimation via Deep Neural Networks, CVPR 2014; Wei et al, Convolutional Pose Machines, CVPR 2016;

Oquab et al., Is object localization for free? - Weakly-supervised learning with convolutional neural networks, CVPR 2015; Kantorov et al., ContextLocNet: Context-aware deep network models for weakly supervised localization, ECCV 2016.

Assignments: Final project proposal due.




Dec 13

3D object recognition and Convolutional neural networks (M. Aubry)

Weakly-supervised learning II (I. Laptev)


Qi et al., Volumetric and multi-view cnns for object classification on 3d data, CVPR 2016; Qi et al., PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation, ArXiv 2017; Aubry and Russell, Understanding deep features with computer-generated imagery, ICCV 2015.

Bojanowski et al. Finding Actors and Actions in Movies, ICCV 2013. Bojanowski et al. Weakly supervised action labeling in videos under ordering constraints, ECCV 2014. Vu et al. Predicting Actions from Static Scenes, ECCV 2014. Delaitre et al. Scene semantics from long-term observation of people, ECCV 2012.



Jan 09

Jan 10

Jan 11

Final project presentations and evaluation (I. Laptev, J. Sivic)

Jan 09: 13:00-17:00

Jan 10: 13:00-17:00

Jan 11: 13:00-17:00

The presentations will take place at Salle Alan Turing - 1st floor at Inria Paris research center, 2 Rue Simone IFF, 75012, Paris. Directions are here. When you enter the building tell the receptionist you are going for the presentation and go directly to the first floor (no special access card is needed).

Relevant literature:


D.A. Forsyth and J. Ponce, "Computer Vision: A Modern Approach", Prentice-Hall, 2nd edition, 2011


J. Ponce, M. Hebert, C. Schmid and A. Zisserman "Toward Category-Level Object Recognition", Lecture Notes in Computer Science 4170, Springer-Verlag, 2007


O. Faugeras, Q.T. Luong, and T. Papadopoulo, "Geometry of Multiple Images", MIT Press, 2001.


R. Hartley and A. Zisserman, "Multiple View Geometry in Computer Vision", Cambridge University Press, 2004.


J. Koenderink, "Solid Shape", MIT Press, 1990


R. Szeliski, "Computer Vision: Algorithms and Applications", 2010. Online book.