Reconnaissance d’objets et vision artificielle 2022/2023
Object recognition and computer vision 2022/2023

Ivan Laptev, Jean Ponce, Cordelia Schmid and Josef Sivic

Course Information

Lecture time: Tuesday 16:15-19:15

Lecture room: Salle Dussane, ENS Ulm, 45 rue d’Ulm, Paris (4/10, 25/10, 8/11, 15/11, 22/11, 29/11, 6/12,13/12)

Salle Jaurès, ENS Ulm, 29 rue d’Ulm, Paris (11/10, 18/10)

Teaching assistants: Ricardo Garcia, Guillaume Le Moing and Antoine Yang

News

Final Project topics are out - submit project proposals by Dec 6
/!\ Today's lecture (November 8) will start at 16.30 (15 min later).
Re-arrangement of lectures 4-7, see course schedule
/!\ The class is now on Google Classroom. If you have not received the access link, please fill this form.
/!\ Assignment 1 is now out. The assignment notebook can be downloaded here.
2023 Willow internship topics (to be updated during the course).

Course description

Automated object recognition -- and more generally scene analysis -- from photographs and videos is the grand challenge of computer vision. This course presents the image, object, and scene models, as well as the methods and algorithms, used today to address this challenge.

Assignments

There will be three programming assignments representing 50% (10% + 20% + 20%) of the grade. The supporting materials for the programming assignments and final projects will be in Python and make use of Jupyter notebooks. For additional technical instructions on the assignments please follow this link.

Final project

The final project will represent 50% of the grade.

Collaboration policy

You can discuss the assignments and final projects with other students in the class. Discussions are encouraged and are an essential component of the academic environment. However, each student has to work out their assignment alone (including any coding, experiments or derivations) and submit their own report. For the final project, you may work alone or in a group of maximum of 3 people. If working in a group, we expect a more substantial project, and an equal contribution from each student in the group. The final project report needs to explicitly specify the contribution of each student. Both students are expected to present the project at the oral presentation and contribute equally to writing the report. The assignments and final projects will be checked to contain original material. Any uncredited reuse of material (text, code, results) will be considered as plagiarism and will result in zero points for the assignment / final project. If a plagiarism is detected, the student will be reported to MVA.

Computer vision and machine learning talks

You are welcome to attend seminars in the Willow group. Please see the current seminar schedule. Typically, these are one hour research talks given by visiting speakers. The talks are at 2 Rue Simone IFF. When you enter the building, tell the receptionist you are going for a seminar.

Course schedule (subject to change):

Lecture	Date	Topic and reading materials.	Slides
1	Oct 4 Salle Dussane	Class logistics: assignments, final projects, grading (I. Laptev) Introduction to visual recognition; Instance-level recognition I. - Local invariant features (C. Schmid); Materials: Mikolajczyk & Schmid, Scale and affine invariant interest point detectors, IJCV 2004; D. Lowe, Distinctive image features from scale-invariant keypoints, IJCV 2004;	[logistics] [intro] [local features]
2	Oct 11 Salle Jaurès	Camera geometry (J. Ponce); Instance-level recognition II. - Correspondence, image matching (I. Laptev); Materials: History: J. Mundy - Object recognition in the geometric era: A retrospective.; Camera geometry: Forsyth&Ponce Ch.1-2. Hartley&Zisserman - Ch.6 R. Szeliski, Sections 7.1, 7.1.1, 7.1.2 and 7.1.3 from Chapter 7: Feature detection and matching; R. Szeliski, Section 8.1 from Chapter 8: Image alignment Assignments: Assignment 1 out	[geometry] [image matching]
3	Oct 18 Salle Jaurès	Instance-level recognition III. - Efficient visual search (J. Sivic) Materials: Muja & Lowe, Fast approx. nearest neighbors with automatic algorithm configuration, VISAPP'09; Sivic & Zisserman, Video Google: Efficient visual search of videos (chapter from this book), Philbin et al., Object retrieval with large vocabularies and fast spatial matching, CVPR'07; Jegou et al., Improving bag-of-features for large scale image search, IJCV 2010; Jegou et al., Aggregating local image descriptors into compact codes, PAMI 2011; Iscen et al., Efficient Diffusion on Region Manifolds, CVPR 2017; Arandjelovic et al., NetVLAD: CNN architecture for weakly-supervised place recognition, PAMI 2018.	[search]
4	Oct 25 Salle Dussane	Sparse coding and dictionary learning for image analysis (3hrs, J. Ponce) Materials: Bach, Mairal, Ponce, Sapiro, Tutorial on sparse coding and dictionary learning for image analysis, at CVPR'10; Eboli, Sun and Ponce, End-to-end interpretable learning of non-blind image deblurring, ECCV 2022; Lecouat, Ponce and Mairal, Lucas-kanade reloaded: End-to-end super-resolution from raw image bursts. ICCV 2021. Assignments: Assignment 1 due Assignment 2 out	[sparse coding]
	Nov 1	No lecture (Toussaint)
5	Nov 8 Salle Dussane	Supervised learning and deep learning; Optimization and regularization for neural networks; Sequence models and transformers (A. Joulin) Materials: For more details on neural networks you can watch the video lectures by Hugo Larochelle. The website also includes links to useful reading materials such as “Practical Recommendations for Gradient-Based Training of Deep Architectures” by Y. Bengio.	[intro_nn] [code]
6	Nov 15 Salle Dussane	Neural networks for visual recognition I. (G. Varol) Materials: Y. LeCun et al., Gradient-based learning applied to document recognition, Proc. of the IEEE 86(11): 2278–2324, 1998; A. Krizhevsky et al. ImageNet Classification with Deep Convolutional Neural Networks, NeurIPS 2012; M.D. Zeiler, R. Fergus, Visualizing and Understanding Convolutional Networks, ECCV 2014; M. Oquab et al., Learning and Transferring Mid-Level Image Representations using Convolutional Neural Networks, CVPR 2014; K. Simonyan and A. Zisserman, Very Deep Convolutional Networks for Large-Scale Visual Recognition, 2014; K. He et al., Deep Residual Learning for Image Recognition, CVPR 2016; A. Vaswani, Attention is all you need, NeurIPS 2017; A. Dosovitskiy et al. An image is worth 16x16 words: Transformers for image recognition at scale. ICLR 2021. Basics of CNNs by A. Vedaldi: http://www.robots.ox.ac.uk/~vedaldi//assets/teach/2019/c18-notes.pdf Assignments: Assignment 2 due NOV 18. Assignment 3 out.	[NN for visual recognition I]
7	Nov 22 Salle Dussane	Neural networks for visual recognition II. (I. Laptev) Materials: CVPR’08; Pascal VOC Challenge; Girshick et al., Rich feature hierarchies for accurate object detection and semantic segmentation, CVPR 2014; Girshick, Fast R-CNN, CVPR 2015; Ren et al., Faster R-CNN: Towards real-time object detection with region proposal networks, NIPS 2015. Redmon et al., You only look once: Unified, real-time object detection, CVPR 2016; Zhou et al., Objects as points, 2019; Long et al., Fully convolutional networks for semantic segmentation, CVPR 2015; Chen et al., DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs, PAMI 2017; He et al., Mask R-CNN, ICCV 2017.	[NN for visual recognition II]
8	Nov 29 Salle Dussane	Human action recognition (C. Schmid) Materials: Brox and Malik, Large Displacement Optical Flow: Descriptor Matching in Variational Motion Estimation, PAMI 2011; Weinzaepfel et al. Deepflow: Large displacement optical flow with deep matching, CVPR 2013; Laptev et al., Learning realistic human actions from movies, CVPR 2008; Wang et al., Dense trajectories and motion boundary descriptors for action recognition, CVPR 2011; Simonyan and Zisserman, Two-stream convolutional networks for action recognition in videos, NIPS 2014; Tran et al. Learning spatiotemporal features with 3D convolutional networks, ICCV 2015. Assignments: Assignment 3 due. Final project topics are out	[action_part1 action_part2]
9	Dec 6 Salle Dussane	Weakly-supervised learning; Self-supervised learning; Vision for robotics (I. Laptev) Materials: ECCV 2016; Oquab et al., Is object localization for free? - Weakly-supervised learning with convolutional neural networks, CVPR 2015;. Alayrac et al., Unsupervised learning from narrated instruction videos, CVPR 2016; Varol et al., Learning from Synthetic Humans, CVPR 2017; Hasson et al., Learning joint reconstruction of hands and manipulated objects, CVPR 2019; Miech et al., End-to-End Learning of Visual Representations from Uncurated Instructional Videos, CVPR 2020. Final project proposal due.	[weaksup selfsup robotics]
10	Dec 13 Salle Dussane	Generative models (G. Varol); Deep Learning and 3D data (M. Aubry) Materials: Generative models: - VAEs: D. Kingma and M. Welling, Auto-Encoding Variational Bayes, ICLR 2014; A. Ramesh et al., Zero-Shot Text-to-Image Generation, ICML 2021. - GANs: I. Goodfellow et al., Generative adversarial nets, NIPS 2014; T. Karras et al., A Style-Based Generator Architecture for Generative Adversarial Networks. CVPR 2019. - Diffusion: J. Ho et al. Denoising diffusion probabilistic models, NeurIPS 2020; R. Rombach et al. High-Resolution Image Synthesis with Latent Diffusion Models, CVPR 2022. DL and 3D: 1. 3D analysis: Qi et al., Volumetric and multi-view cnns for object classification on 3d data, CVPR 2016; Qi et al., PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation, CVPR 2017; Groueix et al. 3D-CODED: 3D correspondences by deep deformation, ECCV 2018. 2. 3D generation: Fan et al. A point set generation network for 3d object reconstruction from a single image, CVPR 2017, Groueix et al. AtlasNet: A papier-mâché approach to learning 3d surface generation, CVPR. 2018; Park et al. Deepsdf: Learning continuous signed distance functions for shape representation, CVPR 2019, Midenhall et al. , Nerf: Representing scenes as neural radiance fields for view synthesis, ECCV 2020. 3. Training with synthetic data: Tobin et al. Domain randomization for transferring deep neural networks from simulation to the real world IROS 2017; Torralba and Efros Unbiased look at dataset bias CVPR 2011; Ganin et al. Domain-adversarial training of neural networks JMLR 2016.	[generative_models] [DL_and_3D]
11		Final project presentations and evaluation Jan 9: 10:30-12:00; 13:00-16:00 Jan 10: 10:30-12:00; 13:00-16:00 The presentations will be virtual. Links will be provided. Final project reports due on 16/01

Relevant literature:

[1]	D.A. Forsyth and J. Ponce, "Computer Vision: A Modern Approach", Prentice-Hall, 2nd edition, 2011
[2]	J. Ponce, M. Hebert, C. Schmid and A. Zisserman "Toward Category-Level Object Recognition", Lecture Notes in Computer Science 4170, Springer-Verlag, 2007
[3]	O. Faugeras, Q.T. Luong, and T. Papadopoulo, "Geometry of Multiple Images", MIT Press, 2001.
[4]	R. Hartley and A. Zisserman, "Multiple View Geometry in Computer Vision", Cambridge University Press, 2004.
[5]	J. Koenderink, "Solid Shape", MIT Press, 1990
[6]	R. Szeliski, "Computer Vision: Algorithms and Applications, 2nd ed.", 2022. Online book.