Description:
The final project amounts to 50% of the final grade. You will have the opportunity to choose your own research topic and to work on a method recently published at a top-quality computer vision conference (ECCV, ICCV, CVPR) or journal (IJCV, TPAMI). We also provide a list of interesting topics / papers below. If you would like to work on another topic (not from the list below), which you may have seen during the class or elsewhere, please consult the topic with the class instructors (I. Laptev and J. Sivic). You may work alone or in a group of 2-3 people. If working in a group, we expect a more substantial project, and an equal contribution from each student in the group.
Your task will be to:
(i) read and understand the research paper,
(ii) implement (a part of ) the paper, and
(iii) perform qualitative/quantitative experimental evaluation.
Evaluation and due dates:
Re-using other’s people code:
You can re-use other people’s code. However, you should clearly indicate in your report/presentation, what is your own code and what was provided by others (don’t forget to indicate the source). We expect projects balanced between implementation / experimental evaluation. For example, if you implement a difficult algorithm from scratch, only few qualitative experimental results may suffice. On the other hand, if you completely use someone else’s implementation, we expect a strong quantitative experimental evaluation with analysis of the obtained results and comparison with baseline methods.
Suggested papers / topics:
Below are some suggested papers and topics for the final projects. If you would like to work on a different topic, please consult your choice with the course instructors (I. Laptev and J. Sivic).
Topic 1. - Spatio-temporal alignment of videos
Paper: Aligning Sequences and Actions by Maximizing Space-Time Correlations (2006) Y. Ukrainitz and M. Irani, ECCV’06
Project page: http://www.wisdom.weizmann.ac.il/~vision/SpaceTimeCorrelations.html
Description: Implement the spatio-temporal alignment algorithm described in (Ukrainitz and Irani 2006). Demonstrate spatio-temporal alignment on their video sequences available here (focus on alignment of human actions, i.e. you can skip sections 5.1 and 6 of the paper). Demonstrate spatio-temporal alignment on your own captured videos. For groups of 2-3 people, experiment with different features for alignment, e.g. HOG3D, and applying the resulting alignment cost for action retrieval in a feature length movie Coffee and Cigarettes. The zip file with annotations is here. The summary of annotations is here. Ask the course instructors for the video. |
Topic 2. - Activities in first-person camera view
Paper: Detecting Activities of Daily Living in First-Person Camera Views
H. Pirsiavash and D. Ramanan, CVPR 2012
Project page: http://deepthought.ics.uci.edu/ADLdataset/adl.html
Description: Continuously wearable (first-person view) cameras such as Google Glass may soon become widespread and will enable new applications such as life-logging or memory enhancement. Computer vision technology to access, organize and search visual content from these types of cameras is currently being developed (see this workshop page for examples of recent work in this direction). This project will investigate recent work on detecting activities of daily living (such as washing dishes, making tea, or watching TV) in video footage from a wearable sensor. The goals of the project are: (i) reproduce results from the paper on their dataset, available here. While the code and intermediate results are available, you may also choose to re-implement yourself portion of this work. (ii) Improve on the reported clip classification and temporal sliding window detection results (table 3 in the paper) by trying simple extensions of the work. Examples include trying out other kernels (e.g. Hellinger), combining the object-based representation described in the paper with the bag-of-visual-word representation from assignment 2, or improving the temporal detection accuracy with dynamic programming (see for example this paper). (iv) Finally, you can capture your own data and apply the algorithm on your own videos. Wearable camera will be available from the class instructors. |
Topic 3. - Face recognition with “Tom-vs-Pete” classifiers
Paper: Tom-vs-Pete Classifiers and Identity-Preserving Alignment for Face Verification
T. Berg and P.N. Belhumeur, BMVC 2012.
Related project page: http://www.cs.columbia.edu/CAVE/projects/faceverification/
Description: Face recognition is one of the oldest problems in computer vision which has recently received much attention both in research and industry. Currently winning techniques take advantage of a large amount of annotated faces (e.g. in LFW and PubFig datasets) and use them to pre-train discriminative face properties for recognizing new people. In this project you will partly implement and experiment with the recently published “Tom-vs-Pete” classifiers achieving top results on the LFW face verification benchmark. You will be given the data and the code for facial feature detection and description. Your task will be to implement and compare techniques in the BMVC12 paper above. The amount of work will be adapted to number of people working on the project. |
Topic 4. - Reconstructing an image from its local descriptors
Paper: P. Weizenapfel, H. Jegou, and P. Perez, Reconstructing an image from its local descriptors, CVPR 2011
Description: The goal of this project is to (i) implement the image reconstruction method described in Weizenapfel et al., (ii) demonstrate reconstruction results on several examples similar to those shown in the paper, and (iii) show example reconstructions on several sequences of videoframes. You can pick few example videos from here. For groups of 2-3 people, you should also experiment with reconstructions based on visual vocabularies, rather than nearest neighbour matching. The goal would be to demonstrate reconstructions of images from the Oxford building dataset. The images and the extracted visual words including their spatial positions and shape can be found here. The goal is to re-construct an image given only its visual word representation (and a database of images with the same visual words extracted). You can experiment with different approaches, such as (i) taking the mean or median representative for each visual word, or (ii) using for reconstruction only a subset of images with high similarity, measured by the normalized scalar product of tf-idf vectors.
Topic 5. - Feature correspondence by graph matching with geometric invariant potentials
Paper: O. Duchenne, F. Bach, I. Kweon, and J. Ponce, A Tensor-Based Algorithm for High-Order Graph Matching, PAMI 2011
Description: Graph matching is a powerful tool for feature correspondence and object recognition. To successfully apply graph matching to computer vision problems, however, well-defined features (for nodes and edges) and well-designed invariant potentials (for measuring how consistent matching edges or nodes are) are essential. Based on these, graph matching algorithms can be used for various specific applications. The goal of this project is to develop the robust invariant potentials for image matching, 3D shape matching, or video matching. Students are encouraged to propose several potentials and explore better ones based on their performance in the task. The basic codes of graph matching algorithm (here and here) are available, and any feature extractors can be used depending on the specific problem (e.g. 2D interest points, segmentations, superpixels, 3D shape features, or spatio-temporal points). In this project you will: (i) test basic graph matching codes with basic invariant potentials on two sets of synthetically generated 2D and 3D points (as in Duchenne et al.) (ii) develop novel invariant potentials for a specific problem ( matching 2D images, 3D shapes, or video streams ) (iii) quantitatively evaluate them with comparison to the previous potentials (i.e. described in Duchenne et al.) on proper datasets (at least, here for 2D images, and here for 3D shapes) and qualitatively analyze the successes and errors. Motivated students may further investigate how this approach applies to other specific vision applications beyond general matching.
Topic 6. Image Analogies for non-aligned Images.
Related Papers:
1) A. Hertzmann, C. Jacobs, N. Oliver, B. Curless, D. Salesin. Image Analogies. SIGGRAPH 2001 Conference Proceedings.
2) Ken Chatfield, James Philbin, Andrew Zisserman. Efficient Retrieval of Deformable Shape Classes Using Local Self-Similarities. ICCV 2009.
3) Eli Schechtman, Michal Irani. Matching Local Self-Similarities Across Images and Videos. CVPR 2007.
4) P. Weizenapfel, H. Jegou, and P. Perez, Reconstructing an image from its local descriptors, CVPR 2011
Description: Image analogies is a great way to explain / learn relationships between two images. In the example shown below, its easy for the human eye to see the relationship between the Eiffel tower photograph, and its watercolor painting. In fact, you can even find an iphone application that takes your photograph and converts it into a watercolor painting. Your attempt in this project would be to *learn* this function so that given a photograph of another object, you would be able to produce its painting. Earlier papers (see [1]) attempted this problem when the two inputs (photograph and painting) were seen from the *exact* same viewpoint. In order to do this, they used the Approximate Nearest Neighbor package (available here, here and here in matlab). Your goal would be to extend this approach to *similar* (but *not* exact) viewpoints by using the self-similarity descriptors for matching [2, 3]. A dataset of famous buildings in Paris is available here. A dataset of watercolor images will be made available to you shortly. The project should proceed in the following steps.
(i) Implementation and testing of the main Image Analogies algorithm. Preliminary results on Paris dataset.
(ii) Implementation and testing of the self-similarity descriptor. Preliminary results on Paris dataset.
(iii) Use of the self-similarity descriptor to create a *correspondence map*, similar to what optical flow might give you. You may regularly interact with instructors to debate / decide / work out the exact approach for this part.
(iv) Use of this map and dataset to generate watercolor paintings of other famous landmarks.
(v) Use of this map to recreate the *viewpoint* change between input photo/painting on the new photograph.
(vi) Experiments with the current setting and extension to other styles of painting (line drawings, fresco), other objects (like faces ?) etc.
As this is an exploratory project, students are encouraged to gather their own dataset, or explore better methods for establishing the correspondence map required to perform this task. You are also encourage to extend this approach to videos, and to have frequent correspondence with the instructors/project guides for any queries/problems.
(vii)** Can effects like this be produced by this approach using the self-similarity descriptor for transformation.
Your own chosen topic.
You can also choose your own topic, e.g. based on a paper which has been discussed in the class. Please validate the topic with the course instructors (I. Laptev or J. Sivic) first. You can discuss the topic with the course instructors after the class or email to Ivan.Laptev@ens.fr or Josef.Sivic@ens.fr. Below is a short list of recent papers that may provide motivation:
Joint topics with the “Introduction to graphical model” class (F. Bach and G. Obozinski).
The joint project between two classes is expected to be more substantial and will have a strong machine learning as well as computer vision component. Please contact the instructors of both courses if you are interested in the joint project. We will discuss and adjust the requirements from each course depending on the size of the group.
Example topics:
Topic J1 - Hierarchical Context
Paper: Exploiting Hierarchical Context on a Large Database of Object Categories
Myung Jin Choi, Joseph Lim, Antonio Torralba, and Alan S. Willsky
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), San
Francisco, CA, June 2010.
http://people.csail.mit.edu/torralba/publications/hcontext.pdf
Topic J2 - Tracking objects
Paper: Globally-Optimal Greedy Algorithms for Tracking a Variable Number of Objects
H. Pirsiavash, D. Ramanan, C. Fowlkes, Computer Vision
and Pattern Recognition (CVPR) Colorado Springs, Colorado, June 2011.
http://www.ics.uci.edu/~dramanan/papers/tracking2011.pdf
Topic J3 - Activity forecasting
Paper: Activity forecasting. Kris M. Kitani, Brian D. Ziebart, Drew Bagnell and Martial Hebert.
European Conference on Computer Vision (ECCV 2012).
Project page: http://www.cs.cmu.edu/~kkitani/ActivityForecasting.html
This topic is particularly suitable for someone taking also the “Reinforcement learning” class by Remi Munos.
You can also define your own topic for a joint project between the two classes. You need to validate the topic with the instructors for both courses.
Send the pdf file of your report to Ivan Laptev <Ivan.Laptev@ens.fr>.