Object recognition and computer vision 2012

Jean Ponce, Ivan Laptev, Cordelia Schmid and Josef Sivic

Final project

Description:

The final project amounts to 50% of the final grade. You will have the opportunity to choose your own research topic and to work on a method recently published at a top-quality computer vision conference (ECCV, ICCV, CVPR) or journal (IJCV, TPAMI). We also provide a list of interesting topics / papers below. If you would like to work on another topic (not from the list below), which you may have seen during the class or elsewhere, please consult the topic with the class instructors (I. Laptev and J. Sivic). You may work alone or in a group of 2-3 people. If working in a group, we expect a more substantial project, and an equal contribution from each student in the group.

Your task will be to:

(i) read and understand the research paper,

(ii) implement (a part of ) the paper, and

(iii) perform qualitative/quantitative experimental evaluation.

Evaluation and due dates:

Project proposal (due on Nov 9th). You will submit a 1-page project proposal indicating (i) your chosen topic, (ii) the plan of work, i.e. what are you going to implement, what data you are going to use, what experiments you are going to do, (iii) if working in a group, who are the members of the group and how you plan to share the work. The project proposal will represent 10% of the final project grade.
Project report (due on Dec 23rd). You will write a short report (<3 pages) summarizing your work. The report is due on Dec 23. The report will represent 70% of the final project grade.
Project presentation (on Dec 11 or Dec 12). You will present your work in the class on Dec 11 or Dec 12. The project presentation will represent 20% of the final project grade.

Re-using other’s people code:

You can re-use other people’s code. However, you should clearly indicate in your report/presentation, what is your own code and what was provided by others (don’t forget to indicate the source). We expect projects balanced between implementation / experimental evaluation. For example, if you implement a difficult algorithm from scratch, only few qualitative experimental results may suffice. On the other hand, if you completely use someone else’s implementation, we expect a strong quantitative experimental evaluation with analysis of the obtained results and comparison with baseline methods.

Suggested papers / topics:

Below are some suggested papers and topics for the final projects. If you would like to work on a different topic, please consult your choice with the course instructors (I. Laptev and J. Sivic).

Topic 1. - Spatio-temporal alignment of videos

Paper: Aligning Sequences and Actions by Maximizing Space-Time Correlations (2006) Y. Ukrainitz and M. Irani, ECCV’06

Project page: http://www.wisdom.weizmann.ac.il/~vision/SpaceTimeCorrelations.html

Description: Implement the spatio-temporal alignment algorithm described in (Ukrainitz and Irani 2006). Demonstrate spatio-temporal alignment on their video sequences available here (focus on alignment of human actions, i.e. you can skip sections 5.1 and 6 of the paper). Demonstrate spatio-temporal alignment on your own captured videos. For groups of 2-3 people, experiment with different features for alignment, e.g. HOG3D, and applying the resulting alignment cost for action retrieval in a feature length movie Coffee and Cigarettes. The zip file with annotations is here. The summary of annotations is here. Ask the course instructors for the video.

Topic 2. - Activities in first-person camera view

Paper: Detecting Activities of Daily Living in First-Person Camera Views

H. Pirsiavash and D. Ramanan, CVPR 2012

Project page: http://deepthought.ics.uci.edu/ADLdataset/adl.html

Description: Continuously wearable (first-person view) cameras such as Google Glass may soon become widespread and will enable new applications such as life-logging or memory enhancement. Computer vision technology to access, organize and search visual content from these types of cameras is currently being developed (see this workshop page for examples of recent work in this direction). This project will investigate recent work on detecting activities of daily living (such as washing dishes, making tea, or watching TV) in video footage from a wearable sensor. The goals of the project are: (i) reproduce results from the paper on their dataset, available here. While the code and intermediate results are available, you may also choose to re-implement yourself portion of this work. (ii) Improve on the reported clip classification and temporal sliding window detection results (table 3 in the paper) by trying simple extensions of the work. Examples include trying out other kernels (e.g. Hellinger), combining the object-based representation described in the paper with the bag-of-visual-word representation from assignment 2, or improving the temporal detection accuracy with dynamic programming (see for example this paper). (iv) Finally, you can capture your own data and apply the algorithm on your own videos. Wearable camera will be available from the class instructors.

Topic 3. - Face recognition with “Tom-vs-Pete” classifiers

Paper: Tom-vs-Pete Classiﬁers and Identity-Preserving Alignment for Face Veriﬁcation

T. Berg and P.N. Belhumeur, BMVC 2012.

Description: Face recognition is one of the oldest problems in computer vision which has recently received much attention both in research and industry. Currently winning techniques take advantage of a large amount of annotated faces (e.g. in LFW and PubFig datasets) and use them to pre-train discriminative face properties for recognizing new people. In this project you will partly implement and experiment with the recently published “Tom-vs-Pete” classifiers achieving top results on the LFW face verification benchmark. You will be given the data and the code for facial feature detection and description. Your task will be to implement and compare techniques in the BMVC12 paper above. The amount of work will be adapted to number of people working on the project.

Topic 4. - Reconstructing an image from its local descriptors

Paper: P. Weizenapfel, H. Jegou, and P. Perez, Reconstructing an image from its local descriptors, CVPR 2011

Description: The goal of this project is to (i) implement the image reconstruction method described in Weizenapfel et al., (ii) demonstrate reconstruction results on several examples similar to those shown in the paper, and (iii) show example reconstructions on several sequences of videoframes. You can pick few example videos from here. For groups of 2-3 people, you should also experiment with reconstructions based on visual vocabularies, rather than nearest neighbour matching. The goal would be to demonstrate reconstructions of images from the Oxford building dataset. The images and the extracted visual words including their spatial positions and shape can be found here. The goal is to re-construct an image given only its visual word representation (and a database of images with the same visual words extracted). You can experiment with different approaches, such as (i) taking the mean or median representative for each visual word, or (ii) using for reconstruction only a subset of images with high similarity, measured by the normalized scalar product of tf-idf vectors.

Topic 5. - Feature correspondence by graph matching with geometric invariant potentials

Paper: O. Duchenne, F. Bach, I. Kweon, and J. Ponce, A Tensor-Based Algorithm for High-Order Graph Matching, PAMI 2011

Description: Graph matching is a powerful tool for feature correspondence and object recognition. To successfully apply graph matching to computer vision problems, however, well-defined features (for nodes and edges) and well-designed invariant potentials (for measuring how consistent matching edges or nodes are) are essential. Based on these, graph matching algorithms can be used for various specific applications. The goal of this project is to develop the robust invariant potentials for image matching, 3D shape matching, or video matching. Students are encouraged to propose several potentials and explore better ones based on their performance in the task. The basic codes of graph matching algorithm (here and here) are available, and any feature extractors can be used depending on the specific problem (e.g. 2D interest points, segmentations, superpixels, 3D shape features, or spatio-temporal points). In this project you will: (i) test basic graph matching codes with basic invariant potentials on two sets of synthetically generated 2D and 3D points (as in Duchenne et al.) (ii) develop novel invariant potentials for a specific problem ( matching 2D images, 3D shapes, or video streams ) (iii) quantitatively evaluate them with comparison to the previous potentials (i.e. described in Duchenne et al.) on proper datasets (at least, here for 2D images, and here for 3D shapes) and qualitatively analyze the successes and errors. Motivated students may further investigate how this approach applies to other specific vision applications beyond general matching.

Topic 6. Image Analogies for non-aligned Images.

Related Papers:

1) A. Hertzmann, C. Jacobs, N. Oliver, B. Curless, D. Salesin. Image Analogies. SIGGRAPH 2001 Conference Proceedings.

2) Ken Chatfield, James Philbin, Andrew Zisserman. Efficient Retrieval of Deformable Shape Classes Using Local Self-Similarities. ICCV 2009.

3) Eli Schechtman, Michal Irani. Matching Local Self-Similarities Across Images and Videos. CVPR 2007.

4) P. Weizenapfel, H. Jegou, and P. Perez, Reconstructing an image from its local descriptors, CVPR 2011

Description: Image analogies is a great way to explain / learn relationships between two images. In the example shown below, its easy for the human eye to see the relationship between the Eiffel tower photograph, and its watercolor painting. In fact, you can even find an iphone application that takes your photograph and converts it into a watercolor painting. Your attempt in this project would be to *learn* this function so that given a photograph of another object, you would be able to produce its painting. Earlier papers (see [1]) attempted this problem when the two inputs (photograph and painting) were seen from the *exact* same viewpoint. In order to do this, they used the Approximate Nearest Neighbor package (available here, here and here in matlab). Your goal would be to extend this approach to *similar* (but *not* exact) viewpoints by using the self-similarity descriptors for matching [2, 3]. A dataset of famous buildings in Paris is available here. A dataset of watercolor images will be made available to you shortly. The project should proceed in the following steps.

(i) Implementation and testing of the main Image Analogies algorithm. Preliminary results on Paris dataset.

(ii) Implementation and testing of the self-similarity descriptor. Preliminary results on Paris dataset.

(iii) Use of the self-similarity descriptor to create a *correspondence map*, similar to what optical flow might give you. You may regularly interact with instructors to debate / decide / work out the exact approach for this part.

(iv) Use of this map and dataset to generate watercolor paintings of other famous landmarks.

(v) Use of this map to recreate the *viewpoint* change between input photo/painting on the new photograph.

(vi) Experiments with the current setting and extension to other styles of painting (line drawings, fresco), other objects (like faces ?) etc.

As this is an exploratory project, students are encouraged to gather their own dataset, or explore better methods for establishing the correspondence map required to perform this task. You are also encourage to extend this approach to videos, and to have frequent correspondence with the instructors/project guides for any queries/problems.

(vii)** Can effects like this be produced by this approach using the self-similarity descriptor for transformation.

Your own chosen topic.

You can also choose your own topic, e.g. based on a paper which has been discussed in the class. Please validate the topic with the course instructors (I. Laptev or J. Sivic) first. You can discuss the topic with the course instructors after the class or email to Ivan.Laptev@ens.fr or Josef.Sivic@ens.fr. Below is a short list of recent papers that may provide motivation:

Relative Attributes, D. Parikh and K. Grauman, ICCV 2011.
Project page: http://ttic.uchicago.edu/~dparikh/relative.html
Combining Randomization and Discrimination for Fine-Grained Image Categorization, B. Yao, A. Khosla and L. Fei-Fei, CVPR 2011.
Parsing clothing in fashion photographs. K. Yamaguchi, M.H. Kiapour, L.E. Ortiz and T.L. Berg, CVPR 2012. Project page www.cs.sunysb.edu/~kyamagu/research/clothing_parsing/

Joint topics with the “Introduction to graphical model” class (F. Bach and G. Obozinski).

The joint project between two classes is expected to be more substantial and will have a strong machine learning as well as computer vision component. Please contact the instructors of both courses if you are interested in the joint project. We will discuss and adjust the requirements from each course depending on the size of the group.

Example topics:

Topic J1 - Hierarchical Context

Paper: Exploiting Hierarchical Context on a Large Database of Object Categories

Myung Jin Choi, Joseph Lim, Antonio Torralba, and Alan S. Willsky

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), San

Francisco, CA, June 2010.

http://people.csail.mit.edu/torralba/publications/hcontext.pdf

Topic J2 - Tracking objects

Paper: Globally-Optimal Greedy Algorithms for Tracking a Variable Number of Objects

H. Pirsiavash, D. Ramanan, C. Fowlkes, Computer Vision

and Pattern Recognition (CVPR) Colorado Springs, Colorado, June 2011.

http://www.ics.uci.edu/~dramanan/papers/tracking2011.pdf

Topic J3 - Activity forecasting

Paper: Activity forecasting. Kris M. Kitani, Brian D. Ziebart, Drew Bagnell and Martial Hebert.

European Conference on Computer Vision (ECCV 2012).

Project page: http://www.cs.cmu.edu/~kkitani/ActivityForecasting.html

This topic is particularly suitable for someone taking also the “Reinforcement learning” class by Remi Munos.

You can also define your own topic for a joint project between the two classes. You need to validate the topic with the instructors for both courses.

Instructions for writing and submitting the project proposal.

You will submit a 1-page project proposal indicating (i) your chosen topic, (ii) the plan of work, i.e. what are you going to implement, what data you are going to use, what experiments you are going to do, (iii) if working in a group, who are the members of the group and how you plan to share the work. The due date for the proposal is given at the beginning of this page. The project proposal should be a single 1-page pdf file.
The proposal pdf should be named using the following format: FP_lastname1_lastname2_lastname3.pdf, where you replace "lastname*" with last names of all members of your group in alphabetical order, e.g. for a group consisting of 3 people: I. Laptev, J. Ponce and C. Schmid, the file name should be FP_Laptev_Ponce_Schmid.pdf.
Send the pdf file of your proposal to Ivan Laptev <Ivan.Laptev@ens.fr>.

Instructions for writing and submitting the final project report

You will hand-in a 3 page report in the format of the submission to the IEEE Computer Vision and Pattern Recognition conference (CVPR) . Use the latex or word templates provided at the CVPR Author Guidelines webpage. Note, that your are asked to produce only a 3-page double-column report (in contrast, a standard CVPR submission is up-to 8 pages).

At the top of the first page of your report include (i) names of all members of your group (up to 3 people), (ii) date, and (iii) the title of your final project.

The report should be a single pdf file and should be named using the following format: FP_lastname1_lastname2_lastname3.pdf, where you replace "lastname*" with last names of all members of your group in alphabetical order, e.g. for a group consisting of 3 people: I. Laptev, J. Ponce and C. Schmid, the file name should be FP_Laptev_Ponce_Schmid.pdf.

Send the pdf file of your report to Ivan Laptev <Ivan.Laptev@ens.fr>.

Instructions on preparing the project presentation.

Each group will present their final project work in the class.
Timing. Depending on the size of the group you will have 10-20 min slot to present your work. The exact timing and schedule of the presentations will be determined during the course.
Who should speak? If you are working in a group, you can have one person presenting for the whole group, but it is preferable that all members of the group get to present a part of the project.
Content. You should introduce the topic, clearly state what the goal of the project is. Show the work you have done. When describing results, please show both qualitative and quantitative results you have obtained and any interesting observations / findings you have made. Your audience are the other students in the class and the class instructors. You want to show us that you have done interesting work. Remember, it is good to illustrate your findings with images.
Re-using material / figures / slides from other people. You can take figures from papers or other people’s slides to illustrate an algorithm or explain a method. However, allways properly acknowledge the source if you do so.