Reconnaissance d’objets et vision artificielle 2009/2010

Object recognition and computer vision 2009/2010

Assignment 3: Bag-of-Features Image Classification

Jean Ponce, Ivan Laptev, Cordelia Schmid and Josef Sivic

(adapted from S. Lazebnik, UNC)

Due date: December 15th 2009

The Data (13 MB) (source: Caltech Vision Group)

The goal of the assignment is to implement a simple system for bag-of-features image classification. The goal is to perform four-class image classification, with the four classes being airplanes, motorbikes, faces, and cars. The data file contains training and test subdirectories for each category. The test subdirectories contain 50 images each, and the training subdirectories contain 100 images each. You must test your system on all the test images.

System Outline and Implementation Details

Feature extraction. You can use one of the two following methods:

Regions produced by your blob detector from Assignment 1.
Alternatively, here is a Matlab code of an example multi-scale blob detector, similar to what you had to implement in Assignment 1.

Feature description. Use this code for computing 128-dimensional SIFT descriptors of circular regions, such as the ones returned by the blob detector from Assignment 1. Note that this code is not rotation-invariant, i.e., it does not attempt to normalize the patches by rotating them so that the horizontal direction is aligned with the dominant gradient orientation of the patch. However, rotation invariance is not really necessary for the assignment.
Dictionary computation. Run k-means clustering on a subset of all training features to learn the dictionary centers. For k-means you can use the following Matlab/C code. You will need to compile the code by running "mex vgg_kmiter.cxx" in the Matlab command window. Set the dictionary size to about 500. New: If you have problems compiling the above k-means code, try using this alternative k-means function.
Feature quantization and histogram computation. For each feature in a training or a test image, find the index of the nearest codevector in the dictionary. You may want to use this code for fast computation of squared Euclidean distances between two sets of vectors (i.e., all descriptors in an image and the codebook). Following quantization, represent each image by the histogram of the visual word indices (check out MATLAB's "hist" function). Because different images can have different numbers of features, the histograms should be normalized to sum to one.
Classifier training. Implement the nearest-neighbor (NN) classifier, which will classify each test image to one of the four classes. Use the the Chi2 distance as discussed in class. You can use the following function to compute the Chi2 distance.
Baseline: As a baseline, convert each image to grayscale and subsample it to 25x25 pixels, resulting in a 625 dimensional descriptor. You can use Matlab function "imresize" for image sub-sampling. Use this simple descriptor with the nearest neighbor classifier using the standard L2 distance and compare performance to the bag-of-features representation.

What to hand in

You should prepare a (very brief) report including the following:

For 1-2 images from each class, show the detected features as circles overlaid over the image.
In few sentences report implementation / parameter choices you made.
Report the final classification rate, i.e. what is the percentage of test images correctly classified into their correct class. Report the classification rate for both the bag-of-features and the baseline method. You should be able to get above 90% for the bag-of-features method and about 70% for the baseline. What is the "chance" performance for this task, i.e. performance of a random classifier?
Show 1-2 examples of correctly classified test images for each class, together with their nearest neighbor training image.
Show 1-2 examples of misclassified test images for each class, together with their nearest neighbour training image. Briefly, in one or two sentences, discuss why you think some of the images were misclassified.

Instructions for formatting and handing-in assignments:

At the top of the first page of your report include (i) your name, (ii) date, (iii) the assignment number and (iv) the assignment title.
The report should be a single pdf file and should be named using the following format: A#_lastname_firstname.pdf, where you replace # with the assignment number and “firstname” and “lastname” with your name, e.g. A3_Sivic_Josef.pdf.
Zip your code into a single zip file using the same naming convention as above, e.g. A3_Sivic_Josef.zip. We do not intend to run your code, but we may look at it and try to run it if your results look wrong.

Sent the pdf file of your report and the zipped code in two separate files to Josef Sivic <Josef.Sivic@ens.fr>.