Reconnaissance d’objets et vision artificielle 2009/2010

Object recognition and computer vision 2009/2010

 

Assignment 3: Bag-of-Features Image Classification

Jean Ponce, Ivan Laptev, Cordelia Schmid and Josef Sivic

(adapted from S. Lazebnik, UNC)

Due date: December 15th 2009


The Data (13 MB)  (source: Caltech Vision Group)

The goal of the assignment is to implement a simple system for bag-of-features image classification. The goal is to perform four-class image classification, with the four classes being airplanes, motorbikes, faces, and cars. The data file contains training and test subdirectories for each category. The test subdirectories contain 50 images each, and the training subdirectories contain 100 images each. You must test your system on all the test images.

System Outline and Implementation Details

  1. Feature extraction. You can use one of the two following methods:
  2. When setting the parameters of the detector, make sure you get at least few hundred regions per image.

  3. Feature description. Use this code for computing 128-dimensional SIFT descriptors of circular regions, such as the ones returned by the blob detector from Assignment 1. Note that this code is not rotation-invariant, i.e., it does not attempt to normalize the patches by rotating them so that the horizontal direction is aligned with the dominant gradient orientation of the patch. However, rotation invariance is not really necessary for the assignment.

  4. Dictionary computation. Run k-means clustering on a subset of all training features to learn the dictionary centers. For k-means you can use the following Matlab/C code. You will need to compile the code by running "mex vgg_kmiter.cxx" in the Matlab command window. Set the dictionary size to about 500. New: If you have problems compiling the above k-means code, try using this alternative k-means function.

  5. Feature quantization and histogram computation. For each feature in a training or a test image, find the index of the nearest codevector in the dictionary. You may want to use this code for fast computation of squared Euclidean distances between two sets of vectors (i.e., all descriptors in an image and the codebook). Following quantization, represent each image by the histogram of the visual word indices (check out MATLAB's "hist" function). Because different images can have different numbers of features, the histograms should be normalized to sum to one.

  6. Classifier training. Implement the nearest-neighbor (NN) classifier, which will classify each test image to one of the four classes. Use the the Chi2 distance as discussed in class. You can use the following function to compute the Chi2 distance.
  7. Baseline: As a baseline, convert each image to grayscale and subsample it to 25x25 pixels, resulting in a 625 dimensional descriptor. You can use Matlab function "imresize" for image sub-sampling. Use this simple descriptor with the nearest neighbor classifier using the standard L2 distance and compare performance to the bag-of-features representation.

What to hand in

You should prepare a (very brief) report including the following:


Instructions for formatting and handing-in assignments
:


 

Sent the pdf file of your report and the zipped code in two separate files to Josef Sivic <Josef.Sivic@ens.fr>.