detres01c.jpgdetres02c.jpgdetres04c.jpg

INRIA Visual Recognition and Machine Learning
Summer School 2012 
Simple Face Detector: Practical session 



Goal  
The goal of this session is to get basic practical experience with SVM classification as well as with the visual object category detection in still images. We will consider a simple face detector based on the common “scanning-window” technique. Our implementation of the detector will contain the following steps:

  • Part 1: Load, format and normalize positive and negative training images
  • Part 2: Learn and evaluate linear SVM classifier; choose the best C value
  • Part 3: Use SVM classifier to score image patches and to detect faces in test images 


Getting started 
Before the start, download the code and images from cvml2012-practical-face-detection.zip UnZip the archive, start Matlab in the directory cvml2012-practical-face-detection/matlab


Exercise description
Open and edit the script exercise.m in Matlab editor. The script contains commented code and description for all steps of this exercise. You will need to fill in some parts of this script with your own code. The steps of the exercise requiring your code modification are marked in red below.

  • Part 1: Preparing training data
    Go through the steps of loading and visualization training images. Run mean-variance normalization, then format images into SVM-acceptable input by running provided lines of the code. Make sure you understand the format of variables Xtrain, ytrainXvalyval as you will need to operate with them in the next steps.

  • Part 2: SVM classification
    Train and test a linear SVM classifier by following the provided code. SVM training and evaluation is done with svmclass and svmvalmod functions respectively. Next, fill-in your own code implementing the following steps:

       1.1 Compute linear hyper-plane W from SVM support vectors and alpha-coefficients
       1.2 Re-compute confidence values for training and validation using W and bias b. Make
            sure your accuracy values correspond to the ones returned by svmvalmod 
     

    Illustrate W as an image using the provided code. Why does it remind a face? How do you explain different values of W? Next, write your own code to re-train SVM for different values of C:

       2.1 Fill-in the for-loop to train SVM for the changing C-values, compute W and classification
            accuracy for training and test samples in each iteraction. Select SMV model maximizing
            accuracy on the validation set.
     
      2.2 Visualize W as an image at each iteration. Why W looks more like a face for small
            C-values?


    The best classification hyper-plane W looks like an average face image. Cannot we just use such an average image as a classification hyper-plane? Try it out by executing the provided code and see what accuracy you get.   

  • Part 3: Face detection
    Follow the provided code and its comments to read a test image; extract its overlapping pixel patches and use linear SVM to classify the patches. Display bounding boxes of patches with the highest classification score.

    Scanning-window style classification of image patches typically results in multiple responses around the target object. A standard practice to deal with this is to remove any detector responses in the neighborhood of detections with the locally maximal confidence score (non-maxima suppression or NMS). NMS is usually applied to all detections in the image with confidence above a certain threshold. Try NMS face detection for different threshold values and in different images:

  •   3.1 Try different threshold values to pre-select windows passed to the NMS step by modifying
            parameter confthresh
      3.2 Try different threshold values for NMS detections by modifying confthreshnms
      3.3 Try detection with the different thresholds for images: img1.jpg, img2.jpg, img3.jpg,
            img4.jpg. Can you find unique NMS threshold giving perfect face detection in all images?



  • Homework
  • Linear SVM classifier is efficient for analyzing many image sub-windows but has lower accuracy compared to RBF SVM. Non-linear SVM, however, is often too slow to be applied in a window-scanning fashion. To improve detection results and to limit the execution time, one can apply non-linear SVM classification only to samples which have passed the linear SVM. Train non-linear SVM using faster implementations of SVM (e.g. LIBSVM or SVM-Light) and apply it to face detections returned by the linear classifier. Compare detection results of the linear and RBF SVM visually for different images.  
     




2012, Ivan Laptev <Ivan.Laptev@ens.fr>, Josef Sivic <Josef.Sivic@ens.fr>