ENS/INRIA Visual Recognition and Machine Learning Summer School -
Scholarship application project
Stitching photo-mosaics
Adapted from A. Efros,
CMU, S. Lazebnik, UNC, and A. Zisserman,
Oxford.
Due date: April 15th 2013
The goal
of the project is to automatically stitch images acquired by a panning camera
into a mosaic as illustrated in figures 1 and 2 below.
Fig.1: Three images acquired
by a panning camera.
Fig.2: Images stitched to a
mosaic. Algorithm outline:
1. Choose one image as
the reference frame. 2. Estimate homography between each of the remaining images and the
reference image. To estimate homography between two
images use the following procedure: a. Detect local features
in each image. b. Extract feature
descriptor for each feature point. c. Match feature
descriptors between two images. d. Robustly
estimate homography using RANSAC. 3.
Warp each image into the reference frame and composite warped images into a
single mosaic. Tips and detailed description of the
algorithm:
The
algorithm will be described on the example of stitching images shown in
figure 1. 1. Choosing the reference image. Choosing the middle image of the
sequence as the reference frame is the preferred option, as it will result in
mosaics with less distortion. However, you can choose any image of the
sequence as the reference frame. If you choose image 1 or 3 as the reference
frame, directly estimating homography between
images 1 and 3, will be difficult as they have very small overlap. To deal
with this issue, you might have to “chain” homographies,
e.g. H13 = H12 * H23. 2. Estimating homography. a. Detect local features in each image. You can use function harris.m by A. Efros implementing a simple single scale Harris corner
detector. Note detection of local features with a sub-pixel
accuracy is not required for this assignment. b. Extract SIFT descriptor for each feature point. You can use
the Matlab SIFT descriptor function find_sift.m from S. Lazebnik to produce a 128-dimensional SIFT descriptor
around a circular region at each feature point. Note that this code is not
rotation-invariant, i.e., it does not attempt to normalize the patches by
rotating them so that the horizontal direction is aligned with the dominant
gradient orientation of the patch. c. Match feature descriptors between two images. Implement
Feature Matching (See Section 5 in “Multi-Image
Matching using Multi-Scale Oriented Patches” by Brown et al.). That is, you will need to
find pairs of features that look similar and are thus likely to be in
correspondence. We will call this set of matched features “tentative”
correspondences. You may find function dist2.m useful for distance
computations. For thresholding, use the simpler
approach due to Lowe of thresholding on the ratio
between the first and the second nearest neighbors. Consult Figure 6b in the
paper for picking the threshold. Note: You can ignore the fast indexing described
in section 6 of the paper. You
can visualize the tentative correspondences between two images by displaying
the feature displacements. For example, to visualize tentative correspondences
between image 1 and 2: (i) show image 1, (ii) show
detected features in image 1 (display only region centers as points, do not
worry about the regions’ scale), (iii) show displacements between detected
features in image 1 and matched features in image 2 by line segments. This is
illustrated in figure 3 and can be achieved using the following matlab code, where im1_pts and im2_pts are 2-by-n matrices holding (x,y) image locations of tentatively corresponding
features in image 1 and image 2, respectively: figure;
clf; imagesc(im1rgb);
hold on; % show features detected in image 1 plot(im1_pts(1,:),im1_pts(2,:),'+g');
% show displacements line([im1_pts(1,:);
im2_pts(1,:)],[im1_pts(2,:); im2_pts(2,:)],'color','y')
d. Robustly estimate homography
using RANSAC.
Use a sample of 4-points to compute each homography
hypothesis. You will need to write a function of the form: H
= computeH(im1_pts,im2_pts) where,
again, im1_pts and im2_pts are 2-by-n matrices holding
the (x,y) locations of n(=4) point correspondences
from the two images and H is the recovered 3x3 homography
matrix. In order to compute the entries in the matrix H, you will
need to set up a linear system of n equations (i.e. a matrix equation of the
form Ah=0 where h is a
vector holding the 8 unknown entries of H). The solution to the
homogeneous least squares system Ax=0
is obtained from the SVD of A by the singular vector corresponding to the
smallest singular value. In Matlab: [U,S,V]=svd(A); x = V(:,end); For more details on homography estimation from point correspondences see a note
written by David Kriegman. For RANSAC, a very simple implementation performing a
fixed number of sampling iterations is sufficient. You should output a single
transformation that gets the most inliers in the course of all the
iterations. For the various RANSAC parameters (number of iterations, inlier
threshold), play around with a few ``reasonable'' values and pick the ones
that work best. For randomly sampling matches, you can use the Matlab randperm function. You should
display the set of inliers as illustrated in figure 3. Finally,
after you find the set of inliers using RANSAC, don’t forget to re-estimate
the homography from all inliers. 3. Warping and compositing. Warp each image into the reference frame
using the estimated homography and composite warped
images into a single mosaic. You can use the vgg_warp_H function for warping. Here is an example
code to warp and composite images 1 and 2 : First, define a mosaic image to warp all the images onto.
Here we assume image 2 as the reference image, and map this image to the
origin of the mosaic image using the identity homography
(eye(3) in Matlab). bbox
= [-400 1200 -200 700] % image space for mosaic Im2w = vgg_warp_H(Im2,
eye(3), ’linear’, bbox); % warp image 1 to mosaic
image Now warp image 1 to a separate mosaic image using
estimated homography H12 between image 1 and image
2 Im1w = vgg_warp_H(Im1,
H12, ’linear’, bbox); and finally combine the mosaic
images by taking the pixel with maximum value from each image. This tends to
produce less artifacts than taking the average of
warped images. imagesc(double(max(Im1w,Im2w))/255); |
|
|