Learning and calibrating per-location classifiers for visual place recognition

Petr Gronat, Guillaume Obozinski, Josef Sivic and Tomas Pajdla

Abstract:

The aim of this work is to localize a query photograph by finding other images depicting the same place in a large geotagged image database. This is a challenging task due to changes in viewpoint, imaging conditions and the large size of the image database. The contribution of this work is two-fold. First, we cast the place recognition problem as a classification task and use the available geotags to train a classifier for each location in the database in a similar manner to per-exemplar SVMs in object recognition. Second, as only few positive training examples are available for each location, we propose a new approach to calibrate all the perlocation SVM classifiers using only the negative examples. The calibration we propose relies on a significance measure essentially equivalent to the p-values classically used in statistical hypothesis testing. Experiments are performed on a database of 25,000 geotagged street view images of Pittsburgh and demonstrate improved place recognition accuracy of the proposed approach over the previous work.

Citation:

Petr Gronat, Guillaume Obozinski, Josef Sivic, Tomas Pajdla. Learning and Calibrating Per-Location Classifiers for Visual Place Recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2013.

BibTeX:

@InProceedings{Gronat_2013_CVPR,
author = {Petr Gronat and Guillaume Obozinski and Josef Sivic and Tomas Pajdla},
title = {Learning and Calibrating Per-Location Classifiers for Visual Place Recognition},
journal = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2013}
}

Datasets and evaluation:

We provide data set on request by email. Please mail to name.surename@inria.fr (please replace name and surname by the first author's name).

Database:

We performed experiments on a database of Google Streetview images from the Internet [1]. We downloaded panoramas from Pittsburgh (U.S.) covering roughly an area of 1.3x1.3 km^2. For each panorama we generate 12 overlapping perspective views corresponding to two different elevation angles 4deg and 24deg to capture both the street-level scene and the building facades. This results in a total of 24 perspective views each with 90deg FOV and resolution of 960x720 pixels. The dataset contains in 25k perspective images.

Query set:

As a query set with known ground truth GPS positions, we use 8999 panoramas from the Google Streetview research dataset [2], which cover approximately the same area, but were captured at a different time, and typically depict the same places from different viewpoints and under different illumination conditions. We generate a test query set such that we first select a panorama at random, and second, we generate a perspective image with a random orientation and random elevation pitch. This way we synthesize 4,000 query test images.

[1] Building streetview datasets for place recognition and city reconstruction, P. Gronat, M. Havlena, J. Sivic, T. Pajdla, Tech. Rep. CTU-CMP-2011-16, Czech Tech University.
[2] Google company,"ICMLA 2011 StreetView Recognition Challenge", http://www.icmla-conference.org/icmla11/challenge.htm.

Acknowledgements:

This work was partly supported by the ERC grant LEAP, ANR project Semapolis (ANR-13-CORD-0003) and the Intelligence Advanced Research Projects Activity (IARPA) via Air Force Research Laboratory, contract FA8650-12-C-7212. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon. Disclaimer: The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of IARPA, AFRL, or the U.S. Government.