Paper
M. Cho, K. Alahari, J. Ponce
Learning Graphs to Match
Proceedings of the IEEE Interational Conference on Computer Vision (2013)
PDF |
Abstract |
BibTeX |
Slides PDF(38MB) |
Slides PPTX(68MB)
Abstract
Many tasks in computer vision are formulated as graph
matching problems. Despite the NP-hard nature of the
problem, fast and accurate approximations have led to significant
progress in a wide range of applications. Learning
graph models from observed data, however, still remains a
challenging issue. This paper presents an effective scheme
to parameterize a graph model, and learn its structural attributes
for visual object matching. For this, we propose a
graph representation with histogram-based attributes, and
optimize them to increase the matching accuracy. Experimental
evaluations on synthetic and real image datasets
demonstrate the effectiveness of our approach, and show
significant improvement in matching accuracy over graphs
with pre-defined structures.
BibTeX
@InProceedings{cho2013,
author = {Minsu Cho and Karteek Alahari and Jean Ponce},
title = {Learning Graphs to Match},
booktitle = {Proceedings of the IEEE Interational Conference on Computer Vision},
year = {2013}
}
Dataset
Code
Overview
The goal of this work is to learn a class-specific graph model for matching problems.
Graph matching is widely used in many computer vision problems, and
much progress has been achieved recently in various applications of graph matching,
such as shape analysis, image matching, action recognition, and object categorization.
For many tasks, however, a natural question arises: How can we obtain a good graph model
for a target object to match? Recent studies have revealed that simple graphs with
hand-crafted structures and similarity functions, typically used in graph matching,
are insufficient to capture the inherent structure underlying the problem at hand.
As a consequence, a better optimization of the graph matching objective does not guarantee
better correspondence accuracy. Previous learning methods for graph matching tackle this issue
by learning a set of parameters in the objective function [Caetano et al. 2009, Leordeanu et al. 2012].
Although it is useful to learn such a matching function for two graphs of a certain class,
a more apt goal would be to learn a graph model to match, which provides an optimal matching
to all instances of the class. Such a learned graph would better model the inherent structure
in the target class, thus resulting in better performance for matching.
Figure 1 illustrates the difference between the previous approaches and ours.
 |  |
(a) Learning graph matching | (b) Learning a class-specific graph model for matching |
Figure 1. Comparison between the previous approaches and the proposed model learning.
To this end, we present a generalized formulation
for graph matching, which is an extension of the previous learning approaches,
and propose to learn a graph model based
on a particular, yet rather general, graph representation with histogram-based attributes
for nodes and edges.
We show that all attributes of such a graph can be learned in a structued-output max-margin framework
[Tsochantaridis at el. 2005].
The proposed method reconstructs a graph model inherent in a target class,
and provides significant improvement in matching performance, as demonstrated in our experiments
on synthetic and real datasets.
Figure 2. Learned graph models and their matching results.
Figure 2 above shows the learned graph models on object classes, and their matching examples.
For each class, the learned graph model, its learned appearance in node attributes,
and matching results are shown. In the graph model, bigger circles represent stronger nodes,
and darker lines denote stronger edges. In the second and the fifth columns,
to better visualize node attributes, we show the edge responses based on the learned SIFT attributes.
For each model, some matching examples with high scores are shown.
The results show that the learned graph model enables robust matching in spite of
deformation and appearance changes.
Acknowledgements
This research was supported in part by the ERC advanced
grant VideoWorld, Institut Universitaire de France, and the
Quaero programme funded by the OSEO.