End-to-end weakly-supervised semantic alignment



We tackle the task of semantic alignment where the goal is to compute dense semantic correspondence aligning two images depicting objects of the same category. This is a challenging task due to large intra-class variation, changes in viewpoint and background clutter. We present the following three principal contributions. First, we develop a convolutional neural network architecture for semantic alignment that is trainable in an end-to-end manner from weak image-level supervision in the form of matching image pairs. The outcome is that parameters are learnt from rich appearance variation present in different but semantically related images without the need for tedious manual annotation of correspondences at training time. Second, the main component of this architecture is a differentiable soft inlier scoring module, inspired by the RANSAC inlier scoring procedure, that computes the quality of the alignment based on only geometrically consistent correspondences thereby reducing the effect of background clutter. Third, we demonstrate that the proposed approach achieves state-of-the-art performance on multiple standard benchmarks for semantic alignment.


I. Rocco, R. Arandjelović and J. Sivic
End-to-end weakly-supervised semantic alignment
In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018
[Paper on arXiv]


        author       = "Rocco, I. and Arandjelovi\'c, R. and Sivic, J.",
        title        = "End-to-end weakly-supervised semantic alignment",
        booktitle    = "Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition",
        year         = "2018",


Training/testing code (PyTorch)

See also


This work has been partly supported by ERC grant LEAP (no. 336845), the Inria CityLab IPL, CIFAR Learning in Machines & Brains program and ESIF, OP Research, development and education Project IMPACT No. CZ.02.1.01/0.0/0.0/15 003/0000468.

Copyright Notice

The documents contained in these directories are included by the contributing authors as a means to ensure timely dissemination of scholarly and technical work on a non-commercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright.