Documentation - C API
Gaussian Scale Space
Author:
Andrea Vedaldi
Karel Lenc
Michal Perdoch

scalespace.h implements a scale space, a data structure fundamental in the computation of covariant features such as SIFT, Hessian-Affine, Harris-Affine, Harris-Laplace, etc.

  • Overview
  • scalespace-usage
  • scalespace-tech

Overview

A scale space is a data structure representing an image at multiple resolution levels. Mathematically, it is defined as a three-dimensional function of two spatial coordinates (usually denoted as $f x $ and $ y $) and a scale coordiante ($ $). It is usually stored in a pyramid, with the coarse scales begin represented with a lower resolution, in order to reduce redundancy (as low-pass images can be represented accurately with a coarser sampling rate). <!-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ --> @section scalespace-usage Usage <!-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ --> A scale space is represented by an instance of the ::VlScaleSpace object. @code VlScaleSpace ss = vl_scalespace_new(width, height, numOctave, firstOctave, numLevel, firstLevel, lastLevel) ; @endcode The scale space objec class has a number of functionalities meant to help developing feature detectors: - Local maxima/minima in space and scale can be detected with ::vl_scalespace_find_local_extrema() and refined to sub-pixel accuracy with ::vl_scalespace_refine_local_extrema(). Local extremas are filtered based on the <em>peak threshold</em> and the <em>edge threshold</em>. - An affinely-warped image patch can be extracted from the scale space by using ::vl_scalespace_affinely_normalize_patch(). <!-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ --> @subsection scalespace-tech Scale space <!-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ --> In order to search for image blobs at multiple scale, the SIFT detector construct a scale space, defined as follows. Let $({x}) $ denote an idealized <em>infinite resolution</em> image. Consider the <em>Gaussian kernel</em> @f[ g_{\sigma}(\mathbf{x}) = \frac{1}{2\pi\sigma^2} \exp \left( -\frac{1}{2} \frac{\mathbf{x}^\top\mathbf{x}}{\sigma^2} \right) @f] The <b>Gaussian scale space</b> is the collection of smoothed images @f[ I_\sigma = g_\sigma * I, \quad \sigma \geq 0. @f] The image at infinite resolution $ I_0 $ is useful conceptually, but is not available to us; instead, the input image $ I_{} $ is assumed to be pre-smoothed at a nominal level $ = 0.5 $ to account for the finite resolution of the pixels. Thus in practice the scale space is computed by @f[ I_\sigma = g_{\sqrt{\sigma^2 - \sigma_n^2}} * I_{\sigma_n}, \quad \sigma \geq \sigma_n. @f] Scales are sampled at logarithmic steps given by @f[ \sigma = \sigma_0 2^{o+s/S}, \quad s = 0,\dots,S-1, \quad o = o_{\min}, \dots, o_{\min}+O-1, @f] where $ = 1.6 $ is the <em>base scale</em>, $ o_{} $ is the <em>first octave index</em>, @em O the <em>number of octaves</em> and @em S the <em>number of scales per octave</em>. Blobs are detected as local extrema of the <b>Difference of Gaussians</b> (DoG) scale space, obtained by subtracting successive scales of the Gaussian scale space: @f[ \mathrm{DoG}_{\sigma(o,s)} = I_{\sigma(o,s+1)} - I_{\sigma(o,s)} @f] At each next octave, the resolution of the images is halved to save computations. The images composing the Gaussian and DoG scale space can then be arranged as in the following figure: @image html sift-ss.png "GSS and DoG scale space structures." The black vertical segments represent images of the Gaussian Scale Space (GSS), arranged by increasing scale $