Quickstart Image

Here is a quick tutorial on scattering transform scattering transform with ScatNet for image classification. It contains :

Setup
Fundamentals
Data Structures
Configuration
Display
Classification
Roto-translation Scattering

Setup

Download and unzip ScatNet latest version
Type edit startup.m into MATLAB prompt

Put the following two lines to your startup.m file.

addpath '/path/to/scatnet/';
addpath_scatnet;

Save startup.m and launch it from MATLAB prompt

ScatNet is now set up and will automatically be loaded when starting MATLAB.

Fundamentals

This section describes how to compute the scattering transform of an image with as few lines of MATLAB as possible.

Load an image of reasonable size (e.g. 640x480).

        x = uiuc_sample;
        imagesc(x);
	colormap gray;

Before computing the scattering coefficients of x, we need to precompute the wavelet transform operators that will be applied to the image. Those operators are built by specific built-in factories adapted to different types of signal. For translation-invariant representation of images, we use wavelet_factory_2d.

        Wop = wavelet_factory_2d(size(x));

Now we call the scat function to compute the scattering of x using those Wop operators.

        S = scat(x, Wop);

The variable S is a cell of structure. It is not suited for direct numeric manipulation. You can reformat it in a 3D array using format_scat :

        S_mat = format_scat(Sx);

S_mat is a 417x60x80 matrix. The first dimension in S_mat is the path index, while the second and third dimensions correspond to subsampled spatial coordinates of the original image.

% scattering transform - code example
%
% load an image
x = uiuc_sample;
% compute wavelet operators with factory
Wop = wavelet_factory_2d(size(x));
% compute scattering
S = scat(x, Wop);
% reformat in 3d matrix
S_mat = format_scat(S);

Note: wavelet_factory_2d_pyramid() may be used instead of wavelet_factory_2d(size(x)). It will use a similar yet faster algorithm to compute the scattering coefficients at different scales.

Data Structures

ScatNet relies on three data structures, embedded into each other: wavelet factories, layers and filters. We review their respective structures

Layers

The first three layers of the 2D scattering of an image \(x\) are defined by \[ S_0 x = \{x \star \phi\} \] \[ S_1 x = \{|x \star \psi_{j_1, \theta_1} | \star \phi\} \] \[ S_2 x = \{||x \star \psi_{j_1, \theta_1} |\star \psi_{j_2, \theta_2} | \star \phi\} \] where the symbol \(\star\) denotes the spatial convolution, \(\phi\) is an averaging window and \( \psi_{j,\theta} \) is a wavelet dilated by \(2^j\) and rotated by \(\pi (\theta-1)/L\).

The output variable S is a cell array, whose elements are the different layers of the scattering transform : S{m+1} corresponds to the \(m\)th layer \(S_m x\).

Scattering layer is a MATLAB structure with two fields:

signal is a cell array of 2d matrices, which combines the scattering coefficients, which is indexed by the integer variable \(p\) as follow
```
% the p-th at the m-th layer: 
Sx{m+1}.signal{p}
        
```

meta is a struct with the sequences of \(j\) and \(\theta\) that corresponds to each \(p\).

% the sequence of j for p-th coefficient at the m-th order : 
Sx{m+1}.meta.j(:,p)
% the sequence of theta for p-th coefficient at the m-th order : 
Sx{m+1}.meta.theta(:,p)

For example, let us display the coefficients of second order with \[ j_1 = 0, j_2 = 2, \theta_1 = 1, \theta_2 = 5 \] First, we find the index \(p\) using meta information

j1 = 0;
j2 = 2;
theta1 = 1;
theta2 = 5;
p = find( Sx{3}.meta.j(1,:) == j1 & ...
    Sx{3}.meta.j(2,:) == j2 & ...
    Sx{3}.meta.theta(1,:) == theta1 & ...
    Sx{3}.meta.theta(2,:) == theta2 );

Then we retrieve the \(p\)-th signal of the second layer and display it :

imagesc(Sx{3}.signal{p});

This should result in the figure:

The resolution corresponds to the oversampling we work with the image.

Factories

A factory is the cell array of all linear operators involved in a scattering network represented in MATLAB as function handless. The standard built-in factory has the following prototype: the Wop operators are obtained with

        Wop = wavelet_factory_2d(size(x));

is a cell array of function handles. scat.m applies layer by layer

[S{m+1}, V] = Wop{m+1}(U{m+1});
U{m+2} = modulus_layer(V);

The \(m\)-th operator Wop{m+1} (which is a function handle) is applied to the \(m\)-th internal layer U{m+1}. The operator outputs the \(m\)-th external layer S{m+1} and a residual layer V. The second line applies a complex modulus to layer V and stores the result as the \(m+1\)-th internal layer U{m+2} for further decomposition.

Filters

The filters \(\phi\) and \(\psi_{j,\theta}\) used in the computation of scattering can be obtained as second argument of the wavelet factory.

[Wop, filters] = wavelet_factory_2d(size(x));

filters is a struct containing three fields

filters.phi the low pass filter \(\phi\)
filters.psi all dilated and oriented wavelets \(\psi_{j,\theta}\)
meta global information about the filter bank

The field filters.psi is a structure with two fields :

filters.psi.filter that is a cell array of filter
filters.psi.meta that contains the \(j\) and \(\theta\) corresponding to each filters

The structure of filters.phi and filters.psi are similar with that of a scattering layer, except that the filed name is replaced by filter filter instead of signal.

Finally, fields filters.psi.filter{p} is a structure with two fields :

filters.psi.filter{p}.type is a string that specifies the type of filter used. For 2D scattering, 'fourier_multires' stands for multiresolution of Fourier transform.
filters.psi.filter{p}.coefft that contains the coefficient of the \(p\)-th filter, in a format that depends upon the previous type field. For 2D scattering, it is a cell whose elements are the Fourier transform of the filter at different resolution.

Configuration

From filter design to network architecture, the built-in wavelet factories in ScatNet are fully customizable. We review the available options with the example of wavelet_factory_2d. The same would hold for wave_fact_3d.

Compute the scattering with options

Options must be embedded in filt_opt and scat_opt structures.

filt_opt.J = 5;

Non specified fields will be set to a default value. The filt_opt and scat_opt structures must be passed to the factory :

% compute scattering with non-default options
x = uiuc_sample;
filt_opt.J = 5;
filt_opt.L = 6;
scat_opt.oversampling = 2;
scat_opt.M = 2;
Wop = wavelet_factory_2d(size(x), filt_opt, scat_opt);
Sx = scat(x, Wop);

filt_opt contains filters-related options
- filt_opt.J (default 4) is the number of scale \(j\) in the filter bank \(\psi_{j,\theta}\). Note that increasing J will also increase the range of translation invariance.
- filt_opt.L (default 8) is the number of orientations. Note that increasing L will also increase the angular selectivity of filters.
- filt_opt.Q (default 1) is the number of scale per octave. Not that increasing Q will decrease the range of translation invariance and increase the scale selectivity of filters.
scat_opt contains scattering-related options
- scat_opt.M (default 2) is the maximum scattering order. The scat function will compute scattering coefficient of order 0 to M, that is \[S_0 x, \dotsc, S_M x \]
- scat_opt.oversampling (default 1) : scattering will be oversampled by up to a power of 2.

Display

Scattering display

All the scattering coefficients in a layer can be displayed at once, as a big image of concatenated nodes. The function image_scat iterates over the scattering order, so as to produce one such image per layer. In the following example, we display a second order scattering representation with 5 scales and 6 orientations per layer. You can display all scattering coefficients with image_scat

% compute scattering with 5 scales, 6 orientations
% and an oversampling factor of 2^2
x = uiuc_sample;
filt_opt.J = 5;
filt_opt.L = 6;
scat_opt.oversampling = 2;
Wop = wavelet_factory_2d(size(x), filt_opt, scat_opt);
Sx = scat(x, Wop);
% display scattering coefficients
image_scat(Sx)

which should result in the following three figures.

The meta information that identifies a scattering path are overlaid on each node. Also, each node is renormalized by the norm of its parent. Both these features can be turned off

        image_scat(Sx, false, false);

which should result in the following three figures.

The coefficient can be organized with a more structured layout using display_with_layer_order. For instance, they can be arranged by lexicographic order on \(j\) along rows and lexicographic order on \( \theta \) along columns.

display_with_layer_order(Sx,1);

which should result in the figure :

Filter display

One may display the real and imaginary part of those filters by means of display_filter_bank_2d

% compute the filters with 5 scales and 6 orientations
filt_opt.J = 5;
filt_opt.L = 6;
[Wop, filters] = wavelet_factory_2d([512, 512], filt_opt);
% display all the filters
display_filter_bank_2d(filters);

The above code should result in the following figure :

The top left image corresponds to \(\phi \). The first left half corresponds to the real parts of \(\psi_{j,\theta} \), arranged according scales (rows) and orientations (columns). The right half image corresponds to the imaginary part.

Classification

Scattering transform on a database of images

We need to define a src structure containing the path and label of all the images. We have a specific function for each database. For the uiuc texture database we use

src = uiuc_src;

Then, we need to define the features as a cell of function handle containing all the feature that we want to compute per image. In this example, the only feature is the scattering representation.

filt_opt.J = 5;
scat_opt.oversampling = 0;
Wop = wavelet_factory_2d([480, 640], filt_opt, scat_opt);
features{1} = @(x)(sum(sum(format_scat(scat(x,Wop)),2),3));

We use prepare_database to perform the computation of the feature

options.parallel = 0;
db = prepare_database(src, features, options);

Classifier

The code below performs an affine space training and testing of the database of features computed in the previous subsection.

% proportion of training example
prop = 0.5;
% split between training and testing
[train_set, test_set] = create_partition(src, prop);
% dimension of the affine pca classifier
train_opt.dim = 20;
% training
model = affine_train(db, train_set, train_opt);
% testing
labels = affine_test(db, model, test_set);
% compute the error
error = classif_err(labels, test_set, src);

Roto-translation scattering

Computation

This section explains how to compute the roto-translation scattering introduced in paper [SifreEtMallat,2013].
Roto-translation scattering computes local descriptors that are invariant to small translation and arbitrary local rotations. First, load an image of reasonnable size

        x = uiuc_sample;

As for translation scattering, we need to precompute the wavelet operators. This time we use wavelet_factory_3d instead of wavelet_factory_2d

        Wop = wavelet_factory_3d(size(x));

As explained in [9], the roto-translation wavelet operators are identical to those obtained with wavelet_factory_2d ; yet the third operators is the roto translation wavelet transform :

>> Wop{3}

ans = 

    @(x)(wavelet_layer_3d(x,filters,filters_rot,scat_opt))

Here, we compute roto-translation scattering by calling scat function on x using the Wop operators.

        Sx = scat(x, Wop);

The above instruction lasts on a 2.4 GHz Core i7 about 15 seconds. If lesser resolution is needed, the antialiasing may be set to 0 (default is 1, which mean that scattering will be oversampled by a factor 2^1).

        filt_opt = struct;
        filt_opt_rot = struct;
        scat_opt.oversampling = 0;
        wavelet = wavelet_factory_3d(size(x), ...
            filt_opt, filt_opt_rot, scat_opt);
        Sx = scat(x, Wop);

Without antialiasing, it takes about 6 seconds.

As for translation scattering, one can format the output is a 3d matrix:

        sx = format_scat(Sx);

Display

The same image_scat function works with roto-translation scattering. Calling it

     image_scat(Sx);

results in the three following figures :

One can also arrange the coefficient according to lexicographic order in \( (j_1,j_2) \) along rows and lexicographic order in \( (k_2, \theta_2) \) along columns (see [9] for definition of \( (k_2, \theta_2) \) )

% arrange scattering layer in lexicographic order
var_y{1}.name = 'j';
var_y{1}.index = 1;
var_y{2}.name = 'j';
var_y{2}.index = 2;
var_x{1}.name = 'k2';
var_x{1}.index = 1;
var_x{2}.name = 'theta2';
var_x{2}.index = 1;

big_img = image_scat_layer_order(Sx{3}, var_x, var_y, 1);
imagesc(big_img)
ylabel('j_1, j_2');
xlabel('k_2, \theta_2');

which results in the figure

Note that the spatial filters and the rotation filters may be obtained as second and third argument of wavelet_factory_3d :

    [wavelet, filters, filters_rot] = wavelet_factory_3d(size(x));

Example of invariance

Here, we check that the roto-translation scattering is indeed rotation invariant, by computing the scattering transformation of a rotated image and rotating back its outputs.

% this script demonstrates the rotation invariance 
% of the roto-translation scattering

x = uiuc_sample;
x = x(128:128+255, 128:128+255);

filt_opt = struct;
filt_rot_opt = struct;
% oversampling must be set to infty
% so that scattering of original and rotated
% image will be sampled at exactly the same points
scat_opt.oversampling = 10;

Wop = wavelet_factory_3d(size(x), filt_opt, filt_rot_opt, scat_opt);

% compute scattering of x
Sx = scat(x, Wop);
sx_raw = format_scat(Sx);

% compute scattering of x rotated by pi/2
x_rot = rot90(x,1);
Sx_rot = scat(x_rot, Wop);
sx_rot_raw = format_scat(Sx_rot);

% rotate back every layer of output
for p = 1:size(sx_rot_raw,1)
	 tmp = rot90(squeeze(sx_rot_raw(p,:,:)), -1);
	 sx_rot_raw_back(p,:,:) = permute(tmp, [3, 1, 2]);
end

% compute norm ratio
norm_diff = sqrt(sum((sx_rot_raw_back(:)-sx_raw(:)).^2));
norm_s = sqrt(sum((sx_raw(:)).^2));
norm_ratio = norm_diff/norm_s

This script is in demo/core/demo_scat_3d_pyramid_rotation_invariance.m. Executing it result in a relative norm of

norm_ratio =

  1.2552e-08

This means that every local descriptors sx(u1, u2, :) that describes the patch located in u1,u2 are invariant to rotation.