Quickstart for Audio Processing
Here is a quick tutorial on scattering representations of sounds using ScatNet.
 Setup
 Fundamentals
 Filter Banks
 Exploration of a network by scale paths
 Display and Formatting
 Batch Processing
 Support Vector Machine Classification
Setup
 Download and unzip the latest version of ScatNet.

Type
edit startup.m
into your MATLAB prompt. 
Put the following two lines of code into your
startup.m
file:addpath '/path/to/ScatNet'; addpath_scatnet;
 Save
startup.m
and launch it from the prompt :startup
.
Fundamentals
This section describes how to compute the scattering spectrum of an audio signal with just three instructions to ScatNet.
The sound file we propose to use for this tutorial is an excerpt of Handel's Messiah, available in all
versions of MATLAB. Once loaded the handel.mat
file, we choose a length T
for
the averaging window. Since the signal y
is encoded at the sampling rate of \( 8000 \text{\,Hz} \)
setting T
to 4096
corresponds to about half a second.
load handel; % loads the signal into y N = length(y); T = 2^12; % length of the averaging window
Before computing the scattering transform of y
, we need to define the linear operators that will make up
the layers of the scattering networks. In audio processing, these linear operators are constantQ filter banks.
In most experiments, two layers are sufficient, as they capture almost the whole energy in the signal, along an averaging
window of up to one second. By default, the quality factors of these two layers are \( Q_1 = 8 \) and \( Q_2 = 1 \).
By calling default_filter_options
with the label 'audio'
, these parameters are automatically
integrated into the structure filt_opt
, which contains the options related to the construction of the filter
banks.
filt_opt = default_filter_options('audio', T);
Scattering networks rely on a set of builtin "wavelet factories", which are suited to specific classes of signals. For
translationinvariant representations in one dimension, one may call wavelet_factory_1d
in the following
way :
Wop = wavelet_factory_1d(N, filt_opt);The previous instruction may take a few seconds on a personal computer, as it requires to build many filters at once.
Note : with the new version of ScatNet, the length
N
of the signal y
no longer needs to be a
multiple of the length T
of the averaging window.
Finally, the scattering transform of y
is computed with the generic function scat
, which is
at the core of the ScatNet toolbox.
S = scat(y, Wop);Most often,
scat
requires less computer time than wavelet_factory_1d
. Once the factory has been run,
scattering transforms may be calculated iteratively with the same operators Wop
, provided the input signals have
the same length. More information about the vectorisation of several scattering transforms of onedimensional signals may be
found in the Batch processing section of this tutorial.
Filter banks
The wavelet transforms Wop
provided by wavelet_factory_1d
are merely function handles,
with no actual data in them. Instead, they rely on a cell array named filters
, which can be
extracted from the factory as an optional second output argument :
[Wop,filters] = wavelet_factory_1d(N, filt_opt);
Like a scattering network, each element in the cell array corresponds to a layer ; however, while S
has
\( M+1 \) elements, filters
has only \( M \) elements, as there is no filter bank at the root.
The psi.filter
field of filters{m}
contains the bandpass filters \(\psi_\lambda\).
These filters are expressed in the Fourier domain, and only their nonzero coefficients are stored. For visualization
purposes, one may use realize_filter
to convert them to arrays of length N
by padding them
with zeros.
Here is a short script to display the filter banks at both orders \(M = 1\) and \(M = 2\).
figure; for m = 1:2 subplot(1,2,m); hold on; for k = 1:length(filters{m}.psi.filter) plot(realize_filter(filters{m}.psi.filter{k}, N)); end hold off; ylim([0 1.5]); xlim([1 5*N/8]); endThis should result in the following figure.
Wavelets are built by dilating a mother wavelet \( \psi \) by a factor \( 2^{1/Q} \), for some quality factor \( Q \), so as to obtain the filter bank \[ \psi_j(t) = 2^{j/Q}\psi(2^{j/Q}t),\] or, in terms of Fourier transforms, \[ \hat\psi_j(\omega) = \hat\psi(2^{j/Q}\omega).\] If we suppose that \( \hat\psi \) has central frequency \( \xi_\psi \), \(\hat \psi_j \) will have central frequency \( 2^{j/Q} \xi_psi\). The scale index \( j \), then is a logarithmic measure of frequency, with a linear increase in \( j \) resulting an exponential decrease of \( \omega \).
In fact, the mother wavelet \( \psi \) may only be dilated up to the maximum scale maximum scale \( J1 \), resulting in \( J \) wavelets \( \psi_0 = \psi, \psi_1, \ldots, \psi_{J1} \). The filter \( \psi_{J1} \) has the smallest possible bandwidth in frequency and the largest possible support in time. If \( Q > 1 \), this is not sufficient to cover the frequency spectrum. To address this issue, \( \psi_{J1} \) is translated in frequency to capture the remaining frequencies. That way, the temporal bandwidth will not exceed that of \( \psi_{J1} \). In summary, \( J \) determines the maximum wavelet time support \( T \), which is equal to the averaging length of the scattering transform, and is chosen by the user.
So as to get \( J \) in a practical way, ScatNet provides the utility function T_to_J
, which
takes the integer T
and the structure filt_opt
as arguments. Thus, a customized set of options
for wavelet_factory_1d
may be built in the following way :
filt_opt.Q = [8 8]; filt_opt.J = T_to_J(T, filt_opt);Note that the original prototype
T_to_J(T,Q)
is no longer supported in the new version of ScatNet.
To ensure coverage of the frequency domain without becoming redundant, the mother wavelet \( \psi \) is chosen such that
adjacent wavelets barely overlap in frequency. In filters
, one has \( Q_1 = 8 \) and \( Q_2 = 1 \) by default,
which means that the first order has higher resolution in frequency compared to the second.
This can be confirmed by looking at the Fourier transforms of the filters above.
Exploration of a network by scale paths
You may recall that scattering coefficients are defined by \[ S_1x(t,j_1) = x\star\psi_{j_1}\star\phi(t) \] \[ S_2x(t,j_1,j_2) = x\star\psi_{j_1}\star\psi_{j_2}\star\phi(t) \] and so on, where \( \psi_{j} \) are bandpass filters and \( \phi \) a lowpass filter.
In ScatNet, the scattering representation S
is a cell array, whose elements correspond to respective layers in
the scattering transform. Theoretically, the root of the network has the index \( m=0 \) ; however, since MATLAB uses 1based
indexing, an offset of 1
is required in the cell array : the \(m^{\text{th}}\) layer is stored as
S{1+m}
.
Each layer has two fields :
signal
: the cell array of all nodes in the layer, andmeta
: astruct
containing the metainformation of the layer, such as arrays of scales and resolutions of each node.
The signal
field is a cell array of P
elements, each entry corresponding to a
node in the scattering layer. The \(k^{\text{th}}\) node of the \(m^{\text{th}}\) layer is expressed as :
S{1+m}.signal{k};
An alternative way to identify scattering nodes is by their paths along the scattering network.
For translationinvariant scattering representations, these paths are expressed with the wavelet filter scales
\( j_1 \), \( j_2 \), etc. at each layer. To make the correspondence between the cell array S{m+1}.signal
of coefficients and these scales, the matrix S{m+1}.meta.j
provides the paths of scales of each coefficient
in the layer. Namely, S{m+1}.meta.j(:,p)
provides the scales [j_1 j_2 ... j_m]'
associated with S{m+1}.signal{p}
.
Similarly, the meta
field in each layer contains other properties of the scattering coefficients,
such as resolution, bandwidth, and so forth. Each field is arranged so that the \(k^{\text{th}}\) column is associated
with the \(k^{\text{th}}\) signal in S{m+1}.signal
. To get the node corresponding to a specifi path,
one may use the MATLAB find
function ; but be aware that, by default, only paths of increasing scale
are computed, so find
may return an empty array if the required path does not belong to the network.
filt_opt.boundary
parameter.
For more information, see the documentation of the filter_bank
function.
Display and Formatting
The scattergram
builtin function in ScatNet provides a spectrogramlike visualization of the
secondorder, translationinvariant scattering transform of onedimensional signals.
The top image is made of firstorder coefficients, organized in a timescale matrix. Very low frequencies, i.e. large scales,
appear at the bottom of the image.
For its part, the secondorder coefficients , for a fixed scale j1
, are shown in the bottom image, again in
a timescale matrix.
In the following example, one displays the firstorder coefficients and secondorder coefficients for an
arbitrary j1 = 23
. Note that the wildcard []
means that all paths are gathered at the first order.
j1 = 23; scattergram(S{2},[],S{3},j1);This should give the figure below.
To improve visualization or classification performance, ScatNet offers utility functions
to manipulate a scattering representation, such as renormalisation (renorm_scat
)
and logarithmic transformation (log_scat
).
The former renormalizes secondorder coefficients by dividing them by the corresponding firstorder coefficients
to decorrelate the two : \(m^{\text{th}}\)order coefficients are similarly renormalized by dividing by
\((m1)^{\text{th}}\)order coefficients. The latter computes the logarithm of the coefficients at each point.
Both of these functions use the format described above for the scattering transform for both input and output.
Let's renormalize the coefficients and compute the logarithm.
S = renorm_scat(S); S = log_scat(S); scattergram(S{2},[],S{3},j1);This results in a figure where much more details are noticeable.
Batch Processing
In order to use the scattering coefficients for classification or other tasks,
we need to assemble them together in vector form. To do this, we use the function format_scat
.
[S_table, meta] = format_scat(S);Here,
S_table
will be a P
byN
table, where P
is the total number
of scattering coefficients (all orders combined) and N
is the number of time points. Now,
S_table
can be fed into a classifier for training or testing.
The ScatNet toolbox also contains a pipeline for classification using an affine space classifier or a support vector machine (SVM). To use it, we need to create a database of feature vectors by calculating the scattering coefficients of a collection of audio files. Here, we shall consider the GTZAN dataset for musical genre classification.
First, we create the list of files accompanied by their class labels using the gtzan_src
function.
src = gtzan_src('/path/to/gtzan/dataset');Second, we need to specify the feature transform to use. This is done, as before, using
wavelet_factory_1d
and the scattering transform functions.
N = 5*2^17; T = 8192; filt_opt.Q = [8 1]; filt_opt.J = T_to_J(T, filt_opt); scat_opt.M = 2; Wop = wavelet_factory_1d(N, filt_opt, scat_opt); feature_fun = @(x)(format_scat( ... log_scat(renorm_scat(scat(x, Wop)))));Note that we must specify
N = 5*2^17
, corresponding to 29.72 s, because the files in src
have been truncated to this length. Finally, we can to specify additional options for generating the features. Here,
to speed up training and testing, we decide to only keep every eighth frame of the scattering transform.
database_options.feature_sampling = 8;Now we use the
prepare_database
function to calculate the scattering coefficients of the files
in src
.
database = prepare_database(src, feature_fun, database_options);Depending on the speed of the machine, this can take some time. If you have access to the distributed computing toolbox, invoking
matlabpool open
before calling prepare_database allows the function to use multiple cores
simultaneously.
Support Vector Machine Classification
To perform evaluate classification performance, we split the database into a training and a testing set
using the create_partition
function.
[train_set, test_set] = create_partition(src, 0.8);The second parameter specifies the proportion of the objects in each class that are assigned to the training set. Note that the MATLAB random number generator is used to calculate the partition, so if you want to ensure the same results, please reset it using the following code before creating the partition.
rs = RandStream.getDefaultStream(); rs.reset();
When using the SVM classifier, we can speed up calculations by precomputing the kernel.
This is done using the svm_calc_kernel
function.
database = svm_calc_kernel(database, 'gaussian');Note that the above requires a lot of memory, in this case almost 3 gigabytes. If necessary, this step can be skipped and the code will run fine.
We now prepare parameter ranges for the SVM and conduct a parameter search using the svm_param_search
function.
svm_options.kernel_type = 'gaussian'; svm_options.C = 2.^[0:4:8]; svm_options.gamma = 2.^[16:4:8]; [err,C,gamma] = svm_param_search( ... database, train_set, [], svm_options);The
svm_param_search
function takes four parameters: the feature database, the training set,
the development set, and an options structure. If a validation set is present,
models are trained using the training set and their error is calculated on the validation set.
If a validation set is missing (as in our case), fivefold crossvalidation is performed on the training set
to estimate the error.
Since we've performed crossvalidation, we obtain an error matrix as output err
from
svm_param_search
, the first dimension corresponding to a particular parameter combination and
the second dimension corresponding to the fold index. The C
and gamma
outputs give
the particular used in the different parameter combinations. As a result, we obtain the optimal parameters using
the following code.
[temp,ind] = min(mean(err,2)); C_best = C(ind); gamma_best = gamma(ind);
Once we've estimated the proper parameters, we are ready to calculate the error over the testing set.
To do this, we set the parameters in the svm_options
structure and call svm_train
to train the model and svm_test
to apply it to the testing set.
svm_options.C = C_best; svm_options.gamma = gamma_best; model = svm_train(database, train_set, svm_options); labels = svm_test(database, model, test_set);The error can now be calculated using the
classif_err
function.
test_err = classif_err(labels, test_set, src);
The error should be 0.15 for this split. To further reduce the error, we need to keep more of
the scattering frames and also conduct a more extensive parameter search. This may be achieved by
using a finer grid or resort to svm_adaptive_param_search
.