Signal representations for image classification need to be invariant with respect to the transformations which do not affect our ability to recognize. An important instance of these transformations is due to physical transformations, such as translations, dilations or rotations. Besides invariance, signal representations also need to be continuous with respect to signal deformations, and should capture enough signal information so that they can discriminate between different signal classes.
Scattering transforms build invariant, stable and informative representations through a non-linear, unitary transform, which delocalizes signal information into scattering decomposition paths. They are computed with a cascade of wavelet modulus operators, and correspond to a convolutional network where filter coefficients are given by a wavelet operator.
Thanks to their invariance and stability properties, scattering operators linearize deformations. This linearization property can be exploited to build linear generative classifiers in the scattering domain, which are computed with simple class-conditional PCA. When applied to stationary textures, scattering transforms provide new texture descriptors, incorporating high order moments which can discriminate non-Gaussian properties. As a result, state-of-the-art classification results are obtained on hand-written digit recognition and texture classification.
The Scattering transform shares some properties with the Fourier modulus transform. It has good frequency localization, it is translation invariant and it is unitary:
f = indicator of a square | Modulus of the Fourier transform of f | Scattering transform of f |
However, the Scattering transform is also stable with respect to small deformations. In this example, a Gabor atom is slightly deformed with a dilation and a rotation. Most of the Fourier energy is displaced to other frequencies, whereas most of the scattering energy remains stable:
f = Gabor atom of varying frequency and direction | Modulus of the Fourier transform of f | Scattering transform of f |
Besides, scattering coefficients capture high-order moments of stationary processes, as opposed to the Fourier power spectrum:
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
X1, X2 = Two realizations of stationary textures. X2 is obtained by equalizing random white noise according to the spectrum of X1. | Power spectrum of X1, X2. | Scattering transform of X1, X2. High order scattering coefficients discriminate between the Gaussian process X2 and the non-gaussian X1. |
Training Size | Raw Pixels | Windowed Fourier | Scatt \( m_\max = 1\) | Scatt \( m_\max = 2\) | Conv. Net. | ||||
PCA | SVM | PCA | SVM | PCA | SVM | PCA | SVM | ||
300 | 14.5 | 15.4 | 7.35 | 7.4 | 5.7 | 8 | \( \bf{4.7} \) | 5.6 | 7.18 |
1000 | 7.2 | 8.2 | 3.74 | 3.74 | 2.35 | 4 | \( \bf{2.3} \) | 2.6 | 3.21 |
2000 | 5.8 | 6.5 | 2.99 | 2.9 | 1.7 | 2.6 | \( \bf{1.3} \) | 1.8 | 2.53 |
5000 | 4.9 | 4 | 2.34 | 2.2 | 1.6 | 1.6 | \( \bf{1.03} \) | 1.4 | 1.52 |
10000 | 4.55 | 3.11 | 2.24 | 1.65 | 1.5 | 1.23 | 0.88 | 1 | \( \bf{0.85} \) |
20000 | 4.25 | 2.2 | 1.92 | 1.15 | 1.4 | 0.96 | 0.79 | \(\bf{0.58}\) | 0.76 |
40000 | 4.1 | 1.7 | 1.85 | 0.9 | 1.36 | 0.75 | 0.74 | \( \bf{0.53} \) | 0.65 |
60000 | 4.3 | 1.4 | 1.80 | 0.8 | 1.34 | 0.62 | 0.7 | \( \bf{0.43} \) | 0.53 |
Training Size | Raw Pixels | Fourier Spectrum | Scat. \(m_{\max}=1\) | Scat. \(m_{\max}=2\) | Textons | MRF |
PCA | PCA | PCA | PCA | |||
46 | 17 | 1 | 0.5 | \( \bf{0.2} \) | 1.53 | 2.4 |
Training set | \( \text{LBP}^{riu2} + \) \( \text{VAR}_{(8,1) + (16,2) + (24,3)} \) | LBP-HF | RI-LPQ | Combined scattering \( m_{\max} = 1 \) \( \widetilde{m}_{\max}=2 \) | Combined scattering \( m_{\max}= 2 \) \( \widetilde{m}_{\max}=0 \) | Combined scattering \( m_{\max}= 2 \) \( \widetilde{m}_{\max}=1 \) | Combined scattering \( m_{\max}= 2 \) \( \widetilde{m}_{\max}=2 \) |
rotation | 97.7 | 96.59 | 98.26 | 96.72 | 97.73 | 98.62 | \(\textbf{98.75} \) |
rotation + tilt | NC | 67.50 | 78.02 | 81.61 | 89.38 | 92.89 | \(\textbf{93.07}\) |