Data Team - DI - ENS Paris

Scattering

Scattering for Image Recognition

Signal representations for image classification need to be invariant with respect to the transformations which do not affect our ability to recognize. An important instance of these transformations is due to physical transformations, such as translations, dilations or rotations. Besides invariance, signal representations also need to be continuous with respect to signal deformations, and should capture enough signal information so that they can discriminate between different signal classes.

Scattering transforms build invariant, stable and informative representations through a non-linear, unitary transform, which delocalizes signal information into scattering decomposition paths. They are computed with a cascade of wavelet modulus operators, and correspond to a convolutional network where filter coefficients are given by a wavelet operator.

Thanks to their invariance and stability properties, scattering operators linearize deformations. This linearization property can be exploited to build linear generative classifiers in the scattering domain, which are computed with simple class-conditional PCA. When applied to stationary textures, scattering transforms provide new texture descriptors, incorporating high order moments which can discriminate non-Gaussian properties. As a result, state-of-the-art classification results are obtained on hand-written digit recognition and texture classification.

The Scattering transform shares some properties with the Fourier modulus transform. It has good frequency localization, it is translation invariant and it is unitary:

f = indicator of a square

Modulus of the Fourier transform of f

Scattering transform of f

However, the Scattering transform is also stable with respect to small deformations. In this example, a Gabor atom is slightly deformed with a dilation and a rotation. Most of the Fourier energy is displaced to other frequencies, whereas most of the scattering energy remains stable:

f = Gabor atom of varying frequency and direction

Modulus of the Fourier transform of f

Scattering transform of f

Besides, scattering coefficients capture high-order moments of stationary processes, as opposed to the Fourier power spectrum:



X1, X2 = Two realizations of stationary textures. X2 is obtained by equalizing random white noise according to the spectrum of X1.	Power spectrum of X1, X2.	Scattering transform of X1, X2. High order scattering coefficients discriminate between the Gaussian process X2 and the non-gaussian X1.

Classification Results

MNIST Digit

MNIST Digit database consists in 60.000 training images of digits and 10.000 training. The following results are reported in the paper , , .

Training Size	Raw Pixels		Windowed Fourier		Scatt \( m_\max = 1\)		Scatt \( m_\max = 2\)		Conv. Net.
	PCA	SVM	PCA	SVM	PCA	SVM	PCA	SVM
300	14.5	15.4	7.35	7.4	5.7	8	\( \bf{4.7} \)	5.6	7.18
1000	7.2	8.2	3.74	3.74	2.35	4	\( \bf{2.3} \)	2.6	3.21
2000	5.8	6.5	2.99	2.9	1.7	2.6	\( \bf{1.3} \)	1.8	2.53
5000	4.9	4	2.34	2.2	1.6	1.6	\( \bf{1.03} \)	1.4	1.52
10000	4.55	3.11	2.24	1.65	1.5	1.23	0.88	1	\( \bf{0.85} \)
20000	4.25	2.2	1.92	1.15	1.4	0.96	0.79	\(\bf{0.58}\)	0.76
40000	4.1	1.7	1.85	0.9	1.36	0.75	0.74	\( \bf{0.53} \)	0.65
60000	4.3	1.4	1.80	0.8	1.34	0.62	0.7	\( \bf{0.43} \)	0.53

CUReT

CUReT is a standard database of texture images. The following results are reported in the paper , ,

Training Size	Raw Pixels	Fourier Spectrum	Scat. \(m_{\max}=1\)	Scat. \(m_{\max}=2\)	Textons	MRF
	PCA	PCA	PCA	PCA
46	17	1	0.5	\( \bf{0.2} \)	1.53	2.4

OUTex 10

OUTex TC10 is a texture database with no rotation in the training set but all possible rotation in the testing set and thus requires built-in rotation invariant. The following results are reported in the paper Combined Scattering for Rotation Invariant Texture Analysis, Sifre L. and Mallat S., Proceedings of the ESANN 2012 conference, Apr. 2012. (PDF)

Training set	\( \text{LBP}^{riu2} + \) \( \text{VAR}_{(8,1) + (16,2) + (24,3)} \)	LBP-HF	RI-LPQ	Combined scattering \( m_{\max} = 1 \) \( \widetilde{m}_{\max}=2 \)	Combined scattering \( m_{\max}= 2 \) \( \widetilde{m}_{\max}=0 \)	Combined scattering \( m_{\max}= 2 \) \( \widetilde{m}_{\max}=1 \)	Combined scattering \( m_{\max}= 2 \) \( \widetilde{m}_{\max}=2 \)
rotation	97.7	96.59	98.26	96.72	97.73	98.62	\(\textbf{98.75} \)
rotation + tilt	NC	67.50	78.02	81.61	89.38	92.89	\(\textbf{93.07}\)