Frequentnet: A New Deep Learning Baseline For Image Classification
Frequentnet: A New Deep Learning Baseline For Image Classification
Frequentnet: A New Deep Learning Baseline For Image Classification
for image classification. Instead of using prin- Transformation and wavelets analysis.
cipal component vectors as the filter vector in
”PCANet”, we use basis vectors in discrete 1.1. Discrete Fourier Transformation
Fourier analysis and wavelets analysis as our filter Discrete Fourier Transformation (DFT)(Beerends et al.,
vectors. Both of them achieve comparable perfor- 2003) can represent the information in image at differ-
mance to “PCANet” in benchmark datasets. It is ent frequencies. Mathematically, given a vectorized im-
noticeable that our algorithms do not require any age x of length n, 1D DFT transforms it as d(ωk ) =
optimization techniques to get those basis. hx, (C(ωk ) − iS(ωk ))i, ωk = 2πk/n, k ∈ Fn , the set
of
Fourier frequencies. To be precise, Fn denotes the set
−[ n−1 n
], . . . , [ ] where [x] is the integer part of x , where
1. Introduction 2 2
DB4) in our paper. In DB4, the first layer wavelets analysis Algorithm 1 Select top K Fourier Basis
can be taken as project the image vector x onto filter vectors Input: X̄, L1
formed with progressive local basis vector as for k in Fk+1 k2 do
" √ √ √ √ #
1+ 3 3+ 3 3− 3 1− 3 ck ← khC(ωk ), X̄ik1
h= , , , sk ← khS(ωk ), X̄ik1
4 4 4 4
" √ √ √ √ # (2) end for
1− 3 3 − 3 3 + 3 −1 − 3 Select C(ωk ) or S(ωk ) with top L1 largest values in
g= , , , .
4 4 4 4 {sk , ck } and call the set of DL1
Output: DL1
h is calculating the moving average, which performs as
low pass filtering in above section, while g is capturing the
comparison of local graph performing as high pass filtering only contains non-negative indices in Fn . And we choose
in above section. Then for vectorized image x, the first layer {cos(ωk ), sin(ωk )} to be our candidate orthogonal basis.
wavelet is like linear transformation in for n filter vectors as We then select certain filters of different frequencies based
h0 h1 h2 h3 · · · · · · · · · on the magnitude of the inner product of vectorized patches
g0 g1 g2 g3 · · · · · · · · ·
xi,j and candidate filters, as is summarized in Algorithm
0
0 h0 h1 h2 h3 · · · x. (3) 1. With the obtained L1 filters v1 , · · · , vk , · · · , vL1 , every
0 0 g0 g1 g2 g3 · · · input image Ii is mapped to L1 new feature maps:
.. .. .. .. .. ..
..
. . . . . . . Iik = Ii ∗ matk1 ,k2 (vk ), (6)
In this case, we can treat each line of left matrix in (3) as
where ∗ is the two dimensional convolution. For later con-
the pool of our potential filter vectors.
venience, we rank v ∈ DL1 reversely based on khv1 , X̄ik1
reversely and set index based on it, i.e., khv1 , X̄ik1 ≥ · · · ≥
2. FrequentNet khvL1 , X̄ik1 .
2.1. Problem Setup
The Second Stage: After the first stage, for each basis
In this section we mainly follow the settings in (Chan et al., in DL1 , we get a new set of feature maps of the same
2015). Provided with N input training images, {Ii }N i=1 size as original images. For the new L1 N feature maps
of size m × n and we set the patch size (or 2D filter size) Iik , i = 1, · · · , N, k = 1, · · · L1, we continue to collect all
as k1 × k2 at all stages. We call those vectorized patches overlapping patches and subtract mean from them. Define
xi,1 , · · · , xi,mn where the first index is for image and the
second index is for patches. Then we subtract patch mean Ȳik = [yi,1
k k
; · · · ; yi,mn ] (7)
from each patch and obtain
then we could concatenate all Ȳik and get
X̄i = [xi,1 , · · · , xi,j , · · · , xi,mn ], 1 ≤ j ≤ mn (4)
Ȳ = [Ȳi1 ; · · · ; ȲN
L1
] (8)
of size k1 k2 × mn. Then we stack X̄i again to get
of size k1 k2 × L1 N mn. For Ȳ, we run Algorithm 1 again
X̄ = [X̄1 , · · · , X̄i , · · · , X̄N ], 1 ≤ i ≤ N. (5) to select top L2 Fourier basis and follow above definition,
Its size is k1 k2 × mnN . Then filter vectors aim to find we call the basis u1 , · · · , uL2 based on the magnitude of
patterns that can represent information in columns in xi,j the inner product.
effectively. PCANet chooses the filter vectors to be the top
left eigen-vectors of X̄. In this paper, we proposed to use Output Stage: At output stage, we use simple hashing
basis in DFT and wavelets. In order to maintain the same and histogram to get the final feature vectors. Generally,
size of image, we set convolution stride to 1 and zero-pad we first binarize all feature maps, then group these feature
each image before convolving with the learned frequent maps by the parent feature maps. For example, Iik is the
filters. The overall pipeline is same as PCANet, where a parent feature maps of Iik and Iik is the parent feature map
simple strategy like hashing and histogram is applied to of {Iik ∗ vl }. Then in each group, we pool the correspond-
obtain the final representation features. ing feature maps channel-wise by an exponential function.
This hashing and pooling operation will reduce the dimen-
sion of the feature representations while preserve significant
2.2. FourierNet
discriminative information. Finally, in histogram stage, we
The First Stage: To avoid duplicates in Fourier basis, again extract blocks by a sliding window and compute the
we restrict the index k in ωk within k ∈ Fn+ , where Fn+ histogram of each block. Then we simply concatenate these
FrequentNet : A New Deep Learning Baseline for Image Classification
Figure 3. The selected two MNIST samples used for low rank
approximation
Figure 1. The fourier filters learned from bg-rand dataset. Top: the
first stage filters. Bottom: the second stage filters.
Beerends, R. J., ter Morsche, H. G., Van den Berg, J., and LeCun, Y., Bottou, L., Bengio, Y., Haffner, P., et al.
Van de Vrie, E. Fourier and laplace transforms. Fourier Gradient-based learning applied to document recognition.
and Laplace Transforms, by RJ Beerends and HG ter Proceedings of the IEEE, 86(11):2278–2324, 1998.
Morsche and JC van den Berg and EM van de Vrie, pp.
Mallat, S. Wavelets for a vision. Proceedings of the IEEE,
458. ISBN 0521534410. Cambridge, UK: Cambridge Uni-
84(4):604–614, 1996.
versity Press, August 2003., pp. 458, 2003.
Nordberg, K. Fourier transforms. 1995.
Canny, J. A computational approach to edge detection.
IEEE Transactions on pattern analysis and machine in- Strang, G. and Nguyen, T. Wavelets and filter banks. SIAM,
telligence, (6):679–698, 1986. 1996.
Chan, T.-H., Jia, K., Gao, S., Lu, J., Zeng, Z., and Ma, Wang, X. Laplacian operator-based edge detectors. IEEE
Y. Pcanet: A simple deep learning baseline for image Transactions on Pattern Analysis and Machine Intelli-
classification? IEEE transactions on image processing, gence, 29(5):886–890, 2007.
24(12):5017–5032, 2015.
Yang, L., Wu, X., Zhao, D., Li, H., and Zhai, J. An improved
Costen, N. P., Parker, D. M., and Craw, I. Effects of high-
prewitt algorithm for edge detection based on noised
pass and low-pass spatial filtering on face identification.
image. In 2011 4th International Congress on Image
Perception & psychophysics, 58(4):602–612, 1996.
and Signal Processing, volume 3, pp. 1197–1200. IEEE,
Fan, C., Hong, X., Tian, L., Ming, Y., Pietikäinen, M., and 2011.
Zhao, G. Pcanet-ii: When pcanet meets the second order
pooling. IEICE TRANSACTIONS on Information and Zhang, Z., Cui, P., and Zhu, W. Deep learning on graphs: A
Systems, 101(8):2159–2162, 2018. survey. arXiv preprint arXiv:1812.04202, 2018.
5. Appendix
5.1. More low rank approximations
We present more low rank approximations here, each im-
age in Figure 5 represents one low rank recovered MNIST
sample.