Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Frequentnet: A New Deep Learning Baseline For Image Classification

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

FrequentNet : A New Deep Learning Baseline for Image Classification

Yifei Li 1 Zheng Wang 2 Kuangyan Song 3 Yiming Sun 4

Abstract second-order pooling strategy. In this paper, we explore the


In this paper, we generalize the idea from the possibility to use basis from DFT and Wavelets analysis as
method called ”PCANet” (Chan et al., 2015) candidates for filter vectors. Before we present our algo-
to achieve a new baseline deep learning model rithms, let us have a brief review of both Discrete Fourier
arXiv:2001.01034v1 [cs.CV] 4 Jan 2020

for image classification. Instead of using prin- Transformation and wavelets analysis.
cipal component vectors as the filter vector in
”PCANet”, we use basis vectors in discrete 1.1. Discrete Fourier Transformation
Fourier analysis and wavelets analysis as our filter Discrete Fourier Transformation (DFT)(Beerends et al.,
vectors. Both of them achieve comparable perfor- 2003) can represent the information in image at differ-
mance to “PCANet” in benchmark datasets. It is ent frequencies. Mathematically, given a vectorized im-
noticeable that our algorithms do not require any age x of length n, 1D DFT transforms it as d(ωk ) =
optimization techniques to get those basis. hx, (C(ωk ) − iS(ωk ))i, ωk = 2πk/n, k ∈ Fn , the set
of
 Fourier frequencies. To be precise, Fn denotes the set
−[ n−1 n

], . . . , [ ] where [x] is the integer part of x , where
1. Introduction 2 2

Convolutional Neural Networks(CNN)(Jarrett et al., 2009) 1


has achieved tremendous success in image classification C(ωk ) = √ (1, cos ωk , . . . , cos(n − 1)ωk )> ,
n
(Krizhevsky et al., 2012), and filter vectors in the network (1)
1
aim to capture different patterns of images. But in or- S(ωk ) = √ (1, sin ωk , . . . , sin(n − 1)ωk )> .
n
der to get those filter vectors, we need to resort to back-
propagation to solve a complicated optimization problem. Researchers have found that different frequencies can cap-
(Chan et al., 2015) Proposed a baseline model for image ture different levels of information in the image. For ex-
classification which does not require any kind of back prop- ample, the high-pass filter will only select high-frequency
agation to learn the filter vector. Instead, they proposed to signals to get the structured information like edges, while
use left eigen-vectors of stacked images which is commonly low-pass filter will select low-frequency signals and thus
known as principal component analysis (PCA) to be the filter generate an over-smoothed and blurry image. (Costen et al.,
vectors. This stems from the eigen-decomposition where we 1996). There are many traditional models focused on the
can decompose the target onto the orthogonal basis (eigen- detection of high-frequency information. For example, typ-
vectors) from PCA. Projection along each orthogonal basis ical gradient-based mothod like sobel operator (Gao et al.,
can represent certain patterns in the image. However, getting 2010), prewitt operator (Yang et al., 2011) and canny oper-
those eigen-vectors is time consuming, especially for large ator (Canny, 1986) detect the high-frequency information
datasets even if some randomized algorithms (Halko et al., in 1-order gradient domain. The laplacian operator (Wang,
2011) are applied. In the classical computer vision literature, 2007) focuses on the 2-order gradient, which was widely
researchers have developed multi-scaled representation of used in image processing to sharpen the image. We refer
images without resorting to optimization. Two most widely (Kumar et al., 2013) for interesting readers to get a compre-
used ones are Discrete Fourier Transformation(DFT) (Nord- hensive understanding of edge detectors in image process-
berg, 1995) and Wavelets analysis (Mallat, 1996). (Fan ing. In this work, we mainly focuses on the discrete Fourier
et al., 2018) also extend the framework of PCANet by a transformation, since it has simpler form and can be easily
extended to convolutional filters.
1
Department of Computer Science, Zhejiang University
2
Department of Computer Science, Tongji University 3 Zshield 1.2. Wavelets Analysis
Inc. 4 AWS AI, Amazon. Correspondence to: Yiming Sun
<ys784@cornell.edu>. Different from DFT, wavelets aim to implement spectral
analysis locally in the graph. We applied Daubechies D4
Wavelet Transform (Strang & Nguyen, 1996) (later we call it
FrequentNet : A New Deep Learning Baseline for Image Classification

DB4) in our paper. In DB4, the first layer wavelets analysis Algorithm 1 Select top K Fourier Basis
can be taken as project the image vector x onto filter vectors Input: X̄, L1
formed with progressive local basis vector as for k in Fk+1 k2 do
" √ √ √ √ #
1+ 3 3+ 3 3− 3 1− 3 ck ← khC(ωk ), X̄ik1
h= , , , sk ← khS(ωk ), X̄ik1
4 4 4 4
" √ √ √ √ # (2) end for
1− 3 3 − 3 3 + 3 −1 − 3 Select C(ωk ) or S(ωk ) with top L1 largest values in
g= , , , .
4 4 4 4 {sk , ck } and call the set of DL1
Output: DL1
h is calculating the moving average, which performs as
low pass filtering in above section, while g is capturing the
comparison of local graph performing as high pass filtering only contains non-negative indices in Fn . And we choose
in above section. Then for vectorized image x, the first layer {cos(ωk ), sin(ωk )} to be our candidate orthogonal basis.
wavelet is like linear transformation in for n filter vectors as We then select certain filters of different frequencies based
 
h0 h1 h2 h3 · · · · · · · · · on the magnitude of the inner product of vectorized patches
 g0 g1 g2 g3 · · · · · · · · ·
  xi,j and candidate filters, as is summarized in Algorithm
0
 0 h0 h1 h2 h3 · · ·  x. (3) 1. With the obtained L1 filters v1 , · · · , vk , · · · , vL1 , every
0 0 g0 g1 g2 g3 · · · input image Ii is mapped to L1 new feature maps:
.. .. .. .. .. ..
 
..
. . . . . . . Iik = Ii ∗ matk1 ,k2 (vk ), (6)
In this case, we can treat each line of left matrix in (3) as
where ∗ is the two dimensional convolution. For later con-
the pool of our potential filter vectors.
venience, we rank v ∈ DL1 reversely based on khv1 , X̄ik1
reversely and set index based on it, i.e., khv1 , X̄ik1 ≥ · · · ≥
2. FrequentNet khvL1 , X̄ik1 .
2.1. Problem Setup
The Second Stage: After the first stage, for each basis
In this section we mainly follow the settings in (Chan et al., in DL1 , we get a new set of feature maps of the same
2015). Provided with N input training images, {Ii }N i=1 size as original images. For the new L1 N feature maps
of size m × n and we set the patch size (or 2D filter size) Iik , i = 1, · · · , N, k = 1, · · · L1, we continue to collect all
as k1 × k2 at all stages. We call those vectorized patches overlapping patches and subtract mean from them. Define
xi,1 , · · · , xi,mn where the first index is for image and the
second index is for patches. Then we subtract patch mean Ȳik = [yi,1
k k
; · · · ; yi,mn ] (7)
from each patch and obtain
then we could concatenate all Ȳik and get
X̄i = [xi,1 , · · · , xi,j , · · · , xi,mn ], 1 ≤ j ≤ mn (4)
Ȳ = [Ȳi1 ; · · · ; ȲN
L1
] (8)
of size k1 k2 × mn. Then we stack X̄i again to get
of size k1 k2 × L1 N mn. For Ȳ, we run Algorithm 1 again
X̄ = [X̄1 , · · · , X̄i , · · · , X̄N ], 1 ≤ i ≤ N. (5) to select top L2 Fourier basis and follow above definition,
Its size is k1 k2 × mnN . Then filter vectors aim to find we call the basis u1 , · · · , uL2 based on the magnitude of
patterns that can represent information in columns in xi,j the inner product.
effectively. PCANet chooses the filter vectors to be the top
left eigen-vectors of X̄. In this paper, we proposed to use Output Stage: At output stage, we use simple hashing
basis in DFT and wavelets. In order to maintain the same and histogram to get the final feature vectors. Generally,
size of image, we set convolution stride to 1 and zero-pad we first binarize all feature maps, then group these feature
each image before convolving with the learned frequent maps by the parent feature maps. For example, Iik is the
filters. The overall pipeline is same as PCANet, where a parent feature maps of Iik and Iik is the parent feature map
simple strategy like hashing and histogram is applied to of {Iik ∗ vl }. Then in each group, we pool the correspond-
obtain the final representation features. ing feature maps channel-wise by an exponential function.
This hashing and pooling operation will reduce the dimen-
sion of the feature representations while preserve significant
2.2. FourierNet
discriminative information. Finally, in histogram stage, we
The First Stage: To avoid duplicates in Fourier basis, again extract blocks by a sliding window and compute the
we restrict the index k in ωk within k ∈ Fn+ , where Fn+ histogram of each block. Then we simply concatenate these
FrequentNet : A New Deep Learning Baseline for Image Classification

histograms originated from one image as the final feature


Table 1. Descriptions of MNIST and MNIST variations picked for
vector of this image.
experiment
Specifically, for the 1-stage FourierNet, we extracte N L1 Datasets Description
feature maps denoted as Iik after the first stage. Then the
feature maps are binarized and grouped by image index i, MNIST Standard MNIST
{Ii1 , · · · , IiL1 } for example. basic A smaller subset of standard MNIST
bg-rand MNIST with noise background
rot MNIST with rotation
Ti = ΣL
k=1 2
1 k−1
B(Ii ∗ vk ) (9) bg-img MNIST with image background
bg-img-rot MNIST with rotation and image background
For the 2-stage FourierNet, we extracte N L1L2 feature
maps denoted as {Iik ∗ vl }. Then these feature maps are
binarized and grouped by image index i and stage-1 filters Table 2. Experiment setup for one stage structures
index k. patch block block
Tik = ΣLl=1 2
2 l−1
B(Iik ∗ ul ) (10) Datasets stride size stride
After grouping, we pool the feature maps channel-wise basic 1 7×7 3
and concatenate histogram vectors like aforementioned. In bg-rand 1 4×4 2
CIFAR10 experiments, we also use Spatial Pyramid Pool- rot 1 4×4 2
ing(SPP) layer from (He et al., 2015) to decrease the length bg-img 1 4×4 2
of feature vector as well as extract more robust features bg-img-rot 1 4×4 2
invariant to object poses, scales and colors etc.

2.3. WaveletsNet structure. We follow the recommended configurations for


different dataset in the original PCANet paper, which are
For waveletsNet, the whole process is very similar to Fouri- listed in Table 3.
erNet, the only difference now is that now the pool of candi-
date filter vectors become all rows in (3). Again, for stage 3.1.2. E XPERIMENT R ESULTS
I, we select L1 filter vectors again based on magnitude of
inner product between filter vector and vectorized image The testing accuracy of the one stage models on the se-
vector. At the this point, we can either go to output stage, lected datasets, with the number of filters varies from 2 to
or further we repeat the the selecting procedure to further 8 are shown in Figure 2. We can see from the results that
select L2 filters and then go to the output stage. the testing accuracy increases when the number of filters
grows. The testing results for the two stage models, with
the key setup in Table 3 are listed in Table 4. One can see
3. Experiments
that FourierNet-2 and WaveNet-2 achieves similar testing
We evaluated and compared the performances of FourierNet, accuracy on these datasets. We also listed the learned first
PCANet and RandNet on two tasks, the hand-written digits and second stage fourier filters from the bg-rand dataset
recognition and object recognition. in Figure 1. In order to visualize the features captured by
the learned filters, we selected two samples from MNIST
3.1. Hand-written Digits Recognition dataset, then performed low rank approximation through
convolution. Figure 3 shows the selected MNIST samples.
The MNIST (LeCun et al., 1998) and MNIST variations Figure 4 shows the selected low rank approximations using
(Larochelle et al., 2007) are common benchmarks for testing
hierarchical representations (Chan et al., 2015). We pick
a subset of MNIST and its variations to experiment on, as
Table 3. Experiment setup for two stages structure
listed in Table 1.
patch patch block block
3.1.1. E XPERIMENT S ETUP Datasets L1 L2 size stride size stride
basic 6 8 7×7 1 7×7 3
We investigated the impact of number of filters L1 on the
bg-rot 6 8 7×7 1 4×4 2
proposed structure using one stage structure. We fixed patch
bg-rand 6 8 7×7 1 4×4 2
size to 7 × 7 and patch stride to 1. Other experiment settings
bg-img 6 8 7×7 1 4×4 2
for one stage structure are listed in Table 2. For two stages
bg-img-rot 6 8 7×7 1 4×4 2
structure, we mainly compared the performances between
cifar10 40 8 5×5 1 8×8 4
model based on different basis vectors using the two stages
FrequentNet : A New Deep Learning Baseline for Image Classification

Table 4. Testing Accuracies(%) of different methods of MNIST


and its variations
bg- bg- bg-
Methods basic rand rot img img-rot
FourierNet-2 98.05 90.50 89.45 86.55 60.25
FourierNet-1 98.75 89.95 85.15 85.45 49.65
WaveNet-2 98.55 88.05 83.60 84.45 49.25
WaveNet-1 97.55 83.60 83.60 78.05 42.20
Figure 2. Test accuracy(%) of FourierNet-1 and PCANet-1 on
PCANet-2 98.15 91.55 89.50 87.20 62.50
MNIST basic and rot test set for varying number of filters(L1 ). We
PCANet-1 98.65 91.80 87.35 86.65 54.35
tested L1 varies from 2 to 8.
RandNet-2 97.55 82.70 86.00 83.40 40.00
RandNet-1 97.95 70.90 77.55 67.55 29.90

(a) Sample 1 (b) Sample 2

Figure 3. The selected two MNIST samples used for low rank
approximation
Figure 1. The fourier filters learned from bg-rand dataset. Top: the
first stage filters. Bottom: the second stage filters.

filters from FourierNet, PCAnet and RandomNet. More low


rank approximations could be found in Appendix.

3.2. CIFAR10 Object Recognition


CIFAR10 contains 10 classes with 50000 training samples
and 10000 test samples, which vary in object position, scale,
colors and textures (Chan et al., 2015). We fix the number (a) Fourier filter (b) PCA filter (c) Random filter
of filters in the first stage to 40, and the number of filters
in second stage to 5. We also fix the patch size to 5 × 5, Figure 4. The low rank approximations of each type of selected
block size to 8 × 8 and block overlap to 4. We tried dif- filters for two samples from MNIST dataset, from left to right: (a)
ferent combinations of methods for the two stage structure. selected Fourier filter; (b)selected PCA filter; (c) selected Ran-
dom filter. We picked one filter for each basis, then performed
Specifically, Fourier-Fourier is simply FourierNet-2 where
convolution over the 28 × 28 image samples.
we use fourier filters for both stages, while Fourier-PCA
means we use fourier filters for the first stage and PCA fil-
ters for the second stage. The rest combinations follow the 4. Conclusion
definition similarly. The testing accuracy of different com-
binations are listed in Table 5. We also listed the learned In this paper, we proposed to use basis from Discrete Fourier
filters of FourierNet-2 and PCANet-2 in Figure 6 and Figure Transformation and wavelets analysis to be the pool of po-
7 respectively. tential filter vectors for selection. This procedure does not
require any optimization and achieves comparable predic-
tion accuracy to ”PCANet”. In the future, we will extend the
basis to two dimensional Discrete Fourier Transformation
Table 5. Testing accuracy(%) of different methods on CIFAR10 and wavelets analysis (Alleyne & Cawley, 1991; Antoine
Methods Accuracy et al., 2008). Lots of progresses has been made in neural
Fourier-Fourier 67.70 network in graph (GNN) (Zhang et al., 2018), we also plan
Fourier-PCA 68.30 to extend our work to neural work in graph structure.
PCA-Fourier 69.75
PCA-PCA 70.95
FrequentNet : A New Deep Learning Baseline for Image Classification

References Kumar, M., Saxena, R., et al. Algorithm and technique


on various edge detection: A survey. Signal & Image
Alleyne, D. and Cawley, P. A two-dimensional fourier
Processing, 4(3):65, 2013.
transform method for the measurement of propagating
multimode signals. The Journal of the Acoustical Society Larochelle, H., Erhan, D., Courville, A., Bergstra, J., and
of America, 89(3):1159–1168, 1991. Bengio, Y. An empirical evaluation of deep architectures
Antoine, J.-P., Murenzi, R., Vandergheynst, P., and Ali, S. T. on problems with many factors of variation. In Proceed-
Two-dimensional wavelets and their relatives. Cambridge ings of the 24th international conference on Machine
University Press, 2008. learning, pp. 473–480. ACM, 2007.

Beerends, R. J., ter Morsche, H. G., Van den Berg, J., and LeCun, Y., Bottou, L., Bengio, Y., Haffner, P., et al.
Van de Vrie, E. Fourier and laplace transforms. Fourier Gradient-based learning applied to document recognition.
and Laplace Transforms, by RJ Beerends and HG ter Proceedings of the IEEE, 86(11):2278–2324, 1998.
Morsche and JC van den Berg and EM van de Vrie, pp.
Mallat, S. Wavelets for a vision. Proceedings of the IEEE,
458. ISBN 0521534410. Cambridge, UK: Cambridge Uni-
84(4):604–614, 1996.
versity Press, August 2003., pp. 458, 2003.
Nordberg, K. Fourier transforms. 1995.
Canny, J. A computational approach to edge detection.
IEEE Transactions on pattern analysis and machine in- Strang, G. and Nguyen, T. Wavelets and filter banks. SIAM,
telligence, (6):679–698, 1986. 1996.
Chan, T.-H., Jia, K., Gao, S., Lu, J., Zeng, Z., and Ma, Wang, X. Laplacian operator-based edge detectors. IEEE
Y. Pcanet: A simple deep learning baseline for image Transactions on Pattern Analysis and Machine Intelli-
classification? IEEE transactions on image processing, gence, 29(5):886–890, 2007.
24(12):5017–5032, 2015.
Yang, L., Wu, X., Zhao, D., Li, H., and Zhai, J. An improved
Costen, N. P., Parker, D. M., and Craw, I. Effects of high-
prewitt algorithm for edge detection based on noised
pass and low-pass spatial filtering on face identification.
image. In 2011 4th International Congress on Image
Perception & psychophysics, 58(4):602–612, 1996.
and Signal Processing, volume 3, pp. 1197–1200. IEEE,
Fan, C., Hong, X., Tian, L., Ming, Y., Pietikäinen, M., and 2011.
Zhao, G. Pcanet-ii: When pcanet meets the second order
pooling. IEICE TRANSACTIONS on Information and Zhang, Z., Cui, P., and Zhu, W. Deep learning on graphs: A
Systems, 101(8):2159–2162, 2018. survey. arXiv preprint arXiv:1812.04202, 2018.

Gao, W., Zhang, X., Yang, L., and Liu, H. An improved


sobel edge detection. In 2010 3rd International Confer-
ence on Computer Science and Information Technology,
volume 5, pp. 67–71. IEEE, 2010.
Halko, N., Martinsson, P.-G., and Tropp, J. A. Finding
structure with randomness: Probabilistic algorithms for
constructing approximate matrix decompositions. SIAM
review, 53(2):217–288, 2011.
He, K., Zhang, X., Ren, S., and Sun, J. Spatial pyramid
pooling in deep convolutional networks for visual recogni-
tion. IEEE transactions on pattern analysis and machine
intelligence, 37(9):1904–1916, 2015.
Jarrett, K., Kavukcuoglu, K., Ranzato, M., and LeCun, Y.
What is the best multi-stage architecture for object recog-
nition? In 2009 IEEE 12th international conference on
computer vision, pp. 2146–2153. IEEE, 2009.
Krizhevsky, A., Sutskever, I., and Hinton, G. E. Imagenet
classification with deep convolutional neural networks.
In Advances in neural information processing systems,
pp. 1097–1105, 2012.
FrequentNet : A New Deep Learning Baseline for Image Classification

5. Appendix
5.1. More low rank approximations
We present more low rank approximations here, each im-
age in Figure 5 represents one low rank recovered MNIST
sample.

Figure 6. The fourier filters learned from CIFAR10 dataset. Top:


the first stage filters, we set the number of filters for each channel
to 40. Bottom: the second stage filters, the number of filters are
set to 8.

(a) Fourier filters (b) PCA filters (c) Random filters

Figure 5. More recovered MNIST samples of selected filters, from


left to right: (a)Low rank approximations using Fourier filters;
(b)Low rank approximations using PCA filters; (c)Low rank ap-
proximations using Random filters

5.2. Visualization of filters of FourierNet and PCANet


learned from CIFAR10 dataset
The learned filters of FourierNet-2 and PCANet-2 on CI-
FAR10. We visualize the

Figure 7. The PCA filters learned from CIFAR10 dataset. Top:


the first stage filters, the number of filters for each channel is 40.
Bottom: the second stage filters, the number of filters is 8.

You might also like