3763-Article Text-20973-2-10-20240316
3763-Article Text-20973-2-10-20240316
3763-Article Text-20973-2-10-20240316
Corresponding Author:
Achmad Lukman, +6281342352282,
Faculty of Informatics, Department of Information Technology,
Telkom University, Bandung, Indonesia,
Email: alukman@telkomuniversity.ac.id
How to Cite:
A. Lukman, W. Saputro, and E. Seniwati, ”Improving Convolutional Neural Networks Performance Using Modified Pooling Func-
tion”, MATRIK: Jurnal Manajemen, Teknik Informatika, dan Rekayasa Komputer, Vol. 23, No. 2, pp. 343-352, Mar, 2024.
This is an open access article under the CC BY-SA license (https://creativecommons.org/licenses/by-sa/4.0/)
1. INTRODUCTION
Currently, many studies have produced various ways to improve the performance of convolutional neural networks, namely by
modifying the network architecture [1], finding the best architectural model to improve accuracy for large-scale image classification,
and producing a new architectural model called VGG networks, and also other research related to it, namely the inception network
[2], DenseNet [3], and residual network (ResNet) [4]. Furthermore, other research related to network performance improvement
is the dropout regularization approach, which is very effective in reducing the overfitting of neural networks [5], and research on
improving dropout performance by offering sparsity in the weights rather than the output vector of a layer using generalized dropout
[6]. Then, in terms of pooling function modification, a new pooling method called max-min pooling is proposed [7], which explores
the positive and negative responses of the resulting convolutional process. In contrast, another approach, mixed pooling operation [8],
the pooled map response, is selected probabilistically through sampling from a multinomial distribution formed from the activation
of each pooling region. Traditionally, convolutional neural networks use one of the two well-known pooling functions to improve
the accuracy of the network, but research [9] revealed that the selection of one of the two pooling functions can be used based on the
problem at hand. For this reason, they proposed a linear combination of the two pools in a convolutional neural network architecture
called ”CombPool.” Their results show that their proposed method can outperform existing pooling function methods, including the
traditional max pooling and average pooling functions. Some existing research has applied a combination of traditional pooling
functions because the shortcomings of the two pools are that the max pooling function only relies on the maximum value in the
feature map area. The problem occurs when most of the area has a high magnitude, resulting in the loss of information that stands
out in the region. In contrast, average pooling has shortcomings when the feature map area mostly has a value of zero, so the results
obtained make the convolution results worse [10]. This study aims to find a way to overcome the shortcomings pointed out by
previous studies by modifying the existing pooling function.
To overcome the problem, we developed double max pooling and double average pooling methods inspired by the mixed
pooling function method offered by [11], the same approach also provided by [8], and we also developed a method similar to paper
[9] but in a different way, which combines max pooling with average pooling so that it outperforms the error rate of conventional
pooling (e.g., max and average pooling) on CIFAR and SVHN datasets. Then, we also investigate the network architecture model
that we modified from the VGG16 network [12], which is the baseline network. Then, we use five modified network architectures
from the baseline derivative to find the right match for our method. For more details, our contributions can be listed as follows.
a. Modify the VGG16-based network architecture with several variations and compare the performance results for classification
tasks. Then, choose the best network architecture to adapt to our pooling function method.
b. We propose a double max and double average pooling function method, as well as a combination of our method with max and
average pooling, to see its response to improving the performance of convolutional neural networks.
c. Investigate the best network architecture among the offered variations by testing with Cifar10 and Cifar100 datasets and com-
paring the results with the baseline (VGG16) and existing state-of-the-art methods.
d. Comparing the accuracy results of the selected network architecture with the Cifar-VGG architecture and combining the net-
work architecture with the pooling function method. It also involves TinyImageNet and SVHN datasets as benchmarks to show
the performance of our pooling technique and network architecture.
Comparing the accuracy results of the selected network architecture with the Cifar-VGG architecture and combining the net-
work architecture with the pooling function method. It also involves TinyImageNet and SVHN datasets as benchmarks to show the
performance of our pooling technique and network architecture.
The max pooling method works by selecting the largest element in each pooling region Rij, where Rij represents the local
neighborhood around position (i,j) that shows in both equations (1) and (2):
Where | Rij | is the size of the pooling region Rij and Xcpq is the element at (p, q) in the pooling region Rij. In gated
pooling [11], a gating mask is used to determine the mixed proportion of max-average pooling to improve the learning response of
the processed image features and improve the accuracy of the convolutional neural network. This differs from research [11], where
we offer double max pooling and double average pooling methods with formulas similar to gated pooling. Each double max pooling
function method contains two max pooling methods whose results will be summed up to get the Qmax pooling function result.
Likewise, the scheme we use to get the Qavg pooling function value contains two average pooling function combinations whose
schemes can be seen in Figure 1.
Next, we will present related work on CNN configuration development in session 2, which we developed based on the VGG16
CNN network architecture. Then, in session 3, we will present in detail our proposed method, including our network architecture and
the experimental setup results, including datasets, training details, and evaluation, which are discussed in session 4. And finally, in
section 5, we will summarize our experimental results and possible future developments.
2. RESEARCH METHOD
The method stages used in this research are double max and double average pooling, combining the Proposed Method with the
Existing Pooling Function and Proposed network architectures.
Figure 1. Pooling configuration of (a) a combination of max and average pooling called ”mixed pooling”[11] and (b)modification of
mixed pooling by breaking it into two, namely the combination of two maxpooling ”f Qmax.”
called double max pooling, and the combination of two average pooling ”f Qavg” is called double average pooling.
2.2. Combining the Proposed Method with the Existing Pooling Function
The second scenario combines the double max pooling function with the existing max pooling and double average pooling
with average pooling, respectively. The scheme is shown in Table 1.
Based on Table 1, the pooling layers marked PL1, PL2, PL3, PL4, and PL5 will be filled with combinations in Table 2 to see
the effect of our pooling function on improving the performance of several convolutional neural network architectures we defined.
The results of this combination will be shown in the experiments and results section.
Table 2. Convolutional neural networks configuration, Net A as a baseline VGG-16. We use the variance pooling function in
pooling*
Net A Net B Net C Net D Net E Net F
Input (32 x 32 x 3)
Conv3 64 Conv3 64 Conv3 64 Conv3 64 Conv3 64
Conv3 64
Conv3 64 Conv3 64 Conv3 64 Conv3 64 Conv3 64
Pooling*(PL1)
Conv3 -128 Conv3 -128 Conv3 -128 Conv3 -128
Conv3 -128 Conv3 -128
Conv3 -128 Conv3 -128 Conv3 -128 Conv3 -128
Pooling*(PL2)
Conv3 256 Conv3 256
Conv3 256 Conv3 256 Conv3 256 Conv3 256
Conv3 256 Conv3 256
Conv3 256 Conv3 256 Conv3 256 Conv3 256
Conv3 256 Conv3 256
Pooling*(PL3)
Conv3 512 Conv3 512 Conv3 512 Conv3 512 Conv3 512
Conv3 512
Conv3 512 Conv3 512 Conv3 512 Conv3 512 Conv3 512
Conv3 512
Conv3 512 Conv3 512 Conv3 512 Conv1 512 Conv1 512
Pooling*(PL4)
Conv3 512 Conv3 512 Conv3 512
Conv3 512 Conv3 512 Conv3 512
Conv3 512 Conv1 512 Conv3 512
Conv3 512 Conv3 512 Conv3 512
Conv3 512 Conv3 512 Conv1 512
Pooling*(PL5)
FC-512
FC-512
FC-10/ FC-100
soft-max
Based on Table 2, we use a 3232 input dimension with three channels, namely R, G, and B. Then we map the pooling function
variably using max pooling, average pooling, Qmaxpooling (ours), and Qavgpooling to see the effectiveness of the VGG architecture
variation against the baseline which is VGG-16 architecture on the accuracy results shown. Among the architectures, Net B, Net
C, Net D, Net E, and Net F differ in layer reduction in each part of the convolution layer, resulting in different parameter variations
shown in Table 4.
1. Dataset
In this study, we used the Cifar dataset [13] with dimensions of 32 32 consisting of Cifar10, which has ten classes with a
total of 60,000 images; Cifar100, which consists of 100 classes with a total of 50,000 images for training and 10,000 for testing. In
addition, the TinyImageNet dataset [14] which contains 200 classes with a total of 120,000 images with a size of 64 64, and we
also involve the SVHN dataset, which consists of 73,257 digits images with a size of 32 32 to see the reliability of our method and
network. We will conduct a systematic testing process where we try to find the appropriate parameters to improve the performance
of our network.
66.52%. Then we added batch sizes to 128, 256, 512, and 1024. It can be seen that the best accuracy is obtained at a batch size of
128, namely using Qmax pooling with the Cifar10 dataset with an accuracy of 89.13%, while cifar100 accuracy reaches 66.57%. The
accuracy of the Qavg pooling function with the same batch size is 89.89% for cifar10 and 67.89% for cifar100.
Table 3. Top-1 accuracy (%) comparing the result with different batch size configuration using proposed method
Qmax pooling Qavg pooling
Batch size #Epochs Cifar 10 Cifar 100 Cifar 10 Cifar 100
Training Testing Training Testing Training Testing Training Testing
64 200 99.91 88.93 99.04 66.06 99.89 89.24 99.04 66.52
128 200 99.94 89.13 99.53 66.57 99.95 89.89 99.58 67.89
256 200 99.89 85.07 99.21 61.68 99.95 89.28 99.51 63.51
512 200 99.95 88.76 98.81 62.28 99.93 88.20 98.67 60.04
1024 200 99.90 87.79 99.04 62.39 99.96 88.17 99.44 63.81
Table 4. -1 accuracy(%) network performance compare with another pooling method with epoch 200
#Params for Training Testing
Network conf. (table 1) Method
Cifar10/Cifar100 Cifar 10 Cifar 100 Cifar 10 Cifar 100
15,001,418/ max pooling (baseline) 99.82 98.84 80.15 59.33
15,047,588 Qmax pooling (ours) 99.89 98.58 80.64 59.27
Mixed pooling [11] 99.87 98.42 88.93 59.35
Net A
Gated pooling [11] 99.39 92.75 84.41 53.22
average pooling 99.92 98.61 81.33 59.42
Qavg pooling (ours) 99.88 99.06 80.94 67.23
12,639,562/ max pooling 99.86 98.94 87.74 59.83
12,685,732 Qmax pooling (ours) 99.89 99 88.79 60.7
Mixed pooling [11] 99.89 98.16 89.9 55.21
Net B
Gated pooling [11] 99.57 94.2 87.44 60.57
average pooling 99.88 98.83 89.1 60.87
Qavg pooling (ours) 99.91 98.76 89.43 55.28
12,048,458/ max pooling 99.91 99.35 88.15 65.54
12,094,628 Qmax pooling (ours) 99.91 99.37 89.32 65.84
Mixed pooling [11] 99.93 99.35 89.06 66.78
Net C
Gated pooling [11] 99.66 94.36 86.23 61.13
average pooling 99.92 99.48 90.05 67.79
Qavg pooling (ours) 99.92 99.39 89.86 65.7
9,686,602/ max pooling 99.96 99.64 90.35 64.76
9,732,772 Qmax pooling (ours) 99.95 99.66 89.59 65.74
Mixed pooling [11] 99.92 99.65 85.58 65.59
Net D
Gated pooling [11] 99.62 95.65 87.1 61.33
average pooling 99.94 99.69 89.5 66.06
Qavg pooling (ours) 99.93 99.69 85.8 66.02
max pooling 99.96 99.5 89.36 66.82
Qmax pooling (ours) 99.94 99.53 89.13 66.57
Mixed pooling [11] 99.94 99.22 89.6 64.18
Net E 14,262,218/ 14,308,388
Gated pooling [11] 99.59 96.5 86.46 61.83
Avg pooling 99.95 99.55 89.55 67.75
Qavg pooling (ours) 99.95 99.58 89.89 67.89
14,225,034/ max pooling 99.95 99.53 87.97 64.73
14,271,204 Qmax pooling (ours) 99.96 99.57 88.06 64.17
Mixed pooling [11] 99.94 99.59 87.58 62.87
Net F
Gated pooling [11] 99.76 97.16 86.03 61.07
Avg pooling 99.93 99.52 87.61 63.99
Qavg pooling (ours) 99.93 99.55 86.86 63.27
Furthermore, along with the addition of batch size, it turns out that the accuracy results tend to decrease. This states that the
effect of a larger batch size will increase the loss function value so that it decreases the resulting accuracy, but it should be noted that
the selection of batch size must also be adjusted to the size of the dataset used. Furthermore, we show the test results with several
Table 6. Top-1 (% accuracy) Comparing results proposed method and baseline on Cifar10 datasets.
Testing
Table config. #Params
Max pooling Qavgpooling Qmaxpooling LP1 LP2
Net A (baseline) 14M 86.94 - - - -
Net B 12.9M - 86.23 85.90 86.49 85.71
Net C 12.3M - 87.59 86.57 87.16 85.33
Net D 9.9M - 86.55 87.32 87.27 86.20
Net E 14M - 88.10 87.74 86.24 88.16
Net F 14M - 81.79 84.49 83.52 84.19
The evaluation seen in Table 6 shows that the accuracy of Net B cannot outperform the baseline that uses the max pooling
function as the default configuration of VGG16 (Net A). In contrast, Net C shows better accuracy than Net A as a baseline and excels
at using Qavg pooling function and LP1 pooling combination. In Net D, the accuracy shows a good accuracy trend, especially in
the use of Qmaxpooling pooling function and LP1 pooling combination with 87.32% and 87.27%, respectively, but still tends to be
lower than the accuracy of Net C, which uses Qavgpooling pooling function which reaches 87.59%. Then a very good accuracy trend
is shown by Net E, which can outperform the accuracy of all performance network configurations involved in this section, namely
88.16% when using the LP2 combination, followed by when using the Qavgpooling function with accuracy reaching 88.10%, and
when using the Qmaxpooling function which reaches 87.74% accuracy. Then, the accuracy shown for Net F cannot outperform the
baseline.
Table 7. Top-1 (% accuracy) Comparing results between baseline and proposed method on cifar 100 datasets
Testing
Table config. #Params
Max pooling Qavgpooling Qmaxpooling LP1 LP2
Net A (baseline) 14M 59.18 - - - -
Net B 12.9M - 58.88 55.82 60.21 57.79
Net C 12.3M - 55.58 58.08 58.58 58.03
Net D 9.9M - 57.36 58.78 55.74 58.95
Net E 14M - 58.76 55.78 60.38 56.93
Net F 14.5M - 48.31 55.39 46.99 53.37
The results of the accuracy comparison evaluation using the Cifar100 dataset are shown in Table 7, where, in general, the
accuracy results that can beat the performance of Net A as a baseline are Net B using the LP1 combination with an accuracy of
60.21%, followed by the highest performance is Net E with an accuracy of 60.38%.
Table 8. Comparison of the testing (%) results with several state-of-the-art on cifar 10 and cifar 100 datasets using Data
Augmentation
Method Cifar 10 Cifar 100
Baseline 93.58 68.75
Net E + Qavg pooling(ours) 93.6 69.81
Net E + Qmax pooling(ours) 93.28 69.73
Net E + LP1 (ours) 93.9 71.1
Net E + LP2 (ours) 93.63 69.76
Mixed pooling [11] 90.74 69.1
Gated pooling [11] 92.38 65.79
Network in Network (NiN) [16] 91.19 64.32
All-CNN [15] 92.75 66.29
Table 9. Comparison of the proposed method Testing (%) results with the state-of-the-art model on cifar 10, cifar 100, SVHN, and
Tiny Imagenet datasets
No. Method Cifar 10 Cifar 100 Tiny Imagenet (200) SVHN
1 Cifar-VGG [2] 90.94 67.02 51.83 96.18
2 Cifar-VGG [2]+Qmax(ours) 91.72 67.45 51.69 96.38
3 Cifar-VGG 2+ Qavg(ours) 91.81 67.75 52.03 95.21
4 Net E + LP1 (ours) 93.9 71.1 52.84 95.95
The results of this study found that the techniques we propose, namely Qavg pooling and Qmax pooling function, can
relatively improve the shortcomings of the existing pooling functions, namely maxpooling and avg pooling, especially when we do a
scheme of pooling configuration that consists of the combination between our pooling and existing pooling functions that the results
can be shown in Table 6, 7, 8 and 9, that point out of our techniques superior to another.
4. CONCLUSION
This paper introduces two new pooling function techniques and five network architecture variations derived from the VGG16
architecture. Among the five architectures we offer, one option has a fairly good performance in adapting to the pooling function
techniques we offer, namely Net E as a network configuration modified from VGG16 (Net A). In addition, the results of testing
optimization algorithms that can be combined with one of the pooling functions we offer lie in the SGD optimization algorithm,
which shows the highest accuracy of 89.76% for Cifar10 and 89.06% for Cifar100. In testing the comparison of network performance
accuracy using two pooling functions and two combinations of pooling functions that we offer against the baseline shows that Net
C, Net D, and Net E are more adaptable to the Cifar10 dataset by showing higher accuracy than the baseline and from other network
configurations. While on the Cifar100 dataset, Net B and Net E show a trend of better accuracy when using the LP1 pooling function
combination. Furthermore, to see the reliability of the method we offer by doing two scenarios, namely first comparing it with
the existing pooling function method using data augmentation, it can be seen that the Net E + LP1 combination can outperform
the methods’ accuracy. In the second scenario, the result of performance comparison using VGG-cifar network architecture shows
superior Net E + LP1 on three benchmarking datasets, namely Cifar10, Cifar100, and TinyImagenet, but tends to be low in accuracy
when given the SVHN dataset. Based on the experiments in the results and analysis section, the accuracy obtained is not too high
compared to the accuracy of the baseline model and other comparison models. For this reason, in future work, this research can still
be developed with several combinations of existing methods, such as using stochastic pooling functions whose accuracy results in
the paper are more promising.
5. ACKNOWLEDGEMENTS
This work was supported by Research and Community Service, Telkom university Bandung.
6. DECLARATIONS
AUTHOR CONTIBUTION
Conceptualization and methodology, Achmad lukman and Erni seniwati, implementation, Wahju Tjahjo saputro, writing-original draf
preparation, Achmad lukman formal analysis, Achmad Lukman, investigation, Erni seniwati. All authors have read and agreed to the
published version of the manuscript.
FUNDING STATEMENT
This research has not been funded.
COMPETING INTEREST
The authors declare no conflict of Interest.
REFERENCES
[1] X. Ding, G. Ding, Y. Guo, and J. Han, “Centripetal SGD for Pruning Very Deep Convolutional Networks With Complicated
Structure,” in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), vol. June. IEEE, jun 2019,
pp. 4938–4948, https://doi.org/10.1109/CVPR.2019.00508.
[2] X. Soria, E. Riba, and A. Sappa, “Dense Extreme Inception Network: Towards a Robust CNN Model for Edge Detection,”
in 2020 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, mar 2020, pp. 1912–1921, https:
//doi.org/10.1109/WACV45572.2020.9093290.
[3] J. Hemalatha, S. Roseline, S. Geetha, S. Kadry, and R. Damaševičius, “An Efficient DenseNet-Based Deep Learning Model for
Malware Detection,” Entropy, vol. 23, no. 3, pp. 1–23, mar 2021, https://doi.org/10.3390/e23030344.
[4] F. He, T. Liu, and D. Tao, “Why ResNet Works? Residuals Generalize,” IEEE Transactions on Neural Networks and Learning
Systems, vol. 31, no. 12, pp. 5349–5362, dec 2020, https://doi.org/10.1109/TNNLS.2020.2966319.
[5] C. Wei, S. Kakade, and T. Ma, “The implicit and explicit regularization effects of dropout,” in 37th International Conference
on Machine Learning, ICML 2020, 2020, pp. 10 181–10 192.
[6] X. Liang, L. Wu, J. Li, Y. Wang, Q. Meng, T. Qin, W. Chen, M. Zhang, and T. Y. Liu, “R-Drop: Regularized Dropout for Neural
Networks,” in Advances in Neural Information Processing Systems, vol. 13, 2021, pp. 1–16.
[7] S. K. Roy, M. E. Paoletti, J. M. Haut, E. M. T. Hendrix, and A. Plaza, “A New Max-Min Convolutional Network for Hyper-
spectral Image Classification,” in 2021 11th Workshop on Hyperspectral Imaging and Signal Processing: Evolution in Remote
Sensing (WHISPERS). IEEE, mar 2021, pp. 1–5, https://doi.org/10.1109/WHISPERS52202.2021.9483983.
[8] Q. Zhou, Z. Qu, and C. Cao, “Mixed pooling and richer attention feature fusion for crack detection,” Pattern Recognition
Letters, vol. 145, no. May, pp. 96–102, may 2021, https://doi.org/10.1016/j.patrec.2021.02.005.
[9] I. Rodriguez-Martinez, J. Lafuente, R. H. Santiago, G. P. Dimuro, F. Herrera, and H. Bustince, “Replacing pooling functions
in Convolutional Neural Networks by linear combinations of increasing functions,” Neural Networks, vol. 152, no. August, pp.
380–393, aug 2022, https://doi.org/10.1016/j.neunet.2022.04.028.
[10] A. Zafar, M. Aamir, N. Mohd Nawi, A. Arshad, S. Riaz, A. Alruban, A. K. Dutta, and S. Almotairi, “A Comparison of Pooling
Methods for Convolutional Neural Networks,” Applied Sciences, vol. 12, no. 17, pp. 1–21, aug 2022, https://doi.org/10.3390/
app12178643.
[11] C.-Y. Lee, P. Gallagher, and Z. Tu, “Generalizing Pooling Functions in CNNs: Mixed, Gated, and Tree,” IEEE Transactions
on Pattern Analysis and Machine Intelligence, vol. 40, no. 4, pp. 863–875, apr 2018, https://doi.org/10.1109/TPAMI.2017.
2703082.
[12] H. Yang, J. Ni, J. Gao, Z. Han, and T. Luan, “A novel method for peanut variety identification and classification by Improved
VGG16,” Scientific Reports, vol. 11, no. 1, pp. 1–17, aug 2021, https://doi.org/10.1038/s41598-021-95240-y.
[13] M. S and E. Karthikeyan, “Classification of Image using Deep Neural Networks and SoftMax Classifier with CIFAR datasets,” in
2022 6th International Conference on Intelligent Computing and Control Systems (ICICCS). IEEE, may 2022, pp. 1132–1135,
https://doi.org/10.1109/ICICCS53718.2022.9788359.
[14] Y. Le and X. Yang, “Tiny imagenet visual recognition challenge,” CS 231N, vol. 7, no. 7, pp. 1–6, 2015.
[15] M. R. Islam, D. Massicotte, and W.-P. Zhu, “All-ConvNet: A Lightweight All CNN for Neuromuscular Activity Recogni-
tion Using Instantaneous High-Density Surface EMG Images,” in 2020 IEEE International Instrumentation and Measurement
Technology Conference (I2MTC). IEEE, may 2020, pp. 1–6, https://doi.org/10.1109/I2MTC43012.2020.9129362.
[16] J. Yamanaka, S. Kuwashima, and T. Kurita, “Fast and Accurate Image Super Resolution by Deep CNN with Skip Connection
and Network in Network,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence
and Lecture Notes in Bioinformatics), 2017, pp. 217–225, https://doi.org/10.1007/978-3-319-70096-0 23.