Abstract
The capsule network is a distinct and promising segment of the neural network family that has drawn attention due to its unique ability to maintain equivariance by preserving spatial relationships among the features. The capsule network has attained unprecedented success in image classification with datasets such as MNIST and affNIST by encoding the characteristic features into capsules and building a parse-tree structure. However, on datasets involving complex foreground and background regions, such as CIFAR-10 and CIFAR-100, the performance of the capsule network is suboptimal due to its naive data routing policy and incompetence in extracting complex features. This paper proposes a new design strategy for capsule network architectures for efficiently dealing with complex images. The proposed method incorporates the optimal placement of the novel wide bottleneck residual block and squeeze and excitation Attention Blocks into the capsule network upheld by the modified factorized machines routing algorithm to address the defined problem. This setup allows channel interdependencies at almost no computational cost, thereby enhancing the representation ability of capsules on complex images. We extensively evaluate the performance of the proposed model on the five publicly available datasets, namely the CIFAR-10, Fashion MNIST, Brain Tumor, SVHN, and the CIFAR-100 datasets. The proposed method outperformed the top-5 capsule network-based methods on Fashion MNIST, CIFAR-10, SVHN, Brain Tumor, and gave a highly competitive performance on the CIFAR-100 datasets.
Similar content being viewed by others
References
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Adv. Neural. Inf. Process. Syst. 25, 1097–1105 (2012)
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. (IJCV) 115(3), 211–252 (2015). https://doi.org/10.1007/s11263-015-0816-y
Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., LeCun, Y.: Overfeat: integrated recognition, localization and detection using convolutional networks. In: 2nd international conference on learning representations, ICLR 2014; conference date: 14-04-2014 through 16-04-2014 (2014)
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1–9 (2015)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778 (2016)
Chollet, F.: Xception: deep learning with depthwise separable convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1251–1258 (2017)
Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861
Huang, G., Liu, Z., Maaten, L. Van Der., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4700–4708 (2017)
Su, J., Vargas, D.V., Sakurai, K.: One pixel attack for fooling deep neural networks. IEEE Trans. Evol. Comput. 23(5), 828–841 (2019)
Moosavi-Dezfooli, S.-M., Fawzi, A., Frossard, P.: Deepfool: a simple and accurate method to fool deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2574–2582 (2016)
Nguyen, A., Yosinski, J., Clune, J.: Deep neural networks are easily fooled: high confidence predictions for unrecognizable images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 427–436 (2015)
Liu, R., Lehman, J., Molino, P., Such, F. Petroski, Frank, E., Sergeev, A., Yosinski, J.: An intriguing failing of convolutional neural networks and the coordconv solution. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (Eds.) Advances in Neural Information Processing Systems, Vol. 31. Curran Associates, Inc (2018)
Hinton, G.E., Krizhevsky, A., Wang, S.D.: Transforming auto-encoders. In: International conference on artificial neural networks. Springer, pp. 44–51 (2011)
Sabour, S., Frosst, N., Hinton, G.E.: Dynamic routing between capsules. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett R. (Eds.), Advances in Neural Information Processing Systems, Vol. 30. Curran Associates, Inc (2017)
Xiao, H., Rasul, K., Vollgraf, R.: Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv:1708.07747
Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning. NIPS Workshop on Deep Learn. Unsupervised Feature Learn. 2011 (2011)
Krizhevsky, A.: Learning multiple layers of features from tiny images. University of Toronto (2009)
Cheng, J., Yang, W., Huang, M., Huang, W., Jiang, J., Zhou, Y., Yang, R., Zhao, J., Feng, Y., Feng, Q., et al.: Retrieval of brain tumors by adaptive spatial pooling and fisher vector representation. PLoS ONE 11(6), e0157112 (2016)
Xiang, C., Zhang, L., Tang, Y., Zou, W., Xu, C.: Ms-capsnet: a novel multi-scale capsule network. IEEE Signal Process. Lett. 25(12), 1850–1854 (2018)
Phaye, S.S.R., Sikka, A., Dhall, A., Bathula, D.: Dense and diverse capsule networks: making the capsules learn better. arXiv:1805.04001
Hoogi, A., Wilcox, B., Gupta, Y., Rubin, D.L.: Self-attention capsule networks for object classification. arXiv:1904.12483
Jia, B., Huang, Q.: De-capsnet: a diverse enhanced capsule network with disperse dynamic routing. Appl. Sci. 10(3), 884 (2020)
Sun, G., Ding, S., Sun, T., Zhang, C., Du, W.: A novel dense capsule network based on dense capsule layers. Appl. Intell. 52, 3066–3076 (2021)
Hinton, G.E., Sabour, S., Frosst, N.: Matrix capsules with em routing. In: International conference on learning representations (2018)
LeCun, Y., Huang, F.J., Bottou, L.: Learning methods for generic object recognition with invariance to pose and lighting. In: Proceedings of the 2004 IEEE computer society conference on computer vision and pattern recognition, 2004. CVPR 2004., Vol. 2, IEEE, pp. II–104 (2004)
Deliege, A., Cioppa, A., Droogenbroeck, M. Van.: Hitnet: a neural network with capsules embedded in a hit-or-miss layer, extended with hybrid data augmentation and ghost capsules. arXiv:1806.06519
Wang, D., Liu, Q.: An optimization view on dynamic routing between capsules
Fuchs, A., Pernkopf, F.: Wasserstein routed capsule networks. arXiv:2007.11465
Zhao, L., Wang, X., Huang, L.: An efficient agreement mechanism in capsnets by pairwise product. In: 24th European conference on artificial intelligence—ECAI 2020
Rezwan, I.M., Ahmed, M.B., Sourav, S.S., Quader, E., Hossain, A., Mohammed, N.: Mixcaps: capsules with iteration free routing. In: Digital image computing: techniques and applications (DICTA). IEEE 2020, 1–8 (2020)
Rajasegaran, J., Jayasundara, V., Jayasekara, S., Jayasekara, H., Seneviratne, S., Rodrigo, R.: Deepcaps: going deeper with capsule networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 10725–10733 (2019)
Sun, K., Yuan, L., Xu, H., Wen, X.: Deep tensor capsule network. IEEE Access 8, 96920–96933 (2020)
Yang, S., Lee, F., Miao, R., Cai, J., Chen, L., Yao, W., Kotani, K., Chen, Q.: Rs-capsnet: an advanced capsule network. IEEE Access 8, 85007–85018 (2020)
Pawan, S., Sankar, R., Prabhudev, A.M., Mahesh, P., Prakashini, K., Das, S.K., Rajan, J.: Mobilecaps: a lightweight model for screening and severity analysis of covid-19 chest x-ray images. arXiv:2108.08775
Li, H., Xiong, P., An, J., Wang, L.: Pyramid attention network for semantic segmentation. arXiv:1805.10180
Fu, J., Liu, J., Tian, H., Li, Y., Bao, Y., Fang, Z., Lu, H.: Dual attention network for scene segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3146–3154 (2019)
Huang, Q., Xia, C., Wu, C., Li, S., Wang, Y., Song, Y., Kuo, C.-C.J.: Semantic segmentation with reverse attention. arXiv:1707.06426
Dong, X., Lei, Y., Tian, S., Wang, T., Patel, P., Curran, W.J., Jani, A.B., Liu, T., Yang, X.: Synthetic mri-aided multi-organ segmentation on male pelvic ct using cycle consistent deep attention network. Radiother. Oncol. 141, 192–199 (2019)
Choi, J., Seo, H., Im, S., Kang, M.: Attention routing between capsules. In: Proceedings of the IEEE/CVF international conference on computer vision workshops, pp. 0–0 (2019)
Huang, W., Zhou, F.: Da-capsnet: dual attention mechanism capsule network. Sci. Rep. 10(1), 1–13 (2020)
Mazzia, V., Salvetti, F., Chiaberge, M.: Efficient-capsnet: capsule network with self-attention routing. Sci. Rep. 11(1), 1–13 (2021)
Tsai, Y.-H.H., Srivastava, N., Goh, H., Salakhutdinov, R., Capsules with inverted dot-product attention routing. In: International conference on learning representations (ICLR) (2020)
Ahmed, K., Torresani, L.: Star-caps: capsule networks with straight-through attentive routing. In: NeurIPS, pp. 9098–9107 (2019)
Pawan, S., Rajan, J.: Capsule networks for image classification: a review. Neurocomputing 509, 102–120 (2022)
LaLonde, R., Xu, Z., Irmakci, I., Jain, S., Bagci, U.: Capsules for biomedical image segmentation. Med. Image Anal. 68, 101889 (2021)
Pawan, S., Sankar, R., Jain, A., Jain, M., Darshan, D., Anoop, B., Kothari, A.R., Venkatesan, M., Rajan, J.: Capsule network-based architectures for the segmentation of sub-retinal serous fluid in optical coherence tomography images of central serous chorioretinopathy. Med. Biol. Eng. Comput. 59(6), 1245–1259 (2021)
Zeng, T., So, H.K.-H., Lam, E.Y.: Redcap: residual encoder-decoder capsule network for holographic image reconstruction. Opt. Express 28(4), 4876–4887 (2020)
Jaiswal, A., AbdAlmageed, W., Wu, Y., Natarajan, P.: Capsulegan: generative adversarial capsule network. In: Proceedings of the European Conference on Computer Vision (ECCV) Workshops, pp. 0–0 (2018)
Brock, A., De, S., Smith, S.L.: Characterizing signal propagation to close the performance gap in unnormalized resnets. In: 9th International Conference on Learning Representations, ICLR (2021)
Maas, A.L., Hannun, A.Y., Ng, A.Y.: Rectifier nonlinearities improve neural network acoustic models. In: Proceedings icml, Vol. 30, p. 3 (2013)
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, PMLR, pp. 448–456 (2015)
Jamel, T.M., Khammas, B.M.: Implementation of a sigmoid activation function for neural network using fpga. In: 13th Scientific Conference of Al-Ma’moon University College, Vol. 13 (2012)
LeCun, Y., Cortes, C., Burges, C.J.: The mnist database. http://yann.lecun.com/exdb/mnist
Chollet, F. et al.: Keras. https://keras.io (2015)
Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving, G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L., Kudlur, M., Levenberg, J., Mané, D., Monga, R., Moore, S., Murray, D., Olah, C., Schuster, M., Shlens, J., Steiner, B., Sutskever, I., Talwar, K., Tucker, P., Vanhoucke, V., Vasudevan, V., Viégas, F., Vinyals, O., Warden, P., Wattenberg, M., Wicke, M., Yu, Y., Zheng, X.: TensorFlow: large-scale machine learning on heterogeneous systems. software available from tensorflow.org (2015). https://www.tensorflow.org/
He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1026–1034 (2015)
Bruch, S., Wang, X., Bendersky, M., Najork, M.: An analysis of the softmax cross entropy loss for learning-to-rank with binary relevance. In: Proceedings of the 2019 ACM SIGIR International Conference on Theory of Information Retrieval, pp. 75–78 (2019)
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Pawan, S.J., Sharma, R., Reddy, H. et al. WideCaps: a wide attention-based capsule network for image classification. Machine Vision and Applications 34, 52 (2023). https://doi.org/10.1007/s00138-023-01401-6
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00138-023-01401-6