Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

WideCaps: a wide attention-based capsule network for image classification

  • Original Paper
  • Published:
Machine Vision and Applications Aims and scope Submit manuscript

Abstract

The capsule network is a distinct and promising segment of the neural network family that has drawn attention due to its unique ability to maintain equivariance by preserving spatial relationships among the features. The capsule network has attained unprecedented success in image classification with datasets such as MNIST and affNIST by encoding the characteristic features into capsules and building a parse-tree structure. However, on datasets involving complex foreground and background regions, such as CIFAR-10 and CIFAR-100, the performance of the capsule network is suboptimal due to its naive data routing policy and incompetence in extracting complex features. This paper proposes a new design strategy for capsule network architectures for efficiently dealing with complex images. The proposed method incorporates the optimal placement of the novel wide bottleneck residual block and squeeze and excitation Attention Blocks into the capsule network upheld by the modified factorized machines routing algorithm to address the defined problem. This setup allows channel interdependencies at almost no computational cost, thereby enhancing the representation ability of capsules on complex images. We extensively evaluate the performance of the proposed model on the five publicly available datasets, namely the CIFAR-10, Fashion MNIST, Brain Tumor, SVHN, and the CIFAR-100 datasets. The proposed method outperformed the top-5 capsule network-based methods on Fashion MNIST, CIFAR-10, SVHN, Brain Tumor, and gave a highly competitive performance on the CIFAR-100 datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. http://www.cs.toronto.edu/tijmen/affNIST/.

References

  1. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)

    Article  Google Scholar 

  2. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Adv. Neural. Inf. Process. Syst. 25, 1097–1105 (2012)

    Google Scholar 

  3. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. (IJCV) 115(3), 211–252 (2015). https://doi.org/10.1007/s11263-015-0816-y

    Article  MathSciNet  Google Scholar 

  4. Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., LeCun, Y.: Overfeat: integrated recognition, localization and detection using convolutional networks. In: 2nd international conference on learning representations, ICLR 2014; conference date: 14-04-2014 through 16-04-2014 (2014)

  5. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1–9 (2015)

  6. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778 (2016)

  7. Chollet, F.: Xception: deep learning with depthwise separable convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1251–1258 (2017)

  8. Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861

  9. Huang, G., Liu, Z., Maaten, L. Van Der., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4700–4708 (2017)

  10. Su, J., Vargas, D.V., Sakurai, K.: One pixel attack for fooling deep neural networks. IEEE Trans. Evol. Comput. 23(5), 828–841 (2019)

    Article  Google Scholar 

  11. Moosavi-Dezfooli, S.-M., Fawzi, A., Frossard, P.: Deepfool: a simple and accurate method to fool deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2574–2582 (2016)

  12. Nguyen, A., Yosinski, J., Clune, J.: Deep neural networks are easily fooled: high confidence predictions for unrecognizable images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 427–436 (2015)

  13. Liu, R., Lehman, J., Molino, P., Such, F. Petroski, Frank, E., Sergeev, A., Yosinski, J.: An intriguing failing of convolutional neural networks and the coordconv solution. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (Eds.) Advances in Neural Information Processing Systems, Vol. 31. Curran Associates, Inc (2018)

  14. Hinton, G.E., Krizhevsky, A., Wang, S.D.: Transforming auto-encoders. In: International conference on artificial neural networks. Springer, pp. 44–51 (2011)

  15. Sabour, S., Frosst, N., Hinton, G.E.: Dynamic routing between capsules. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett R. (Eds.), Advances in Neural Information Processing Systems, Vol. 30. Curran Associates, Inc (2017)

  16. Xiao, H., Rasul, K., Vollgraf, R.: Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv:1708.07747

  17. Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning. NIPS Workshop on Deep Learn. Unsupervised Feature Learn. 2011 (2011)

  18. Krizhevsky, A.: Learning multiple layers of features from tiny images. University of Toronto (2009)

  19. Cheng, J., Yang, W., Huang, M., Huang, W., Jiang, J., Zhou, Y., Yang, R., Zhao, J., Feng, Y., Feng, Q., et al.: Retrieval of brain tumors by adaptive spatial pooling and fisher vector representation. PLoS ONE 11(6), e0157112 (2016)

    Article  Google Scholar 

  20. Xiang, C., Zhang, L., Tang, Y., Zou, W., Xu, C.: Ms-capsnet: a novel multi-scale capsule network. IEEE Signal Process. Lett. 25(12), 1850–1854 (2018)

    Article  Google Scholar 

  21. Phaye, S.S.R., Sikka, A., Dhall, A., Bathula, D.: Dense and diverse capsule networks: making the capsules learn better. arXiv:1805.04001

  22. Hoogi, A., Wilcox, B., Gupta, Y., Rubin, D.L.: Self-attention capsule networks for object classification. arXiv:1904.12483

  23. Jia, B., Huang, Q.: De-capsnet: a diverse enhanced capsule network with disperse dynamic routing. Appl. Sci. 10(3), 884 (2020)

    Article  Google Scholar 

  24. Sun, G., Ding, S., Sun, T., Zhang, C., Du, W.: A novel dense capsule network based on dense capsule layers. Appl. Intell. 52, 3066–3076 (2021)

    Article  Google Scholar 

  25. Hinton, G.E., Sabour, S., Frosst, N.: Matrix capsules with em routing. In: International conference on learning representations (2018)

  26. LeCun, Y., Huang, F.J., Bottou, L.: Learning methods for generic object recognition with invariance to pose and lighting. In: Proceedings of the 2004 IEEE computer society conference on computer vision and pattern recognition, 2004. CVPR 2004., Vol. 2, IEEE, pp. II–104 (2004)

  27. Deliege, A., Cioppa, A., Droogenbroeck, M. Van.: Hitnet: a neural network with capsules embedded in a hit-or-miss layer, extended with hybrid data augmentation and ghost capsules. arXiv:1806.06519

  28. Wang, D., Liu, Q.: An optimization view on dynamic routing between capsules

  29. Fuchs, A., Pernkopf, F.: Wasserstein routed capsule networks. arXiv:2007.11465

  30. Zhao, L., Wang, X., Huang, L.: An efficient agreement mechanism in capsnets by pairwise product. In: 24th European conference on artificial intelligence—ECAI 2020

  31. Rezwan, I.M., Ahmed, M.B., Sourav, S.S., Quader, E., Hossain, A., Mohammed, N.: Mixcaps: capsules with iteration free routing. In: Digital image computing: techniques and applications (DICTA). IEEE 2020, 1–8 (2020)

  32. Rajasegaran, J., Jayasundara, V., Jayasekara, S., Jayasekara, H., Seneviratne, S., Rodrigo, R.: Deepcaps: going deeper with capsule networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 10725–10733 (2019)

  33. Sun, K., Yuan, L., Xu, H., Wen, X.: Deep tensor capsule network. IEEE Access 8, 96920–96933 (2020)

    Article  Google Scholar 

  34. Yang, S., Lee, F., Miao, R., Cai, J., Chen, L., Yao, W., Kotani, K., Chen, Q.: Rs-capsnet: an advanced capsule network. IEEE Access 8, 85007–85018 (2020)

    Article  Google Scholar 

  35. Pawan, S., Sankar, R., Prabhudev, A.M., Mahesh, P., Prakashini, K., Das, S.K., Rajan, J.: Mobilecaps: a lightweight model for screening and severity analysis of covid-19 chest x-ray images. arXiv:2108.08775

  36. Li, H., Xiong, P., An, J., Wang, L.: Pyramid attention network for semantic segmentation. arXiv:1805.10180

  37. Fu, J., Liu, J., Tian, H., Li, Y., Bao, Y., Fang, Z., Lu, H.: Dual attention network for scene segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3146–3154 (2019)

  38. Huang, Q., Xia, C., Wu, C., Li, S., Wang, Y., Song, Y., Kuo, C.-C.J.: Semantic segmentation with reverse attention. arXiv:1707.06426

  39. Dong, X., Lei, Y., Tian, S., Wang, T., Patel, P., Curran, W.J., Jani, A.B., Liu, T., Yang, X.: Synthetic mri-aided multi-organ segmentation on male pelvic ct using cycle consistent deep attention network. Radiother. Oncol. 141, 192–199 (2019)

    Article  Google Scholar 

  40. Choi, J., Seo, H., Im, S., Kang, M.: Attention routing between capsules. In: Proceedings of the IEEE/CVF international conference on computer vision workshops, pp. 0–0 (2019)

  41. Huang, W., Zhou, F.: Da-capsnet: dual attention mechanism capsule network. Sci. Rep. 10(1), 1–13 (2020)

    MathSciNet  Google Scholar 

  42. Mazzia, V., Salvetti, F., Chiaberge, M.: Efficient-capsnet: capsule network with self-attention routing. Sci. Rep. 11(1), 1–13 (2021)

    Article  Google Scholar 

  43. Tsai, Y.-H.H., Srivastava, N., Goh, H., Salakhutdinov, R., Capsules with inverted dot-product attention routing. In: International conference on learning representations (ICLR) (2020)

  44. Ahmed, K., Torresani, L.: Star-caps: capsule networks with straight-through attentive routing. In: NeurIPS, pp. 9098–9107 (2019)

  45. Pawan, S., Rajan, J.: Capsule networks for image classification: a review. Neurocomputing 509, 102–120 (2022)

    Article  Google Scholar 

  46. LaLonde, R., Xu, Z., Irmakci, I., Jain, S., Bagci, U.: Capsules for biomedical image segmentation. Med. Image Anal. 68, 101889 (2021)

    Article  Google Scholar 

  47. Pawan, S., Sankar, R., Jain, A., Jain, M., Darshan, D., Anoop, B., Kothari, A.R., Venkatesan, M., Rajan, J.: Capsule network-based architectures for the segmentation of sub-retinal serous fluid in optical coherence tomography images of central serous chorioretinopathy. Med. Biol. Eng. Comput. 59(6), 1245–1259 (2021)

    Article  Google Scholar 

  48. Zeng, T., So, H.K.-H., Lam, E.Y.: Redcap: residual encoder-decoder capsule network for holographic image reconstruction. Opt. Express 28(4), 4876–4887 (2020)

    Article  Google Scholar 

  49. Jaiswal, A., AbdAlmageed, W., Wu, Y., Natarajan, P.: Capsulegan: generative adversarial capsule network. In: Proceedings of the European Conference on Computer Vision (ECCV) Workshops, pp. 0–0 (2018)

  50. Brock, A., De, S., Smith, S.L.: Characterizing signal propagation to close the performance gap in unnormalized resnets. In: 9th International Conference on Learning Representations, ICLR (2021)

  51. Maas, A.L., Hannun, A.Y., Ng, A.Y.: Rectifier nonlinearities improve neural network acoustic models. In: Proceedings icml, Vol. 30, p. 3 (2013)

  52. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, PMLR, pp. 448–456 (2015)

  53. Jamel, T.M., Khammas, B.M.: Implementation of a sigmoid activation function for neural network using fpga. In: 13th Scientific Conference of Al-Ma’moon University College, Vol. 13 (2012)

  54. LeCun, Y., Cortes, C., Burges, C.J.: The mnist database. http://yann.lecun.com/exdb/mnist

  55. Chollet, F. et al.: Keras. https://keras.io (2015)

  56. Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving, G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L., Kudlur, M., Levenberg, J., Mané, D., Monga, R., Moore, S., Murray, D., Olah, C., Schuster, M., Shlens, J., Steiner, B., Sutskever, I., Talwar, K., Tucker, P., Vanhoucke, V., Vasudevan, V., Viégas, F., Vinyals, O., Warden, P., Wattenberg, M., Wicke, M., Yu, Y., Zheng, X.: TensorFlow: large-scale machine learning on heterogeneous systems. software available from tensorflow.org (2015). https://www.tensorflow.org/

  57. He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1026–1034 (2015)

  58. Bruch, S., Wang, X., Bendersky, M., Najork, M.: An analysis of the softmax cross entropy loss for learning-to-rank with binary relevance. In: Proceedings of the 2019 ACM SIGIR International Conference on Theory of Information Retrieval, pp. 75–78 (2019)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to S. J. Pawan.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pawan, S.J., Sharma, R., Reddy, H. et al. WideCaps: a wide attention-based capsule network for image classification. Machine Vision and Applications 34, 52 (2023). https://doi.org/10.1007/s00138-023-01401-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00138-023-01401-6

Keywords