Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3495018.3495058acmotherconferencesArticle/Chapter ViewAbstractPublication PagesaiamConference Proceedingsconference-collections
research-article

Scale-invariant Convolutional Capsule Network

Published: 14 March 2022 Publication History

Abstract

Scale-invariant feature detection plays an important role in computer vision. Inspired by capsule networks and Hough transform, we presented a scale-invariant feature detection and extraction module that is streamlined, lightweight, and interpretable. We reduced the overhead by combining the learnable feature detector with an efficient scale aware parameter voting mechanism. Also, a routing mechanism can be added to further refine the extracted features to boost performance. Compared with popular multicolumn or feature pyramid based methods, our proposed method is more lightweight both parameter wise and architectural wise, while maintaining good multiscale performance, especially in intensively scale transformed scenarios. Meanwhile, its streamlined sequential pipeline makes it easy to integrate into other models in a plug-and-play manner.

References

[1]
Yann Lecun, Yoshua Bengio, Convolutional networks for images, speech, and time series. The handbook of brain theory and neural networks, vol. 3361, no. 10, 1995.
[2]
Dominik Scherer, Andreas Muller, and Sven Behnke. Evaluation of pooling operations in convolutional architectures for object recognition. In International conference on artificial neural networks, pp. 92-101, 2010.
[3]
Karel Lenc and Andrea Vedaldi. Understanding image representations by measuring their equivariance and equivalence. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 991-999, 2015.
[4]
Xu Yichong, Xiao Tianjun, Zhang Jiaxing, Yang Kuiyuan, and Zheng Zhang. Scale-invariant convolutional neural networks. arXiv preprint arXiv: 1411.6369, 2014.
[5]
Daniel Worrall and Max Welling. Deep scale-spaces: Equivariance over scale. In Advances in Neural Information Processing Systems, pp. 7366-7378, 2019.
[6]
Tsung-Yi Lin, Piotr Dollar, Ross Girshick, Kaiming He, Bharath Hariha-´ ran, and Serge Belongie. Feature pyramid networks for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2117-2125, 2017.
[7]
Tsung-Wei Ke, Michael Maire, and Stella X Yu. Multigrid neural architectures. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6665–6673, 2017.
[8]
Max Jaderberg, Karen Simonyan, Andrew Zisserman, Spatial transformer networks. In Advances in neural information processing systems, pp. 2017-2025, 2015.
[9]
Richard O Duda and Peter E Hart. Use of the hough transformation to detect lines and curves in pictures. Communications of the ACM, vol. 15, no. 1, pp. 11-15, 1972.
[10]
David G Lowe. Object recognition from local scale-invariant features. In Proceedings of the seventh IEEE international conference on computer vision, vol. 2, pp. 1150-1157, 1999.
[11]
Ethan Rublee, Vincent Rabaud, Kurt Konolige, and Gary Bradski. Orb: An efficient alternative to sift or surf. In 2011 International conference on computer vision, pp. 2564–2571, 2011.
[12]
Etienne Barnard and David Casasent. Invariance and neural nets. IEEE Transactions on neural networks, vol. 2, no. 5, pp. 498-508, 1991.
[13]
Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L Yuille. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE transactions on pattern analysis and machine intelligence, vol. 40, no. 4, pp. 834-848, 2017.
[14]
Joseph Redmon and Ali Farhadi. Yolo9000: better, faster, stronger. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7263-7271, 2017.
[15]
Evan Shelhamer, Dequan Wang, and Trevor Darrell. Blurring the line between structure and learning to optimize and adapt receptive fields. arXiv preprint arXiv: 1904.11487, 2019.
[16]
Gong Yunchao, Wang Liwei, Guo Ruiqi, and Lazebnik Svetlana. Multiscale orderless pooling of deep convolutional activation features. In European conference on computer vision, pp. 392-407, 2014.
[17]
Xu Wenju, Wang Guanghui, Sullivan Alan, and Zhang Ziming. Towards learning affine-invariant representations via data-efficient cnns. In The IEEE Winter Conference on Applications of Computer Vision, pp. 904-913, 2020.
[18]
Wang Xinjiang, Zhang Shilong, Yu Zhuoran, Feng Litong, and Zhang Wayne. Scale-equalizing pyramid convolution for object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13359-13368, 2020.
[19]
Dai Jifeng, Qi Haozhi, Xiong Yuwen, Li Yi, Zhang Guodong, Hu Han, and Wei Yichen. Deformable convolutional networks. In Proceedings of the IEEE international conference on computer vision, pp. 764-773, 2017.
[20]
Dana H Ballard. Generalizing the hough transform to detect arbitrary shapes. Pattern recognition, vol. 13, no. 2, pp. 111-122, 1981.
[21]
Sara Sabour, Nicholas Frosst, and Geoffrey E Hinton. Dynamic routing between capsules. Advances in neural information processing systems, vol. 30, pp. 3856-3866, 2017.
[22]
Geoffrey E Hinton, Sara Sabour, and Nicholas Frosst. Matrix capsules with em routing. In International conference on learning representations, 2018.
[23]
Adam Kosiorek, Sara Sabour, Yee Whye Teh, and Geoffrey E Hinton. Stacked capsule autoencoders. In Advances in neural information processing systems, pp. 15512-15522, 2019.
[24]
M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results. http://www.pascalnetwork.org/challenges/VOC/voc2012/workshop/index.html.
[25]
Tsung-Yi Lin, Michael Maire, Serge Belongie, Lubomir Bourdev, Ross Girshick, James Hays, Pietro Perona, Deva Ramanan, C. Lawrence Zitnick, and Piotr Dollar. Microsoft coco: Common objects in context, 2015.
[26]
Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv: 1804.02767, 2018.
[27]
Pengguang Chen. Gridmask data augmentation. arXiv preprint arXiv: 2001.04086, 2020.
[28]
Yun Sangdoo, Han Dongyoon, Oh Seong Joon, Chun Sanghyuk, Choe Junsuk, and Yoo Youngjoon. Cutmix: Regularization strategy to train strong classifiers with localizable features. In Proceedings of the IEEE International Conference on Computer Vision, pp. 6023-6032, 2019.
[29]
Bochkovskiy Alexey, Wang Chienyao, and Liao Hong-Yuan Mark. Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv: 2004.10934, 2020.
[30]
Hamid Rezatofighi, Nathan Tsoi, JunYoung Gwak, Amir Sadeghian, Ian Reid, and Silvio Savarese. Generalized intersection over union: A metric and a loss for bounding box regression. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 658- 666, 2019.
[31]
Tan Mingxing, Pang Ruoming, and Le Quoc V. Efficientdet: Scalable and efficient object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 10781-10790, 2020.
  1. Scale-invariant Convolutional Capsule Network

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    AIAM2021: 2021 3rd International Conference on Artificial Intelligence and Advanced Manufacture
    October 2021
    3136 pages
    ISBN:9781450385046
    DOI:10.1145/3495018
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 14 March 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    AIAM2021

    Acceptance Rates

    Overall Acceptance Rate 100 of 285 submissions, 35%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 31
      Total Downloads
    • Downloads (Last 12 months)4
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 08 Feb 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media