research-article

Scale-invariant Convolutional Capsule Network

Authors:

Baocai YinAuthors Info & Claims

AIAM2021: 2021 3rd International Conference on Artificial Intelligence and Advanced Manufacture

Pages 230 - 237

https://doi.org/10.1145/3495018.3495058

Published: 14 March 2022 Publication History

Abstract

Scale-invariant feature detection plays an important role in computer vision. Inspired by capsule networks and Hough transform, we presented a scale-invariant feature detection and extraction module that is streamlined, lightweight, and interpretable. We reduced the overhead by combining the learnable feature detector with an efficient scale aware parameter voting mechanism. Also, a routing mechanism can be added to further refine the extracted features to boost performance. Compared with popular multicolumn or feature pyramid based methods, our proposed method is more lightweight both parameter wise and architectural wise, while maintaining good multiscale performance, especially in intensively scale transformed scenarios. Meanwhile, its streamlined sequential pipeline makes it easy to integrate into other models in a plug-and-play manner.

References

[1]

Yann Lecun, Yoshua Bengio, Convolutional networks for images, speech, and time series. The handbook of brain theory and neural networks, vol. 3361, no. 10, 1995.

[2]

Dominik Scherer, Andreas Muller, and Sven Behnke. Evaluation of pooling operations in convolutional architectures for object recognition. In International conference on artificial neural networks, pp. 92-101, 2010.

Digital Library

[3]

Karel Lenc and Andrea Vedaldi. Understanding image representations by measuring their equivariance and equivalence. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 991-999, 2015.

[4]

Xu Yichong, Xiao Tianjun, Zhang Jiaxing, Yang Kuiyuan, and Zheng Zhang. Scale-invariant convolutional neural networks. arXiv preprint arXiv: 1411.6369, 2014.

[5]

Daniel Worrall and Max Welling. Deep scale-spaces: Equivariance over scale. In Advances in Neural Information Processing Systems, pp. 7366-7378, 2019.

[6]

Tsung-Yi Lin, Piotr Dollar, Ross Girshick, Kaiming He, Bharath Hariha-´ ran, and Serge Belongie. Feature pyramid networks for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2117-2125, 2017.

[7]

Tsung-Wei Ke, Michael Maire, and Stella X Yu. Multigrid neural architectures. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6665–6673, 2017.

[8]

Max Jaderberg, Karen Simonyan, Andrew Zisserman, Spatial transformer networks. In Advances in neural information processing systems, pp. 2017-2025, 2015.

Digital Library

[9]

Richard O Duda and Peter E Hart. Use of the hough transformation to detect lines and curves in pictures. Communications of the ACM, vol. 15, no. 1, pp. 11-15, 1972.

Digital Library

[10]

David G Lowe. Object recognition from local scale-invariant features. In Proceedings of the seventh IEEE international conference on computer vision, vol. 2, pp. 1150-1157, 1999.

Digital Library

[11]

Ethan Rublee, Vincent Rabaud, Kurt Konolige, and Gary Bradski. Orb: An efficient alternative to sift or surf. In 2011 International conference on computer vision, pp. 2564–2571, 2011.

Digital Library

[12]

Etienne Barnard and David Casasent. Invariance and neural nets. IEEE Transactions on neural networks, vol. 2, no. 5, pp. 498-508, 1991.

Digital Library

[13]

Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L Yuille. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE transactions on pattern analysis and machine intelligence, vol. 40, no. 4, pp. 834-848, 2017.

[14]

Joseph Redmon and Ali Farhadi. Yolo9000: better, faster, stronger. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7263-7271, 2017.

[15]

Evan Shelhamer, Dequan Wang, and Trevor Darrell. Blurring the line between structure and learning to optimize and adapt receptive fields. arXiv preprint arXiv: 1904.11487, 2019.

[16]

Gong Yunchao, Wang Liwei, Guo Ruiqi, and Lazebnik Svetlana. Multiscale orderless pooling of deep convolutional activation features. In European conference on computer vision, pp. 392-407, 2014.

[17]

Xu Wenju, Wang Guanghui, Sullivan Alan, and Zhang Ziming. Towards learning affine-invariant representations via data-efficient cnns. In The IEEE Winter Conference on Applications of Computer Vision, pp. 904-913, 2020.

[18]

Wang Xinjiang, Zhang Shilong, Yu Zhuoran, Feng Litong, and Zhang Wayne. Scale-equalizing pyramid convolution for object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13359-13368, 2020.

[19]

Dai Jifeng, Qi Haozhi, Xiong Yuwen, Li Yi, Zhang Guodong, Hu Han, and Wei Yichen. Deformable convolutional networks. In Proceedings of the IEEE international conference on computer vision, pp. 764-773, 2017.

[20]

Dana H Ballard. Generalizing the hough transform to detect arbitrary shapes. Pattern recognition, vol. 13, no. 2, pp. 111-122, 1981.

[21]

Sara Sabour, Nicholas Frosst, and Geoffrey E Hinton. Dynamic routing between capsules. Advances in neural information processing systems, vol. 30, pp. 3856-3866, 2017.

[22]

Geoffrey E Hinton, Sara Sabour, and Nicholas Frosst. Matrix capsules with em routing. In International conference on learning representations, 2018.

[23]

Adam Kosiorek, Sara Sabour, Yee Whye Teh, and Geoffrey E Hinton. Stacked capsule autoencoders. In Advances in neural information processing systems, pp. 15512-15522, 2019.

[24]

M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results. http://www.pascalnetwork.org/challenges/VOC/voc2012/workshop/index.html.

[25]

Tsung-Yi Lin, Michael Maire, Serge Belongie, Lubomir Bourdev, Ross Girshick, James Hays, Pietro Perona, Deva Ramanan, C. Lawrence Zitnick, and Piotr Dollar. Microsoft coco: Common objects in context, 2015.

[26]

Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv: 1804.02767, 2018.

[27]

Pengguang Chen. Gridmask data augmentation. arXiv preprint arXiv: 2001.04086, 2020.

[28]

Yun Sangdoo, Han Dongyoon, Oh Seong Joon, Chun Sanghyuk, Choe Junsuk, and Yoo Youngjoon. Cutmix: Regularization strategy to train strong classifiers with localizable features. In Proceedings of the IEEE International Conference on Computer Vision, pp. 6023-6032, 2019.

[29]

Bochkovskiy Alexey, Wang Chienyao, and Liao Hong-Yuan Mark. Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv: 2004.10934, 2020.

[30]

Hamid Rezatofighi, Nathan Tsoi, JunYoung Gwak, Amir Sadeghian, Ian Reid, and Silvio Savarese. Generalized intersection over union: A metric and a loss for bounding box regression. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 658- 666, 2019.

[31]

Tan Mingxing, Pang Ruoming, and Le Quoc V. Efficientdet: Scalable and efficient object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 10781-10790, 2020.

Scale-invariant Convolutional Capsule Network
1. Computing methodologies
  1. Artificial intelligence

Recommendations

Convolutional Fully-Connected Capsule Network (CFC-CapsNet): A Novel and Fast Capsule Network
Abstract
A Capsule Network (CapsNet) is a relatively new classifier and one of the possible successors of Convolutional Neural Networks (CNNs). CapsNet maintains the spatial hierarchies between the features and outperforms CNNs at classifying images ...
Multi-scale Convolutional Capsule Network for Hyperspectral Image Classification
Pattern Recognition and Computer Vision
Abstract
The conventional CNNs-based hyperspectral image classification faces the challenges of quite limited training samples which lead to over fitting and dissatisfied ability to describe the correlation between features. Recent Capsules network can ...
Convolutional Fully-Connected Capsule Network (CFC-CapsNet)
DASIP '21: Workshop on Design and Architectures for Signal and Image Processing (14th edition)

Capsule Networks (CapsNets) are the new generation of classifiers with several advantages over the previous ones. Such advantages include higher robustness to affine transformed datasets and detection of overlapping images. CapsNets, while obtaining ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

AIAM2021: 2021 3rd International Conference on Artificial Intelligence and Advanced Manufacture

October 2021

3136 pages

ISBN:9781450385046

DOI:10.1145/3495018

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 March 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article
Research
Refereed limited

Conference

AIAM2021

AIAM2021: 2021 3rd International Conference on Artificial Intelligence and Advanced Manufacture

October 23 - 25, 2021

Manchester, United Kingdom

Acceptance Rates

Overall Acceptance Rate 100 of 285 submissions, 35%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
31
Total Downloads

Downloads (Last 12 months)4
Downloads (Last 6 weeks)0

Reflects downloads up to 08 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten