Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3523150.3523169acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlscConference Proceedingsconference-collections
research-article

Transformer-Convolution Network for Arbitrary Shape Text Detection

Published: 13 April 2022 Publication History

Abstract

Arbitrary shape text detection is a prevalent topic in computer vision. Text instances in natural scenes may involve different sizes, different shapes, and complex background textures. Therefore, the ability to extract accurate text features becomes extremely significant for subsequent detection work. This paper proposes a novel Transformer-Convolution Network(TCNet) to participate in scene text detection task. TCNet contains two major modules named CNN module and Transformer module. CNN module is used to extract local features from the input images, while Transformer module establish connections among various local features. The two structures are complementary to each other, in particular, such combination between local features and relative locations contributes to a more precise detection, promoting the convergence and reducing the amount of parameters. Numerous experiments based on public datasets have demonstrated the excellent performance under the condition of sufficient data. specifically, in the case of small data, our method achieves the state-of-the-art performance whether in quadrilateral text or arbitrary shape text datasets.

Supplementary Material

p120-hu-supplement (p120-hu-supplement.pptx)
Presentation slides

References

[1]
S. Long, X. He, and C. Yao, “Scene text detection and recognition: The deep learning era,” International Journal of Computer Vision 129, 161–184 (2021).
[2]
Y. Zhu, C. Yao, and X. Bai, “Scene text detection and recognition: Recent advances and future trends,” Frontiers of Computer Science 10, 19–36 (2016).
[3]
X. Liu, G. Meng, and C. Pan, “Scene text detection and recognition with advances in deep learning: a survey,” International Journal on Document Analysis and Recognition (IJDAR) 22(2), 143–162 (2019).
[4]
F. Meng, D. Yin, R. Zhang, “Efficient framework with sequential classification for graphic vehicle identification number recognition,” Journal of Electronic Imaging 29(4), 043009 (2020).
[5]
K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556 (2014).
[6]
K. He, X. Zhang, S. Ren, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition,770–778 (2016).
[7]
G. Huang, Z. Liu, L. Van Der Maaten, “Densely connected convolutional networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 4700–4708 (2017).
[8]
A. G. Howard, M. Zhu, B. Chen, “Mobilenets: Efficient convolutional neural networks for mobile vision applications,” arXiv preprint arXiv:1704.04861 (2017).
[9]
A. Vaswani, N. Shazeer, N. Parmar, “Attention is all you need,” in Advances in neural information processing systems, 5998–6008 (2017).
[10]
10 A. Dosovitskiy, L. Beyer, A. Kolesnikov, “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929 (2020).
[11]
Z. Liu, Y. Lin, Y. Cao, “Swin transformer: Hierarchical vision transformer using shifted windows,” arXiv preprint arXiv:2103.14030 (2021).
[12]
S.-X. Zhang, X. Zhu, J.-B. Hou, “Deep relational reasoning graph network for arbitrary shape text detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 9699–9708 (2020).
[13]
S. Ren, K. He, R. Girshick, “Faster r-cnn: Towards real-time object detection with region proposal networks,” Advances in neural information processing systems 28, 91–99 (2015).
[14]
M. Liao, B. Shi, X. Bai, “Textboxes: A fast text detector with a single deep neural network,” in Thirty-first AAAI conference on artificial intelligence, (2017).
[15]
X. Zhou, C. Yao, H. Wen, “East: an efficient and accurate scene text detector,” in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, 5551–5560 (2017).
[16]
J. Ma, W. Shao, H. Ye, “Arbitrary-oriented scene text detection via rotation proposals,” IEEE Transactions on Multimedia 20(11), 3111–3122 (2018).
[17]
M. Liao, Z. Zhu, B. Shi, “Rotation-sensitive regression for oriented scene text detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 5909–5918 (2018).
[18]
Z. Tian, W. Huang, T. He, “Detecting text in natural image with connectionist text proposal network,” in European conference on computer vision, 56–72, Springer (2016).
[19]
S. Long, J. Ruan,W. Zhang, “Textsnake: A flexible representation for detecting text of arbitrary shapes,” in Proceedings of the European conference on computer vision (ECCV), 20–36 (2018).
[20]
W.Wang, E. Xie, X. Li, “Shape robust text detection with progressive scale expansion network,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 9336–9345 (2019).
[21]
Y. Baek, B. Lee, D. Han, “Character region awareness for text detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 9365–9374 (2019).
[22]
T. N. Kipf and M.Welling, “Semi-supervised classification with graph convolutional networks,” arXiv preprint arXiv:1609.02907 (2016).
[23]
Y. Liu, H. Chen, C. Shen, “Abcnet: Real-time scene text spotting with adaptive bezier-curve network,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 9809–9818 (2020).
[24]
Y. Zhu, J. Chen, L. Liang, “Fourier contour embedding for arbitrary-shaped text detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 3123–3131 (2021).
[25]
T.-Y. Lin, P. Dollár, R. Girshick, “Feature pyramid networks for object detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2117–2125 (2017).
[26]
O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical image computing and computer-assisted intervention, 234–241, Springer (2015).
[27]
A. Gupta, A. Vedaldi, and A. Zisserman, “Synthetic data for text localisation in natural images,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2315–2324 (2016).
[28]
L. Yuliang, J. Lianwen, Z. Shuaitao, “Detecting curve text in the wild: New dataset and new solution,” arXiv preprint arXiv:1712.02170 (2017).
[29]
C.-K. Chng, C. S. Chan, and C.-L. Liu, “Total-text: toward orientation robustness in scene text detection,” International Journal on Document Analysis and Recognition (IJDAR) 23(1), 31–52 (2020).
[30]
D. Karatzas, L. Gomez-Bigorda, A. Nicolaou, “Icdar 2015 competition on robust reading,” in 2015 13th International Conference on Document Analysis and Recognition (ICDAR), 1156–1160, IEEE (2015).
[31]
A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” Advances in neural information processing systems 25, 1097–1105 (2012).

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICMLSC '22: Proceedings of the 2022 6th International Conference on Machine Learning and Soft Computing
January 2022
185 pages
ISBN:9781450387477
DOI:10.1145/3523150
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 April 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. CNN
  2. Transformer
  3. arbitrary shape text detection

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

  • Major science and technology projects in Anhui Province

Conference

ICMLSC 2022

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 62
    Total Downloads
  • Downloads (Last 12 months)12
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Sep 2024

Other Metrics

Citations

Cited By

View all

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media