research-article

Transformer-Convolution Network for Arbitrary Shape Text Detection

Authors:

Dong YinAuthors Info & Claims

ICMLSC '22: Proceedings of the 2022 6th International Conference on Machine Learning and Soft Computing

Pages 120 - 126

https://doi.org/10.1145/3523150.3523169

Published: 13 April 2022 Publication History

Abstract

Arbitrary shape text detection is a prevalent topic in computer vision. Text instances in natural scenes may involve different sizes, different shapes, and complex background textures. Therefore, the ability to extract accurate text features becomes extremely significant for subsequent detection work. This paper proposes a novel Transformer-Convolution Network(TCNet) to participate in scene text detection task. TCNet contains two major modules named CNN module and Transformer module. CNN module is used to extract local features from the input images, while Transformer module establish connections among various local features. The two structures are complementary to each other, in particular, such combination between local features and relative locations contributes to a more precise detection, promoting the convergence and reducing the amount of parameters. Numerous experiments based on public datasets have demonstrated the excellent performance under the condition of sufficient data. specifically, in the case of small data, our method achieves the state-of-the-art performance whether in quadrilateral text or arbitrary shape text datasets.

Supplementary Material

p120-hu-supplement (p120-hu-supplement.pptx)

Presentation slides

Download
4.40 MB

References

[1]

S. Long, X. He, and C. Yao, “Scene text detection and recognition: The deep learning era,” International Journal of Computer Vision 129, 161–184 (2021).

Digital Library

[2]

Y. Zhu, C. Yao, and X. Bai, “Scene text detection and recognition: Recent advances and future trends,” Frontiers of Computer Science 10, 19–36 (2016).

Digital Library

[3]

X. Liu, G. Meng, and C. Pan, “Scene text detection and recognition with advances in deep learning: a survey,” International Journal on Document Analysis and Recognition (IJDAR) 22(2), 143–162 (2019).

Digital Library

[4]

F. Meng, D. Yin, R. Zhang, “Efficient framework with sequential classification for graphic vehicle identification number recognition,” Journal of Electronic Imaging 29(4), 043009 (2020).

[5]

K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556 (2014).

[6]

K. He, X. Zhang, S. Ren, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition,770–778 (2016).

[7]

G. Huang, Z. Liu, L. Van Der Maaten, “Densely connected convolutional networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 4700–4708 (2017).

[8]

A. G. Howard, M. Zhu, B. Chen, “Mobilenets: Efficient convolutional neural networks for mobile vision applications,” arXiv preprint arXiv:1704.04861 (2017).

[9]

A. Vaswani, N. Shazeer, N. Parmar, “Attention is all you need,” in Advances in neural information processing systems, 5998–6008 (2017).

[10]

10 A. Dosovitskiy, L. Beyer, A. Kolesnikov, “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929 (2020).

[11]

Z. Liu, Y. Lin, Y. Cao, “Swin transformer: Hierarchical vision transformer using shifted windows,” arXiv preprint arXiv:2103.14030 (2021).

[12]

S.-X. Zhang, X. Zhu, J.-B. Hou, “Deep relational reasoning graph network for arbitrary shape text detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 9699–9708 (2020).

[13]

S. Ren, K. He, R. Girshick, “Faster r-cnn: Towards real-time object detection with region proposal networks,” Advances in neural information processing systems 28, 91–99 (2015).

[14]

M. Liao, B. Shi, X. Bai, “Textboxes: A fast text detector with a single deep neural network,” in Thirty-first AAAI conference on artificial intelligence, (2017).

[15]

X. Zhou, C. Yao, H. Wen, “East: an efficient and accurate scene text detector,” in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, 5551–5560 (2017).

[16]

J. Ma, W. Shao, H. Ye, “Arbitrary-oriented scene text detection via rotation proposals,” IEEE Transactions on Multimedia 20(11), 3111–3122 (2018).

Digital Library

[17]

M. Liao, Z. Zhu, B. Shi, “Rotation-sensitive regression for oriented scene text detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 5909–5918 (2018).

[18]

Z. Tian, W. Huang, T. He, “Detecting text in natural image with connectionist text proposal network,” in European conference on computer vision, 56–72, Springer (2016).

[19]

S. Long, J. Ruan,W. Zhang, “Textsnake: A flexible representation for detecting text of arbitrary shapes,” in Proceedings of the European conference on computer vision (ECCV), 20–36 (2018).

Digital Library

[20]

W.Wang, E. Xie, X. Li, “Shape robust text detection with progressive scale expansion network,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 9336–9345 (2019).

[21]

Y. Baek, B. Lee, D. Han, “Character region awareness for text detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 9365–9374 (2019).

[22]

T. N. Kipf and M.Welling, “Semi-supervised classification with graph convolutional networks,” arXiv preprint arXiv:1609.02907 (2016).

[23]

Y. Liu, H. Chen, C. Shen, “Abcnet: Real-time scene text spotting with adaptive bezier-curve network,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 9809–9818 (2020).

[24]

Y. Zhu, J. Chen, L. Liang, “Fourier contour embedding for arbitrary-shaped text detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 3123–3131 (2021).

[25]

T.-Y. Lin, P. Dollár, R. Girshick, “Feature pyramid networks for object detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2117–2125 (2017).

[26]

O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical image computing and computer-assisted intervention, 234–241, Springer (2015).

[27]

A. Gupta, A. Vedaldi, and A. Zisserman, “Synthetic data for text localisation in natural images,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2315–2324 (2016).

[28]

L. Yuliang, J. Lianwen, Z. Shuaitao, “Detecting curve text in the wild: New dataset and new solution,” arXiv preprint arXiv:1712.02170 (2017).

[29]

C.-K. Chng, C. S. Chan, and C.-L. Liu, “Total-text: toward orientation robustness in scene text detection,” International Journal on Document Analysis and Recognition (IJDAR) 23(1), 31–52 (2020).

Digital Library

[30]

D. Karatzas, L. Gomez-Bigorda, A. Nicolaou, “Icdar 2015 competition on robust reading,” in 2015 13th International Conference on Document Analysis and Recognition (ICDAR), 1156–1160, IEEE (2015).

Digital Library

[31]

A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” Advances in neural information processing systems 25, 1097–1105 (2012).

Cited By

Recommendations

DQ-DETR: Dynamic Queries Enhanced Detection Transformer for Arbitrary Shape Text Detection
Document Analysis and Recognition - ICDAR 2023
Abstract
We propose a new Transformer-based text detection model, named Dynamic Queries enhanced DEtection TRansformer (DQ-DETR), to detect arbitrary shape text instances from images with high localization accuracy. Unlike previous Transformer-based ...
CViT: A Convolution Vision Transformer for Video Abnormal Behavior Detection and Localization
Abstract
Video anomaly detection is a critical task because of the rare, irregular, and unbounded nature of abnormal events. Currently, most approaches only rely on CNN for such tasks, but due to spatial inductive bias, it can extract only local features ...
Progressive Scale Expansion Network with Octave Convolution for Arbitrary Shape Scene Text Detection
Pattern Recognition
Abstract
Scene text detection is a challenging problem due to the image cluttering and high variability of text shape. Many methods have been proposed for multi-oriented and arbitrary shape text detection, in which the storage and computation costs of deep ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICMLSC '22: Proceedings of the 2022 6th International Conference on Machine Learning and Soft Computing

January 2022

185 pages

ISBN:9781450387477

DOI:10.1145/3523150

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 April 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

Major science and technology projects in Anhui Province

Conference

ICMLSC 2022

ICMLSC 2022: 2022 The 6th International Conference on Machine Learning and Soft Computing

January 15 - 17, 2022

Haikou, China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
62
Total Downloads

Downloads (Last 12 months)12
Downloads (Last 6 weeks)0

Reflects downloads up to 01 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents