research-article

Detection and rectification of arbitrary shaped scene texts by using text keypoints and links

Authors:

Shijian Lu, and

Steven HoiAuthors Info & Claims

Volume 124, Issue C

https://doi.org/10.1016/j.patcog.2021.108494

Published: 01 April 2022 Publication History

Highlights

•

We propose a robust scene text detection and rectification technique that is capable of detecting and rectifying scene texts of arbitrary shapes almost simultaneously.

•

We formulate scene text detection and rectification as a text keypoint and link detection problem and proposes a mask-guided multi-task network that is capable of detecting text keypoints and keypoint links accurately.

•

We develop an efficient and end-to-end trainable system that achieves superior scene text detection and rectification performance as compared with the state-of-the-art.

Abstract

Detection and recognition of scene texts of arbitrary shapes remain a grand challenge due to the super-rich text shape variation in text line orientations, lengths, curvatures, etc. This paper presents a mask-guided multi-task network that detects and rectifies scene texts of arbitrary shapes reliably. Three types of keypoints are detected which specify the centre line and so the shape of text instances accurately. In addition, four types of keypoint links are detected of which the horizontal links associate the detected keypoints of each text instance and the vertical links predict a pair of landmark points (for each keypoint) along the upper and lower text boundary, respectively. Scene texts can be located and rectified by linking up the associated landmark points (giving localization polygon boxes) and transforming the polygon boxes via thin plate spline, respectively. Extensive experiments over several public datasets show that the use of text keypoints is tolerant to the variation in text orientations, lengths, and curvatures, and it achieves competitive scene text detection and rectification performance as compared with state-of-the-art methods.

References

[1]

X. Zhou, C. Yao, H. Wen, Y. Wang, S. Zhou, W. He, J. Liang, EAST: an efficient and accurate scene text detector, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 5551–5560.

[2]

X. Wang, Y. Jiang, Z. Luo, C.-L. Liu, H. Choi, S. Kim, Arbitrary shape scene text detection with adaptive text region representation, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019.

[3]

Y. Baek, B. Lee, D. Han, S. Yun, H. Lee, Character region awareness for text detection, The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.

[4]

J. Tang, Z. Yang, Y. Wang, Q. Zheng, Y. Xu, X. Bai, SegLink++: detecting dense and arbitrary-shaped scene text by instance-aware component grouping, Pattern Recognit. 96 (2019) 106954.

[5]

F. Zhan, S. Lu, ESIR: end-to-end scene text recognition via iterative image rectification, The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.

[6]

B. Shi, M. Yang, X. Wang, P. Lyu, C. Yao, X. Bai, ASTER: an attentional scene text recognizer with flexible rectification, IEEE Trans. Pattern Anal. Mach. Intell. (2018).

[7]

F.L. Bookstein, Principal warps: thin-plate splines and the decomposition of deformations, IEEE Trans. Pattern Anal. Mach. Intell. 11 (6) (1989) 567–585.

Digital Library

[8]

Y. Zhu, J. Du, TextMountain: accurate scene text detection via instance segmentation, Pattern Recognit. (2020) 107336.

[9]

W. Sihang, W. Jiapeng, M. Weihong, J. Lianwen, Precise detection of chinese characters in historical documents with deep reinforcement learning, Pattern Recognit. 107 (2020) 107503.

[10]

W. He, X.-Y. Zhang, F. Yin, Z. Luo, J.-M. Ogier, C.-L. Liu, Realtime multi-scale scene text detection with scale-based region proposal network, Pattern Recognit. 98 (2020) 107026.

[11]

Y. Liu, L. Jin, S. Zhang, C. Luo, S. Zhang, Curved scene text detection via transverse and longitudinal sequence connection, Pattern Recognit. 90 (2019) 337–345.

Digital Library

[12]

S. Ren, K. He, R. Girshick, J. Sun, Faster R-CNN: towards real-time object detection with region proposal networks, Advances in Neural Information Processing Systems, 2015, pp. 91–99.

[13]

M. Liao, Z. Zhu, B. Shi, G.-s. Xia, X. Bai, Rotation-sensitive regression for oriented scene text detection, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 5909–5918.

[14]

Z. Tian, W. Huang, T. He, P. He, Y. Qiao, Detecting text in natural image with connectionist text proposal network, European Conference on Computer Vision, Springer, 2016, pp. 56–72.

[15]

S. Long, J. Ruan, W. Zhang, X. He, W. Wu, C. Yao, TextSnake: a flexible representation for detecting text of arbitrary shapes, The European Conference on Computer Vision (ECCV), 2018.

[16]

C. Xue, S. Lu, F. Zhan, Accurate scene text detection through border semantics awareness and bootstrapping, The European Conference on Computer Vision (ECCV), 2018.

[17]

C. Xue, S. Lu, W. Zhang, MSR: multi-scale shape regression for scene text detection, Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI-19, International Joint Conferences on Artificial Intelligence Organization, 2019, pp. 989–995,.

[18]

Z. Liu, G. Lin, S. Yang, J. Feng, W. Lin, W. Ling Goh, Learning Markov clustering networks for scene text detection, The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.

[19]

Z. Liu, G. Lin, S. Yang, F. Liu, W. Lin, W.L. Goh, Towards robust curve text detection with conditional spatial expansion, The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.

[20]

D. Deng, H. Liu, X. Li, D. Cai, PixeLlink: detecting scene text via instance segmentation, Thirty-Second AAAI Conference on Artificial Intelligence, 2018.

[21]

Z. Tian, M. Shu, P. Lyu, R. Li, C. Zhou, X. Shen, J. Jia, Learning shape-aware embedding for scene text detection, The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.

[22]

S. Tian, S. Lu, C. Li, WeText: scene text detection under weak supervision, Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 1492–1500.

[23]

S. Lu, B.M. Chen, C.C. Ko, Perspective rectification of document images using fuzzy set and morphological operations, Image Vis. Comput. 23 (5) (2005) 541–553.

[24]

S. Lu, C.L. Tan, Document flattening through grid modeling and regularization, 18th International Conference on Pattern Recognition (ICPR’06), vol. 1, IEEE, 2006, pp. 971–974.

[25]

X. Yang, D. He, Z. Zhou, D. Kifer, C.L. Giles, Learning to read irregular text with attention mechanisms, IJCAI, vol. 1, 2017, p. 3.

[26]

C. Luo, L. Jin, Z. Sun, MORAN: a multi-object rectified attention network for scene text recognition, Pattern Recognit. 90 (2019) 109–118.

Digital Library

[27]

H. Zhang, Q. Yao, M. Yang, Y. Xu, X. Bai, AutoSTR: efficient backbone search for scene text recognition, arXiv e-prints (2020).

[28]

H. Law, J. Deng, CornerNet: detecting objects as paired keypoints, Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 734–750.

[29]

T.-Y. Lin, P. Goyal, R. Girshick, K. He, P. Dollár, Focal loss for dense object detection, Proceedings of the IEEE International Conference on Computer vision, 2017, pp. 2980–2988.

[30]

A. Gupta, A. Vedaldi, A. Zisserman, Synthetic data for text localisation in natural images, IEEE Conference on Computer Vision and Pattern Recognition, 2016.

[31]

L. Yuliang, J. Lianwen, Z. Shuaitao, Z. Sheng, Detecting curve text in the wild: New dataset and new solution, arXiv preprint arXiv:1712.02170(2017).

[32]

C.K. Ch’ng, C.S. Chan, Total-text: a comprehensive dataset for scene text detection and recognition, Document Analysis and Recognition (ICDAR), 2017 14th IAPR International Conference on, vol. 1, IEEE, 2017, pp. 935–942.

[33]

C.K. Chng, Y. Liu, Y. Sun, C.C. Ng, C. Luo, Z. Ni, C. Fang, S. Zhang, J. Han, E. Ding, et al., ICDAR2019 robust reading challenge on arbitrary-shaped text-RRC-ArT, 2019 International Conference on Document Analysis and Recognition (ICDAR), IEEE, 2019, pp. 1571–1576.

[34]

C. Yao, X. Bai, W. Liu, Y. Ma, Z. Tu, Detecting texts of arbitrary orientations in natural images, Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, IEEE, 2012, pp. 1083–1090.

[35]

W. Wang, E. Xie, X. Li, W. Hou, T. Lu, G. Yu, S. Shao, Shape robust text detection with progressive scale expansion network, The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.

[36]

Y. Xu, Y. Wang, W. Zhou, Y. Wang, Z. Yang, X. Bai, TextField: learning a deep direction field for irregular scene text detection, IEEE Trans. Image Process. (2019).

[37]

C. Zhang, B. Liang, Z. Huang, M. En, J. Han, E. Ding, X. Ding, Look more than once: an accurate detector for text of arbitrary shapes, The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.

[38]

M. Liao, Z. Wan, C. Yao, K. Chen, X. Bai, Real-time scene text detection with differentiable binarization, Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, 2020, pp. 11474–11481.

[39]

W. Wang, E. Xie, X. Song, Y. Zang, W. Wang, T. Lu, G. Yu, C. Shen, Efficient and accurate arbitrary-shaped text detection with pixel aggregation network, Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 8440–8449.

[40]

E. Xie, Y. Zang, S. Shao, G. Yu, C. Yao, G. Li, Scene text detection with supervised pyramid context network, Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, 2019, pp. 9038–9045.

[41]

H. Wang, P. Lu, H. Zhang, M. Yang, X. Bai, Y. Xu, M. He, Y. Wang, W. Liu, All you need is boundary: Toward arbitrary-shaped text spotting, Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, 2020, pp. 12160–12167.

[42]

F. Wang, L. Zhao, X. Li, X. Wang, D. Tao, Geometry-aware scene text detection with instance transformation network, The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.

[43]

P. Lyu, C. Yao, W. Wu, S. Yan, X. Bai, Multi-oriented scene text detection via corner localization and region segmentation, The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.

[44]

Q. Yang, M. Cheng, W. Zhou, Y. Chen, M. Qiu, W. Lin, IncepText: a new inception-text module with deformable PSROI pooling for multi-oriented scene text detection, Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI-18, International Joint Conferences on Artificial Intelligence Organization, 2018, pp. 1071–1077,.

[45]

C. Yao, X. Bai, W. Liu, A unified framework for multioriented text detection and recognition, IEEE Trans. Image Process. 23 (11) (2014) 4737–4749.

Cited By

Kaur JSingh W(2024)A systematic review of object detection from images using deep learningMultimedia Tools and Applications10.1007/s11042-023-15981-y83:4(12253-12338)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1007/s11042-023-15981-y
Zhou WSong W(2023)Real-Time Accurate Text Detection with Adaptive Double Pyramid NetworkNeural Processing Letters10.1007/s11063-022-11080-555:4(5055-5067)Online publication date: 1-Aug-2023
https://dl.acm.org/doi/10.1007/s11063-022-11080-5
Kaur JSingh W(2022)Tools, techniques, datasets and application areas for object detection in an image: a reviewMultimedia Tools and Applications10.1007/s11042-022-13153-y81:27(38297-38351)Online publication date: 1-Nov-2022
https://dl.acm.org/doi/10.1007/s11042-022-13153-y

Index Terms

Detection and rectification of arbitrary shaped scene texts by using text keypoints and links
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
  2. Machine learning

Index terms have been assigned to the content through auto-classification.

Recommendations

Rule-based perspective rectification for Chinese text in natural scene images
Abstract
Recognizing text with large perspective deformation is challenging, especially when the orientation of the text is arbitrary. A rule-based method is proposed in this paper to recover the fronto-parallel image for Chinese text in natural scenes. It ...
Read More
Could scene context be beneficial for scene text detection?

Scene text detection and scene segmentation are meaningful tasks in the computer vision field. Could the semantic scene segmentation assist scene text detection in any degree? For example, can we expect the probability of a region being text is low if ...
Read More
Con-text: text detection using background connectivity for fine-grained object classification
MM '13: Proceedings of the 21st ACM international conference on Multimedia

This paper focuses on fine-grained classification by detecting photographed text in images. We introduce a text detection method that does not try to detect all possible foreground text regions but instead aims to reconstruct the scene background to ...
Read More

Comments

Information & Contributors

Information

Published In

cover image Pattern Recognition

Pattern Recognition Volume 124, Issue C

Apr 2022

951 pages

ISSN:0031-3203

Issue’s Table of Contents

Copyright © 2021.

Publisher

Elsevier Science Inc.

United States

Publication History

Published: 01 April 2022

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Other Metrics

View Author Metrics

Citations

Cited By

Kaur JSingh W(2024)A systematic review of object detection from images using deep learningMultimedia Tools and Applications10.1007/s11042-023-15981-y83:4(12253-12338)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1007/s11042-023-15981-y
Zhou WSong W(2023)Real-Time Accurate Text Detection with Adaptive Double Pyramid NetworkNeural Processing Letters10.1007/s11063-022-11080-555:4(5055-5067)Online publication date: 1-Aug-2023
https://dl.acm.org/doi/10.1007/s11063-022-11080-5
Kaur JSingh W(2022)Tools, techniques, datasets and application areas for object detection in an image: a reviewMultimedia Tools and Applications10.1007/s11042-022-13153-y81:27(38297-38351)Online publication date: 1-Nov-2022
https://dl.acm.org/doi/10.1007/s11042-022-13153-y

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents