Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Detection and rectification of arbitrary shaped scene texts by using text keypoints and links

Published: 01 April 2022 Publication History
  • Get Citation Alerts
  • Highlights

    We propose a robust scene text detection and rectification technique that is capable of detecting and rectifying scene texts of arbitrary shapes almost simultaneously.
    We formulate scene text detection and rectification as a text keypoint and link detection problem and proposes a mask-guided multi-task network that is capable of detecting text keypoints and keypoint links accurately.
    We develop an efficient and end-to-end trainable system that achieves superior scene text detection and rectification performance as compared with the state-of-the-art.

    Abstract

    Detection and recognition of scene texts of arbitrary shapes remain a grand challenge due to the super-rich text shape variation in text line orientations, lengths, curvatures, etc. This paper presents a mask-guided multi-task network that detects and rectifies scene texts of arbitrary shapes reliably. Three types of keypoints are detected which specify the centre line and so the shape of text instances accurately. In addition, four types of keypoint links are detected of which the horizontal links associate the detected keypoints of each text instance and the vertical links predict a pair of landmark points (for each keypoint) along the upper and lower text boundary, respectively. Scene texts can be located and rectified by linking up the associated landmark points (giving localization polygon boxes) and transforming the polygon boxes via thin plate spline, respectively. Extensive experiments over several public datasets show that the use of text keypoints is tolerant to the variation in text orientations, lengths, and curvatures, and it achieves competitive scene text detection and rectification performance as compared with state-of-the-art methods.

    References

    [1]
    X. Zhou, C. Yao, H. Wen, Y. Wang, S. Zhou, W. He, J. Liang, EAST: an efficient and accurate scene text detector, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 5551–5560.
    [2]
    X. Wang, Y. Jiang, Z. Luo, C.-L. Liu, H. Choi, S. Kim, Arbitrary shape scene text detection with adaptive text region representation, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
    [3]
    Y. Baek, B. Lee, D. Han, S. Yun, H. Lee, Character region awareness for text detection, The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
    [4]
    J. Tang, Z. Yang, Y. Wang, Q. Zheng, Y. Xu, X. Bai, SegLink++: detecting dense and arbitrary-shaped scene text by instance-aware component grouping, Pattern Recognit. 96 (2019) 106954.
    [5]
    F. Zhan, S. Lu, ESIR: end-to-end scene text recognition via iterative image rectification, The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
    [6]
    B. Shi, M. Yang, X. Wang, P. Lyu, C. Yao, X. Bai, ASTER: an attentional scene text recognizer with flexible rectification, IEEE Trans. Pattern Anal. Mach. Intell. (2018).
    [7]
    F.L. Bookstein, Principal warps: thin-plate splines and the decomposition of deformations, IEEE Trans. Pattern Anal. Mach. Intell. 11 (6) (1989) 567–585.
    [8]
    Y. Zhu, J. Du, TextMountain: accurate scene text detection via instance segmentation, Pattern Recognit. (2020) 107336.
    [9]
    W. Sihang, W. Jiapeng, M. Weihong, J. Lianwen, Precise detection of chinese characters in historical documents with deep reinforcement learning, Pattern Recognit. 107 (2020) 107503.
    [10]
    W. He, X.-Y. Zhang, F. Yin, Z. Luo, J.-M. Ogier, C.-L. Liu, Realtime multi-scale scene text detection with scale-based region proposal network, Pattern Recognit. 98 (2020) 107026.
    [11]
    Y. Liu, L. Jin, S. Zhang, C. Luo, S. Zhang, Curved scene text detection via transverse and longitudinal sequence connection, Pattern Recognit. 90 (2019) 337–345.
    [12]
    S. Ren, K. He, R. Girshick, J. Sun, Faster R-CNN: towards real-time object detection with region proposal networks, Advances in Neural Information Processing Systems, 2015, pp. 91–99.
    [13]
    M. Liao, Z. Zhu, B. Shi, G.-s. Xia, X. Bai, Rotation-sensitive regression for oriented scene text detection, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 5909–5918.
    [14]
    Z. Tian, W. Huang, T. He, P. He, Y. Qiao, Detecting text in natural image with connectionist text proposal network, European Conference on Computer Vision, Springer, 2016, pp. 56–72.
    [15]
    S. Long, J. Ruan, W. Zhang, X. He, W. Wu, C. Yao, TextSnake: a flexible representation for detecting text of arbitrary shapes, The European Conference on Computer Vision (ECCV), 2018.
    [16]
    C. Xue, S. Lu, F. Zhan, Accurate scene text detection through border semantics awareness and bootstrapping, The European Conference on Computer Vision (ECCV), 2018.
    [17]
    C. Xue, S. Lu, W. Zhang, MSR: multi-scale shape regression for scene text detection, Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI-19, International Joint Conferences on Artificial Intelligence Organization, 2019, pp. 989–995,.
    [18]
    Z. Liu, G. Lin, S. Yang, J. Feng, W. Lin, W. Ling Goh, Learning Markov clustering networks for scene text detection, The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
    [19]
    Z. Liu, G. Lin, S. Yang, F. Liu, W. Lin, W.L. Goh, Towards robust curve text detection with conditional spatial expansion, The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
    [20]
    D. Deng, H. Liu, X. Li, D. Cai, PixeLlink: detecting scene text via instance segmentation, Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
    [21]
    Z. Tian, M. Shu, P. Lyu, R. Li, C. Zhou, X. Shen, J. Jia, Learning shape-aware embedding for scene text detection, The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
    [22]
    S. Tian, S. Lu, C. Li, WeText: scene text detection under weak supervision, Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 1492–1500.
    [23]
    S. Lu, B.M. Chen, C.C. Ko, Perspective rectification of document images using fuzzy set and morphological operations, Image Vis. Comput. 23 (5) (2005) 541–553.
    [24]
    S. Lu, C.L. Tan, Document flattening through grid modeling and regularization, 18th International Conference on Pattern Recognition (ICPR’06), vol. 1, IEEE, 2006, pp. 971–974.
    [25]
    X. Yang, D. He, Z. Zhou, D. Kifer, C.L. Giles, Learning to read irregular text with attention mechanisms, IJCAI, vol. 1, 2017, p. 3.
    [26]
    C. Luo, L. Jin, Z. Sun, MORAN: a multi-object rectified attention network for scene text recognition, Pattern Recognit. 90 (2019) 109–118.
    [27]
    H. Zhang, Q. Yao, M. Yang, Y. Xu, X. Bai, AutoSTR: efficient backbone search for scene text recognition, arXiv e-prints (2020).
    [28]
    H. Law, J. Deng, CornerNet: detecting objects as paired keypoints, Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 734–750.
    [29]
    T.-Y. Lin, P. Goyal, R. Girshick, K. He, P. Dollár, Focal loss for dense object detection, Proceedings of the IEEE International Conference on Computer vision, 2017, pp. 2980–2988.
    [30]
    A. Gupta, A. Vedaldi, A. Zisserman, Synthetic data for text localisation in natural images, IEEE Conference on Computer Vision and Pattern Recognition, 2016.
    [31]
    L. Yuliang, J. Lianwen, Z. Shuaitao, Z. Sheng, Detecting curve text in the wild: New dataset and new solution, arXiv preprint arXiv:1712.02170(2017).
    [32]
    C.K. Ch’ng, C.S. Chan, Total-text: a comprehensive dataset for scene text detection and recognition, Document Analysis and Recognition (ICDAR), 2017 14th IAPR International Conference on, vol. 1, IEEE, 2017, pp. 935–942.
    [33]
    C.K. Chng, Y. Liu, Y. Sun, C.C. Ng, C. Luo, Z. Ni, C. Fang, S. Zhang, J. Han, E. Ding, et al., ICDAR2019 robust reading challenge on arbitrary-shaped text-RRC-ArT, 2019 International Conference on Document Analysis and Recognition (ICDAR), IEEE, 2019, pp. 1571–1576.
    [34]
    C. Yao, X. Bai, W. Liu, Y. Ma, Z. Tu, Detecting texts of arbitrary orientations in natural images, Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, IEEE, 2012, pp. 1083–1090.
    [35]
    W. Wang, E. Xie, X. Li, W. Hou, T. Lu, G. Yu, S. Shao, Shape robust text detection with progressive scale expansion network, The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
    [36]
    Y. Xu, Y. Wang, W. Zhou, Y. Wang, Z. Yang, X. Bai, TextField: learning a deep direction field for irregular scene text detection, IEEE Trans. Image Process. (2019).
    [37]
    C. Zhang, B. Liang, Z. Huang, M. En, J. Han, E. Ding, X. Ding, Look more than once: an accurate detector for text of arbitrary shapes, The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
    [38]
    M. Liao, Z. Wan, C. Yao, K. Chen, X. Bai, Real-time scene text detection with differentiable binarization, Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, 2020, pp. 11474–11481.
    [39]
    W. Wang, E. Xie, X. Song, Y. Zang, W. Wang, T. Lu, G. Yu, C. Shen, Efficient and accurate arbitrary-shaped text detection with pixel aggregation network, Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 8440–8449.
    [40]
    E. Xie, Y. Zang, S. Shao, G. Yu, C. Yao, G. Li, Scene text detection with supervised pyramid context network, Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, 2019, pp. 9038–9045.
    [41]
    H. Wang, P. Lu, H. Zhang, M. Yang, X. Bai, Y. Xu, M. He, Y. Wang, W. Liu, All you need is boundary: Toward arbitrary-shaped text spotting, Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, 2020, pp. 12160–12167.
    [42]
    F. Wang, L. Zhao, X. Li, X. Wang, D. Tao, Geometry-aware scene text detection with instance transformation network, The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
    [43]
    P. Lyu, C. Yao, W. Wu, S. Yan, X. Bai, Multi-oriented scene text detection via corner localization and region segmentation, The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
    [44]
    Q. Yang, M. Cheng, W. Zhou, Y. Chen, M. Qiu, W. Lin, IncepText: a new inception-text module with deformable PSROI pooling for multi-oriented scene text detection, Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI-18, International Joint Conferences on Artificial Intelligence Organization, 2018, pp. 1071–1077,.
    [45]
    C. Yao, X. Bai, W. Liu, A unified framework for multioriented text detection and recognition, IEEE Trans. Image Process. 23 (11) (2014) 4737–4749.

    Cited By

    View all
    • (2024)A systematic review of object detection from images using deep learningMultimedia Tools and Applications10.1007/s11042-023-15981-y83:4(12253-12338)Online publication date: 1-Jan-2024
    • (2023)Real-Time Accurate Text Detection with Adaptive Double Pyramid NetworkNeural Processing Letters10.1007/s11063-022-11080-555:4(5055-5067)Online publication date: 1-Aug-2023
    • (2022)Tools, techniques, datasets and application areas for object detection in an image: a reviewMultimedia Tools and Applications10.1007/s11042-022-13153-y81:27(38297-38351)Online publication date: 1-Nov-2022

    Index Terms

    1. Detection and rectification of arbitrary shaped scene texts by using text keypoints and links
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image Pattern Recognition
        Pattern Recognition  Volume 124, Issue C
        Apr 2022
        951 pages

        Publisher

        Elsevier Science Inc.

        United States

        Publication History

        Published: 01 April 2022

        Author Tags

        1. Scene text detection
        2. Scene text recognition
        3. Deep learning
        4. Neural network

        Qualifiers

        • Research-article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)A systematic review of object detection from images using deep learningMultimedia Tools and Applications10.1007/s11042-023-15981-y83:4(12253-12338)Online publication date: 1-Jan-2024
        • (2023)Real-Time Accurate Text Detection with Adaptive Double Pyramid NetworkNeural Processing Letters10.1007/s11063-022-11080-555:4(5055-5067)Online publication date: 1-Aug-2023
        • (2022)Tools, techniques, datasets and application areas for object detection in an image: a reviewMultimedia Tools and Applications10.1007/s11042-022-13153-y81:27(38297-38351)Online publication date: 1-Nov-2022

        View Options

        View options

        Get Access

        Login options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media