Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

DPGS: : Cross-cooperation guided dynamic points generation for scene text spotting

Published: 25 October 2024 Publication History

Abstract

End-to-end text spotting aims to combine scene text detection and recognition into a unified framework. Dealing with the relationship between the two sub-tasks plays a pivotal role in designing effective spotters. While polygon or segmentation-based methods eliminate heuristic post-processing, they still face challenges such as background noise and high computational burden. In this study, we introduce DPGS, a coarse-to-fine learning framework that lets Dynamic Points Generation for text Spotting. DPGS simultaneously learns character representations for both detection and recognition tasks. Specifically, for each text instance, we represent the character sequence as ordered points and model them with learnable point queries. This approach progressively selects appropriate key points covering character and leverages group attention to associate similar information from different positions, improving detection accuracy. After passing through a single decoder, the point queries encode text semantics and locations, facilitating decoding to central line, boundary, script, and confidence of text through simple prediction heads. Additionally, we introduce an adaptive cooperative criterion to combine more useful feature knowledge, enhancing training efficiency. Extensive experiments show the superiority of our DPGS when handling scene text detection and recognition tasks. Compared to their respective top-1 methods, DPGS has significantly improved the average recognition accuracy by 3.7%, 1.9%, and 0.7% on the Total-Text, ICDAR15, and CTW1500 datasets, respectively.

References

[1]
Zhang C., Tao Y., Du K., Ding W., Wang B., Liu J., Wang W., Character-level street view text spotting based on deep multisegmentation network for smarter autonomous driving, IEEE Trans. Artif. Intell. 3 (2) (2021) 297–308.
[2]
M. Huang, Y. Liu, Z. Peng, C. Liu, D. Lin, S. Zhu, N. Yuan, K. Ding, L. Jin, Swintextspotter: Scene text spotting via better synergy between text detection and text recognition, in: IEEE Conference on Computer Vision and Pattern Recognition, 2022, pp. 4593–4603.
[3]
X. Zhang, Y. Su, S. Tripathi, Z. Tu, Text spotting transformers, in: IEEE Conference on Computer Vision and Pattern Recognition, 2022, pp. 9519–9528.
[4]
M. Ye, J. Zhang, S. Zhao, J. Liu, B. Du, D. Tao, Dptext-detr: Towards better scene text detection with dynamic points in transformer, in: AAAI Conference on Artificial Intelligence, 2023, pp. 3241–3249.
[5]
Y. Liu, H. Chen, C. Shen, T. He, L. Jin, L. Wang, Abcnet: Real-time scene text spotting with adaptive bezier-curve network, in: IEEE Conference on Computer Vision and Pattern Recognition, 2020, pp. 9809–9818.
[6]
Liu Y., Shen C., Jin L., He T., Chen P., Liu C., Chen H., Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting, IEEE Trans. Pattern Anal. Mach. Intell. 44 (11) (2021) 8048–8064.
[7]
Huang M., Peng D., Li H., Peng Z., Liu C., Lin D., Liu Y., Bai X., Jin L., SwinTextSpotter v2: Towards better synergy for scene text spotting, 2024, arXiv preprint arXiv:2401.07641.
[8]
Zhu X., Su W., Lu L., Li B., Wang X., Dai J., Deformable detr: Deformable transformers for end-to-end object detection, 2020, arXiv preprint arXiv:2010.04159.
[9]
Jaderberg M., Simonyan K., Vedaldi A., Zisserman A., Reading text in the wild with convolutional neural networks, Int. J. Comput. Vis. 116 (2016) 1–20.
[10]
A. Gupta, A. Vedaldi, A. Zisserman, Synthetic data for text localisation in natural images, in: IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 2315–2324.
[11]
Shi B., Bai X., Yao C., An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition, IEEE Trans. Pattern Anal. Mach. Intell. 39 (11) (2016) 2298–2304.
[12]
H. Li, P. Wang, C. Shen, Towards end-to-end text spotting with convolutional recurrent neural networks, in: IEEE International Conference on Computer Vision, 2017, pp. 5238–5246.
[13]
W. Feng, W. He, F. Yin, X.-Y. Zhang, C.-L. Liu, Textdragon: An end-to-end framework for arbitrary shaped text spotting, in: IEEE International Conference on Computer Vision, 2019, pp. 9076–9085.
[14]
H. Wang, P. Lu, H. Zhang, M. Yang, X. Bai, Y. Xu, M. He, Y. Wang, W. Liu, All you need is boundary: Toward arbitrary-shaped text spotting, in: AAAI Conference on Artificial Intelligence, 2020, pp. 12160–12167.
[15]
D. Peng, X. Wang, Y. Liu, J. Zhang, M. Huang, S. Lai, J. Li, S. Zhu, D. Lin, C. Shen, et al., Spts: Single-point text spotting, in: ACM International Conference on Multimedia, 2022, pp. 4272–4281.
[16]
Liu Y., Zhang J., Peng D., Huang M., Wang X., Tang J., Huang C., Lin D., Shen C., Bai X., et al., Spts v2: single-point scene text spotting, IEEE Trans. Pattern Anal. Mach. Intell. 45 (12) (2023) 15665–15679.
[17]
P. Dai, S. Zhang, H. Zhang, X. Cao, Progressive contour regression for arbitrary-shape scene text detection, in: IEEE Conference on Computer Vision and Pattern Recognition, 2021, pp. 7393–7402.
[18]
S.-X. Zhang, X. Zhu, C. Yang, H. Wang, X.-C. Yin, Adaptive boundary proposal network for arbitrary shape text detection, in: IEEE International Conference on Computer Vision, 2021, pp. 1305–1314.
[19]
Q. Dong, Z. Tu, H. Liao, Y. Zhang, V. Mahadevan, S. Soatto, Visual relationship detection using part-and-sum transformers with composite queries, in: IEEE International Conference on Computer Vision, 2021, pp. 3550–3559.
[20]
Phillips G.M., Phillips G.M., Bernstein polynomials, Interpolat. Approx. Polyn. (2003) 247–290.
[21]
Kuhn H.W., The hungarian method for the assignment problem, Naval Res. Logist. Q. 2 (1–2) (1955) 83–97.
[22]
M. Ye, J. Zhang, S. Zhao, J. Liu, T. Liu, B. Du, D. Tao, Deepsolo: Let transformer decoder with explicit points solo for text spotting, in: IEEE Conference on Computer Vision and Pattern Recognition, 2023, pp. 19348–19357.
[23]
T.-Y. Lin, P. Goyal, R. Girshick, K. He, P. Dollár, Focal loss for dense object detection, in: IEEE International Conference on Computer Vision, 2017, pp. 2980–2988.
[24]
Ch’ng C.-K., Chan C.S., Liu C.-L., Total-text: toward orientation robustness in scene text detection, Int. J. Document Anal. Recognit. 23 (1) (2020) 31–52.
[25]
D. Karatzas, L. Gomez-Bigorda, A. Nicolaou, S. Ghosh, A. Bagdanov, M. Iwamura, J. Matas, L. Neumann, V.R. Chandrasekhar, S. Lu, et al., ICDAR 2015 competition on robust reading, in: International Conference on Document Analysis and Recognition, 2015, pp. 1156–1160.
[26]
Liu Y., Jin L., Zhang S., Luo C., Zhang S., Curved scene text detection via transverse and longitudinal sequence connection, Pattern Recognit. 90 (2019) 337–345.
[27]
Loshchilov I., Hutter F., Decoupled weight decay regularization, 2017, arXiv preprint arXiv:1711.05101.
[28]
Wang W., Xie E., Li X., Liu X., Liang D., Yang Z., Lu T., Shen C., Pan++: Towards efficient and accurate end-to-end spotting of arbitrarily-shaped text, IEEE Trans. Pattern Anal. Mach. Intell. 44 (9) (2021) 5349–5367.
[29]
Kim S., Shin S., Kim Y., Cho H.-C., Kil T., Surh J., Park S., Lee B., Baek Y., Deer: Detection-agnostic end-to-end recognizer for scene text spotting, 2022, arXiv preprint arXiv:2203.05122.
[30]
R. Ronen, S. Tsiper, O. Anschel, I. Lavi, A. Markovitz, R. Manmatha, Glass: Global to local attention for scene-text spotting, in: European Conference on Computer Vision, 2022, pp. 249–266.
[31]
Lu P., Wang H., Zhu S., Wang J., Bai X., Liu W., Boundary textspotter: Toward arbitrary-shaped scene text spotting, IEEE Trans. Image Process. 31 (2022) 6200–6212.
[32]
M. Fujitake, A3s: Adversarial learning of semantic representations for scene-text spotting, in: IEEE International Conference on Acoustics, Speech and Signal Processing, 2023, pp. 1–5.
[33]
Das A., Biswas S., Pal U., Lladós J., Diving into the depths of spotting text in multi-domain noisy scenes, 2023, arXiv preprint arXiv:2310.00558.
[34]
A. Das, S. Biswas, A. Banerjee, J. Lladós, U. Pal, S. Bhattacharya, Harnessing the power of multi-lingual datasets for pre-training: Towards enhancing text spotting performance, in: IEEE Winter Conference on Applications of Computer Vision, 2024, pp. 718–728.
[35]
Zhang S.-X., Yang C., Zhu X., Zhou H., Wang H., Yin X.-C., Inverse-like antagonistic scene text spotting via reading-order estimation and dynamic sampling, IEEE Trans. Image Process. 33 (2024) 825–839.
[36]
N. Nayef, F. Yin, I. Bizid, H. Choi, Y. Feng, D. Karatzas, Z. Luo, U. Pal, C. Rigaud, J. Chazalon, et al., Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt, in: International Conference on Document Analysis and Recognition, 2017, pp. 1454–1459.
[37]
D. Karatzas, F. Shafait, S. Uchida, M. Iwamura, L.G. i Bigorda, S.R. Mestre, J. Mas, D.F. Mota, J.A. Almazan, L.P. De Las Heras, ICDAR 2013 robust reading competition, in: International Conference on Document Analysis and Recognition, 2013, pp. 1484–1493.

Index Terms

  1. DPGS: Cross-cooperation guided dynamic points generation for scene text spotting
            Index terms have been assigned to the content through auto-classification.

            Recommendations

            Comments

            Information & Contributors

            Information

            Published In

            cover image Knowledge-Based Systems
            Knowledge-Based Systems  Volume 302, Issue C
            Oct 2024
            670 pages

            Publisher

            Elsevier Science Publishers B. V.

            Netherlands

            Publication History

            Published: 25 October 2024

            Author Tags

            1. Scene text detection and recognition
            2. Coarse-to-fine
            3. Cross-cooperative learning
            4. K-NN search

            Qualifiers

            • Research-article

            Contributors

            Other Metrics

            Bibliometrics & Citations

            Bibliometrics

            Article Metrics

            • 0
              Total Citations
            • 0
              Total Downloads
            • Downloads (Last 12 months)0
            • Downloads (Last 6 weeks)0
            Reflects downloads up to 26 Jan 2025

            Other Metrics

            Citations

            View Options

            View options

            Figures

            Tables

            Media

            Share

            Share

            Share this Publication link

            Share on social media