Feature Enhancement with Text-Specific Region Contrast for Scene Text Detection

Sun, Xurui; Lyu, Jiahao; Zhang, Yifei; Zeng, Gangyan; Fang, Bo; Zhou, Yu; Xie, Enze; Ma, Can

doi:10.1007/978-981-99-8540-1_1

Xurui Sun^15,16,
Jiahao Lyu^15,16,
Yifei Zhang^15,16,
Gangyan Zeng¹⁷,
Bo Fang^15,16,
Yu Zhou¹⁵,
Enze Xie^15,16 &
…
Can Ma¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14431))

Included in the following conference series:

Chinese Conference on Pattern Recognition and Computer Vision (PRCV)

518 Accesses

Abstract

As a fundamental step in most visual text-related tasks, scene text detection has been widely studied for a long time. However, due to the diversity in the foreground, such as aspect ratios, colors, shapes, etc., as well as the complexity of the background, scene text detection still faces many challenges. It is often difficult to obtain discriminative text-level features when dealing with overlapping text regions or ambiguous regions of adjacency, resulting in suboptimal detection performance. In this paper, we propose Text-specific Region Contrast (TRC) based on contrastive learning to enhance the features of text regions. Specifically, to formulate positive and negative sample pairs for contrast-based training, we divide regions in scene text images into three categories, i.e., text regions, backgrounds, and text-adjacent regions. Furthermore, we design a Text Multi-scale Strip Convolutional Attention module, called TextMSCA, to refine embedding features for precise contrast. We find that the learned features can focus on complete text regions and effectively tackle the ambiguity problem. Additionally, our method is lightweight and can be implemented in a plug-and-play manner while maintaining a high inference speed. Extensive experiments conducted on multiple benchmarks verify that the proposed method consistently improves the baseline with significant margins.

Supported by the Natural Science Foundation of China (Grant NO 62376266), and by the Key Research Program of Frontier Sciences, CAS (Grant NO ZDBS-LY-7024).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Combining Swin Transformer and Attention-Weighted Fusion for Scene Text Detection

Article Open access 17 February 2024

Enhancing Scene Text Detection via Fused Semantic Segmentation Network with Attention

A real-time and effective text detection method for multi-scale and fuzzy text

Article 09 February 2023

References

Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9365–9374 (2019)
Google Scholar
Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607. PMLR (2020)
Google Scholar
Ch’ng, C.K., Chan, C.S.: Total-Text: a comprehensive dataset for scene text detection and recognition. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 935–942. IEEE (2017)
Google Scholar
Deng, D., Liu, H., Li, X., Cai, D.: PixelLink: detecting scene text via instance segmentation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)
Google Scholar
Gómez, L., Mafla, A., Rusiñol, M., Karatzas, D.: Single shot scene text retrieval. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018. LNCS, vol. 11218, pp. 728–744. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01264-9_43
Chapter Google Scholar
Guo, M.H., Lu, C.Z., Hou, Q., Liu, Z., Cheng, M.M., Hu, S.M.: SegNeXt: rethinking convolutional attention design for semantic segmentation. arXiv preprint arXiv:2209.08575 (2022)
He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738 (2020)
Google Scholar
Hu, H., Cui, J., Wang, L.: Region-aware contrastive learning for semantic segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 16291–16301 (2021)
Google Scholar
Karatzas, D., et al.: ICDAR 2015 competition on robust reading. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 1156–1160. IEEE (2015)
Google Scholar
Khosla, P., et al.: Supervised contrastive learning. Adv. Neural. Inf. Process. Syst. 33, 18661–18673 (2020)
Google Scholar
Liao, M., Wan, Z., Yao, C., Chen, K., Bai, X.: Real-time scene text detection with differentiable binarization. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 11474–11481 (2020)
Google Scholar
Liu, Y., Chen, H., Shen, C., He, T., Jin, L., Wang, L.: ABCNet: real-time scene text spotting with Adaptive Bezier-Curve network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9809–9818 (2020)
Google Scholar
Liu, Z., Lin, G., Yang, S., Feng, J., Lin, W., Goh, W.L.: Learning Markov clustering networks for scene text detection. arXiv preprint arXiv:1805.08365 (2018)
Long, S., Ruan, J., Zhang, W., He, X., Wu, W., Yao, C.: TextSnake: a flexible representation for detecting text of arbitrary shapes. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11206, pp. 19–35. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01216-8_2
Chapter Google Scholar
Nayef, N., et al.: ICDAR 2017 robust reading challenge on multi-lingual scene text detection and script identification-RRC-MLT. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 1454–1459. IEEE (2017)
Google Scholar
Oord, A.V.D., Li, Y., Vinyals, O.: Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748 (2018)
Sheng, T., Chen, J., Lian, Z.: CentripetalText: an efficient text instance representation for scene text detection. Adv. Neural. Inf. Process. Syst. 34, 335–346 (2021)
Google Scholar
Shi, B., Yang, M., Wang, X., Lyu, P., Yao, C., Bai, X.: ASTER: an attentional scene text recognizer with flexible rectification. IEEE Trans. Pattern Anal. Mach. Intell. 41(9), 2035–2048 (2018)
Article Google Scholar
Tian, Y., Krishnan, D., Isola, P.: Contrastive multiview coding. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12356, pp. 776–794. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58621-8_45
Chapter Google Scholar
Tian, Z., et al.: Learning shape-aware embedding for scene text detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4234–4243 (2019)
Google Scholar
Wang, F., Chen, Y., Wu, F., Li, X.: TextRay: contour-based geometric modeling for arbitrary-shaped scene text detection. In: Proceedings of the 28th ACM International Conference on Multimedia, pp. 111–119 (2020)
Google Scholar
Wang, F., Xu, X., Chen, Y., Li, X.: Fuzzy semantics for arbitrary-shaped scene text detection. IEEE Trans. Image Process. 32, 1–12 (2022)
Article Google Scholar
Wang, W., Zhou, T., Yu, F., Dai, J., Konukoglu, E., Van Gool, L.: Exploring cross-image pixel contrast for semantic segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7303–7313 (2021)
Google Scholar
Wang, W., et al.: Shape robust text detection with progressive scale expansion network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9336–9345 (2019)
Google Scholar
Wang, W., et al.: Efficient and accurate arbitrary-shaped text detection with pixel aggregation network. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8440–8449 (2019)
Google Scholar
Xie, E., Zang, Y., Shao, S., Yu, G., Yao, C., Li, G.: Scene text detection with supervised pyramid context network. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 9038–9045 (2019)
Google Scholar
Xu, Y., Wang, Y., Zhou, W., Wang, Y., Yang, Z., Bai, X.: TextField: learning a deep direction field for irregular scene text detection. IEEE Trans. Image Process. 28(11), 5566–5579 (2019)
Article MathSciNet Google Scholar
Xue, C., Lu, S., Zhang, W.: MSR: multi-scale shape regression for scene text detection. arXiv preprint arXiv:1901.02596 (2019)
Yao, C., Bai, X., Liu, W., Ma, Y., Tu, Z.: Detecting texts of arbitrary orientations in natural images. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1083–1090. IEEE (2012)
Google Scholar
Yuliang, L., Lianwen, J., Shuaitao, Z., Sheng, Z.: Detecting curve text in the wild: new dataset and new solution. arXiv preprint arXiv:1712.02170 (2017)
Zhang, S., Liu, Y., Jin, L., Wei, Z., Shen, C.: OPMP: an omnidirectional pyramid mask proposal network for arbitrary-shape scene text detection. IEEE Trans. Multimedia 23, 454–467 (2020)
Article Google Scholar
Zhou, X., et al.: EAST: an efficient and accurate scene text detector. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5551–5560 (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China
Xurui Sun, Jiahao Lyu, Yifei Zhang, Bo Fang, Yu Zhou, Enze Xie & Can Ma
School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China
Xurui Sun, Jiahao Lyu, Yifei Zhang, Bo Fang & Enze Xie
School of Information and Communication Engineering, Communication University of China, Beijing, China
Gangyan Zeng

Authors

Xurui Sun
View author publications
You can also search for this author in PubMed Google Scholar
Jiahao Lyu
View author publications
You can also search for this author in PubMed Google Scholar
Yifei Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Gangyan Zeng
View author publications
You can also search for this author in PubMed Google Scholar
Bo Fang
View author publications
You can also search for this author in PubMed Google Scholar
Yu Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Enze Xie
View author publications
You can also search for this author in PubMed Google Scholar
Can Ma
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yu Zhou .

Editor information

Editors and Affiliations

Nanjing University of Information Science and Technology, Nanjing, China
Qingshan Liu
Xiamen University, Xiamen, China
Hanzi Wang
Beijing University of Posts and Telecommunications, Beijing, China
Zhanyu Ma
Sun Yat-sen University, Guangzhou, China
Weishi Zheng
Peking University, Beijing, China
Hongbin Zha
Chinese Academy of Sciences, Beijing, China
Xilin Chen
Chinese Academy of Sciences, Beijing, China
Liang Wang
Xiamen University, Xiamen, China
Rongrong Ji

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sun, X. et al. (2024). Feature Enhancement with Text-Specific Region Contrast for Scene Text Detection. In: Liu, Q., et al. Pattern Recognition and Computer Vision. PRCV 2023. Lecture Notes in Computer Science, vol 14431. Springer, Singapore. https://doi.org/10.1007/978-981-99-8540-1_1

Download citation

DOI: https://doi.org/10.1007/978-981-99-8540-1_1
Published: 25 December 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8539-5
Online ISBN: 978-981-99-8540-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Feature Enhancement with Text-Specific Region Contrast for Scene Text Detection

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Combining Swin Transformer and Attention-Weighted Fusion for Scene Text Detection

Enhancing Scene Text Detection via Fused Semantic Segmentation Network with Attention

A real-time and effective text detection method for multi-scale and fuzzy text

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Feature Enhancement with Text-Specific Region Contrast for Scene Text Detection

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Combining Swin Transformer and Attention-Weighted Fusion for Scene Text Detection

Enhancing Scene Text Detection via Fused Semantic Segmentation Network with Attention

A real-time and effective text detection method for multi-scale and fuzzy text

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation