Efficient and Accurate Text Detection Combining Differentiable Binarization with Semantic Segmentation

Liu, Yue; Shi, Ying; Lin, Chaojun; Hua, Jie; Huang, Ziqi

doi:10.1007/978-3-031-15934-3_52

Yue Liu¹²,
Ying Shi¹²,
Chaojun Lin¹²,
Jie Hua¹² &
…
Ziqi Huang¹²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13531))

Included in the following conference series:

International Conference on Artificial Neural Networks

1874 Accesses

Abstract

Recently, segmentation-based methods have quickly become the mainstream in scene text detection, owing to their precise description of arbitrary-shape texts. However, the reduced inference speed hinders the practical application of segmentation-based methods. In this paper, we propose an efficient and accurate arbitrary-shaped text detector named ViT-Bilateral DBNet, which improves the efficiency of feature processing approach to achieve a good trade-off between accuracy and real-time performance. Specifically, we first combine Differentiable Binarization (DB) with real-time semantic segmentation BiSeNet V2 which is more suitable to process features for segmentation-based methods. Then three improvements are proposed to optimize the initial integrated network. ViT-Bilateral Network can strengthen the feature extracting capability of neural networks. Attention-driven Aggregation Layer (AAL) can adaptively fuse the details and the semantics achieved by ViT-Bilateral Network. Meanwhile, the auxiliary loss is added to make the training more sufficient. Compared with original DBNet, our method not only gains 1.17% (on IC15) and 1.34% (on CTW 1500) improvements, but also runs 1.38 times and 1.34 times faster. Notably, our detector surpasses the previous best record and maintains a high inference speed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Adaptive Segmentation Network for Scene Text Detection

Enhancing Scene Text Detection via Fused Semantic Segmentation Network with Attention

CBNet: A Plug-and-Play Network for Segmentation-Based Scene Text Detection

Article 05 March 2024

References

Long, S., He, X., Yao, C.: Scene text detection and recognition: the deep learning era. Int. J. Comput. Vis. (IJCV) 129, 161–184 (2021). https://doi.org/10.1007/s11263-020-01369-0
Article Google Scholar
Bonechi, S., Andreini, P., Bianchini, M., Scarselli, F.: COCO_TS dataset: pixel–level annotations based on weak supervision for scene text segmentation. In: Tetko, I.V., Kůrková, V., Karpov, P., Theis, F. (eds.) ICANN 2019. LNCS, vol. 11729, pp. 238–250. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-30508-6_20
Chapter Google Scholar
Liao, M., Wan, Z., Yao, C., et al.: Real-time scene text detection with differentiable binarization. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 11474–11481 (2020)
Google Scholar
He, K., Zhang, X., Ren, S., et al.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)
Google Scholar
Lin, T.Y., Dollár, P., Girshick, R., et al.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2117–2125 (2017)
Google Scholar
Chen, Q., Wang, Y., Yang, T., et al.: You only look one-level feature. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021)
Google Scholar
Yu, C., Gao, C., Wang, J., et al.: BiSeNet V2: bilateral network with guided aggregation for real-time semantic segmentation. Int. J. Comput. Vis. (IJCV) 129, 3051–3068 (2021). https://doi.org/10.1007/s11263-021-01515-2
Article Google Scholar
Tian, Z., Huang, W., He, T., He, P., Qiao, Y.: Detecting text in natural image with connectionist text proposal network. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 56–72. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_4
Chapter Google Scholar
Liao, M., Shi, B., Bai, X., et al.: Textboxes: a fast text detector with a single deep neural network. In: Thirty-First AAAI Conference on Artificial Intelligence (2017)
Google Scholar
Zhou, X., Yao, C., Wen, H., et al.: East: an efficient and accurate scene text detector. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition (CVPR), pp. 5551–5560 (2017)
Google Scholar
Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2550–2558 (2017)
Google Scholar
Wang, W., Xie, E., Li, X., et al.: Shape robust text detection with progressive scale expansion network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9336–9345 (2019)
Google Scholar
Wang, W., Xie, E., Song, X., et al.: Efficient and accurate arbitrary-shaped text detection with pixel aggregation network. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (CVPR), pp. 8440–8449 (2019)
Google Scholar
Paszke, A., Chaurasia, A., Kim, S., et al.: ENet: a deep neural network architecture for real-time semantic segmentation. arXiv preprint arXiv:1606.02147 (2016)
Zhao, H., Qi, X., Shen, X., Shi, J., Jia, J.: ICNet for real-time semantic segmentation on high-resolution images. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11207, pp. 418–434. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01219-9_25
Chapter Google Scholar
Li, H., Xiong, P., Fan, H., et al.: DFANet: deep feature aggregation for real-time semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9522–9531 (2019)
Google Scholar
Yu, C., Wang, J., Peng, C., Gao, C., Yu, G., Sang, N.: BiSeNet: bilateral segmentation network for real-time semantic segmentation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11217, pp. 334–349. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01261-8_20
Chapter Google Scholar
Dosovitskiy, A., Beyer, L., Kolesnikov, A., et al.: An image is worth 16×16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Liu, Z., Lin, Y., Cao, Y, et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ECCV), pp. 10012–10022. IEEE (2021)
Google Scholar
Zhou, J., Wang, P., Wang, F., et al.: ELSA: enhanced local self-attention for vision transformer. arXiv preprint arXiv:2112.12786 (2021)
Liu, Z., Mao, H., Wu, C.Y., et al.: A ConvNet for the 2020s. arXiv preprint arXiv:2201.03545 (2022)
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7132–7141 (2018)
Google Scholar
Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: CBAM: convolutional block attention module. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 3–19. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_1
Chapter Google Scholar
Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localisation in natural images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2315–2324 (2016)
Google Scholar
Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., et al.: ICDAR 2015 competition on robust reading. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 1156–1160 IEEE (2015)
Google Scholar
Yuliang, L., Lianwen, J., Shuaitao, Z., et al.: Detecting curve text in the wild: new dataset and new solution. arXiv preprint arXiv:1712.02170 (2017)

Download references

Acknowledgments

The work is supported by National Natural Science Foundation of China (No. 52105528).

Author information

Authors and Affiliations

School of Automation, Wuhan University of Technology, Wuhan, China
Yue Liu, Ying Shi, Chaojun Lin, Jie Hua & Ziqi Huang

Authors

Yue Liu
View author publications
You can also search for this author in PubMed Google Scholar
Ying Shi
View author publications
You can also search for this author in PubMed Google Scholar
Chaojun Lin
View author publications
You can also search for this author in PubMed Google Scholar
Jie Hua
View author publications
You can also search for this author in PubMed Google Scholar
Ziqi Huang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yue Liu .

Editor information

Editors and Affiliations

University of the West of England, Bristol, UK
Elias Pimenidis
Lancaster University, Lancaster, UK
Plamen Angelov
Digital Innovation, Teeside University, Middlesbrough, UK
Chrisina Jayne
Democritus University of Thrace, Xanthi, Greece
Antonios Papaleonidas
The University of the West of England, Bristol, UK
Mehmet Aydin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, Y., Shi, Y., Lin, C., Hua, J., Huang, Z. (2022). Efficient and Accurate Text Detection Combining Differentiable Binarization with Semantic Segmentation. In: Pimenidis, E., Angelov, P., Jayne, C., Papaleonidas, A., Aydin, M. (eds) Artificial Neural Networks and Machine Learning – ICANN 2022. ICANN 2022. Lecture Notes in Computer Science, vol 13531. Springer, Cham. https://doi.org/10.1007/978-3-031-15934-3_52

Download citation

DOI: https://doi.org/10.1007/978-3-031-15934-3_52
Published: 15 September 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-15933-6
Online ISBN: 978-3-031-15934-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Efficient and Accurate Text Detection Combining Differentiable Binarization with Semantic Segmentation

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Adaptive Segmentation Network for Scene Text Detection

Enhancing Scene Text Detection via Fused Semantic Segmentation Network with Attention

CBNet: A Plug-and-Play Network for Segmentation-Based Scene Text Detection

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Efficient and Accurate Text Detection Combining Differentiable Binarization with Semantic Segmentation

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Adaptive Segmentation Network for Scene Text Detection

Enhancing Scene Text Detection via Fused Semantic Segmentation Network with Attention

CBNet: A Plug-and-Play Network for Segmentation-Based Scene Text Detection

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation