research-article

DC-PSENet: a novel scene text detection method integrating double ResNet-based and changed channels recursive feature pyramid

Authors:

Shujiao Liao, and

Wenyuan YangAuthors Info & Claims

The Visual Computer, Volume 40, Issue 6

Pages 4473 - 4491

https://doi.org/10.1007/s00371-023-03093-5

Published: 27 September 2023 Publication History

Abstract

Due to the emergence and advancement of deep learning technologies, scene text detection is becoming more widespread in various fields. However, due to the complexity of distances, angles and backgrounds, the adjacent texts in images have the problem that the detection boxes are far away from the texts, i.e., a position is not accurate enough. In this paper, we propose a text detection method centered on double ResNet-based and changed channels recursive feature pyramid, which integrates ResNet50-Mish and Res2Net50-Mish, as well as using recursive feature pyramid with changed channels. Firstly, scene images are fed into ResNet50-Mish and Res2Net50-Mish of double ResNet-based, and results are passed through a weight-based addition step to generate the fused feature maps. Secondly, the processed feature maps of double ResNet-based are sent into changed channels recursive feature pyramid to obtain feature maps with enhanced feature information. Also, the relevant segmentation results are then obtained by concatenating and convoluting. Finally, the results are given to progressive scale expansion algorithm to output the location of texts in images. The proposed model is trained and tested on ICDAR15 and CTW1500 benchmark datasets. In terms of precision values, our method outperforms or is comparable to state-of-the-art methods. In particular, experimental results achieve 91.53% precision on ICDAR15 dataset and 84.89% precision on CTW-1500 dataset.

References

[1]

Liu Z, Zhou W, and Li H AB-LSTM: Attention-based bidirectional LSTM model for scene text detection ACM Trans. Multimedia Comput. Commun. Appl. 2019

Digital Library

[2]

Long S, He X, and Yao C Scene text detection and recognition: the deep learning era Int. J. Comput. Vis. 2021 129 1 161-184

Digital Library

[3]

Kang, J., Ibrayim, M., Hamdulla, A.: Overview of scene text detection and recognition. In: 2022 14th International Conference on Measuring Technology and Mechatronics Automation (ICMTMA), pp. 661–666 (2022).

[4]

Chaung, H.-H., Chen, D.-W., Lin, C.-H.: Multi-language text detection and recognition based on deep learning. In: 2021 IEEE International Conference on Consumer Electronics-Taiwan (ICCE-TW), pp. 1–2 (2021).

[5]

Tang Y, Zhu M, Chen Z, Wu C, Chen B, Li C, and Li L Seismic performance evaluation of recycled aggregate concrete-filled steel tubular columns with field strain detected via a novel mark-free vision method Structures 2022 37 426-441

[6]

Taşyürek M ODRP: a new approach for spatial street sign detection from EXIF using deep learning-based object detection, distance estimation, rotation and projection system Vis. Comput. 2023

Digital Library

[7]

Song S, Huang T, Zhu Q, and Hu H ODSPC: deep learning-based 3D object detection using semantic point cloud Vis. Comput. 2023

Digital Library

[8]

Rainarli E Suprapto, Wahyono: a decade: review of scene text detection methods Comput. Sci. Rev. 2021 42

Digital Library

[9]

Li, G.: CSNet-PGNet: algorithm for scene text detection and recognition. In: 2022 3rd International Conference on Computer Vision, Image and Deep Learning & International Conference on Computer Engineering and Applications (CVIDL & ICCEA), pp. 1217–1224 (2022).

[10]

Perepu PK Deep learning for detection of text polarity in natural scene images Neurocomputing 2021 431 1-6

[11]

Liu, B., Jin, J.: Text detection based on bidirectional feature fusion and SA attention mechanism. In: 2022 IEEE Asia-Pacific Conference on Image Processing, Electronics and Computers (IPEC), pp. 912–915 (2022).

[12]

Shinde, A., Patil, M.: Street view text detection methods: review paper. In: 2021 International Conference on Artificial Intelligence and Smart Systems (ICAIS), pp. 961–965 (2021).

[13]

Ye Q and Doermann D Text detection and recognition in imagery: a survey IEEE Trans. Pattern Anal. Mach. Intell. 2015 37 7 1480-1500

Digital Library

[14]

Zhu Y, Yao C, and Bai X Scene text detection and recognition: recent advances and future trends Front. Comput. Sci. 2016 10 1 19-36

Digital Library

[15]

Lee, J.-J., Lee, P.-H., Lee, S.-W., Yuille, A., Koch, C.: AdaBoost for text detection in natural scene. In: 2011 International Conference on Document Analysis and Recognition, pp. 429–434 (2011).

Digital Library

[16]

Ye Q, Huang Q, Gao W, and Zhao D Fast and robust text detection in images and video frames Image Vis. Comput. 2005 23 6 565-576

Digital Library

[17]

Raisi, Z., Naiel, M.A., Fieguth, P.W., Wardell, S., Zelek, J.S.: Text detection and recognition in the wild: a review (2020). CoRR arXiv:2006.04305

[18]

Ye, M., Zhang, J., Zhao, S., Liu, J., Du, B., Tao, D.: DPText-DETR: towards better scene text detection with dynamic points in transformer. In: Proceedings of the AAAI Conference on Artificial Intelligence (2023)

[19]

Ren, S., He, K., Girshick, R.B., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks (2015). CoRR arXiv: 1506.01497

[20]

Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, and Berg AC Leibe B, Matas J, Sebe N, and Welling M SSD: single shot MultiBox detector Computer Vision—ECCV 2016 2016 Cham Springer 21-37

[21]

Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv e-prints, pp. 1409–1556 (2014) arXiv:1409.1556 [cs.CV]

[22]

Liao, M., Shi, B., Bai, X., Wang, X., Liu, W.: TextBoxes: a fast text detector with a single deep neural network. CoRR abs/1611.06779 (2016) arXiv:1611.06779

[23]

Shelhamer E, Long J, and Darrell T Fully convolutional networks for semantic segmentation IEEE Trans. Pattern Anal. Mach. Intell. 2017 39 4 640-651

Digital Library

[24]

Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3482–3490 (2017).

[25]

Long S, Ruan J, Zhang W, He X, Wu W, and Yao C Ferrari V, Hebert M, Sminchisescu C, and Weiss Y TextSnake: a flexible representation for detecting text of arbitrary shapes Computer Vision—ECCV 2018 2018 Cham Springer 19-35

Digital Library

[26]

Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 936–944 (2017).

[27]

Liu H, Yuan M, Wang T, Ren P, and Yan D-M LIST: low illumination scene text detector with automatic feature enhancement Vis. Comput. 2022 38 9 3231-3242

[28]

Wang, W., Xie, E., Li, X., Hou, W., Lu, T., Yu, G., Shao, S.: Shape robust text detection with progressive scale expansion network. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9328–9337 (2019).

[29]

He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016).

[30]

Wang, W., Xie, E., Song, X., Zang, Y., Wang, W., Lu, T., Yu, G., Shen, C.: Efficient and accurate arbitrary-shaped text detection with pixel aggregation network. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 8439–8448 (2019).

[31]

Wang W, Xie E, Li X, Liu X, Liang D, Yang Z, Lu T, and Shen C PAN++: towards efficient and accurate end-to-end spotting of arbitrarily-shaped text IEEE Trans. Pattern Anal. Mach. Intell. 2022 44 9 5349-5367

[32]

Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3122–3130 (2021).

[33]

Wu Q, Luo W, Chai Z, and Guo G Scene text detection by adaptive feature selection with text scale-aware loss Appl. Intell. 2022 52 1 514-529

Digital Library

[34]

Wang X, Yi Y, Peng J, and Wang K Arbitrary-shaped scene text detection by predicting distance map Appl. Intell. 2022 52 12 14374-14386

Digital Library

[35]

Gao S-H, Cheng M-M, Zhao K, Zhang X-Y, Yang M-H, and Torr P Res2Net: a new multi-scale backbone architecture IEEE Trans. Pattern Anal. Mach. Intell. 2021 43 2 652-662

Digital Library

[36]

Qiao, S., Chen, L., Yuille, A.L.: DetectoRS: detecting objects with recursive feature pyramid and switchable atrous convolution. CoRR abs/2006.02334 (2020) arXiv:2006.02334

[37]

Liu, Y., Wang, Y., Wang, S., Liang, T., Zhao, Q., Tang, Z., Ling, H.: CBNet: a novel composite backbone network architecture for object detection. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 11653–11660 (2020).

[38]

Gabbasov, R., Paringer, R.: Influence of the receptive field size on accuracy and performance of a convolutional neural network. In: 2020 International Conference on Information Technology and Nanotechnology (ITNT), pp. 1–4 (2020).

[39]

Tang Y, Huang Z, Chen Z, Chen M, Zhou H, Zhang H, and Sun J Novel visual crack width measurement based on backbone double-scale features for improved detection automation Eng. Struct. 2023 274

[40]

Tang Y, Zhou H, Wang H, and Zhang Y Fruit detection and positioning technology for a Camellia oleifera C. Abel orchard based on improved YOLOv4-tiny model and binocular stereo vision Expert Syst. Appl. 2023 211

Digital Library

[41]

Ramachandran, P., Zoph, B., Le, Q.V.: Searching for activation functions. CoRR abs/1710.05941 (2017) arXiv:1710.05941

[42]

Chen, Y., Dai, X., Liu, M., Chen, D., Yuan, L., Liu, Z.: Dynamic ReLU. CoRR abs/2003.10027 (2020) arXiv:2003.10027

[43]

Ma, N., Zhang, X., Sun, J.: Activate or not: Learning customized activation. CoRR abs/2009.04759 (2020) arXiv:2009.04759

[44]

He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2015)

[45]

Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: Proceedings of the 27th International Conference on International Conference on Machine Learning. ICML’10, pp. 807–814. Omnipress, Madison (2010)

[46]

Misra, D.: Mish: a self regularized non-monotonic neural activation function. CoRR abs/1908.08681 (2019) arXiv:1908.08681

[47]

Shrivastava, A., Gupta, A., Girshick, R.: Training region-based object detectors with online hard example mining. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 761–769 (2016).

[48]

Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., Shafait, F., Uchida, S., Valveny, E.: Icdar 2015 competition on robust reading. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 1156–1160 (2015).

Digital Library

[49]

Liu, Y., Jin, L., Zhang, S., Zhang, S.: Detecting curve text in the wild: New dataset and new solution. CoRR abs/1712.02170 (2017) arXiv:1712.02170

[50]

Zhang, C., Liang, B., Huang, Z., En, M., Han, J., Ding, E., Ding, X.: Look more than once: An accurate detector for text of arbitrary shapes. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10544–10553 (2019).

[51]

Kim, K., Cheon, Y., Hong, S., Roh, B., Park, M.: PVANET: deep but lightweight neural networks for real-time object detection. CoRR abs/1608.08021 (2016) arXiv:1608.08021

[52]

Tian Z, Huang W, He T, He P, and Qiao Y Leibe B, Matas J, Sebe N, and Welling M Detecting text in natural image with connectionist text proposal network Computer Vision—ECCV 2016 2016 Cham Springer 56-72

[53]

Tang J, Yang Z, Wang Y, Zheng Q, Xu Y, and Bai X Seglink++: detecting dense and arbitrary-shaped scene text by instance-aware component grouping Pattern Recognit. 2019 96

Digital Library

[54]

Zhou, X., Yao, C., Wen, H., Wang, Y., Zhou, S., He, W., Liang, J.: EAST: an efficient and accurate scene text detector. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2642–2651 (2017).

[55]

Graves A and Schmidhuber J Framewise phoneme classification with bidirectional LSTM and other neural network architectures Neural Netw. 2005 18 5 602-610

Digital Library

[56]

Deng, D., Liu, H., Li, X., Cai, D.: PixelLink: Detecting scene text via instance segmentation. CoRR abs/1801.01315 (2018) arXiv:1801.01315

[57]

He, M., Liao, M., Yang, Z., Zhong, H., Tang, J., Cheng, W., Yao, C., Wang, Y., Bai, X.: MOST: a multi-oriented scene text detector with localization refinement. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8809–8818 (2021).

Recommendations

Scene text detection via decoupled feature pyramid networks
Abstract
Detecting arbitrary shape scene texts is challenging mainly due to the varied aspect ratios, curves, and scales. In this paper, we propose a novel arbitrary shape scene text detection method via Decoupled Feature Pyramid Networks (DFPN) and ...
Read More
3D Object Detection Based on Feature Pyramid Network
ICAIP '20: Proceedings of the 4th International Conference on Advances in Image Processing

3D object detection aims to study how to perceive environmental information effectively, classify and locate interested objects accurately. In order to solve the problem that the object is easy to be lost in complex environments (such as partial ...
Read More
Could scene context be beneficial for scene text detection?

Scene text detection and scene segmentation are meaningful tasks in the computer vision field. Could the semantic scene segmentation assist scene text detection in any degree? For example, can we expect the probability of a region being text is low if ...
Read More

Comments

Information & Contributors

Information

Published In

cover image The Visual Computer: International Journal of Computer Graphics

The Visual Computer: International Journal of Computer Graphics Volume 40, Issue 6

Jun 2024

697 pages

ISSN:0178-2789

Issue’s Table of Contents

© The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 27 September 2023

Accepted: 08 September 2023

Author Tags

Qualifiers

Research-article

Funding Sources

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Other Metrics

View Author Metrics

Citations

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents