Article

TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes

Authors:

Shangbang Long,

Cong YaoAuthors Info & Claims

Computer Vision – ECCV 2018: 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part II

Pages 19 - 35

https://doi.org/10.1007/978-3-030-01216-8_2

Published: 08 September 2018 Publication History

Abstract

Driven by deep neural networks and large scale datasets, scene text detection methods have progressed substantially over the past years, continuously refreshing the performance records on various standard benchmarks. However, limited by the representations (axis-aligned rectangles, rotated rectangles or quadrangles) adopted to describe text, existing methods may fall short when dealing with much more free-form text instances, such as curved text, which are actually very common in real-world scenarios. To tackle this problem, we propose a more flexible representation for scene text, termed as TextSnake, which is able to effectively represent text instances in horizontal, oriented and curved forms. In TextSnake, a text instance is described as a sequence of ordered, overlapping disks centered at symmetric axes, each of which is associated with potentially variable radius and orientation. Such geometry attributes are estimated via a Fully Convolutional Network (FCN) model. In experiments, the text detector based on TextSnake achieves state-of-the-art or comparable performance on Total-Text and SCUT-CTW1500, the two newly published benchmarks with special emphasis on curved text in natural images, as well as the widely-used datasets ICDAR 2015 and MSRA-TD500. Specifically, TextSnake outperforms the baseline on Total-Text by more than 40% in F-measure.

References

[1]

Abadi M et al. TensorFlow: a system for large-scale machine learning OSDI 2016 16 265-283

[2]

Deng, D., Liu, H., Li, X., Cai, D.: PixelLink: detecting scene text via instance segmentation. In: Proceedings of AAAI (2018)

[3]

Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: Proceedings of The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2963–2970. IEEE (2010)

[4]

Girshick, R.: Fast R-CNN. In: Proceedings of The IEEE International Conference on Computer Vision (ICCV), December 2015

[5]

Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localisation in natural images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2315–2324 (2016)

[6]

He, D., et al.: Multi-scale FCN with cascaded instance aware segmentation for arbitrary oriented word spotting in the wild. In: Proceedings of The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 474–483. IEEE (2017)

[7]

He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)

[8]

He, W., Zhang, X.Y., Yin, F., Liu, C.L.: Deep direct regression for multi-oriented scene text detection. In: Proceedings of The IEEE International Conference on Computer Vision (ICCV), October 2017

[9]

Hu, H., Zhang, C., Luo, Y., Wang, Y., Han, J., Ding, E.: WordSup: exploiting word annotations for character based text detection. In: Proceedings of The IEEE International Conference on Computer Vision (ICCV), October 2017

[10]

Huang, L., Yang, Y., Deng, Y., Yu, Y.: DenseBox: unifying landmark localization with end to end object detection. arXiv preprint arXiv:1509.04874 (2015)

[11]

Huang W, Qiao Y, and Tang X Fleet D, Pajdla T, Schiele B, and Tuytelaars T Robust scene text detection with convolution neural network induced MSER trees Computer Vision – ECCV 2014 2014 Cham Springer 497-511

[12]

Jaderberg M, Simonyan K, Vedaldi A, and Zisserman A Reading text in the wild with convolutional neural networks Int. J. Comput. Vis. 2016 116 1 1-20

[13]

Jaderberg M, Vedaldi A, and Zisserman A Fleet D, Pajdla T, Schiele B, and Tuytelaars T Deep features for text spotting Computer Vision – ECCV 2014 2014 Cham Springer 512-528

[14]

Karatzas, D., et al.: ICDAR 2015 competition on robust reading. In: 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 1156–1160. IEEE (2015)

[15]

Kheng Chng, C., Chan, C.S.: Total-text: a comprehensive dataset for scene text detection and recognition. In: 14th IAPR International Conference on Document Analysis and Recognition (ICDAR) (2017)

[16]

Kingma, D., Ba, J.: Adam: a method for stochastic optimization. In: Proceedings of ICLR (2015)

[17]

Liao, M., Shi, B., Bai, X., Wang, X., Liu, W.: TextBoxes: a fast text detector with a single deep neural network. In: Proceedings of AAAI, pp. 4161–4167 (2017)

[18]

Lin, T.Y., Dollar, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017

[19]

Liu W et al. Leibe B, Matas J, Sebe N, Welling M, et al. SSD: single shot MultiBox detector Computer Vision – ECCV 2016 2016 Cham Springer 21-37

[20]

Liu, Y., Jin, L.: Deep matching prior network: toward tighter multi-oriented text detection (2017)

[21]

Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3431–3440 (2015)

[22]

Lyu, P., Yao, C., Wu, W., Yan, S., Bai, X.: Multi-oriented scene text detection via corner localization and region segmentation. In: Proceedings of The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)

[23]

Ma, J., et al.: Arbitrary-oriented scene text detection via rotation proposals. arXiv preprint arXiv:1703.01086 (2017)

[24]

Neumann L and Matas J Kimmel R, Klette R, and Sugimoto A A method for text localization and recognition in real-world images Computer Vision – ACCV 2010 2011 Heidelberg Springer 770-783

[25]

Noh, H., Hong, S., Han, B.: Learning deconvolution network for semantic segmentation, pp. 1520–1528 (2015)

[26]

Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)

[27]

Ronneberger O, Fischer P, and Brox T Navab N, Hornegger J, Wells WM, and Frangi AF U-Net: convolutional networks for biomedical image segmentation Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015 2015 Cham Springer 234-241

[28]

Sheng, Z., Yuliang, L., Lianwen, J., Canjie, L.: Feature enhancement network: a refined scene text detector. In: Proceedings of AAAI (2018)

[29]

Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: Proceedings of The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017

[30]

Shi B, Bai X, and Yao C An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition IEEE Trans. Pattern Anal. Mach. Intell. 2017 39 11 2298-2304

[31]

Shi, B., Yang, M., Wang, X., Lyu, P., Yao, C., Bai, X.: ASTER: an attentional scene text recognizer with flexible rectification. IEEE Trans. Pattern Anal. Mach. Intell. (2018)

[32]

Shrivastava, A., Gupta, A., Girshick, R.: Training region-based object detectors with online hard example mining, pp. 761–769 (2016)

[33]

Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

[34]

Tian, S., Lu, S., Li, C.: WeText: scene text detection under weak supervision. In: Proceedings of The IEEE International Conference on Computer Vision (ICCV) (2017)

[35]

Wolf C and Jolion JM Object count/area graphs for the evaluation of object detection and segmentation algorithms Int. J. Doc. Anal. Recognit. (IJDAR) 2006 8 4 280-296

[36]

Wu, Y., Natarajan, P.: Self-organized text detection with minimal post-processing via border learning. In: Proceedings of The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5000–5009 (2017)

[37]

Yao C, Bai X, and Liu W A unified framework for multioriented text detection and recognition IEEE Trans. Image Process. 2014 23 11 4737-4749

[38]

Yao, C., Bai, X., Liu, W., Ma, Y., Tu, Z.: Detecting texts of arbitrary orientations in natural images. In: Proceedings of The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1083–1090. IEEE (2012)

[39]

Yao, C., Bai, X., Sang, N., Zhou, X., Zhou, S., Cao, Z.: Scene text detection via holistic, multi-channel prediction. arXiv preprint arXiv:1606.09002 (2016)

[40]

Yao, C., Bai, X., Shi, B., Liu, W.: Strokelets: a learned multi-scale representation for scene text recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4042–4049 (2014)

[41]

Ye Q and Doermann D Text detection and recognition in imagery: a survey IEEE Trans. Pattern Anal. Mach. Intell. 2015 37 7 1480-1500

[42]

Yin XC, Yin X, Huang K, and Hao HW Robust text detection in natural scene images IEEE Trans. Pattern Anal. Mach. Intell. 2014 36 5 970-983

[43]

Yuliang, L., Lianwen, J., Shuaitao, Z., Sheng, Z.: Detecting curve text in the wild: new dataset and new solution. arXiv preprint arXiv:1712.02170 (2017)

[44]

Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: Proceedings of The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2528–2535 (2010)

[45]

Zhang, Z., Shen, W., Yao, C., Bai, X.: Symmetry-based text line detection in natural scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2558–2567 (2015)

[46]

Zhang, Z., Zhang, C., Shen, W., Yao, C., Liu, W., Bai, X.: Multi-oriented text detection with fully convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4159–4167 (2016)

[47]

Zhou, X., et al.: EAST: an efficient and accurate scene text detector. In: Proceedings of The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017

[48]

Zhu Y, Yao C, and Bai X Scene text detection and recognition: recent advances and future trends Front. Comput. Sci. 2016 10 1 19-36

Cited By

Wen LTang XZhang DGurrin CKongkachandra RSchoeffmann KDang-Nguyen DRossetto LSatoh SZhou L(2024)TWIST: Text-only Weakly Supervised Scene Text Spotting Using Pseudo LabelsProceedings of the 2024 International Conference on Multimedia Retrieval10.1145/3652583.3658075(275-284)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3652583.3658075
Zhang XTian CGao X(2024)An efficient and universal polygon prediction method based on derivable analytic geometry for arbitrary-shaped text detectionThe Visual Computer: International Journal of Computer Graphics10.1007/s00371-023-03081-940:6(4273-4285)Online publication date: 1-Jun-2024
https://dl.acm.org/doi/10.1007/s00371-023-03081-9
Wei GFu JZhang Z(2023)Efficient text detection via Re-parameterization and Tversky coefficientEfficient and flexible text detectionProceedings of the 2023 7th International Conference on Electronic Information Technology and Computer Engineering10.1145/3650400.3650568(1000-1005)Online publication date: 20-Oct-2023
https://dl.acm.org/doi/10.1145/3650400.3650568
Show More Cited By

Index Terms

TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
  2. Machine learning
    1. Machine learning approaches
      1. Neural networks

Index terms have been assigned to the content through auto-classification.

Recommendations

Curved Scene Text Detection Based on Mask R-CNN
Image and Graphics
Abstract
Text detection in natural scenes has achieved good results in existing research methods. However, detecting the curved scene text is still a challenging task because of perspective distortion and variation of text scale. We proposed Mask-CSTD (...
Progressive Scale Expansion Network with Octave Convolution for Arbitrary Shape Scene Text Detection
Pattern Recognition
Abstract
Scene text detection is a challenging problem due to the image cluttering and high variability of text shape. Many methods have been proposed for multi-oriented and arbitrary shape text detection, in which the storage and computation costs of deep ...
TextPolar: Accurate Scene Text Detection in the Polar Coordinate
ICVISP 2020: Proceedings of the 2020 4th International Conference on Vision, Image and Signal Processing

Driven by deep learning and instance segmentation, scene text detection based on segmentation has achieved remarkable results during the past few years. However, most existing pixel-wised segmentation-based detectors may fail to separate two close text ...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

Computer Vision – ECCV 2018: 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part II

Sep 2018

778 pages

ISBN:978-3-030-01215-1

DOI:10.1007/978-3-030-01216-8

Editors:
Vittorio Ferrari
Google Research, Zurich, Switzerland
,
Martial Hebert
Carnegie Mellon University, Pittsburgh, PA, USA
,
Cristian Sminchisescu
Google Research, Zurich, Switzerland
,
Yair Weiss
Hebrew University of Jerusalem, Jerusalem, Israel

© Springer Nature Switzerland AG 2018.

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 08 September 2018

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

33
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 14 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Wen LTang XZhang DGurrin CKongkachandra RSchoeffmann KDang-Nguyen DRossetto LSatoh SZhou L(2024)TWIST: Text-only Weakly Supervised Scene Text Spotting Using Pseudo LabelsProceedings of the 2024 International Conference on Multimedia Retrieval10.1145/3652583.3658075(275-284)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3652583.3658075
Zhang XTian CGao X(2024)An efficient and universal polygon prediction method based on derivable analytic geometry for arbitrary-shaped text detectionThe Visual Computer: International Journal of Computer Graphics10.1007/s00371-023-03081-940:6(4273-4285)Online publication date: 1-Jun-2024
https://dl.acm.org/doi/10.1007/s00371-023-03081-9
Wei GFu JZhang Z(2023)Efficient text detection via Re-parameterization and Tversky coefficientEfficient and flexible text detectionProceedings of the 2023 7th International Conference on Electronic Information Technology and Computer Engineering10.1145/3650400.3650568(1000-1005)Online publication date: 20-Oct-2023
https://dl.acm.org/doi/10.1145/3650400.3650568
Pant DTalukder DSeth APant DSingh RDua BPandey RMaruthi SJohri MArora C(2023)Robust OCR Pipeline for Automated Digitization of Mother and Child Protection Cards in IndiaACM Journal on Computing and Sustainable Societies10.1145/36081141:1(1-24)Online publication date: 22-Sep-2023
https://dl.acm.org/doi/10.1145/3608114
Wang ZTian X(2023)Power Equipment Nameplate Text Detection Based on Improved Multiscale Feature Fusion NetworkProceedings of the 15th International Conference on Digital Image Processing10.1145/3604078.3604118(1-8)Online publication date: 19-May-2023
https://dl.acm.org/doi/10.1145/3604078.3604118
Wang KXie HWang YZhang DQu YGao ZZhang YEl Saddik AMei TCucchiara RBertini MTobon Vallejo DAtrey PHossain M(2023)Masked Text Modeling: A Self-Supervised Pre-training Method for Scene Text DetectionProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612370(2006-2015)Online publication date: 26-Oct-2023
https://dl.acm.org/doi/10.1145/3581783.3612370
Fu ZXie HFang SWang YXing MZhang Y(2023)Learning Pixel Affinity Pyramid for Arbitrary-Shaped Text DetectionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/352461719:1s(1-24)Online publication date: 3-Feb-2023
https://dl.acm.org/doi/10.1145/3524617
Huang LLiao SYang W(2023)DC-PSENet: a novel scene text detection method integrating double ResNet-based and changed channels recursive feature pyramidThe Visual Computer: International Journal of Computer Graphics10.1007/s00371-023-03093-540:6(4473-4491)Online publication date: 27-Sep-2023
https://dl.acm.org/doi/10.1007/s00371-023-03093-5
Sun XLyu JZhang YZeng GFang BZhou YXie EMa C(2023)Feature Enhancement with Text-Specific Region Contrast for Scene Text DetectionPattern Recognition and Computer Vision10.1007/978-981-99-8540-1_1(3-14)Online publication date: 13-Oct-2023
https://dl.acm.org/doi/10.1007/978-981-99-8540-1_1
Jiang WChen YCao YZhao Y(2023)CC-DBNet: A Scene Text Detector Combining Collaborative Learning and Cascaded Feature FusionAdvanced Intelligent Computing Technology and Applications10.1007/978-981-99-4742-3_46(555-566)Online publication date: 10-Aug-2023
https://dl.acm.org/doi/10.1007/978-981-99-4742-3_46
Show More Cited By

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents