Article

Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes

Authors:

Xiang BaiAuthors Info & Claims

Computer Vision – ECCV 2018: 15th European Conference, Munich, Germany, September 8–14, 2018, Proceedings, Part XIV

Pages 71 - 88

https://doi.org/10.1007/978-3-030-01264-9_5

Published: 08 September 2018 Publication History

Abstract

Recently, models based on deep neural networks have dominated the fields of scene text detection and recognition. In this paper, we investigate the problem of scene text spotting, which aims at simultaneous text detection and recognition in natural images. An end-to-end trainable neural network model for scene text spotting is proposed. The proposed model, named as Mask TextSpotter, is inspired by the newly published work Mask R-CNN. Different from previous methods that also accomplish text spotting with end-to-end trainable deep neural networks, Mask TextSpotter takes advantage of simple and smooth end-to-end learning procedure, in which precise text detection and recognition are acquired via semantic segmentation. Moreover, it is superior to previous methods in handling text instances of irregular shapes, for example, curved text. Experiments on ICDAR2013, ICDAR2015 and Total-Text demonstrate that the proposed method achieves state-of-the-art results in both scene text detection and end-to-end text recognition tasks.

References

[1]

Bengio, Y., Louradour, J., Collobert, R., Weston, J.: Curriculum learning. In: Proceeding of ICML, pp. 41–48 (2009)

[2]

Bissacco, A., Cummins, M., Netzer, Y., Neven, H.: PhotoOCR: reading text in uncontrolled conditions. In: Proceedings of ICCV, pp. 785–792 (2013)

[3]

Busta, M., Neumann, L., Matas, J.: Deep TextSpotter: an end-to-end trainable scene text localization and recognition framework. In: Proceedings of ICCV, pp. 2223–2231 (2017)

[4]

Chng, C.K., Chan, C.S.: Total-Text: a comprehensive dataset for scene text detection and recognition. In: Proceedings of ICDAR, pp. 935–942 (2017)

[5]

Dai Jifeng, He Kaiming, Li Yi, Ren Shaoqing, and Sun Jian Instance-Sensitive Fully Convolutional Networks Computer Vision – ECCV 2016 2016 Cham Springer International Publishing 534-549

[6]

Dai, J., Li, Y., He, K., Sun, J.: R-FCN: object detection via region-based fully convolutional networks. In: Proceedings of NIPS, pp. 379–387 (2016)

[7]

Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: Proceedings of CVPR, pp. 2963–2970 (2010)

[8]

Girshick, R.B.: Fast R-CNN. In: Proceedings of ICCV, pp. 1440–1448 (2015)

[9]

Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of CVPR, pp. 580–587 (2014)

[10]

Gómez L and Karatzas D TextProposals: a text-specific selective search algorithm for word spotting in the wild Pattern Recognit. 2017 70 60-74

[11]

Graves, A., Fernández, S., Gomez, F.J., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of ICML, pp. 369–376 (2006)

[12]

Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localisation in natural images. In: Proceedings of CVPR, pp. 2315–2324 (2016)

[13]

He, K., Gkioxari, G., Dollár, P., Girshick, R.B.: Mask R-CNN. In: Proceedings of ICCV, pp. 2980–2988 (2017)

[14]

He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of CVPR, pp. 770–778 (2016)

[15]

He, P., Huang, W., He, T., Zhu, Q., Qiao, Y., Li, X.: Single shot text detector with regional attention. In: Proceedings of ICCV, pp. 3066–3074 (2017)

[16]

He, W., Zhang, X., Yin, F., Liu, C.: Deep direct regression for multi-oriented scene text detection. In: Proceedings ICCV, pp. 745–753 (2017)

[17]

Hochreiter S and Schmidhuber J Long short-term memory Neural Comput. 1997 9 8 1735-1780

[18]

Hu, H., Zhang, C., Luo, Y., Wang, Y., Han, J., Ding, E.: WordSup: exploiting word annotations for character based text detection. In: Proceedings of ICCV, pp. 4950–4959 (2017)

[19]

Huang Weilin, Qiao Yu, and Tang Xiaoou Robust Scene Text Detection with Convolution Neural Network Induced MSER Trees Computer Vision – ECCV 2014 2014 Cham Springer International Publishing 497-511

[20]

Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Synthetic data and artificial neural networks for natural scene text recognition. CoRR abs/1406.2227 (2014)

[21]

Jaderberg M, Simonyan K, Vedaldi A, and Zisserman A Reading text in the wild with convolutional neural networks Int. J. Comput. Vis. 2016 116 1 1-20

[22]

Jaderberg M, Vedaldi A, and Zisserman A Fleet D, Pajdla T, Schiele B, and Tuytelaars T Deep features for text spotting Computer Vision – ECCV 2014 2014 Cham Springer 512-528

[23]

Kang, L., Li, Y., Doermann, D.S.: Orientation robust text line detection in natural images. In: Proceedings of CVPR, pp. 4034–4041 (2014)

[24]

Karatzas, D., et al.: ICDAR 2015 competition on robust reading. In: Proceedings of ICDAR, pp. 1156–1160 (2015)

[25]

Karatzas, D., et al.: ICDAR 2013 robust reading competition. In: Proceedings of ICDAR, pp. 1484–1493 (2013)

[26]

Lee, C., Osindero, S.: Recursive recurrent nets with attention modeling for OCR in the wild. In: Proceedings of CVPR, pp. 2231–2239 (2016)

[27]

Li, H., Wang, P., Shen, C.: Towards end-to-end text spotting with convolutional recurrent neural networks. In: Proceedings of ICCV, pp. 5248–5256 (2017)

[28]

Li, Y., Qi, H., Dai, J., Ji, X., Wei, Y.: Fully convolutional instance-aware semantic segmentation. In: Proceedings of CVPR, pp. 4438–4446 (2017)

[29]

Liao M, Shi B, and Bai X TextBoxes++: a single-shot oriented scene text detector IEEE Trans. Image Process. 2018 27 8 3676-3690

[30]

Liao, M., Shi, B., Bai, X., Wang, X., Liu, W.: TextBoxes: a fast text detector with a single deep neural network. In: Proceedings of AAAI, pp. 4161–4167 (2017)

[31]

Liao, M., Zhu, Z., Shi, B., Xia, G.s., Bai, X.: Rotation-sensitive regression for oriented scene text detection. In: Proceedings of CVPR, pp. 5909–5918 (2018)

[32]

Lin, T., Dollár, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: Proceedings of CVPR, pp. 936–944 (2017)

[33]

Liu W et al. Leibe B, Matas J, Sebe N, Welling M, et al. SSD: single shot multibox detector Computer Vision – ECCV 2016 2016 Cham Springer 21-37

[34]

Liu, Y., Jin, L.: Deep matching prior network: toward tighter multi-oriented text detection. In: Proceedings of CVPR, pp. 3454–3461 (2017)

[35]

Lyu, P., Yao, C., Wu, W., Yan, S., Bai, X.: Multi-oriented scene text detection via corner localization and region segmentation. In: Proceedings of CVPR, pp. 7553–7563 (2018)

[36]

Neumann Lukas and Matas Jiri A Method for Text Localization and Recognition in Real-World Images Computer Vision – ACCV 2010 2011 Berlin, Heidelberg Springer Berlin Heidelberg 770-783

[37]

Neumann, L., Matas, J.: Real-time scene text localization and recognition. In: Proceedings of CVPR, pp. 3538–3545 (2012)

[38]

Neumann L and Matas J Real-time lexicon-free scene text localization and recognition IEEE Trans. Pattern Anal. Mach. Intell. 2016 38 9 1872-1885

[39]

Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of CVPR, pp. 779–788 (2016)

[40]

Ren S, He K, Girshick RB, and Sun J Faster R-CNN: towards real-time object detection with region proposal networks IEEE Trans. Pattern Anal. Mach. Intell. 2017 39 6 1137-1149

[41]

Risnumawan A, Shivakumara P, Chan CS, and Tan CL A robust arbitrary text detection system for natural scene images Expert Syst. Appl. 2014 41 18 8027-8048

[42]

Shelhamer E, Long J, and Darrell T Fully convolutional networks for semantic segmentation IEEE Trans. Pattern Anal. Mach. Intell. 2017 39 4 640-651

[43]

Shi, B., Bai, X., Belongie, S.J.: Detecting oriented text in natural images by linking segments. In: Proceedings of CVPR, pp. 3482–3490 (2017)

[44]

Shi B, Bai X, and Yao C An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition IEEE Trans. Pattern Anal. Mach. Intell. 2017 39 11 2298-2304

[45]

Shi, B., Wang, X., Lyu, P., Yao, C., Bai, X.: Robust scene text recognition with automatic rectification. In: Proceedings of CVPR, pp. 4168–4176 (2016)

[46]

Shi, B., Yang, M., Wang, X., Lyu, P., Yao, C., Bai, X.: ASTER: an attentional scene text recognizer with flexible rectification. IEEE Trans. Pattern Anal. Mach. Intell. (2018)

[47]

Tian, S., Pan, Y., Huang, C., Lu, S., Yu, K., Tan, C.L.: Text flow: a unified text detection system in natural scene images. In: Proceedings of ICCV, pp. 4651–4659 (2015)

[48]

Tian Z, Huang W, He T, He P, and Qiao Y Leibe B, Matas J, Sebe N, and Welling M Detecting text in natural image with connectionist text proposal network Computer Vision – ECCV 2016 2016 Cham Springer 56-72

[49]

Wang, K., Babenko, B., Belongie, S.: End-to-end scene text recognition. In: Proceedings of ICCV, pp. 1457–1464 (2011)

[50]

Yao, C., Bai, X., Liu, Wenyu and, M.Y., Tu, Z.: Detecting texts of arbitrary orientations in natural images. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1083–1090. IEEE (2012)

[51]

Yao C, Bai X, and Liu W A unified framework for multioriented text detection and recognition IEEE Trans. Image Process. 2014 23 11 4737-4749

[52]

Yao, C., Bai, X., Sang, N., Zhou, X., Zhou, S., Cao, Z.: Scene text detection via holistic, multi-channel prediction. CoRR abs/1606.09002 (2016)

[53]

Yao, C., Bai, X., Shi, B., Liu, W.: Strokelets: a learned multi-scale representation for scene text recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4042–4049 (2014)

[54]

Zhang, Z., Shen, W., Yao, C., Bai, X.: Symmetry-based text line detection in natural scenes. In: Proceedings of CVPR, pp. 2558–2567 (2015)

[55]

Zhang, Z., Zhang, C., Shen, W., Yao, C., Liu, W., Bai, X.: Multi-oriented text detection with fully convolutional networks. In: Proceeding of CVPR, pp. 4159–4167 (2016)

[56]

Zhong, Z., Jin, L., Zhang, S., Feng, Z.: DeepText: a unified framework for text proposal generation and text detection in natural images. CoRR abs/1605.07314 (2016)

[57]

Zhou, X., Yao, C., Wen, H., Wang, Y., Zhou, S., He, W., Liang, J.: EAST: an efficient and accurate scene text detector. In: Proceedings of CVPR, pp. 2642–2651 (2017)

[58]

Zhu Y, Liao M, Yang M, and Liu W Cascaded segmentation-detection networks for text-based traffic sign detection IEEE Trans. Intell. Transport. Syst. 2018 19 1 209-219

[59]

Zhu Y, Yao C, and Bai X Scene text detection and recognition: recent advances and future trends Front. Comput. Sci. 2016 10 1 19-36

[60]

Zitnick CL and Dollár P Fleet D, Pajdla T, Schiele B, and Tuytelaars T Edge boxes: locating object proposals from edges Computer Vision – ECCV 2014 2014 Cham Springer 391-405

Cited By

Gao XPang YLiu YHan MYu JWang WChen Y(2024)Multimodal Visual-Semantic Representations Learning for Scene Text RecognitionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/364655120:7(1-18)Online publication date: 27-Mar-2024
https://dl.acm.org/doi/10.1145/3646551
Fei NJiang HLu HLong JDai YFan TCao ZLu Z(2024)VEMO: A Versatile Elastic Multi-modal Model for Search-Oriented Multi-task LearningAdvances in Information Retrieval10.1007/978-3-031-56027-9_4(56-72)Online publication date: 24-Mar-2024
https://dl.acm.org/doi/10.1007/978-3-031-56027-9_4
Wang ZWang YZhou ZYuan HJin CJin CHe LSong MWang R(2023)Exploring Anchor-Free Approach for Reading Chinese CharactersProceedings of the 1st International Workshop on Multimedia Content Generation and Evaluation: New Methods and Practice10.1145/3607541.3616813(23-28)Online publication date: 29-Oct-2023
https://dl.acm.org/doi/10.1145/3607541.3616813
Show More Cited By

Index Terms

Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
  2. Machine learning
    1. Machine learning approaches
      1. Neural networks

Index terms have been assigned to the content through auto-classification.

Recommendations

Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes
Unifying text detection and text recognition in an end-to-end training fashion has become a new trend for reading text in the wild, as these two tasks are highly relevant and complementary. In this paper, we investigate the problem of scene text spotting, ...
Scene text spotting based on end-to-end

Aiming at the problem that the traditional OCR processing method ignores the inherent connection between the text detection task and the text recognition task, This paper propose a novel end-to-end text spotting framework. The framework includes three ...
GMDH-type neural network algorithm with a feedback loop for structural identification of RBF neural network

In this paper, a Group Method of Data Handling (GMDH)-type neural network algorithm with a feedback loop for structural identification of Radial Basis Function (RBF) neural network is proposed. In case of the GMDH-type neural network, the network ...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

Computer Vision – ECCV 2018: 15th European Conference, Munich, Germany, September 8–14, 2018, Proceedings, Part XIV

Sep 2018

844 pages

ISBN:978-3-030-01263-2

DOI:10.1007/978-3-030-01264-9

Editors:
Vittorio Ferrari
Google Research, Zurich, Switzerland
,
Martial Hebert
Carnegie Mellon University, Pittsburgh, PA, USA
,
Cristian Sminchisescu
Google Research, Zurich, Switzerland
,
Yair Weiss
Hebrew University of Jerusalem, Jerusalem, Israel

© Springer Nature Switzerland AG 2018.

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 08 September 2018

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

23
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 14 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Gao XPang YLiu YHan MYu JWang WChen Y(2024)Multimodal Visual-Semantic Representations Learning for Scene Text RecognitionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/364655120:7(1-18)Online publication date: 27-Mar-2024
https://dl.acm.org/doi/10.1145/3646551
Fei NJiang HLu HLong JDai YFan TCao ZLu Z(2024)VEMO: A Versatile Elastic Multi-modal Model for Search-Oriented Multi-task LearningAdvances in Information Retrieval10.1007/978-3-031-56027-9_4(56-72)Online publication date: 24-Mar-2024
https://dl.acm.org/doi/10.1007/978-3-031-56027-9_4
Wang ZWang YZhou ZYuan HJin CJin CHe LSong MWang R(2023)Exploring Anchor-Free Approach for Reading Chinese CharactersProceedings of the 1st International Workshop on Multimedia Content Generation and Evaluation: New Methods and Practice10.1145/3607541.3616813(23-28)Online publication date: 29-Oct-2023
https://dl.acm.org/doi/10.1145/3607541.3616813
Fu ZXie HFang SWang YXing MZhang Y(2023)Learning Pixel Affinity Pyramid for Arbitrary-Shaped Text DetectionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/352461719:1s(1-24)Online publication date: 3-Feb-2023
https://dl.acm.org/doi/10.1145/3524617
Zhao MYin FLiu C(2023)Scene Text Detection with Box Supervision and Level Set EvolutionPattern Recognition10.1007/978-3-031-47634-1_14(179-193)Online publication date: 5-Nov-2023
https://dl.acm.org/doi/10.1007/978-3-031-47634-1_14
Wei JZhan HTu XLu YPal U(2023)Scene Text Recognition with Image-Text Matching-Guided DictionaryDocument Analysis and Recognition - ICDAR 202310.1007/978-3-031-41731-3_4(54-69)Online publication date: 21-Aug-2023
https://dl.acm.org/doi/10.1007/978-3-031-41731-3_4
Gao CYang BWang HYang MYu WLiu YBai X(2023)TextREC: A Dataset for Referring Expression Comprehension with Reading ComprehensionDocument Analysis and Recognition - ICDAR 202310.1007/978-3-031-41682-8_25(402-420)Online publication date: 21-Aug-2023
https://dl.acm.org/doi/10.1007/978-3-031-41682-8_25
Ma CSun LWang JHuo Q(2023)DQ-DETR: Dynamic Queries Enhanced Detection Transformer for Arbitrary Shape Text DetectionDocument Analysis and Recognition - ICDAR 202310.1007/978-3-031-41679-8_14(243-260)Online publication date: 21-Aug-2023
https://dl.acm.org/doi/10.1007/978-3-031-41679-8_14
Pan ZJi ZLiu XBai JLiu C(2023)ViSA: Visual and Semantic Alignment for Robust Scene Text RecognitionDocument Analysis and Recognition - ICDAR 202310.1007/978-3-031-41679-8_13(223-242)Online publication date: 21-Aug-2023
https://dl.acm.org/doi/10.1007/978-3-031-41679-8_13
Li HLiu CWang JHuang MZhou WJin L(2023)DTDT: Highly Accurate Dense Text Line Detection in Historical Documents via Dynamic TransformerDocument Analysis and Recognition - ICDAR 202310.1007/978-3-031-41676-7_22(381-396)Online publication date: 21-Aug-2023
https://dl.acm.org/doi/10.1007/978-3-031-41676-7_22
Show More Cited By

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents