article

Scene text detection with fully convolutional neural networks

Authors:

Houqiang LiAuthors Info & Claims

Multimedia Tools and Applications, Volume 78, Issue 13

Pages 18205 - 18227

https://doi.org/10.1007/s11042-019-7177-4

Published: 01 July 2019 Publication History

Abstract

Text detection in scene image has become a hot topic in computer vision and artificial intelligence research, due to its wide range of applications and challenges. Most state-of-the-art methods for text detection based on deep learning rely on text bounding box regression. These methods can not well handle the case that if the scene text is curved. In this paper, we propose a new framework for arbitrarily oriented text detection in natural images based on fully convolutional neural networks. The main idea is to represent a text instance by two forms: text center block and word stroke region. These two elements are detected by two fully convolutional networks, respectively. Final detections are produced by the word region surrounding box algorithm. The proposed method does not need to regress the extant bounding box of the text instance, mainly because the predicted text block region itself implicitly contains position and orientation information. Besides, our method can well handle text in different languages, arbitrary orientations, curved shape and various fonts. To validate the effectiveness of the proposed method, we perform experiments on three public datasets: MSRA-TD500, USTB-SV1K and ICDAR2013, and compare it with other state-of-the-art methods. Experiment results demonstrate that the proposed method achieves competitive results. Based on VGG-16, our method achieves an F-measure of 78.84% on MSRA-TD500, 59.34% on USTB-SV1K, and 88.21% on ICDAR2013.

References

[1]

Bai X, Yang M, Lyu P, Xu Y (2017) Integrating scene text and visual appearance for fine-grained image classification with convolutional neural networks. arXiv:1704.04613

[2]

Busta M, Neumann L, Matas J (2015) Fastext: efficient unconstrained scene text detector. In: IEEE international conference on computer vision, pp 1206---1214

Digital Library

[3]

Cho H, Sung M, Jun B (2016) Canny text detector: fast and robust scene text localization algorithm. In: IEEE conference on computer vision and pattern recognition, pp 3566---3573

[4]

Deng D, Liu H, Li X, Cai D (2018) Pixellink: detecting scene text via instance segmentation. arXiv:1801.01315

[5]

Eom S, Huh JH (2018) Group signature with restrictive linkability: minimizing privacy exposure in ubiquitous environment. J Ambient Intell Humaniz Comput: 1---11

[6]

Epshtein B, Ofek E, Wexler Y (2010) Detecting text in natural scenes with stroke width transform. In: IEEE conference on computer vision and pattern recognition, pp 2963---2970

[7]

He D, Yang X, Liang C, Zhou Z, Ororbia AG, Kifer D, Giles CL (2017) Multi-scale fcn with cascaded instance aware segmentation for arbitrary oriented word spotting in the wild. In: IEEE conference on computer vision and pattern recognition, pp 3519---3528

[8]

He P, Huang W, He T, Zhu Q, Qiao Y, Li X (2017) Single shot text detector with regional attention. In: IEEE international conference on computer vision, pp 3047---3055

[9]

He T, Huang W, Qiao Y, Yao J (2016) Accurate text localization in natural image with cascaded convolutional text network. arXiv:1603.09423

[10]

Huang L, Yang Y, Deng Y, Yu Y (2015) Densebox: unifying landmark localization with end to end object detection. arXiv:1509.04874

[11]

Huang W, Lin Z, Yang J, Wang J (2013) Text localization in natural images using stroke feature transform and text covariance descriptors. In: IEEE international conference on computer vision, pp 1241---1248

Digital Library

[12]

Huang W, Qiao Y, Tang X (2014) Robust scene text detection with convolution neural network induced mser trees. In: European conference on computer vision, pp 497---511

[13]

Huh JH, Otgonchimeg S, Seo K (2016) Advanced metering infrastructure design and test bed experiment using intelligent agents: focusing on the plc network base technology for smart grid system. J Supercomput 72(5):1862---1877

Digital Library

[14]

Jaderberg M, Vedaldi A, Zisserman A (2014) Deep features for text spotting. In: European conference on computer vision, pp 512---528

[15]

Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. In: ACM international conference on multimedia, pp 675---678

Digital Library

[16]

Jiang Y, Zhu X, Wang X, Yang S, Li W, Wang H, Fu P, Luo Z (2017) R2cnn: rotational region cnn for orientation robust scene text detection. arXiv:1706.09579

[17]

Kang L, Li Y, Doermann D (2014) Orientation robust text line detection in natural images. In: IEEE conference on computer vision and pattern recognition, pp 4034---4041

Digital Library

[18]

Karaoglu S, Tao R, Gevers T, Smeulders AW (2017) Words matter: scene text for image classification and retrieval. IEEE Trans Multimedia 19(5):1063---1076

Digital Library

[19]

Khare V, Shivakumara P, Paramesran R, Blumenstein M (2017) Arbitrarily-oriented multi-lingual text detection in video. Multimed Tools Appl 76 (15):16625---16655

Digital Library

[20]

Liao M, Shi B, Bai X, Wang X, Liu W (2017) Textboxes: a fast text detector with a single deep neural network. In: AAAI conference on artificial intelligence, pp 4161---4167

Digital Library

[21]

Lin TY, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: IEEE international conference on computer vision, pp 2980---2988

[22]

Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: single shot multibox detector. In: European conference on computer vision, pp 21---37

[23]

Liu X, Liang D, Yan S, Chen D, Qiao Y, Yan J (2018) Fots: fast oriented text spotting with a unified network. arXiv:1801.01671

[24]

Liu Y, Jin L (2017) Deep matching prior network: toward tighter multi-oriented text detection. In: IEEE conference on computer vision and pattern recognition, vol 2, p 8

[25]

Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: IEEE conference on computer vision and pattern recognition, pp 3431---3440

[26]

Loukhaoukha K, Chouinard JY, Berdai A (2012) A secure image encryption algorithm based on rubik's cube principle. J Electr Comput Eng 2012:7

Digital Library

[27]

Matas J, Chum O, Urban M, Pajdla T (2004) Robust wide-baseline stereo from maximally stable extremal regions. Image Vis Comput 22(10):761---767

[28]

Nikoloudakis Y, Panagiotakis S, Markakis E, Pallis E, Mastorakis G, Mavromoustakis CX, Dobre C (2016) A fog-based emergency system for smart enhanced living environments. IEEE Cloud Computing 3(6): 54---62

[29]

Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91---99

Digital Library

[30]

Shi B, Bai X, Belongie S (2017) Detecting oriented text in natural images by linking segments. In: IEEE conference on computer vision and pattern recognition, pp 2550---2558

[31]

Shi B, Bai X, Yao C (2017) An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans Pattern Anal Mach Intell 39(11):2298---2304

Digital Library

[32]

Tian S, Pan Y, Huang C, Lu S, Yu K, Lim Tan C (2015) Text flow: a unified text detection system in natural scene images. In: IEEE international conference on computer vision, pp 4651---4659

Digital Library

[33]

Tian S, Pei WY, Zuo ZY, Yin X (2016) Scene text detection in video by learning locally and globally. In: International joint conference on artificial intelligence, pp 2647---2653

Digital Library

[34]

Wolf C, Jolion JM (2006) Object count/area graphs for the evaluation of object detection and segmentation algorithms. Int J Doc Anal Recognit 8(4):280---296

Digital Library

[35]

Xie S, Tu Z (2015) Holistically-nested edge detection. In: IEEE international conference on computer vision, pp 1395---1403

Digital Library

[36]

Yao C, Bai X, Liu W, Ma Y, Tu Z (2012) Detecting texts of arbitrary orientations in natural images. In: IEEE conference on computer vision and pattern recognition, pp 1083---1090

Digital Library

[37]

Yao C, Bai X, Sang N, Zhou X, Zhou S, Cao Z (2016) Scene text detection via holistic, multi-channel prediction. arXiv:1606.09002

[38]

Yi C, Tian Y (2011) Assistive text reading from complex background for blind persons. In: International workshop on camera-based document analysis and recognition, pp 15---28

Digital Library

[39]

Yin X, Pei WY, Zhang J, Hao HW (2015) Multi-orientation scene text detection with adaptive clustering. IEEE Trans Pattern Anal Mach Intell 37 (9):1930---1937

Digital Library

[40]

Yin X, Yin X, Huang K, Hao HW (2014) Robust text detection in natural scene images. IEEE Trans Pattern Anal Mach Intell 36(5):970---983

[41]

Zhang Z, Shen W, Yao C, Bai X (2015) Symmetry-based text line detection in natural scenes. In: IEEE conference on computer vision and pattern recognition, pp 2558---2567

[42]

Zhang Z, Zhang C, Shen W, Yao C, Liu W, Bai X (2016) Multi-oriented text detection with fully convolutional networks. In: IEEE conference on computer vision and pattern recognition, pp 4159---4167

[43]

Zhou X, Yao C, Wen H, Wang Y, Zhou S, He W, Liang J (2017) East: an efficient and accurate scene text detector. In: IEEE conference on computer vision and pattern recognition, pp 2642---2651

Cited By

Li KLiu Z(2023)MCANet: multi-scale contextual feature fusion network based on Atrous convolutionMultimedia Tools and Applications10.1007/s11042-023-14800-882:22(34679-34702)Online publication date: 1-Sep-2023
https://dl.acm.org/doi/10.1007/s11042-023-14800-8
Larbi G(2023)Two-step text detection framework in natural scenes based on Pseudo-Zernike moments and CNNMultimedia Tools and Applications10.1007/s11042-022-13690-682:7(10595-10616)Online publication date: 1-Mar-2023
https://dl.acm.org/doi/10.1007/s11042-022-13690-6
Chakraborty NMitra AChoudhury AMollah ABasu SSarkar R(2022)How to handle bi/tri-lingual Indic texts in a single image? A new dataset of natural scene and born-digital imagesMultimedia Tools and Applications10.1007/s11042-022-12596-781:11(15367-15394)Online publication date: 1-May-2022
https://dl.acm.org/doi/10.1007/s11042-022-12596-7
Show More Cited By

Index Terms

Scene text detection with fully convolutional neural networks
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision

Index terms have been assigned to the content through auto-classification.

Recommendations

A Novel Multi-scale Deep Neural Framework for Script Invariant Text Detection
Abstract
Text detection in the wild is an active research problem in computer vision. Localizing text in multi-script and arbitrary–oriented scene images in unconstrained environment is one of the challenging aspects in this context. In this paper, we ...
Curved Scene Text Detection Based on Mask R-CNN
Image and Graphics
Abstract
Text detection in natural scenes has achieved good results in existing research methods. However, detecting the curved scene text is still a challenging task because of perspective distortion and variation of text scale. We proposed Mask-CSTD (...
Could scene context be beneficial for scene text detection?

Scene text detection and scene segmentation are meaningful tasks in the computer vision field. Could the semantic scene segmentation assist scene text detection in any degree? For example, can we expect the probability of a region being text is low if ...

Comments

Information & Contributors

Information

Published In

cover image Multimedia Tools and Applications

Multimedia Tools and Applications Volume 78, Issue 13

July 2019

1595 pages

ISSN:1380-7501

Issue’s Table of Contents

Copyright © Copyright © 2019 Springer Science+Business Media, LLC, part of Springer Nature.

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 July 2019

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 14 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Li KLiu Z(2023)MCANet: multi-scale contextual feature fusion network based on Atrous convolutionMultimedia Tools and Applications10.1007/s11042-023-14800-882:22(34679-34702)Online publication date: 1-Sep-2023
https://dl.acm.org/doi/10.1007/s11042-023-14800-8
Larbi G(2023)Two-step text detection framework in natural scenes based on Pseudo-Zernike moments and CNNMultimedia Tools and Applications10.1007/s11042-022-13690-682:7(10595-10616)Online publication date: 1-Mar-2023
https://dl.acm.org/doi/10.1007/s11042-022-13690-6
Chakraborty NMitra AChoudhury AMollah ABasu SSarkar R(2022)How to handle bi/tri-lingual Indic texts in a single image? A new dataset of natural scene and born-digital imagesMultimedia Tools and Applications10.1007/s11042-022-12596-781:11(15367-15394)Online publication date: 1-May-2022
https://dl.acm.org/doi/10.1007/s11042-022-12596-7
Qin SChen L(2022)Arbitrary-shaped scene text detection with keypoint-based shape representationInternational Journal on Document Analysis and Recognition10.1007/s10032-022-00396-625:2(115-127)Online publication date: 1-Jun-2022
https://dl.acm.org/doi/10.1007/s10032-022-00396-6
Liu ZZhou WLi H(2021)MFECN: Multi-level Feature Enhanced Cumulative Network for Scene Text DetectionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/344008717:3(1-22)Online publication date: 22-Jul-2021
https://dl.acm.org/doi/10.1145/3440087
Chowdhury PShivakumara PPal ULu TBlumenstein M(2020)A new augmentation-based method for text detection in night and day license plate imagesMultimedia Tools and Applications10.1007/s11042-020-09681-079:43-44(33303-33330)Online publication date: 1-Sep-2020
https://dl.acm.org/doi/10.1007/s11042-020-09681-0
Liu ZZhou WLi H(2019)AB-LSTMACM Transactions on Multimedia Computing, Communications, and Applications10.1145/335672815:4(1-23)Online publication date: 16-Dec-2019
https://dl.acm.org/doi/10.1145/3356728

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents