Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Scene text detection with fully convolutional neural networks

Published: 01 July 2019 Publication History
  • Get Citation Alerts
  • Abstract

    Text detection in scene image has become a hot topic in computer vision and artificial intelligence research, due to its wide range of applications and challenges. Most state-of-the-art methods for text detection based on deep learning rely on text bounding box regression. These methods can not well handle the case that if the scene text is curved. In this paper, we propose a new framework for arbitrarily oriented text detection in natural images based on fully convolutional neural networks. The main idea is to represent a text instance by two forms: text center block and word stroke region. These two elements are detected by two fully convolutional networks, respectively. Final detections are produced by the word region surrounding box algorithm. The proposed method does not need to regress the extant bounding box of the text instance, mainly because the predicted text block region itself implicitly contains position and orientation information. Besides, our method can well handle text in different languages, arbitrary orientations, curved shape and various fonts. To validate the effectiveness of the proposed method, we perform experiments on three public datasets: MSRA-TD500, USTB-SV1K and ICDAR2013, and compare it with other state-of-the-art methods. Experiment results demonstrate that the proposed method achieves competitive results. Based on VGG-16, our method achieves an F-measure of 78.84% on MSRA-TD500, 59.34% on USTB-SV1K, and 88.21% on ICDAR2013.

    References

    [1]
    Bai X, Yang M, Lyu P, Xu Y (2017) Integrating scene text and visual appearance for fine-grained image classification with convolutional neural networks. arXiv:1704.04613
    [2]
    Busta M, Neumann L, Matas J (2015) Fastext: efficient unconstrained scene text detector. In: IEEE international conference on computer vision, pp 1206---1214
    [3]
    Cho H, Sung M, Jun B (2016) Canny text detector: fast and robust scene text localization algorithm. In: IEEE conference on computer vision and pattern recognition, pp 3566---3573
    [4]
    Deng D, Liu H, Li X, Cai D (2018) Pixellink: detecting scene text via instance segmentation. arXiv:1801.01315
    [5]
    Eom S, Huh JH (2018) Group signature with restrictive linkability: minimizing privacy exposure in ubiquitous environment. J Ambient Intell Humaniz Comput: 1---11
    [6]
    Epshtein B, Ofek E, Wexler Y (2010) Detecting text in natural scenes with stroke width transform. In: IEEE conference on computer vision and pattern recognition, pp 2963---2970
    [7]
    He D, Yang X, Liang C, Zhou Z, Ororbia AG, Kifer D, Giles CL (2017) Multi-scale fcn with cascaded instance aware segmentation for arbitrary oriented word spotting in the wild. In: IEEE conference on computer vision and pattern recognition, pp 3519---3528
    [8]
    He P, Huang W, He T, Zhu Q, Qiao Y, Li X (2017) Single shot text detector with regional attention. In: IEEE international conference on computer vision, pp 3047---3055
    [9]
    He T, Huang W, Qiao Y, Yao J (2016) Accurate text localization in natural image with cascaded convolutional text network. arXiv:1603.09423
    [10]
    Huang L, Yang Y, Deng Y, Yu Y (2015) Densebox: unifying landmark localization with end to end object detection. arXiv:1509.04874
    [11]
    Huang W, Lin Z, Yang J, Wang J (2013) Text localization in natural images using stroke feature transform and text covariance descriptors. In: IEEE international conference on computer vision, pp 1241---1248
    [12]
    Huang W, Qiao Y, Tang X (2014) Robust scene text detection with convolution neural network induced mser trees. In: European conference on computer vision, pp 497---511
    [13]
    Huh JH, Otgonchimeg S, Seo K (2016) Advanced metering infrastructure design and test bed experiment using intelligent agents: focusing on the plc network base technology for smart grid system. J Supercomput 72(5):1862---1877
    [14]
    Jaderberg M, Vedaldi A, Zisserman A (2014) Deep features for text spotting. In: European conference on computer vision, pp 512---528
    [15]
    Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. In: ACM international conference on multimedia, pp 675---678
    [16]
    Jiang Y, Zhu X, Wang X, Yang S, Li W, Wang H, Fu P, Luo Z (2017) R2cnn: rotational region cnn for orientation robust scene text detection. arXiv:1706.09579
    [17]
    Kang L, Li Y, Doermann D (2014) Orientation robust text line detection in natural images. In: IEEE conference on computer vision and pattern recognition, pp 4034---4041
    [18]
    Karaoglu S, Tao R, Gevers T, Smeulders AW (2017) Words matter: scene text for image classification and retrieval. IEEE Trans Multimedia 19(5):1063---1076
    [19]
    Khare V, Shivakumara P, Paramesran R, Blumenstein M (2017) Arbitrarily-oriented multi-lingual text detection in video. Multimed Tools Appl 76 (15):16625---16655
    [20]
    Liao M, Shi B, Bai X, Wang X, Liu W (2017) Textboxes: a fast text detector with a single deep neural network. In: AAAI conference on artificial intelligence, pp 4161---4167
    [21]
    Lin TY, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: IEEE international conference on computer vision, pp 2980---2988
    [22]
    Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: single shot multibox detector. In: European conference on computer vision, pp 21---37
    [23]
    Liu X, Liang D, Yan S, Chen D, Qiao Y, Yan J (2018) Fots: fast oriented text spotting with a unified network. arXiv:1801.01671
    [24]
    Liu Y, Jin L (2017) Deep matching prior network: toward tighter multi-oriented text detection. In: IEEE conference on computer vision and pattern recognition, vol 2, p 8
    [25]
    Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: IEEE conference on computer vision and pattern recognition, pp 3431---3440
    [26]
    Loukhaoukha K, Chouinard JY, Berdai A (2012) A secure image encryption algorithm based on rubik's cube principle. J Electr Comput Eng 2012:7
    [27]
    Matas J, Chum O, Urban M, Pajdla T (2004) Robust wide-baseline stereo from maximally stable extremal regions. Image Vis Comput 22(10):761---767
    [28]
    Nikoloudakis Y, Panagiotakis S, Markakis E, Pallis E, Mastorakis G, Mavromoustakis CX, Dobre C (2016) A fog-based emergency system for smart enhanced living environments. IEEE Cloud Computing 3(6): 54---62
    [29]
    Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91---99
    [30]
    Shi B, Bai X, Belongie S (2017) Detecting oriented text in natural images by linking segments. In: IEEE conference on computer vision and pattern recognition, pp 2550---2558
    [31]
    Shi B, Bai X, Yao C (2017) An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans Pattern Anal Mach Intell 39(11):2298---2304
    [32]
    Tian S, Pan Y, Huang C, Lu S, Yu K, Lim Tan C (2015) Text flow: a unified text detection system in natural scene images. In: IEEE international conference on computer vision, pp 4651---4659
    [33]
    Tian S, Pei WY, Zuo ZY, Yin X (2016) Scene text detection in video by learning locally and globally. In: International joint conference on artificial intelligence, pp 2647---2653
    [34]
    Wolf C, Jolion JM (2006) Object count/area graphs for the evaluation of object detection and segmentation algorithms. Int J Doc Anal Recognit 8(4):280---296
    [35]
    Xie S, Tu Z (2015) Holistically-nested edge detection. In: IEEE international conference on computer vision, pp 1395---1403
    [36]
    Yao C, Bai X, Liu W, Ma Y, Tu Z (2012) Detecting texts of arbitrary orientations in natural images. In: IEEE conference on computer vision and pattern recognition, pp 1083---1090
    [37]
    Yao C, Bai X, Sang N, Zhou X, Zhou S, Cao Z (2016) Scene text detection via holistic, multi-channel prediction. arXiv:1606.09002
    [38]
    Yi C, Tian Y (2011) Assistive text reading from complex background for blind persons. In: International workshop on camera-based document analysis and recognition, pp 15---28
    [39]
    Yin X, Pei WY, Zhang J, Hao HW (2015) Multi-orientation scene text detection with adaptive clustering. IEEE Trans Pattern Anal Mach Intell 37 (9):1930---1937
    [40]
    Yin X, Yin X, Huang K, Hao HW (2014) Robust text detection in natural scene images. IEEE Trans Pattern Anal Mach Intell 36(5):970---983
    [41]
    Zhang Z, Shen W, Yao C, Bai X (2015) Symmetry-based text line detection in natural scenes. In: IEEE conference on computer vision and pattern recognition, pp 2558---2567
    [42]
    Zhang Z, Zhang C, Shen W, Yao C, Liu W, Bai X (2016) Multi-oriented text detection with fully convolutional networks. In: IEEE conference on computer vision and pattern recognition, pp 4159---4167
    [43]
    Zhou X, Yao C, Wen H, Wang Y, Zhou S, He W, Liang J (2017) East: an efficient and accurate scene text detector. In: IEEE conference on computer vision and pattern recognition, pp 2642---2651

    Cited By

    View all
    • (2023)MCANet: multi-scale contextual feature fusion network based on Atrous convolutionMultimedia Tools and Applications10.1007/s11042-023-14800-882:22(34679-34702)Online publication date: 1-Sep-2023
    • (2023)Two-step text detection framework in natural scenes based on Pseudo-Zernike moments and CNNMultimedia Tools and Applications10.1007/s11042-022-13690-682:7(10595-10616)Online publication date: 1-Mar-2023
    • (2022)How to handle bi/tri-lingual Indic texts in a single image? A new dataset of natural scene and born-digital imagesMultimedia Tools and Applications10.1007/s11042-022-12596-781:11(15367-15394)Online publication date: 1-May-2022
    • Show More Cited By

    Index Terms

    1. Scene text detection with fully convolutional neural networks
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image Multimedia Tools and Applications
      Multimedia Tools and Applications  Volume 78, Issue 13
      July 2019
      1595 pages

      Publisher

      Kluwer Academic Publishers

      United States

      Publication History

      Published: 01 July 2019

      Author Tags

      1. Multi-orientation
      2. Scene text detection
      3. Semantic segmentation
      4. Text center block
      5. Word stroke region

      Qualifiers

      • Article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 14 Aug 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2023)MCANet: multi-scale contextual feature fusion network based on Atrous convolutionMultimedia Tools and Applications10.1007/s11042-023-14800-882:22(34679-34702)Online publication date: 1-Sep-2023
      • (2023)Two-step text detection framework in natural scenes based on Pseudo-Zernike moments and CNNMultimedia Tools and Applications10.1007/s11042-022-13690-682:7(10595-10616)Online publication date: 1-Mar-2023
      • (2022)How to handle bi/tri-lingual Indic texts in a single image? A new dataset of natural scene and born-digital imagesMultimedia Tools and Applications10.1007/s11042-022-12596-781:11(15367-15394)Online publication date: 1-May-2022
      • (2022)Arbitrary-shaped scene text detection with keypoint-based shape representationInternational Journal on Document Analysis and Recognition10.1007/s10032-022-00396-625:2(115-127)Online publication date: 1-Jun-2022
      • (2021)MFECN: Multi-level Feature Enhanced Cumulative Network for Scene Text DetectionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/344008717:3(1-22)Online publication date: 22-Jul-2021
      • (2020)A new augmentation-based method for text detection in night and day license plate imagesMultimedia Tools and Applications10.1007/s11042-020-09681-079:43-44(33303-33330)Online publication date: 1-Sep-2020
      • (2019)AB-LSTMACM Transactions on Multimedia Computing, Communications, and Applications10.1145/335672815:4(1-23)Online publication date: 16-Dec-2019

      View Options

      View options

      Get Access

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media