Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

TextProposals

Published: 01 October 2017 Publication History

Abstract

We present a text specific object proposals algorithm.Our algorithm is able to reach impressive recall rates with a few thousand proposals in different standard datasets.Our method generates word proposals without an explicit character segmentation.The combination of our object proposals with existing whole-word recognizers shows competitive performance in end-to-end word spotting, and, in some benchmarks, outperforms previously published results. Motivated by the success of powerful while expensive techniques to recognize words in a holistic way(Goel etal., 2013; Almazn etal., 2014; Jaderberg etal., 2016) object proposals techniques emerge as an alternative to the traditional text detectors. In this paper we introduce a novel object proposals method that is specifically designed for text. We rely on a similarity based region grouping algorithm that generates a hierarchy of word hypotheses. Over the nodes of this hierarchy it is possible to apply a holistic word recognition method in an efficient way.Our experiments demonstrate that the presented method is superior in its ability of producing good quality word proposals when compared with class-independent algorithms. We show impressive recall rates with a few thousand proposals in different standard benchmarks, including focused or incidental text datasets, and multi-language scenarios. Moreover, the combination of our object proposals with existing whole-word recognizers (Almazn etal., 2014; Jaderberg etal., 2016) shows competitive performance in end-to-end word spotting, and, in some benchmarks, outperforms previously published results. Concretely, in the challenging ICDAR2015 Incidental Text dataset, we overcome in more than 10% F-score the best-performing method in the last ICDAR Robust Reading Competition (Karatzas, 2015). Source code of the complete end-to-end system is available athttps://github.com/lluisgomez/TextProposals.

References

[1]
V. Goel, A. Mishra, K. Alahari, C. Jawahar, Whole is greater than sum of parts: Recognizing scene text words, IEEE, 2013.
[2]
J. Almazn, A. Gordo, A. Forns, E. Valveny, Word spotting and recognition with embedded attributes, IEEE Trans. Pattern Anal. Mach. Intell., 36 (2014) 2552-2566.
[3]
M. Jaderberg, K. Simonyan, A. Vedaldi, A. Zisserman, Reading text in the wild with convolutional neural networks, Int. J. Comput. Vis., 116 (2016) 1-20.
[4]
D. Karatzas, ICDAR 2015 competition on robust reading, IEEE, 2015.
[5]
A. Coates, B. Carpenter, C. Case, S. Satheesh, B. Suresh, T. Wang, D.J. Wu, A.Y. Ng, Text detection and character recognition in scene images with unsupervised feature learning, IEEE, 2011.
[6]
T. Wang, D.J. Wu, A. Coates, A.Y. Ng, End-to-end text recognition with convolutional neural networks, IEEE, 2012.
[7]
M. Jaderberg, A. Vedaldi, A. Zisserman, Deep features for text spotting, Springer, 2014.
[8]
L. Neumann, J. Matas, Real-time scene text localization and recognition, IEEE, 2012.
[9]
L. Neumann, J. Matas, Scene text localization and recognition with oriented stroke detection, 2013.
[10]
R. Girshick, J. Donahue, T. Darrell, J. Malik, Rich feature hierarchies for accurate object detection and semantic segmentation, 2014.
[11]
J.R. Uijlings, K.E. van de Sande, T. Gevers, A.W. Smeulders, Selective search for object recognition, Int. J. Comput. Vis., 104 (2013) 154-171.
[12]
P. Dollr, R. Appel, S. Belongie, P. Perona, Fast feature pyramids for object detection, IEEE Trans. Pattern Anal. Mach. Intell., 36 (2014) 1532-1545.
[13]
C.L. Zitnick, P. Dollr, Edge boxes: locating object proposals from edges, Springer, 2014.
[14]
K. Kim, H. Byun, Y. Song, Y.-W. Choi, S. Chi, K.K. Kim, Y. Chung, Scene text extraction in natural scene images using hierarchical feature combining and verification, IEEE, 2004.
[15]
S. Lee, J. Seok, K. Min, J. Kim, Scene text extraction using image intensity and color information, IEEE, 2009.
[16]
C. Yao, X. Bai, W. Liu, Y. Ma, Z. Tu, Detecting texts of arbitrary orientations in natural images, IEEE, 2012.
[17]
H.I. Koo, D.H. Kim, Scene text detection via connected component clustering and nontext filtering, IEEE Trans. Image Process., 22 (2013) 2296-2305.
[18]
X.-C. Yin, X. Yin, K. Huang, H.-W. Hao, Robust text detection in natural scene images, IEEE Trans. Pattern Anal. Mach. Intell., 36 (2014) 970-983.
[19]
L. Gomez, D. Karatzas, A fast hierarchical method for multi-script and arbitrary oriented scene text extraction, Int. J. Doc. Anal. Recognit. (IJDAR), 19 (2016) 335-349.
[20]
Q. Ye, D. Doermann, Text detection and recognition in imagery: a survey, IEEE Trans. Pattern Anal. Mach. Intell., 37 (2015) 1480-1500.
[21]
Y. Zhu, C. Yao, X. Bai, Scene text detection and recognition: recent advances and future trends, Front. Comput. Sci., 10 (2016) 19-36.
[22]
K. Jung, K.I. Kim, A.K. Jain, Text information extraction in images and video: a survey, Pattern Recognit., 37 (2004) 977-997.
[23]
J. Liang, D. Doermann, H. Li, Camera-based analysis of text and documents: a survey, Int. J. Doc. Anal. Recognit. (IJDAR), 7 (2005) 84-104.
[24]
K. Wang, B. Babenko, S. Belongie, End-to-end scene text recognition, IEEE, 2011.
[25]
A. Mishra, K. Alahari, C. Jawahar, Top-down and bottom-up cues for scene text recognition, IEEE, 2012.
[26]
R. Minetto, N. Thome, M. Cord, N.J. Leite, J. Stolfi, T-HOG: an effective gradient-based descriptor for single line text regions, Pattern Recognit., 46 (2013) 1078-1090.
[27]
C. Yao, X. Bai, W. Liu, A unified framework for multioriented text detection and recognition, IEEE Trans. Image Process., 23 (2014) 4737-4749.
[28]
B. Epshtein, E. Ofek, Y. Wexler, Detecting text in natural scenes with stroke width transform, IEEE, 2010.
[29]
A. Mosleh, N. Bouguila, A.B. Hamza, Image text detection using a bandlet-based edge detector and stroke width transform., 2012.
[30]
H. Xu, L. Xue, F. Su, Scene text detection based on robust stroke width transform and deep belief network, Springer, 2014.
[31]
J. Matas, O. Chum, M. Urban, T. Pajdla, Robust wide-baseline stereo from maximally stable extremal regions, Image Vis. Comput., 22 (2004) 761-767.
[32]
L. Neumann, J. Matas, A method for text localization and recognition in real-world images, Springer, 2010.
[33]
H. Chen, S.S. Tsai, G. Schroth, D.M. Chen, R. Grzeszczuk, B. Girod, Robust text detection in natural images with edge-enhanced maximally stable extremal regions, IEEE, 2011.
[34]
T. Novikova, O. Barinova, P. Kohli, V. Lempitsky, Large-lexicon attribute-consistent text recognition in natural images, Springer, 2012.
[35]
C. Shi, C. Wang, B. Xiao, Y. Zhang, S. Gao, Scene text detection using graph model built upon maximally stable extremal regions, Pattern Recognit. Lett., 34 (2013) 107-116.
[36]
C. Shi, C. Wang, B. Xiao, S. Gao, J. Hu, End-to-end scene text recognition using tree-structured models, Pattern Recognit., 47 (2014) 2853-2866.
[37]
O. Alsharif, J. Pineau, End-to-end text recognition with hybrid HMM maxout models, 2014.
[38]
L. Sun, Q. Huo, W. Jia, K. Chen, A robust approach for text detection from natural scene images, Pattern Recognit., 48 (2015) 2906-2920.
[39]
W. Huang, Y. Qiao, X. Tang, Robust scene text detection with convolution neural network induced mser trees, Springer, 2014.
[40]
J. Hosang, R. Benenson, P. Dollr, B. Schiele, What makes for effective detection proposals?, IEEE Trans. Pattern Anal. Mach. Intell., 38 (2016) 814-830.
[41]
L. Huo, L. Jiao, S. Wang, S. Yang, Object-level saliency detection with color attributes, Pattern Recognit., 49 (2016) 162-173.
[42]
I. Gonzlez-Daz, V. Buso, J. Benois-Pineau, Perceptual modeling in the problem of active object recognition in visual scenes, Pattern Recognit., 56 (2016) 129-141.
[43]
M.-M. Cheng, Z. Zhang, W.-Y. Lin, P. Torr, BING: binarized normed gradients for objectness estimation at 300fps, 2014.
[44]
S. Manen, M. Guillaumin, L. Gool, Prime object proposals with randomized Prims algorithm, 2013.
[45]
P. Krhenbhl, V. Koltun, Geodesic object proposals, Springer, 2014.
[46]
G. Borgefors, Distance transformations in digital images, Comput. Vis. Graph. Image Process., 34 (1986) 344-371.
[47]
A. Fitzgibbon, R.B. Fisher, A buyers guide to conic fitting, 1995.
[48]
S.M. Lucas, A. Panaretos, L. Sosa, A. Tang, S. Wong, R. Young, K. Ashida, H. Nagai, M. Okamoto, H. Yamamoto, ICDAR 2003 robust reading competitions: entries, results, and future directions, Int. J. Doc. Anal. Recognit. (IJDAR), 7 (2005) 105-122.
[49]
D. Karatzas, ICDAR 2013 robust reading competition, IEEE, 2013.
[50]
K. Wang, S. Belongie, Word spotting in the wild, Springer-Verlag, 2010.
[51]
L. Gomez, D. Karatzas, A fine-grained approach to scene text script identification, IEEE, 2016.
[52]
F. Cao, J. Delon, A. Desolneux, P. Mus, F. Sur, An a Contrario Approach to Hierarchical Clustering Validity Assessment, INRIA, 2004.
[53]
L. Gmez, D. Karatzas, Scene text recognition: no country for old men?, Springer, 2014.
[54]
X. Liu, T. Lu, Natural scene character recognition using Markov random field, IEEE, 2015.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Pattern Recognition
Pattern Recognition  Volume 70, Issue C
October 2017
152 pages

Publisher

Elsevier Science Inc.

United States

Publication History

Published: 01 October 2017

Author Tags

  1. Grouping
  2. Object proposals
  3. Perceptual organization
  4. Scene text

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 15 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Towards End-to-End Text Spotting in Natural ScenesIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2021.309591644:10_Part_2(7266-7281)Online publication date: 1-Oct-2022
  • (2022)MORANPattern Recognition10.1016/j.patcog.2019.01.02090:C(109-118)Online publication date: 18-Apr-2022
  • (2022)Word spotting and recognition via a joint deep embedding of image and textPattern Recognition10.1016/j.patcog.2018.11.01788:C(312-320)Online publication date: 18-Apr-2022
  • (2022)Deep-learning based end-to-end system for text reading in the wildMultimedia Tools and Applications10.1007/s11042-022-11998-x81:17(24691-24719)Online publication date: 21-Mar-2022
  • (2022)Traditional to transfer learning progression on scene text detection and recognition: a surveyArtificial Intelligence Review10.1007/s10462-021-10091-355:4(3457-3502)Online publication date: 1-Apr-2022
  • (2021)Scene text spotting based on end-to-endJournal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology10.3233/JIFS-20090340:5(8871-8881)Online publication date: 1-Jan-2021
  • (2021)Research on Methods of English Text Detection and Recognition Based on Neural Network Detection ModelScientific Programming10.1155/2021/64068562021Online publication date: 1-Jan-2021
  • (2021)A Multimodal Deep Framework for Derogatory Social Media Post Identification of a Recognized PersonACM Transactions on Asian and Low-Resource Language Information Processing10.1145/344765121:1(1-19)Online publication date: 2-Nov-2021
  • (2021)A decadeComputer Science Review10.1016/j.cosrev.2021.10043442:COnline publication date: 1-Nov-2021
  • (2021)Residual Dual Scale Scene Text Spotting by Fusing Bottom-Up and Top-Down ProcessingInternational Journal of Computer Vision10.1007/s11263-020-01388-x129:3(619-637)Online publication date: 1-Mar-2021
  • Show More Cited By

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media