research-article

TextProposals

Authors:

Dimosthenis KaratzasAuthors Info & Claims

Pattern Recognition, Volume 70, Issue C

Pages 60 - 74

https://doi.org/10.1016/j.patcog.2017.04.027

Published: 01 October 2017 Publication History

Abstract

We present a text specific object proposals algorithm.Our algorithm is able to reach impressive recall rates with a few thousand proposals in different standard datasets.Our method generates word proposals without an explicit character segmentation.The combination of our object proposals with existing whole-word recognizers shows competitive performance in end-to-end word spotting, and, in some benchmarks, outperforms previously published results. Motivated by the success of powerful while expensive techniques to recognize words in a holistic way(Goel etal., 2013; Almazn etal., 2014; Jaderberg etal., 2016) object proposals techniques emerge as an alternative to the traditional text detectors. In this paper we introduce a novel object proposals method that is specifically designed for text. We rely on a similarity based region grouping algorithm that generates a hierarchy of word hypotheses. Over the nodes of this hierarchy it is possible to apply a holistic word recognition method in an efficient way.Our experiments demonstrate that the presented method is superior in its ability of producing good quality word proposals when compared with class-independent algorithms. We show impressive recall rates with a few thousand proposals in different standard benchmarks, including focused or incidental text datasets, and multi-language scenarios. Moreover, the combination of our object proposals with existing whole-word recognizers (Almazn etal., 2014; Jaderberg etal., 2016) shows competitive performance in end-to-end word spotting, and, in some benchmarks, outperforms previously published results. Concretely, in the challenging ICDAR2015 Incidental Text dataset, we overcome in more than 10% F-score the best-performing method in the last ICDAR Robust Reading Competition (Karatzas, 2015). Source code of the complete end-to-end system is available athttps://github.com/lluisgomez/TextProposals.

References

[1]

V. Goel, A. Mishra, K. Alahari, C. Jawahar, Whole is greater than sum of parts: Recognizing scene text words, IEEE, 2013.

[2]

J. Almazn, A. Gordo, A. Forns, E. Valveny, Word spotting and recognition with embedded attributes, IEEE Trans. Pattern Anal. Mach. Intell., 36 (2014) 2552-2566.

[3]

M. Jaderberg, K. Simonyan, A. Vedaldi, A. Zisserman, Reading text in the wild with convolutional neural networks, Int. J. Comput. Vis., 116 (2016) 1-20.

Digital Library

[4]

D. Karatzas, ICDAR 2015 competition on robust reading, IEEE, 2015.

[5]

A. Coates, B. Carpenter, C. Case, S. Satheesh, B. Suresh, T. Wang, D.J. Wu, A.Y. Ng, Text detection and character recognition in scene images with unsupervised feature learning, IEEE, 2011.

[6]

T. Wang, D.J. Wu, A. Coates, A.Y. Ng, End-to-end text recognition with convolutional neural networks, IEEE, 2012.

[7]

M. Jaderberg, A. Vedaldi, A. Zisserman, Deep features for text spotting, Springer, 2014.

[8]

L. Neumann, J. Matas, Real-time scene text localization and recognition, IEEE, 2012.

[9]

L. Neumann, J. Matas, Scene text localization and recognition with oriented stroke detection, 2013.

[10]

R. Girshick, J. Donahue, T. Darrell, J. Malik, Rich feature hierarchies for accurate object detection and semantic segmentation, 2014.

[11]

J.R. Uijlings, K.E. van de Sande, T. Gevers, A.W. Smeulders, Selective search for object recognition, Int. J. Comput. Vis., 104 (2013) 154-171.

Digital Library

[12]

P. Dollr, R. Appel, S. Belongie, P. Perona, Fast feature pyramids for object detection, IEEE Trans. Pattern Anal. Mach. Intell., 36 (2014) 1532-1545.

Digital Library

[13]

C.L. Zitnick, P. Dollr, Edge boxes: locating object proposals from edges, Springer, 2014.

[14]

K. Kim, H. Byun, Y. Song, Y.-W. Choi, S. Chi, K.K. Kim, Y. Chung, Scene text extraction in natural scene images using hierarchical feature combining and verification, IEEE, 2004.

[15]

S. Lee, J. Seok, K. Min, J. Kim, Scene text extraction using image intensity and color information, IEEE, 2009.

[16]

C. Yao, X. Bai, W. Liu, Y. Ma, Z. Tu, Detecting texts of arbitrary orientations in natural images, IEEE, 2012.

[17]

H.I. Koo, D.H. Kim, Scene text detection via connected component clustering and nontext filtering, IEEE Trans. Image Process., 22 (2013) 2296-2305.

Digital Library

[18]

X.-C. Yin, X. Yin, K. Huang, H.-W. Hao, Robust text detection in natural scene images, IEEE Trans. Pattern Anal. Mach. Intell., 36 (2014) 970-983.

[19]

L. Gomez, D. Karatzas, A fast hierarchical method for multi-script and arbitrary oriented scene text extraction, Int. J. Doc. Anal. Recognit. (IJDAR), 19 (2016) 335-349.

Digital Library

[20]

Q. Ye, D. Doermann, Text detection and recognition in imagery: a survey, IEEE Trans. Pattern Anal. Mach. Intell., 37 (2015) 1480-1500.

[21]

Y. Zhu, C. Yao, X. Bai, Scene text detection and recognition: recent advances and future trends, Front. Comput. Sci., 10 (2016) 19-36.

Digital Library

[22]

K. Jung, K.I. Kim, A.K. Jain, Text information extraction in images and video: a survey, Pattern Recognit., 37 (2004) 977-997.

[23]

J. Liang, D. Doermann, H. Li, Camera-based analysis of text and documents: a survey, Int. J. Doc. Anal. Recognit. (IJDAR), 7 (2005) 84-104.

Digital Library

[24]

K. Wang, B. Babenko, S. Belongie, End-to-end scene text recognition, IEEE, 2011.

[25]

A. Mishra, K. Alahari, C. Jawahar, Top-down and bottom-up cues for scene text recognition, IEEE, 2012.

[26]

R. Minetto, N. Thome, M. Cord, N.J. Leite, J. Stolfi, T-HOG: an effective gradient-based descriptor for single line text regions, Pattern Recognit., 46 (2013) 1078-1090.

Digital Library

[27]

C. Yao, X. Bai, W. Liu, A unified framework for multioriented text detection and recognition, IEEE Trans. Image Process., 23 (2014) 4737-4749.

[28]

B. Epshtein, E. Ofek, Y. Wexler, Detecting text in natural scenes with stroke width transform, IEEE, 2010.

[29]

A. Mosleh, N. Bouguila, A.B. Hamza, Image text detection using a bandlet-based edge detector and stroke width transform., 2012.

[30]

H. Xu, L. Xue, F. Su, Scene text detection based on robust stroke width transform and deep belief network, Springer, 2014.

[31]

J. Matas, O. Chum, M. Urban, T. Pajdla, Robust wide-baseline stereo from maximally stable extremal regions, Image Vis. Comput., 22 (2004) 761-767.

[32]

L. Neumann, J. Matas, A method for text localization and recognition in real-world images, Springer, 2010.

[33]

H. Chen, S.S. Tsai, G. Schroth, D.M. Chen, R. Grzeszczuk, B. Girod, Robust text detection in natural images with edge-enhanced maximally stable extremal regions, IEEE, 2011.

[34]

T. Novikova, O. Barinova, P. Kohli, V. Lempitsky, Large-lexicon attribute-consistent text recognition in natural images, Springer, 2012.

[35]

C. Shi, C. Wang, B. Xiao, Y. Zhang, S. Gao, Scene text detection using graph model built upon maximally stable extremal regions, Pattern Recognit. Lett., 34 (2013) 107-116.

Digital Library

[36]

C. Shi, C. Wang, B. Xiao, S. Gao, J. Hu, End-to-end scene text recognition using tree-structured models, Pattern Recognit., 47 (2014) 2853-2866.

[37]

O. Alsharif, J. Pineau, End-to-end text recognition with hybrid HMM maxout models, 2014.

[38]

L. Sun, Q. Huo, W. Jia, K. Chen, A robust approach for text detection from natural scene images, Pattern Recognit., 48 (2015) 2906-2920.

Digital Library

[39]

W. Huang, Y. Qiao, X. Tang, Robust scene text detection with convolution neural network induced mser trees, Springer, 2014.

[40]

J. Hosang, R. Benenson, P. Dollr, B. Schiele, What makes for effective detection proposals?, IEEE Trans. Pattern Anal. Mach. Intell., 38 (2016) 814-830.

Digital Library

[41]

L. Huo, L. Jiao, S. Wang, S. Yang, Object-level saliency detection with color attributes, Pattern Recognit., 49 (2016) 162-173.

Digital Library

[42]

I. Gonzlez-Daz, V. Buso, J. Benois-Pineau, Perceptual modeling in the problem of active object recognition in visual scenes, Pattern Recognit., 56 (2016) 129-141.

Digital Library

[43]

M.-M. Cheng, Z. Zhang, W.-Y. Lin, P. Torr, BING: binarized normed gradients for objectness estimation at 300fps, 2014.

[44]

S. Manen, M. Guillaumin, L. Gool, Prime object proposals with randomized Prims algorithm, 2013.

[45]

P. Krhenbhl, V. Koltun, Geodesic object proposals, Springer, 2014.

[46]

G. Borgefors, Distance transformations in digital images, Comput. Vis. Graph. Image Process., 34 (1986) 344-371.

Digital Library

[47]

A. Fitzgibbon, R.B. Fisher, A buyers guide to conic fitting, 1995.

[48]

S.M. Lucas, A. Panaretos, L. Sosa, A. Tang, S. Wong, R. Young, K. Ashida, H. Nagai, M. Okamoto, H. Yamamoto, ICDAR 2003 robust reading competitions: entries, results, and future directions, Int. J. Doc. Anal. Recognit. (IJDAR), 7 (2005) 105-122.

Digital Library

[49]

D. Karatzas, ICDAR 2013 robust reading competition, IEEE, 2013.

[50]

K. Wang, S. Belongie, Word spotting in the wild, Springer-Verlag, 2010.

[51]

L. Gomez, D. Karatzas, A fine-grained approach to scene text script identification, IEEE, 2016.

[52]

F. Cao, J. Delon, A. Desolneux, P. Mus, F. Sur, An a Contrario Approach to Hierarchical Clustering Validity Assessment, INRIA, 2004.

[53]

L. Gmez, D. Karatzas, Scene text recognition: no country for old men?, Springer, 2014.

[54]

X. Liu, T. Lu, Natural scene character recognition using Markov random field, IEEE, 2015.

Cited By

Wang PLi HShen C(2022)Towards End-to-End Text Spotting in Natural ScenesIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2021.309591644:10_Part_2(7266-7281)Online publication date: 1-Oct-2022
https://dl.acm.org/doi/10.1109/TPAMI.2021.3095916
Luo CJin LSun Z(2022)MORANPattern Recognition10.1016/j.patcog.2019.01.02090:C(109-118)Online publication date: 18-Apr-2022
https://dl.acm.org/doi/10.1016/j.patcog.2019.01.020
Mhiri MDesrosiers CCheriet M(2022)Word spotting and recognition via a joint deep embedding of image and textPattern Recognition10.1016/j.patcog.2018.11.01788:C(312-320)Online publication date: 18-Apr-2022
https://dl.acm.org/doi/10.1016/j.patcog.2018.11.017
Show More Cited By

Recommendations

Boundary-aware box refinement for object proposal generation

Object proposals have been widely used in object detection to speed up object searching. However, many of existing object proposal generators have pool localization quality, which weakens the performance of object detectors. In this paper, we present an ...
A fast hierarchical method for multi-script and arbitrary oriented scene text extraction

Typography and layout lead to the hierarchical organization of text in words, text lines, paragraphs. This inherent structure is a key property of text in any script and language, which has nonetheless been minimally leveraged by existing scene text ...
DeepFH segmentations for superpixel-based object proposal refinement
Graphical abstract

Display Omitted
Highlights
- We propose a superpixel-based refinement system for object proposal generation.
Abstract
Class-agnostic object proposal generation is an important first step in many object detection pipelines. However, object proposals of modern systems are rather inaccurate in terms of segmentation and only roughly adhere to object ...

Comments

Information & Contributors

Information

Published In

cover image Pattern Recognition

Pattern Recognition Volume 70, Issue C

October 2017

152 pages

ISSN:0031-3203

Issue’s Table of Contents

Copyright © Elsevier Ltd.

Publisher

Elsevier Science Inc.

United States

Publication History

Published: 01 October 2017

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

21
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 15 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Wang PLi HShen C(2022)Towards End-to-End Text Spotting in Natural ScenesIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2021.309591644:10_Part_2(7266-7281)Online publication date: 1-Oct-2022
https://dl.acm.org/doi/10.1109/TPAMI.2021.3095916
Luo CJin LSun Z(2022)MORANPattern Recognition10.1016/j.patcog.2019.01.02090:C(109-118)Online publication date: 18-Apr-2022
https://dl.acm.org/doi/10.1016/j.patcog.2019.01.020
Mhiri MDesrosiers CCheriet M(2022)Word spotting and recognition via a joint deep embedding of image and textPattern Recognition10.1016/j.patcog.2018.11.01788:C(312-320)Online publication date: 18-Apr-2022
https://dl.acm.org/doi/10.1016/j.patcog.2018.11.017
Harizi RWalha RDrira F(2022)Deep-learning based end-to-end system for text reading in the wildMultimedia Tools and Applications10.1007/s11042-022-11998-x81:17(24691-24719)Online publication date: 21-Mar-2022
https://dl.acm.org/doi/10.1007/s11042-022-11998-x
Gupta NJalal A(2022)Traditional to transfer learning progression on scene text detection and recognition: a surveyArtificial Intelligence Review10.1007/s10462-021-10091-355:4(3457-3502)Online publication date: 1-Apr-2022
https://dl.acm.org/doi/10.1007/s10462-021-10091-3
Wei GRong WLiang YXiao XLiu X(2021)Scene text spotting based on end-to-endJournal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology10.3233/JIFS-20090340:5(8871-8881)Online publication date: 1-Jan-2021
https://dl.acm.org/doi/10.3233/JIFS-200903
Li C(2021)Research on Methods of English Text Detection and Recognition Based on Neural Network Detection ModelScientific Programming10.1155/2021/64068562021Online publication date: 1-Jan-2021
https://dl.acm.org/doi/10.1155/2021/6406856
Bhowmick RGanguli IPaul JSil J(2021)A Multimodal Deep Framework for Derogatory Social Media Post Identification of a Recognized PersonACM Transactions on Asian and Low-Resource Language Information Processing10.1145/344765121:1(1-19)Online publication date: 2-Nov-2021
https://dl.acm.org/doi/10.1145/3447651
Rainarli ESuprapto Wahyono (2021)A decadeComputer Science Review10.1016/j.cosrev.2021.10043442:COnline publication date: 1-Nov-2021
https://dl.acm.org/doi/10.1016/j.cosrev.2021.100434
Feng WYin FZhang XHe WLiu C(2021)Residual Dual Scale Scene Text Spotting by Fusing Bottom-Up and Top-Down ProcessingInternational Journal of Computer Vision10.1007/s11263-020-01388-x129:3(619-637)Online publication date: 1-Mar-2021
https://dl.acm.org/doi/10.1007/s11263-020-01388-x
Show More Cited By

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents