Article

Word spotting in the wild

Authors:

Serge BelongieAuthors Info & Claims

ECCV'10: Proceedings of the 11th European conference on Computer vision: Part I

Pages 591 - 604

Published: 05 September 2010 Publication History

Abstract

We present a method for spotting words in the wild, i.e., in real images taken in unconstrained environments. Text found in the wild has a surprising range of difficulty. At one end of the spectrum, Optical Character Recognition (OCR) applied to scanned pages of well formatted printed text is one of the most successful applications of computer vision to date. At the other extreme lie visual CAPTCHAs - text that is constructed explicitly to fool computer vision algorithms. Both tasks involve recognizing text, yet one is nearly solved while the other remains extremely challenging. In this work, we argue that the appearance of words in the wild spans this range of difficulties and propose a new word recognition approach based on state-of-the-art methods from generic object recognition, in which we consider object categories to be the words themselves. We compare performance of leading OCR engines - one open source and one proprietary - with our new approach on the ICDAR Robust Reading data set and a new word spotting data set we introduce in this paper: the Street View Text data set. We show improvements of up to 16% on the data sets, demonstrating the feasibility of a new approach to a seemingly old problem.

References

[1]

von Ahn, L., Blum, M., Hopper, N.J., Langford, J.: Captcha: Using hard AI problems for security. In: Eurocrypt (2003).

[2]

Huang, G.B., Ramesh, M., Berg, T., Learned-Miller, E.: Labeled faces in the wild: A database for studying face recognition in unconstrained environments. Technical Report 07-49, University of Massachusetts, Amherst (2007).

[3]

Lucas, S.M., Panaretos, A., Sosa, L., Tang, A., Wong, S., Young, R.: ICDAR 2003 robust reading competitions. In: ICDAR (2003).

[4]

Smeulders, A.W.M., Worring, M., Santini, S., Gupta, A., Jain, R.: Content-based image retrieval at the end of the early years. IEEE Trans. PAMI 22, 1349-1380 (2000).

[5]

Rath, T.M., Manmatha, R.: Word image matching using dynamic time warping. In: CVPR (2003).

[6]

Nagy, G.: At the frontiers of OCR. Proceedings of IEEE 80, 1093-1100 (1992).

[7]

Mori, S., Suen, C.Y., Yamamoto, K.: Historical review of OCR research and development. Document Image Analysis, 244-273 (1995).

[8]

Casey, R.G., Lecolinet, E.: A survey of methods and strategies in character segmentation. IEEE Trans. PAMI 18, 690-706 (1996).

[9]

Chellapilla, K., Larson, K., Simard, P.Y., Czerwinski, M.: Designing human friendly human interaction proofs (HIPs). In: CHI (2005).

[10]

Wu, V., Manmatha, R., Riseman, E.M.: Textfinder: An automatic system to detect and recognize text in images. IEEE Trans. PAMI 21, 1224-1229 (1999).

[11]

Sato, T., Kanade, T., Hughes, E.K., Smith, M.A., Satoh, S.: Video OCR: indexing digital new libraries by recognition of superimposed captions. Multimedia Systems 7, 385-395 (1999).

[12]

Weinman, J.J., Learned-Miller, E., Hanson, A.R.: Scene text recognition using similarity and a lexicon with sparse belief propagation. IEEE Trans. PAMI 31, 1733-1746 (2009).

[13]

Chen, X., Yuille, A.L.: Detecting and reading text in natural scenes. In: CVPR (2004).

[14]

Vanhoucke, V., Gokturk, S.B.: Reading text in consumer digital photographs. In: SPIE (2007).

[15]

Mori, G., Malik, J.: Recognizing objects in adversarial clutter: Breaking a visual CAPTCHA. In: CVPR (2003).

[16]

Fischler, M., Elschlager, R.: The representation and matching of pictorial structures. IEEE Trans. on Computers 22, 67-92 (1973).

[17]

Felzenszwalb, P.F., Huttenlocher, D.P.: Pictorial structures for object recognition. IJCV 61, 55-79 (2005).

[18]

de Campos, T., Babu, B., Varma, M.: Character recognition in natural images. In: VISAPP (2009).

[19]

Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR (2005).

[20]

Berg, A.C., Berg, T.L., Malik, J.: Shape matching and object recognition using low distortion correspondence. In: CVPR (2005).

[21]

Belongie, S., Malik, J., Puzicha, J.: Shape matching and object recognition using shape contexts. IEEE Trans. PAMI 24, 509-522 (2002).

[22]

Canny, J.: A computational approach to edge detection. IEEE Trans. PAMI 8, 679-698 (1986).

Cited By

Fu ZXie HFang SWang YXing MZhang Y(2023)Learning Pixel Affinity Pyramid for Arbitrary-Shaped Text DetectionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/352461719:1s(1-24)Online publication date: 3-Feb-2023
https://dl.acm.org/doi/10.1145/3524617
Qiao ZZhou YWei JWang WZhang YJiang NWang HWang WShen HZhuang YSmith JYang YCesar PMetze FPrabhakaran B(2021)PIMNet: A Parallel, Iterative and Mimicking Network for Scene Text RecognitionProceedings of the 29th ACM International Conference on Multimedia10.1145/3474085.3475238(2046-2055)Online publication date: 17-Oct-2021
https://dl.acm.org/doi/10.1145/3474085.3475238
Wu LLiu XHao YMa YHong RCheng WKankanhalli MWang MChu WLiu JWorring M(2021)NASTER: Non-local Attentional Scene Text RecognizerProceedings of the 2021 International Conference on Multimedia Retrieval10.1145/3460426.3463623(331-338)Online publication date: 24-Aug-2021
https://dl.acm.org/doi/10.1145/3460426.3463623
Show More Cited By

Index Terms

Word spotting in the wild
1. Applied computing
  1. Document management and text processing
    1. Document capture
      1. Optical character recognition

Index terms have been assigned to the content through auto-classification.

Recommendations

Using attributes for word spotting and recognition in polytonic greek documents
ICDAR '15: Proceedings of the 2015 13th International Conference on Document Analysis and Recognition (ICDAR)

Word spotting and recognition are among the most important applications used today in the field of document processing and text understanding. In word spotting, the goal is to search a scanned document for instances of a specific word. In word ...
Learning-based word spotting system for Arabic handwritten documents

The retrieval of information from scanned handwritten documents is becoming vital with the rapid increase of digitized documents, and word spotting systems have been developed to search for words within documents. These systems can be either template ...
Lexicon-free handwritten word spotting using character HMMs

Highlights Novel handwritten word spotting system using character HMMs. Efficient lexicon-free approach. Template-free spotting of arbitrary keywords. Segmentation-free approach is applied to complete text lines. Clearly outperforms DTW-based reference ...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

ECCV'10: Proceedings of the 11th European conference on Computer vision: Part I

September 2010

810 pages

ISBN:3642155480

Editors:
Kostas Daniilidis
GRASP Laboratory, University of Pennsylvania, Philadelphia, PA
,
Petros Maragos
National Technical University of Athens, School of Electrical and Computer Engineering, Athens, Greece
,
Nikos Paragios
Ecole Centrale de Paris, Department of Applied Mathematics, Chatenay-Malabry, France

Sponsors

Adobe
Google Inc.
Microsoft Research: Microsoft Research
technicolor
INRIA: Institut Natl de Recherche en Info et en Automatique

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 05 September 2010

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

39
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 14 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Fu ZXie HFang SWang YXing MZhang Y(2023)Learning Pixel Affinity Pyramid for Arbitrary-Shaped Text DetectionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/352461719:1s(1-24)Online publication date: 3-Feb-2023
https://dl.acm.org/doi/10.1145/3524617
Qiao ZZhou YWei JWang WZhang YJiang NWang HWang WShen HZhuang YSmith JYang YCesar PMetze FPrabhakaran B(2021)PIMNet: A Parallel, Iterative and Mimicking Network for Scene Text RecognitionProceedings of the 29th ACM International Conference on Multimedia10.1145/3474085.3475238(2046-2055)Online publication date: 17-Oct-2021
https://dl.acm.org/doi/10.1145/3474085.3475238
Wu LLiu XHao YMa YHong RCheng WKankanhalli MWang MChu WLiu JWorring M(2021)NASTER: Non-local Attentional Scene Text RecognizerProceedings of the 2021 International Conference on Multimedia Retrieval10.1145/3460426.3463623(331-338)Online publication date: 24-Aug-2021
https://dl.acm.org/doi/10.1145/3460426.3463623
Sarkar RMukhopadhyay AKumar SChowdhury SChakraborty NMollah ABasu S(2019)Multi-Lingual Scene Text Detection Using One-Class ClassifierInternational Journal of Computer Vision and Image Processing10.4018/IJCVIP.20190401049:2(48-65)Online publication date: 1-Apr-2019
https://dl.acm.org/doi/10.4018/IJCVIP.2019040104
Gu YChang JZhang YWang Y(2019)An Element Sensitive Saliency Model with Position Prior Learning for Web PagesProceedings of the 2019 3rd International Conference on Innovation in Artificial Intelligence10.1145/3319921.3319932(157-161)Online publication date: 15-Mar-2019
https://dl.acm.org/doi/10.1145/3319921.3319932
Xie HFang SZha ZYang YLi YZhang Y(2019)Convolutional Attention Networks for Scene Text RecognitionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/323173715:1s(1-17)Online publication date: 24-Jan-2019
https://dl.acm.org/doi/10.1145/3231737
Zhu AUchida S(2019)Scene word recognition from pieces to wholeFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-017-6420-213:2(292-301)Online publication date: 1-Apr-2019
https://dl.acm.org/doi/10.1007/s11704-017-6420-2
Jha ANamboodiri VJawahar C(2019)Spotting words in silent speech videosMachine Vision and Applications10.1007/s00138-019-01006-y30:2(217-229)Online publication date: 1-Mar-2019
https://dl.acm.org/doi/10.1007/s00138-019-01006-y
Zhong HMa Y(2018)Fast and Robust Text Detection in MOOCs videosProceedings of the 2018 International Conference on Distance Education and Learning10.1145/3231848.3231856(98-102)Online publication date: 26-May-2018
https://dl.acm.org/doi/10.1145/3231848.3231856
Borisyuk FGordo ASivakumar VGuo YFarooq F(2018)RosettaProceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining10.1145/3219819.3219861(71-79)Online publication date: 19-Jul-2018
https://dl.acm.org/doi/10.1145/3219819.3219861
Show More Cited By

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents