Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2964284.2967282acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
short-paper
Public Access

Detecting Arbitrary Oriented Text in the Wild with a Visual Attention Model

Published: 01 October 2016 Publication History

Abstract

Text embedded in images provides important semantic information about a scene and its content. Detecting text in an unconstrained environment is a challenging task because of the many fonts, sizes, backgrounds, and alignments of the characters. We present a novel attention model for detecting arbitrary oriented and curved scene text. Inspired by the attention mechanisms in the human visual system, our model utilizes a spatial glimpse network to processes the attended area and deploys a recurrent neural network that aggregates the information over time to determine the attention movement. Combining this with an off-the-shelf region proposal method, the model achieves the state-of-the-art performance on the highly cited ICDAR2013 dataset, and the MSRA-TD500 dataset which contains arbitrary oriented text.

References

[1]
B. Alexe, N. Heess, Y. W. Teh, and V. Ferrari. Searching for objects driven by context. In Proc. of NIPS'12, pages 881--889, 2012.
[2]
J. Ba, V. Mnih, and K. Kavukcuoglu. Multiple object recognition with visual attention. In Proc. of ICLR'15, 2015.
[3]
B. Bai, F. Yin, and C. L. Liu. Scene text localization using gradient local correlation. In Proc. of ICDAR'13, pages 1380--1384. IEEE, 2013.
[4]
L. Bazzani, N. de Freitas, H. Larochelle, V. Murino, and J.-A. Ting. Learning attentional policies for object tracking and recognition in video with deep networks. In Proc. of ICML'11, pages 937--944. ACM, 2011.
[5]
Y. Bengio, J. Louradour, R. Collobert, and J. Weston. Curriculum learning. In Proc. of ICML'09, pages 41--48. ACM, 2009.
[6]
M. Bu\vsta, T. Drtina, D. Helekal, L. Neumann, and J. Matas. Efficient character skew rectification in scene text images. In Proc of ACCV'14 Workshops, pages 134--146. Springer, 2014.
[7]
X. Chen and A. L. Yuille. Detecting and reading text in natural scenes. In Proc. of CVPR'04, pages 366--373. IEEE Computer Society, 2004.
[8]
M. Corbetta and G. L. Shulman. Control of goal-directed and stimulus-driven attention in the brain. Nature reviews neuroscience, 3(3):201--215, 2002.
[9]
B. Epshtein, E. Ofek, and Y. Wexler. Detecting text in natural scenes with stroke width transform. In Proc. of CVPR'10, pages 2963--2970. IEEE, 2010.
[10]
M. Jaderberg, K. Simonyan, A. Vedaldi, and A. Zisserman. Synthetic data and artificial neural networks for natural scene text recognition. In Workshop on Deep Learning, NIPS, 2014.
[11]
M. Jaderberg, A. Vedaldi, and A. Zisserman. Deep features for text spotting. In Proc. of ECCV 2014, pages 512--528. Springer, 2014.
[12]
L. Kang, Y. Li, and D. Doermann. Orientation robust text line detection in natural images. In Proc. of CVPR'14, pages 4034--4041. IEEE, 2014.
[13]
D. Karatzas, L. Gomez-Bigorda, A. Nicolaou, S. Ghosh, A. Bagdanov, M. Iwamura, J. Matas, L. Neumann, V. R. Chandrasekhar, S. Lu, et al. Icdar 2015 competition on robust reading. In Proc. of ICDAR'15, pages 1156--1160. IEEE, 2015.
[14]
Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. In Intelligent Signal Processing, pages 306--351. IEEE Press, 2001.
[15]
J. Mao, H. Li, W. Zhou, S. Yan, and Q. Tian. Scale based region growing for scene text detection. In Proc, of ACM MM'13, pages 1007--1016. ACM, 2013.
[16]
V. Mnih, N. Heess, A. Graves, et al. Recurrent models of visual attention. In Proc. of NIPS'14, pages 2204--2212, 2014.
[17]
L. Neumann and J. Matas. A method for text localization and recognition in real-world images. In Proc. of ACCV'10, pages 770--783. Springer, 2010.
[18]
L. Neumann and J. Matas. Real-time scene text localization and recognition. In Proc. of CVPR'12, pages 3538--3545. IEEE, 2012.
[19]
T. Q. Phan, P. Shivakumara, and C. L. Tan. Detecting text in the real world. In Proc. of ACM MM'12, pages 765--768. ACM, 2012.
[20]
R. A. Rensink. The dynamic representation of scenes. Visual cognition, 7(1--3):17--42, 2000.
[21]
A. Risnumawan, P. Shivakumara, C. S. Chan, and C. L. Tan. A robust arbitrary text detection system for natural scene images. Expert Systems with Applications, 41(18):8027--8048, 2014.
[22]
C. Shi, C. Wang, B. Xiao, Y. Zhang, and S. Gao. Scene text detection using graph model built upon maximally stable extremal regions. Pattern recognition letters, 34(2):107--116, 2013.
[23]
S. Tian, Y. Pan, C. Huang, S. Lu, K. Yu, and C. Lim Tan. Text flow: A unified text detection system in natural scene images. In Proc. of ICCV'15, pages 4651--4659, 2015.
[24]
T. Tieleman and G. Hinton. Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning, 4:2, 2012.
[25]
H. von Helmholtz and J. P. C. Southall. Treatise on Physiological Optics: Translated from the 3rd German Ed. Optical Society of America, 1925.
[26]
K. Wang, B. Babenko, and S. Belongie. End-to-end scene text recognition. In Proc. of ICCV'11, pages 1457--1464. IEEE, 2011.
[27]
T. Wang, D. J. Wu, A. Coates, and A. Y. Ng. End-to-end text recognition with convolutional neural networks. In Proc. of ICPR'12, pages 3304--3308. IEEE, 2012.
[28]
C. Wolf and J.-M. Jolion. Object count/area graphs for the evaluation of object detection and segmentation algorithms. IJDAR, 8(4):280--296, 2006.
[29]
K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhutdinov, R. Zemel, and Y. Bengio. Show, attend and tell: Neural image caption generation with visual attention. In Proc. of ICML'15, 2015.
[30]
C. Yao, X. Bai, W. Liu, Y. Ma, and Z. Tu. Detecting texts of arbitrary orientations in natural images. In Proc. of CVPR'12, pages 1083--1090. IEEE, 2012.
[31]
X.-C. Yin, W.-Y. Pei, J. Zhang, and H.-W. Hao. Multi-orientation scene text detection with adaptive clustering. IEEE Transactions on PAMI, 37(9):1930--1937, 2015.
[32]
A. Zamberletti, L. Noce, and I. Gallo. Text localization based on fast feature pyramids and multi-resolution maximally stable extremal regions. In Proc. of ACCV'14, pages 91--105. Springer, 2014.
[33]
Z. Zhang, W. Shen, C. Yao, and X. Bai. Symmetry-based text line detection in natural scenes. In Proc. of CVPR'15, June 2015.
[34]
Q. Zhu, M.-C. Yeh, and K.-T. Cheng. Multimodal fusion using learned text concepts for image categorization. In Proc. of ACM MM'06, pages 211--220. ACM, 2006.

Cited By

View all
  • (2021)Morphable Convolutional Neural Network for Biomedical Image Segmentation2021 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE51398.2021.9474153(1522-1525)Online publication date: 1-Feb-2021
  • (2020)Deep Multi-Scale Context Aware Feature Aggregation for Curved Scene Text DetectionIEEE Transactions on Multimedia10.1109/TMM.2019.295297822:8(1969-1984)Online publication date: Aug-2020
  • (2020)Evaluating the Potential and Challenges of an Uncertainty Quantification Method for Long Short‐Term Memory Models for Soil Moisture PredictionsWater Resources Research10.1029/2020WR02809556:12Online publication date: Dec-2020
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '16: Proceedings of the 24th ACM international conference on Multimedia
October 2016
1542 pages
ISBN:9781450336031
DOI:10.1145/2964284
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 October 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. deep learning
  2. scene text detection
  3. visual attention

Qualifiers

  • Short-paper

Funding Sources

Conference

MM '16
Sponsor:
MM '16: ACM Multimedia Conference
October 15 - 19, 2016
Amsterdam, The Netherlands

Acceptance Rates

MM '16 Paper Acceptance Rate 52 of 237 submissions, 22%;
Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)53
  • Downloads (Last 6 weeks)7
Reflects downloads up to 11 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2021)Morphable Convolutional Neural Network for Biomedical Image Segmentation2021 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE51398.2021.9474153(1522-1525)Online publication date: 1-Feb-2021
  • (2020)Deep Multi-Scale Context Aware Feature Aggregation for Curved Scene Text DetectionIEEE Transactions on Multimedia10.1109/TMM.2019.295297822:8(1969-1984)Online publication date: Aug-2020
  • (2020)Evaluating the Potential and Challenges of an Uncertainty Quantification Method for Long Short‐Term Memory Models for Soil Moisture PredictionsWater Resources Research10.1029/2020WR02809556:12Online publication date: Dec-2020
  • (2019)Look More Than Once: An Accurate Detector for Text of Arbitrary Shapes2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR.2019.01080(10544-10553)Online publication date: Jun-2019
  • (2019)Classification and Diagnosis of Thyroid Carcinoma Using Reinforcement Residual Network with Visual Attention Mechanisms in Ultrasound ImagesJournal of Medical Systems10.1007/s10916-019-1448-543:11Online publication date: 14-Oct-2019
  • (2018)HESS Opinions: Incubating deep-learning-powered hydrologic science advances as a communityHydrology and Earth System Sciences10.5194/hess-22-5639-201822:11(5639-5656)Online publication date: 1-Nov-2018
  • (2018)Multi Orientation Text Detection in Natural ImageryInternational Journal of Computer Vision and Image Processing10.4018/IJCVIP.20181001048:4(41-56)Online publication date: 1-Oct-2018
  • (2018)Text extraction and retrieval from smartphone screenshotsProceedings of the 33rd Annual ACM Symposium on Applied Computing10.1145/3167132.3167236(948-955)Online publication date: 9-Apr-2018
  • (2018)Multi-oriented text detection from natural scene images based on a CNN and pruning non-adjacent graph edgesSignal Processing: Image Communication10.1016/j.image.2018.02.01664(89-98)Online publication date: May-2018

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media