short-paper

Public Access

Detecting Arbitrary Oriented Text in the Wild with a Visual Attention Model

Authors:

C. Lee GilesAuthors Info & Claims

MM '16: Proceedings of the 24th ACM international conference on Multimedia

Pages 551 - 555

https://doi.org/10.1145/2964284.2967282

Published: 01 October 2016 Publication History

Abstract

Text embedded in images provides important semantic information about a scene and its content. Detecting text in an unconstrained environment is a challenging task because of the many fonts, sizes, backgrounds, and alignments of the characters. We present a novel attention model for detecting arbitrary oriented and curved scene text. Inspired by the attention mechanisms in the human visual system, our model utilizes a spatial glimpse network to processes the attended area and deploys a recurrent neural network that aggregates the information over time to determine the attention movement. Combining this with an off-the-shelf region proposal method, the model achieves the state-of-the-art performance on the highly cited ICDAR2013 dataset, and the MSRA-TD500 dataset which contains arbitrary oriented text.

References

[1]

B. Alexe, N. Heess, Y. W. Teh, and V. Ferrari. Searching for objects driven by context. In Proc. of NIPS'12, pages 881--889, 2012.

Digital Library

[2]

J. Ba, V. Mnih, and K. Kavukcuoglu. Multiple object recognition with visual attention. In Proc. of ICLR'15, 2015.

[3]

B. Bai, F. Yin, and C. L. Liu. Scene text localization using gradient local correlation. In Proc. of ICDAR'13, pages 1380--1384. IEEE, 2013.

Digital Library

[4]

L. Bazzani, N. de Freitas, H. Larochelle, V. Murino, and J.-A. Ting. Learning attentional policies for object tracking and recognition in video with deep networks. In Proc. of ICML'11, pages 937--944. ACM, 2011.

[5]

Y. Bengio, J. Louradour, R. Collobert, and J. Weston. Curriculum learning. In Proc. of ICML'09, pages 41--48. ACM, 2009.

Digital Library

[6]

M. Bu\vsta, T. Drtina, D. Helekal, L. Neumann, and J. Matas. Efficient character skew rectification in scene text images. In Proc of ACCV'14 Workshops, pages 134--146. Springer, 2014.

[7]

X. Chen and A. L. Yuille. Detecting and reading text in natural scenes. In Proc. of CVPR'04, pages 366--373. IEEE Computer Society, 2004.

Digital Library

[8]

M. Corbetta and G. L. Shulman. Control of goal-directed and stimulus-driven attention in the brain. Nature reviews neuroscience, 3(3):201--215, 2002.

[9]

B. Epshtein, E. Ofek, and Y. Wexler. Detecting text in natural scenes with stroke width transform. In Proc. of CVPR'10, pages 2963--2970. IEEE, 2010.

[10]

M. Jaderberg, K. Simonyan, A. Vedaldi, and A. Zisserman. Synthetic data and artificial neural networks for natural scene text recognition. In Workshop on Deep Learning, NIPS, 2014.

[11]

M. Jaderberg, A. Vedaldi, and A. Zisserman. Deep features for text spotting. In Proc. of ECCV 2014, pages 512--528. Springer, 2014.

[12]

L. Kang, Y. Li, and D. Doermann. Orientation robust text line detection in natural images. In Proc. of CVPR'14, pages 4034--4041. IEEE, 2014.

Digital Library

[13]

D. Karatzas, L. Gomez-Bigorda, A. Nicolaou, S. Ghosh, A. Bagdanov, M. Iwamura, J. Matas, L. Neumann, V. R. Chandrasekhar, S. Lu, et al. Icdar 2015 competition on robust reading. In Proc. of ICDAR'15, pages 1156--1160. IEEE, 2015.

Digital Library

[14]

Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. In Intelligent Signal Processing, pages 306--351. IEEE Press, 2001.

[15]

J. Mao, H. Li, W. Zhou, S. Yan, and Q. Tian. Scale based region growing for scene text detection. In Proc, of ACM MM'13, pages 1007--1016. ACM, 2013.

Digital Library

[16]

V. Mnih, N. Heess, A. Graves, et al. Recurrent models of visual attention. In Proc. of NIPS'14, pages 2204--2212, 2014.

Digital Library

[17]

L. Neumann and J. Matas. A method for text localization and recognition in real-world images. In Proc. of ACCV'10, pages 770--783. Springer, 2010.

Digital Library

[18]

L. Neumann and J. Matas. Real-time scene text localization and recognition. In Proc. of CVPR'12, pages 3538--3545. IEEE, 2012.

Digital Library

[19]

T. Q. Phan, P. Shivakumara, and C. L. Tan. Detecting text in the real world. In Proc. of ACM MM'12, pages 765--768. ACM, 2012.

Digital Library

[20]

R. A. Rensink. The dynamic representation of scenes. Visual cognition, 7(1--3):17--42, 2000.

[21]

A. Risnumawan, P. Shivakumara, C. S. Chan, and C. L. Tan. A robust arbitrary text detection system for natural scene images. Expert Systems with Applications, 41(18):8027--8048, 2014.

[22]

C. Shi, C. Wang, B. Xiao, Y. Zhang, and S. Gao. Scene text detection using graph model built upon maximally stable extremal regions. Pattern recognition letters, 34(2):107--116, 2013.

Digital Library

[23]

S. Tian, Y. Pan, C. Huang, S. Lu, K. Yu, and C. Lim Tan. Text flow: A unified text detection system in natural scene images. In Proc. of ICCV'15, pages 4651--4659, 2015.

Digital Library

[24]

T. Tieleman and G. Hinton. Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning, 4:2, 2012.

[25]

H. von Helmholtz and J. P. C. Southall. Treatise on Physiological Optics: Translated from the 3rd German Ed. Optical Society of America, 1925.

[26]

K. Wang, B. Babenko, and S. Belongie. End-to-end scene text recognition. In Proc. of ICCV'11, pages 1457--1464. IEEE, 2011.

Digital Library

[27]

T. Wang, D. J. Wu, A. Coates, and A. Y. Ng. End-to-end text recognition with convolutional neural networks. In Proc. of ICPR'12, pages 3304--3308. IEEE, 2012.

[28]

C. Wolf and J.-M. Jolion. Object count/area graphs for the evaluation of object detection and segmentation algorithms. IJDAR, 8(4):280--296, 2006.

Digital Library

[29]

K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhutdinov, R. Zemel, and Y. Bengio. Show, attend and tell: Neural image caption generation with visual attention. In Proc. of ICML'15, 2015.

[30]

C. Yao, X. Bai, W. Liu, Y. Ma, and Z. Tu. Detecting texts of arbitrary orientations in natural images. In Proc. of CVPR'12, pages 1083--1090. IEEE, 2012.

Digital Library

[31]

X.-C. Yin, W.-Y. Pei, J. Zhang, and H.-W. Hao. Multi-orientation scene text detection with adaptive clustering. IEEE Transactions on PAMI, 37(9):1930--1937, 2015.

Digital Library

[32]

A. Zamberletti, L. Noce, and I. Gallo. Text localization based on fast feature pyramids and multi-resolution maximally stable extremal regions. In Proc. of ACCV'14, pages 91--105. Springer, 2014.

[33]

Z. Zhang, W. Shen, C. Yao, and X. Bai. Symmetry-based text line detection in natural scenes. In Proc. of CVPR'15, June 2015.

[34]

Q. Zhu, M.-C. Yeh, and K.-T. Cheng. Multimodal fusion using learned text concepts for image categorization. In Proc. of ACM MM'06, pages 211--220. ACM, 2006.

Digital Library

Cited By

Jiang HSarma AFan MRyoo JArunachalam MNaveen SKandemir M(2021)Morphable Convolutional Neural Network for Biomedical Image Segmentation2021 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE51398.2021.9474153(1522-1525)Online publication date: 1-Feb-2021
https://doi.org/10.23919/DATE51398.2021.9474153
Dai PZhang HCao X(2020)Deep Multi-Scale Context Aware Feature Aggregation for Curved Scene Text DetectionIEEE Transactions on Multimedia10.1109/TMM.2019.295297822:8(1969-1984)Online publication date: Aug-2020
https://doi.org/10.1109/TMM.2019.2952978
Fang KKifer DLawson KShen C(2020)Evaluating the Potential and Challenges of an Uncertainty Quantification Method for Long Short‐Term Memory Models for Soil Moisture PredictionsWater Resources Research10.1029/2020WR02809556:12Online publication date: Dec-2020
https://doi.org/10.1029/2020WR028095
Show More Cited By

Index Terms

Detecting Arbitrary Oriented Text in the Wild with a Visual Attention Model
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Interest point and salient region detections
        Object detection
  2. Machine learning
    1. Machine learning approaches
      1. Neural networks

Recommendations

Relevance of a feed-forward model of visual attention for goal-oriented and free-viewing tasks

A purely bottom-up model of visual attention is proposed and compared to five state-of-the-art models. The role of the low-level visual features is examined in two contexts. Two datasets are used: one containing data coming from an eye tracking ...
Do video coding impairments disturb the visual attention deployment?

The visual attention deployment in a visual scene is contingent upon a number of factors. The relationship between the observer's attention and the visual quality of the scene is investigated in this paper: can a video artifact disturb the observer's ...
Perception-oriented video saliency detection via spatio-temporal attention analysis

Human visual system actively seeks salient regions and movements in video sequences to reduce the search effort. Computational visual saliency detection model provides important information for semantic understanding in many real world applications. In ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '16: Proceedings of the 24th ACM international conference on Multimedia

October 2016

1542 pages

ISBN:9781450336031

DOI:10.1145/2964284

General Chairs:
Alan Hanjalic
Delft University of Technology
,
Cees Snoek
Qualcomm Research Netherlands / University of Amsterdam
,
Marcel Worring
University of Amsterdam
,
Moderator:
Dick Bulterman
CWI / VU University Amsterdam
,
Program Chairs:
Benoit Huet
EURECOM
,
Aisling Kelliher
Virginia Tech
,
Yiannis Kompatsiaris
CERTH-ITI
,
Jin Li
Microsoft

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 October 2016

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Funding Sources

National Science Foundation

Conference

MM '16

Sponsor:

SIGMM

MM '16: ACM Multimedia Conference

October 15 - 19, 2016

Amsterdam, The Netherlands

Acceptance Rates

MM '16 Paper Acceptance Rate 52 of 237 submissions, 22%;

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

9
Total Citations
View Citations
456
Total Downloads

Downloads (Last 12 months)53
Downloads (Last 6 weeks)7

Reflects downloads up to 11 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Jiang HSarma AFan MRyoo JArunachalam MNaveen SKandemir M(2021)Morphable Convolutional Neural Network for Biomedical Image Segmentation2021 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE51398.2021.9474153(1522-1525)Online publication date: 1-Feb-2021
https://doi.org/10.23919/DATE51398.2021.9474153
Dai PZhang HCao X(2020)Deep Multi-Scale Context Aware Feature Aggregation for Curved Scene Text DetectionIEEE Transactions on Multimedia10.1109/TMM.2019.295297822:8(1969-1984)Online publication date: Aug-2020
https://doi.org/10.1109/TMM.2019.2952978
Fang KKifer DLawson KShen C(2020)Evaluating the Potential and Challenges of an Uncertainty Quantification Method for Long Short‐Term Memory Models for Soil Moisture PredictionsWater Resources Research10.1029/2020WR02809556:12Online publication date: Dec-2020
https://doi.org/10.1029/2020WR028095
Zhang CLiang BHuang ZEn MHan JDing EDing X(2019)Look More Than Once: An Accurate Detector for Text of Arbitrary Shapes2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR.2019.01080(10544-10553)Online publication date: Jun-2019
https://doi.org/10.1109/CVPR.2019.01080
Zhang Y(2019)Classification and Diagnosis of Thyroid Carcinoma Using Reinforcement Residual Network with Visual Attention Mechanisms in Ultrasound ImagesJournal of Medical Systems10.1007/s10916-019-1448-543:11Online publication date: 14-Oct-2019
https://doi.org/10.1007/s10916-019-1448-5
Shen CLaloy EElshorbagy AAlbert ABales JChang FGanguly SHsu KKifer DFang ZFang KLi DLi XTsai W(2018)HESS Opinions: Incubating deep-learning-powered hydrologic science advances as a communityHydrology and Earth System Sciences10.5194/hess-22-5639-201822:11(5639-5656)Online publication date: 1-Nov-2018
https://doi.org/10.5194/hess-22-5639-2018
Kumar DSingh R(2018)Multi Orientation Text Detection in Natural ImageryInternational Journal of Computer Vision and Image Processing10.4018/IJCVIP.20181001048:4(41-56)Online publication date: 1-Oct-2018
https://dl.acm.org/doi/10.4018/IJCVIP.2018100104
Chiatti ACho MGagneja AYang XBrinberg MRoehrick KChoudhury SRam NReeves BGiles CHaddad HWainwright RChbeir R(2018)Text extraction and retrieval from smartphone screenshotsProceedings of the 33rd Annual ACM Symposium on Applied Computing10.1145/3167132.3167236(948-955)Online publication date: 9-Apr-2018
https://dl.acm.org/doi/10.1145/3167132.3167236
Wei YShen WZeng DYe LZhang Z(2018)Multi-oriented text detection from natural scene images based on a CNN and pruning non-adjacent graph edgesSignal Processing: Image Communication10.1016/j.image.2018.02.01664(89-98)Online publication date: May-2018
https://doi.org/10.1016/j.image.2018.02.016

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents