research-article

Arbitrary-Oriented Scene Text Detection via Rotation Proposals

Authors:

Xiangyang XueAuthors Info & Claims

IEEE Transactions on Multimedia, Volume 20, Issue 11

Pages 3111 - 3122

https://doi.org/10.1109/TMM.2018.2818020

Published: 01 November 2018 Publication History

Abstract

This paper introduces a novel rotation-based framework for arbitrary-oriented text detection in natural scene images. We present the <italic>Rotation Region Proposal Networks</italic>, which are designed to generate inclined proposals with text orientation angle information. The angle information is then adapted for bounding box regression to make the proposals more accurately fit into the text region in terms of the orientation. The <italic>Rotation Region-of-Interest</italic> pooling layer is proposed to project arbitrary-oriented proposals to a feature map for a text region classifier. The whole framework is built upon a region-proposal-based architecture, which ensures the computational efficiency of the arbitrary-oriented text detection compared with previous text detection systems. We conduct experiments using the rotation-based framework on three real-world scene text detection datasets and demonstrate its superiority in terms of effectiveness and efficiency over previous approaches.

References

[1]

S. Karaoglu, R. Tao, T. Gevers, and A. W. M. Smeulders, “Words matter: Scene text for image classification and retrieval,” IEEE Trans. Multimedia , vol. 19, no. 5, pp. 1063–1076, May 2017.

Digital Library

[2]

X. Bai, M. Yang, P. Lyu, and Y. Xu, “Integrating scene text and visual appearance for fine-grained image classification with convolutional neural networks,” arXiv:1704.04613 , 2017.

[3]

X.-C. Yin, Z.-Y. Zuo, S. Tian, and C.-L. Liu, “Text detection, tracking and recognition in video: A comprehensive survey,” IEEE Trans. Image Process., vol. 25, no. 6, pp. 2752–2773, Jun. 2016.

Digital Library

[4]

X. Liu and W. Wang, “ Robustly extracting captions in videos based on stroke-like edges and spatio-temporal analysis,” IEEE Trans. Multimedia, vol. 14, no. 2, pp. 482–489, Apr. 2012.

Digital Library

[5]

K. L. Bouman, G. Abdollahian, M. Boutin, and E. J. Delp, “A low complexity sign detection and text localization method for mobile applications,” IEEE Trans. Multimedia , vol. 13, no. 5, pp. 922–934, Oct. 2011.

Digital Library

[6]

X. Chen and A. L. Yuille, “Detecting and reading text in natural scenes,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.%, 2004, pp. 366–373.

[7]

K. Wang and S. Belongie, “Word spotting in the wild,” in Proc. Eur. Conf. Comput. Vis. , 2010, pp. 591–604.

[8]

L. Neumann and J. Matas, “A method for text localization and recognition in real-world images,” in Proc. 10th Asian Conf. Comp. Vis., LNCS, vol. 6494, 2010, pp. 770 –783.

[9]

A. Bissacco, M. Cummins, Y. Netzer, and H. Neven, “Photoocr: Reading text in uncontrolled conditions,” in Proc. IEEE Int. Conf. Comput. Vis.%, 2013, pp. 785–792.

[10]

M. Jaderberg, A. Vedaldi, and A. Zisserman, “Deep features for text spotting,” in Proc. Eur. Conf. Comput. Vis., 2014, pp. 512–528.

[11]

W. Huang, Y. Qiao, and X. Tang, “Robust scene text detection with convolution neural network induced MSER trees,” in Proc. Eur. Conf. Comput. Vis., 2014, pp. 497–511.

[12]

L. Wu, P. Shivakumara, T. Lu, and C. L. Tan, “A new technique for multi-oriented scene text line detection and tracking in video,” IEEE Trans. Multimedia, vol. 17, no. 8, pp. 1137–1152, Aug. 2015.

Digital Library

[13]

S. Tian et al. “Text flow: A unified text detection system in natural scene images,” in Proc. IEEE Int. Conf. Comput. Vis., 2015, pp. 4651–4659.

[14]

D. Bazazian et al., “Improving text proposals for scene images with fully convolutional networks,” arXiv:1702.05089, 2017.

[15]

X. Ren et al., “A convolutional neural network-based chinese text detection algorithm via text structure modeling,” IEEE Trans. Multimedia , vol. 19, no. 3, pp. 506–518, Mar. 2017.

Digital Library

[16]

M. Liao, B. Shi, X. Bai, X. Wang, and W. Liu, “Textboxes: A fast text detector with a single deep neural network,” in Proc. AAAI Conf. Artif. Intell., 2017, pp. 4161– 4167.

[17]

Z. Zhang et al. “Multi-oriented text detection with fully convolutional networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 4159–4167.

[18]

C. Yao et al., “Scene text detection via holistic, multi-channel prediction,” arXiv:1606.09002, 2016.

[19]

T. He, W. Huang, Y. Qiao, and J. Yao, “Accurate text localization in natural image with cascaded convolutional text network,” arXiv:1603.09423, 2016.

[20]

S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real-time object detection with region proposal networks.” IEEE Trans. Pattern Anal. Mach. Intell. , vol. 39, no. 6, pp. 1137–1149, 2016.

Digital Library

[21]

C. Yao, X. Bai, W. Liu, Y. Ma, and Z. Tu, “Detecting texts of arbitrary orientations in natural images,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.%, 2012, pp. 1083–1090.

[22]

D. Karatzass, F. Shafait, and S. Uchida, “Icdar 2013 robust reading competition,” in Proc. Int. Conf. Document Anal. Recognit., 2013, pp. 1484–1493.

[23]

D. Karatzas, L. Gomez-Bigorda, and A. Nicolaou, “ Icdar 2015 competition on robust reading,” in Proc. Int. Conf. Document Anal. Recognit., 2015, pp. 1156–1160.

[24]

D. Chen and J. Luettin, “A survey of text detection and recognition in images and videos,” Tech. Rep., 2000.

[25]

K. Jung, K. I. Kim, and A. K. Jain, “Text information extraction in images and video: A survey,” Pattern Recognit., vol. 37, no. 5, pp. 977–997, 2004.

[26]

S. Uchida, “Text localization and recognition in images and video,” in Handbook of Document Image Processing and Recognition, New York, USA: Springer, 2014, pp. 843– 883.

[27]

Q. Ye and D. Doermann, “ Text detection and recognition in imagery: A survey,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 37, no. 7, pp. 1480 –1500, Jul. 2015.

Digital Library

[28]

K. I. Kim, K. Jung, and J. H. Kim, “ Texture-based approach for text detection in images using support vector machines and continuously adaptive mean shift algorithm,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 25, no. 12, pp. 1631–1639, Dec. 2003.

Digital Library

[29]

X. Chen and A. L. Yuille, “Detecting and reading text in natural scenes,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.%, 2004, vol. 2, pp. 366–373.

[30]

L. Neumann and J. Matas, “Scene text localization and recognition with oriented stroke detection,” in Proc. IEEE Int. Conf. Comput. Vis.%, 2013, pp. 97 –104.

[31]

T. Wang, D. J. Wu, A. Coates, and A. Y. Ng, “End-to-end text recognition with convolutional neural networks,” in Proc. Int. Conf. Pattern Recognit., 2012, pp. 3304–3308.

[32]

B. Epshtein, E. Ofek, and Y. Wexler, “Detecting text in natural scenes with stroke width transform,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.%, 2010, pp. 2963–2970.

[33]

J. Matas, O. Chum, M. Urban, and T. Pajdla, “Robust wide-baseline stereo from maximally stable extremal regions,” Image Vis. Comput., vol. 22, no. 10, pp. 761–767, 2004.

[34]

A. Shahab, F. Shafait, and A. Dengel, “Icdar 2011 robust reading competition challenge 2: Reading text in scene images,” in Proc. Int. Conf. Document Anal. Recognit., 2011, pp. 1491–1496.

[35]

S. Zhang, M. Lin, T. Chen, L. Jin, and L. Lin, “Character proposal network for robust text extraction,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process.%, 2016, pp. 2633–2637.

[36]

A. Risnumawan, P. Shivakumara, C. S. Chan, and C. L. Tan, “A robust arbitrary text detection system for natural scene images,” Expert Syst. Appl., vol. 41, no. 18, pp. 8027–8048, 2014.

[37]

H. Cho, M. Sung, and B. Jun, “Canny text detector: Fast and robust scene text localization algorithm,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.%, 2016, pp. 3566–3573.

[38]

T. He, W. Huang, Y. Qiao, and J. Yao, “Text-attentional convolutional neural network for scene text detection,” IEEE Trans. Image Process., vol. 25, no. 6, pp. 2529–2541, Jun. 2016.

Digital Library

[39]

C. L. Zitnick and P. Dollár, “Edge boxes: Locating object proposals from edges,” in Proc. Eur. Conf. Comput. Vis., 2014, pp. 391–405.

[40]

J. R. R. Uijlings, K. E. A. V. D. Sande, T. Gevers, and A. W. M. Smeulders, “Selective search for object recognition,” Int. J. Comput. Vis., vol. 104, no. 2, pp. 154–171, 2013.

Digital Library

[41]

M. Jaderberg, K. Simonyan, A. Vedaldi, and A. Zisserman, “Reading text in the wild with convolutional neural networks,” Int. J. Comput. Vis., vol. 116, no. 1, pp. 1–20, 2014.

[42]

Z. Tian, W. Huang, T. He, P. He, and Y. Qiao, “Detecting text in natural image with connectionist text proposal network,” in Proc. Eur. Conf. Comput. Vis., 2016, pp. 56 –72.

[43]

Z. Zhong, L. Jin, S. Zhang, and Z. Feng, “Deeptext: A unified framework for text proposal generation and text detection in natural images,” arXiv:1605.07314, 2016.

[44]

H. Jiang and E. G. Learned-Miller, “Face detection with the faster R-CNN,” arXiv:1606.03473, 2016.

[45]

L. Wang et al., “Evolving boxes for fast vehicle detection,” in Proc. IEEE Int. Conf. Multimedia Expo.%, 2017, pp. 1135–1140.

[46]

M. Jaderberg, K. Simonyan, A. Zisserman, and K. Kavukcuoglu, “Spatial transformer networks,” in Proc. Adv. Neural Inf. Process. Syst., 2015, pp. 2017 –2025.

[47]

K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in Proc. Int. Conf. Learn. Represent., arXiv preprint arXiv: 1409.1556, 2014 . [Online]. Available: https://arxiv.org/abs/1409.1556

[48]

R. Girshick, “Fast R-CNN,” in Proc. IEEE Int. Conf. Comput. Vis., 2015, pp. 1440–1448.

[49]

D. A. Plaisted and J. Hong, “A heuristic triangulation algorithm,” J. Algorithms, vol. 8, no. 3, pp. 405–437, 1987.

Digital Library

[50]

J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016.

[51]

C. Yao, X. Bai, and W. Liu, “A unified framework for multioriented text detection and recognition.” IEEE Trans. Image Process., vol. 23, no. 11, pp. 4737–4749, Nov. 2014.

[52]

S. M. Lucas et al., “ICDAR 2003 robust reading competitions,” in Proc. 17th Int. Conf. Document Anal. Recognit., 2003, pp. 682–687.

[53]

X. C. Yin, X. Yin, K. Huang, and H. W. Hao, “Robust text detection in natural scene images.” IEEE Trans. Pattern Anal. Mach. Intell., vol. 36, no. 5, pp. 970–983, May 2014.

[54]

L. Kang, Y. Li, and D. Doermann, “ Orientation robust text line detection in natural images,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.%, 2014, pp. 4034–4041.

[55]

A. Gupta, A. Vedaldi, and A. Zisserman, “ Synthetic data for text localisation in natural images,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.%, 2016, pp. 2315–2324.

[56]

X. C. Yin, W. Y. Pei, J. Zhang, and H. W. Hao, “Multi-orientation scene text detection with adaptive clustering,” IEEE Trans. Pattern Anal. Mach. Intell. , vol. 37, no. 9, pp. 1930–1937, Sep. 2015.

Digital Library

[57]

Y. Liu and L. Jin, “ Deep matching prior network: Toward tighter multi-oriented text detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.%, 2017, pp. 3454–3461.

[58]

S. Qin and R. Manduchi, “ Cascaded segmentation-detection networks for word-level text spotting,” in Proc. Int. Conf. Document Anal. Recognit., 2017, pp. 1275–1282 .

[59]

B. Shi, X. Bai, and S. Belongie, “Detecting oriented text in natural images by linking segments,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.%, 2017, pp. 3482–3490.

Cited By

Gan LTan XHu L(2024)GWSAI Communications10.3233/AIC-23013537:1(169-183)Online publication date: 21-Mar-2024
https://dl.acm.org/doi/10.3233/AIC-230135
Leng JYe YMo MGao CGan JXiao BGao X(2024)Recent Advances for Aerial Object Detection: A SurveyACM Computing Surveys10.1145/366459856:12(1-36)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3664598
Fu XZhou R(2024)Shape Prior Fusion for Oracle Bone Inscriptions DetectionProceedings of the 2024 7th International Conference on Image and Graphics Processing10.1145/3647649.3647711(394-401)Online publication date: 19-Jan-2024
https://dl.acm.org/doi/10.1145/3647649.3647711
Show More Cited By

Index Terms

Arbitrary-Oriented Scene Text Detection via Rotation Proposals
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Object recognition
  2. Machine learning

Index terms have been assigned to the content through auto-classification.

Recommendations

Rotation and translation invariants of Gaussian-Hermite moments

Geometric moment invariants are widely used in many fields of image analysis and pattern recognition since their first introduction by Hu in 1962. A few years ago, Flusser has proved how to find the independent and complete set of geometric moment ...
Multi-Lingual Scene Text Detection Using One-Class Classifier

The main purpose of scene text recognition is to detect texts in a given image. The problem of text detection and recognition in such images has gained great attention in recent years due to rising demand of several applications like visual based ...
Arbitrary-shaped scene text detection with keypoint-based shape representation
Abstract
Recently scene text detection has become a hot research topic. Arbitrary-shaped text detection is more challenging due to the irregular geometry of the texts such as long curved shapes. Most existing works attempt to solve the problem by using ...

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Multimedia

IEEE Transactions on Multimedia Volume 20, Issue 11

Nov. 2018

312 pages

ISSN:1520-9210

Issue’s Table of Contents

1520-9210 © 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications standards/publications/rights/index.html for more information.

Publisher

IEEE Press

Publication History

Published: 01 November 2018

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

224
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 25 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Gan LTan XHu L(2024)GWSAI Communications10.3233/AIC-23013537:1(169-183)Online publication date: 21-Mar-2024
https://dl.acm.org/doi/10.3233/AIC-230135
Leng JYe YMo MGao CGan JXiao BGao X(2024)Recent Advances for Aerial Object Detection: A SurveyACM Computing Surveys10.1145/366459856:12(1-36)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3664598
Fu XZhou R(2024)Shape Prior Fusion for Oracle Bone Inscriptions DetectionProceedings of the 2024 7th International Conference on Image and Graphics Processing10.1145/3647649.3647711(394-401)Online publication date: 19-Jan-2024
https://dl.acm.org/doi/10.1145/3647649.3647711
Yu YDa F(2024)On Boundary Discontinuity in Angle Regression Based Arbitrary Oriented Object DetectionIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2024.337877746:10(6494-6508)Online publication date: 1-Oct-2024
https://dl.acm.org/doi/10.1109/TPAMI.2024.3378777
Zhang SYang CZhu XYin X(2024)Arbitrary Shape Text Detection via Boundary TransformerIEEE Transactions on Multimedia10.1109/TMM.2023.328665726(1747-1760)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TMM.2023.3286657
Zhang SYang CZhu XZhou HWang HYin X(2024)Inverse-Like Antagonistic Scene Text Spotting via Reading-Order Estimation and Dynamic SamplingIEEE Transactions on Image Processing10.1109/TIP.2024.335239933(825-839)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TIP.2024.3352399
Murrugarra-Llerena JKirsten LZeni LJung C(2024)Probabilistic Intersection-Over-Union for Training and Evaluation of Oriented Object DetectorsIEEE Transactions on Image Processing10.1109/TIP.2023.334869733(671-681)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TIP.2023.3348697
Shao ZSu YZhou YMeng FZhu HLiu BYao R(2024)CT-Net: Arbitrary-Shaped Text Detection via Contour TransformerIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.329908734:3(1815-1826)Online publication date: 1-Mar-2024
https://dl.acm.org/doi/10.1109/TCSVT.2023.3299087
Xu JLin ALi JLu G(2024)Text Position-Aware Pixel Aggregation Network With Adaptive Gaussian Threshold: Detecting Text in the WildIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.328509634:1(286-298)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TCSVT.2023.3285096
Yang ZWen LDeng JTao JLiu ZLiu D(2024)FCOS-Based Anchor-Free Ship Detection Method for Consumer Electronic UAV SystemsIEEE Transactions on Consumer Electronics10.1109/TCE.2024.337116370:2(4988-4997)Online publication date: 19-Jun-2024
https://dl.acm.org/doi/10.1109/TCE.2024.3371163
Show More Cited By

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents