Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content

Arbitrary-Oriented Scene Text Detection via Rotation Proposals

Published: 01 November 2018 Publication History


This paper introduces a novel rotation-based framework for arbitrary-oriented text detection in natural scene images. We present the <italic>Rotation Region Proposal Networks</italic>, which are designed to generate inclined proposals with text orientation angle information. The angle information is then adapted for bounding box regression to make the proposals more accurately fit into the text region in terms of the orientation. The <italic>Rotation Region-of-Interest</italic> pooling layer is proposed to project arbitrary-oriented proposals to a feature map for a text region classifier. The whole framework is built upon a region-proposal-based architecture, which ensures the computational efficiency of the arbitrary-oriented text detection compared with previous text detection systems. We conduct experiments using the rotation-based framework on three real-world scene text detection datasets and demonstrate its superiority in terms of effectiveness and efficiency over previous approaches.


S. Karaoglu, R. Tao, T. Gevers, and A. W. M. Smeulders, “Words matter: Scene text for image classification and retrieval,” IEEE Trans. Multimedia , vol. 19, no. 5, pp. 1063–1076, May 2017.
X. Bai, M. Yang, P. Lyu, and Y. Xu, “Integrating scene text and visual appearance for fine-grained image classification with convolutional neural networks,” arXiv:1704.04613 , 2017.
X.-C. Yin, Z.-Y. Zuo, S. Tian, and C.-L. Liu, “Text detection, tracking and recognition in video: A comprehensive survey,” IEEE Trans. Image Process., vol. 25, no. 6, pp. 2752–2773, Jun. 2016.
X. Liu and W. Wang, “ Robustly extracting captions in videos based on stroke-like edges and spatio-temporal analysis,” IEEE Trans. Multimedia, vol. 14, no. 2, pp. 482–489, Apr. 2012.
K. L. Bouman, G. Abdollahian, M. Boutin, and E. J. Delp, “A low complexity sign detection and text localization method for mobile applications,” IEEE Trans. Multimedia , vol. 13, no. 5, pp. 922–934, Oct. 2011.
X. Chen and A. L. Yuille, “Detecting and reading text in natural scenes,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.%, 2004, pp. 366–373.
K. Wang and S. Belongie, “Word spotting in the wild,” in Proc. Eur. Conf. Comput. Vis. , 2010, pp. 591–604.
L. Neumann and J. Matas, “A method for text localization and recognition in real-world images,” in Proc. 10th Asian Conf. Comp. Vis., LNCS, vol. 6494, 2010, pp. 770 –783.
A. Bissacco, M. Cummins, Y. Netzer, and H. Neven, “Photoocr: Reading text in uncontrolled conditions,” in Proc. IEEE Int. Conf. Comput. Vis.%, 2013, pp. 785–792.
M. Jaderberg, A. Vedaldi, and A. Zisserman, “Deep features for text spotting,” in Proc. Eur. Conf. Comput. Vis., 2014, pp. 512–528.
W. Huang, Y. Qiao, and X. Tang, “Robust scene text detection with convolution neural network induced MSER trees,” in Proc. Eur. Conf. Comput. Vis., 2014, pp. 497–511.
L. Wu, P. Shivakumara, T. Lu, and C. L. Tan, “A new technique for multi-oriented scene text line detection and tracking in video,” IEEE Trans. Multimedia, vol. 17, no. 8, pp. 1137–1152, Aug. 2015.
S. Tian et al. “Text flow: A unified text detection system in natural scene images,” in Proc. IEEE Int. Conf. Comput. Vis., 2015, pp. 4651–4659.
D. Bazazian et al., “Improving text proposals for scene images with fully convolutional networks,” arXiv:1702.05089, 2017.
X. Ren et al., “A convolutional neural network-based chinese text detection algorithm via text structure modeling,” IEEE Trans. Multimedia , vol. 19, no. 3, pp. 506–518, Mar. 2017.
M. Liao, B. Shi, X. Bai, X. Wang, and W. Liu, “Textboxes: A fast text detector with a single deep neural network,” in Proc. AAAI Conf. Artif. Intell., 2017, pp. 4161– 4167.
Z. Zhang et al. “Multi-oriented text detection with fully convolutional networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 4159–4167.
C. Yao et al., “Scene text detection via holistic, multi-channel prediction,” arXiv:1606.09002, 2016.
T. He, W. Huang, Y. Qiao, and J. Yao, “Accurate text localization in natural image with cascaded convolutional text network,” arXiv:1603.09423, 2016.
S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real-time object detection with region proposal networks.” IEEE Trans. Pattern Anal. Mach. Intell. , vol. 39, no. 6, pp. 1137–1149, 2016.
C. Yao, X. Bai, W. Liu, Y. Ma, and Z. Tu, “Detecting texts of arbitrary orientations in natural images,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.%, 2012, pp. 1083–1090.
D. Karatzass, F. Shafait, and S. Uchida, “Icdar 2013 robust reading competition,” in Proc. Int. Conf. Document Anal. Recognit., 2013, pp. 1484–1493.
D. Karatzas, L. Gomez-Bigorda, and A. Nicolaou, “ Icdar 2015 competition on robust reading,” in Proc. Int. Conf. Document Anal. Recognit., 2015, pp. 1156–1160.
D. Chen and J. Luettin, “A survey of text detection and recognition in images and videos,” Tech. Rep., 2000.
K. Jung, K. I. Kim, and A. K. Jain, “Text information extraction in images and video: A survey,” Pattern Recognit., vol. 37, no. 5, pp. 977–997, 2004.
S. Uchida, “Text localization and recognition in images and video,” in Handbook of Document Image Processing and Recognition, New York, USA: Springer, 2014, pp. 843– 883.
Q. Ye and D. Doermann, “ Text detection and recognition in imagery: A survey,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 37, no. 7, pp. 1480 –1500, Jul. 2015.
K. I. Kim, K. Jung, and J. H. Kim, “ Texture-based approach for text detection in images using support vector machines and continuously adaptive mean shift algorithm,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 25, no. 12, pp. 1631–1639, Dec. 2003.
X. Chen and A. L. Yuille, “Detecting and reading text in natural scenes,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.%, 2004, vol. 2, pp. 366–373.
L. Neumann and J. Matas, “Scene text localization and recognition with oriented stroke detection,” in Proc. IEEE Int. Conf. Comput. Vis.%, 2013, pp. 97 –104.
T. Wang, D. J. Wu, A. Coates, and A. Y. Ng, “End-to-end text recognition with convolutional neural networks,” in Proc. Int. Conf. Pattern Recognit., 2012, pp. 3304–3308.
B. Epshtein, E. Ofek, and Y. Wexler, “Detecting text in natural scenes with stroke width transform,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.%, 2010, pp. 2963–2970.
J. Matas, O. Chum, M. Urban, and T. Pajdla, “Robust wide-baseline stereo from maximally stable extremal regions,” Image Vis. Comput., vol. 22, no. 10, pp. 761–767, 2004.
A. Shahab, F. Shafait, and A. Dengel, “Icdar 2011 robust reading competition challenge 2: Reading text in scene images,” in Proc. Int. Conf. Document Anal. Recognit., 2011, pp. 1491–1496.
S. Zhang, M. Lin, T. Chen, L. Jin, and L. Lin, “Character proposal network for robust text extraction,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process.%, 2016, pp. 2633–2637.
A. Risnumawan, P. Shivakumara, C. S. Chan, and C. L. Tan, “A robust arbitrary text detection system for natural scene images,” Expert Syst. Appl., vol. 41, no. 18, pp. 8027–8048, 2014.
H. Cho, M. Sung, and B. Jun, “Canny text detector: Fast and robust scene text localization algorithm,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.%, 2016, pp. 3566–3573.
T. He, W. Huang, Y. Qiao, and J. Yao, “Text-attentional convolutional neural network for scene text detection,” IEEE Trans. Image Process., vol. 25, no. 6, pp. 2529–2541, Jun. 2016.
C. L. Zitnick and P. Dollár, “Edge boxes: Locating object proposals from edges,” in Proc. Eur. Conf. Comput. Vis., 2014, pp. 391–405.
J. R. R. Uijlings, K. E. A. V. D. Sande, T. Gevers, and A. W. M. Smeulders, “Selective search for object recognition,” Int. J. Comput. Vis., vol. 104, no. 2, pp. 154–171, 2013.
M. Jaderberg, K. Simonyan, A. Vedaldi, and A. Zisserman, “Reading text in the wild with convolutional neural networks,” Int. J. Comput. Vis., vol. 116, no. 1, pp. 1–20, 2014.
Z. Tian, W. Huang, T. He, P. He, and Y. Qiao, “Detecting text in natural image with connectionist text proposal network,” in Proc. Eur. Conf. Comput. Vis., 2016, pp. 56 –72.
Z. Zhong, L. Jin, S. Zhang, and Z. Feng, “Deeptext: A unified framework for text proposal generation and text detection in natural images,” arXiv:1605.07314, 2016.
H. Jiang and E. G. Learned-Miller, “Face detection with the faster R-CNN,” arXiv:1606.03473, 2016.
L. Wang et al., “Evolving boxes for fast vehicle detection,” in Proc. IEEE Int. Conf. Multimedia Expo.%, 2017, pp. 1135–1140.
M. Jaderberg, K. Simonyan, A. Zisserman, and K. Kavukcuoglu, “Spatial transformer networks,” in Proc. Adv. Neural Inf. Process. Syst., 2015, pp. 2017 –2025.
K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in Proc. Int. Conf. Learn. Represent., arXiv preprint arXiv: 1409.1556, 2014 . [Online]. Available: https://arxiv.org/abs/1409.1556
R. Girshick, “Fast R-CNN,” in Proc. IEEE Int. Conf. Comput. Vis., 2015, pp. 1440–1448.
D. A. Plaisted and J. Hong, “A heuristic triangulation algorithm,” J. Algorithms, vol. 8, no. 3, pp. 405–437, 1987.
J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016.
C. Yao, X. Bai, and W. Liu, “A unified framework for multioriented text detection and recognition.” IEEE Trans. Image Process., vol. 23, no. 11, pp. 4737–4749, Nov. 2014.
S. M. Lucas et al., “ICDAR 2003 robust reading competitions,” in Proc. 17th Int. Conf. Document Anal. Recognit., 2003, pp. 682–687.
X. C. Yin, X. Yin, K. Huang, and H. W. Hao, “Robust text detection in natural scene images.” IEEE Trans. Pattern Anal. Mach. Intell., vol. 36, no. 5, pp. 970–983, May 2014.
L. Kang, Y. Li, and D. Doermann, “ Orientation robust text line detection in natural images,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.%, 2014, pp. 4034–4041.
A. Gupta, A. Vedaldi, and A. Zisserman, “ Synthetic data for text localisation in natural images,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.%, 2016, pp. 2315–2324.
X. C. Yin, W. Y. Pei, J. Zhang, and H. W. Hao, “Multi-orientation scene text detection with adaptive clustering,” IEEE Trans. Pattern Anal. Mach. Intell. , vol. 37, no. 9, pp. 1930–1937, Sep. 2015.
Y. Liu and L. Jin, “ Deep matching prior network: Toward tighter multi-oriented text detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.%, 2017, pp. 3454–3461.
S. Qin and R. Manduchi, “ Cascaded segmentation-detection networks for word-level text spotting,” in Proc. Int. Conf. Document Anal. Recognit., 2017, pp. 1275–1282 .
B. Shi, X. Bai, and S. Belongie, “Detecting oriented text in natural images by linking segments,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.%, 2017, pp. 3482–3490.

Cited By

View all

Index Terms

  1. Arbitrary-Oriented Scene Text Detection via Rotation Proposals
      Index terms have been assigned to the content through auto-classification.



      Information & Contributors


      Published In

      cover image IEEE Transactions on Multimedia
      IEEE Transactions on Multimedia  Volume 20, Issue 11
      Nov. 2018
      312 pages


      IEEE Press

      Publication History

      Published: 01 November 2018


      • Research-article


      Other Metrics

      Bibliometrics & Citations


      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 25 Oct 2024

      Other Metrics


      Cited By

      View all
      • (2024)GWSAI Communications10.3233/AIC-23013537:1(169-183)Online publication date: 21-Mar-2024
      • (2024)Recent Advances for Aerial Object Detection: A SurveyACM Computing Surveys10.1145/366459856:12(1-36)Online publication date: 13-May-2024
      • (2024)Shape Prior Fusion for Oracle Bone Inscriptions DetectionProceedings of the 2024 7th International Conference on Image and Graphics Processing10.1145/3647649.3647711(394-401)Online publication date: 19-Jan-2024
      • (2024)On Boundary Discontinuity in Angle Regression Based Arbitrary Oriented Object DetectionIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2024.337877746:10(6494-6508)Online publication date: 1-Oct-2024
      • (2024)Arbitrary Shape Text Detection via Boundary TransformerIEEE Transactions on Multimedia10.1109/TMM.2023.328665726(1747-1760)Online publication date: 1-Jan-2024
      • (2024)Inverse-Like Antagonistic Scene Text Spotting via Reading-Order Estimation and Dynamic SamplingIEEE Transactions on Image Processing10.1109/TIP.2024.335239933(825-839)Online publication date: 1-Jan-2024
      • (2024)Probabilistic Intersection-Over-Union for Training and Evaluation of Oriented Object DetectorsIEEE Transactions on Image Processing10.1109/TIP.2023.334869733(671-681)Online publication date: 1-Jan-2024
      • (2024)CT-Net: Arbitrary-Shaped Text Detection via Contour TransformerIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.329908734:3(1815-1826)Online publication date: 1-Mar-2024
      • (2024)Text Position-Aware Pixel Aggregation Network With Adaptive Gaussian Threshold: Detecting Text in the WildIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.328509634:1(286-298)Online publication date: 1-Jan-2024
      • (2024)FCOS-Based Anchor-Free Ship Detection Method for Consumer Electronic UAV SystemsIEEE Transactions on Consumer Electronics10.1109/TCE.2024.337116370:2(4988-4997)Online publication date: 19-Jun-2024
      • Show More Cited By

      View Options

      View options

      Get Access

      Login options







      Share this Publication link

      Share on social media