Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3474085.3478328acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
short-paper

MMOCR: A Comprehensive Toolbox for Text Detection, Recognition and Understanding

Published: 17 October 2021 Publication History
  • Get Citation Alerts
  • Abstract

    We present MMOCR---an open-source toolbox which provides a comprehensive pipeline for text detection and recognition, as well as their downstream tasks such as named entity recognition and key information extraction. MMOCR implements 14 state-of-the-art algorithms, which is significantly more than all the existing open-source OCR projects we are aware of to date. To facilitate future research and industrial applications of text recognition-related problems, we also provide a large number of trained models and detailed benchmarks to give insights into the performance of text detection, recognition and understanding. MMOCR is publicly released at https://github.com/open-mmlab/mmocr.

    References

    [1]
    Youngmin Baek, Bado Lee, Dongyoon Han, Sangdoo Yun, and Hwalsuk Lee. 2019. Character region awareness for text detection. In CVPR. 9365--9374.
    [2]
    Fedor Borisyuk, Albert Gordo, and Viswanath Sivakumar. 2019. Rosetta: Large scale system for text detection and recognition in images. ACM SIGKDD (2019), 71--79.
    [3]
    Jason P.C. Chiu and Eric Nichols. 2016. Named Entity Recognition with Bidirectional LSTM-CNNs. Transactions of the Association for Computational Linguistics, Vol. 4 (2016), 357--370.
    [4]
    Jiaqi Duan, Youjiang Xu, Zhanghui Kuang, Xiaoyu Yue, Hongbin Sun, Yue Guan, and Wayne Zhang. 2019. Geometry normalization networks for accurate scene text detection. In ICCV. 9136--9145.
    [5]
    Anoop Raveendra Katti Faddoul, Christian Reisswig Cordula Guder, Sebastian Brarda, Steffen Bickel, Johannes Hö hne, and Jean Baptiste. 2018. Chargrid: Towards Understanding 2D Documents. In EMNLP. 4459--4469.
    [6]
    Ross Girshick. 2015. Fast R-CNN. In ICCV. 1440--1448.
    [7]
    Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017. Mask R-CNN. In ICCV. 2961--2969.
    [8]
    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In CVPR. 770--778.
    [9]
    Yuanduo Hong, Huihui Pan, Weichao Sun, and Yisong Jia. 2021. Deep Dual-resolution Networks for Real-time and Accurate Semantic Segmentation of Road Scenes. CoRR, Vol. abs/2101.06085 (2021).
    [10]
    Guillaume Lample, Miguel Ballesteros, Sandeep Subramanian, Kazuya Kawakami, and Chris Dyer. 2016. Neural architectures for named entity recognition. arXiv preprint arXiv:1603.01360 (2016).
    [11]
    Hui Li, Peng Wang, Chunhua Shen, and Guyu Zhang. 2019. Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition. AAAI (2019), 8610--8617.
    [12]
    Minghui Liao, Zhaoyi Wan, Cong Yao, Kai Chen, and Xiang Bai. 2020. Real-Time Scene Text Detection with Differentiable Binarization. In AAAI. 11474--11481.
    [13]
    Minghui Liao, Jian Zhang, Zhaoyi Wan, Fengming Xie, Jiajun Liang, Pengyuan Lyu, Cong Yao, and Xiang Bai. 2019. Scene text recognition from two-dimensional perspective. AAAI (2019), 8714--8721.
    [14]
    Jingchao Liu, Xuebo Liu, Jie Sheng, Ding Liang, Xin Li, and Qingjie Liu. 2019. Pyramid Mask Text Detector. CoRR (2019).
    [15]
    Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng Yang Fu, and Alexander C. Berg. 2016a. SSD: single shot multibox detector. In ECCV. 21--37.
    [16]
    Wei Liu, Chaofeng Chen, Kwan-Yee K Wong, Zhizhong Su, and Junyu Han. 2016b. STAR-Net: A SpaTial Attention Residue Network for Scene Text Recognition. In BMVC.
    [17]
    Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation. In CVPR. 3431--3440.
    [18]
    Shangbang Long, Jiaqiang Ruan, Wenjie Zhang, Xin He, Wenhao Wu, and Cong Yao. 2018. TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes. In ECCV. 19--35.
    [19]
    Rasmus Berg Palm, Ole Winther, and Florian Laws. 2017. CloudScan - A configuration-free invoice analysis system using recurrent neural networks. In ICDAR. 406--413.
    [20]
    Joseph Redmon and Ali Farhadi. 2017. YOLO9000: Better, Faster, Stronger. In CVPR. 6517--6525.
    [21]
    Joseph Redmon and Ali Farhadi. 2018. YOLOv3: An Incremental Improvement. CoRR, Vol. abs/1804.02767 (2018).
    [22]
    Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. In NIPS. 91--99.
    [23]
    Fenfen Sheng, Zhineng Chen, and Bo Xu. 2019. NRTR: A no-recurrence sequence-to-sequence model for scene text recognition. In ICDAR. 781--786.
    [24]
    Baoguang Shi, Xiang Bai, and Cong Yao. 2016a. An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition. PAMI, Vol. 39, 11 (2016), 2298--2304.
    [25]
    Baoguang Shi, Xinggang Wang, Pengyuan Lyu, Cong Yao, and Xiang Bai. 2016b. Robust Scene Text Recognition with Automatic Rectification. In CVPR. 4168--4176.
    [26]
    Baoguang Shi, Mingkun Yang, Xinggang Wang, Pengyuan Lyu, Cong Yao, and Xiang Bai. 2018. ASTER : An Attentional Scene Text Recognizer with Flexible Rectification. PAMI, Vol. 41, 9 (2018), 2035--2048.
    [27]
    Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. In ICLR.
    [28]
    Ray Smith. 2007. An Overview of the Tesseract OCR Engine. In ICDAR. 629--633.
    [29]
    Hongbin Sun, Zhanghui Kuang, Xiaoyu Yue, Chenhao Lin, and Wayne Zhang. 2021. Spatial Dual-Modality Graph Reasoning for Key Information Extraction. arXiv preprint (2021).
    [30]
    C Szegedy, W Liu, Y Jia, and P Sermanet. 2015. Going deeper with convolutions. In CVPR. 1--9.
    [31]
    Pengfei Wang, Chengquan Zhang, Fei Qi, Zuming Huang, Mengyi En, Junyu Han, Jingtuo Liu, Errui Ding, and Guangming Shi. 2019 c. A Single-Shot Arbitrarily-Shaped Text Detector based on Context Attended Multi-Task Learning. In ACM MM. 1277--1285.
    [32]
    Pengfei Wang, Chengquan Zhang, Fei Qi, Shanshan Liu, Xiaoqiang Zhang, Pengyuan Lyu, Junyu Han, Jingtuo Liu, Errui Ding, and Guangming Shi. 2021. PGNet: Real-time Arbitrarily-Shaped Text Spotting with Point Gathering Network. In AAAI. 2782--2790.
    [33]
    Wenhai Wang, Enze Xie, Xiang Li, Wenbo Hou, Tong Lu, Gang Yu, and Shuai Shao. 2019 a. Shape robust text detection with progressive scale expansion network. In CVPR. 9336--9345.
    [34]
    Wenhai Wang, Enze Xie, Xiaoge Song, Yuhang Zang, Wenjia Wang, Tong Lu, Gang Yu, and Chunhua Shen. 2019 b. Efficient and Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation Network. In ICCV. 8439--8448.
    [35]
    Zecheng Xie, Yaoxiong Huang, Yuanzhi Zhu, Lianwen Jin, Yuliang Liu, and Lele Xie. 2019. Aggregation cross-entropy for sequence recognition. In CVPR. 6538--6547.
    [36]
    Liang Xu, Yu Tong, Qianqian Dong, Yixuan Liao, Cong Yu, Yin Tian, Weitang Liu, Lu Li, Caiquan Liu, and Xuanwei Zhang. 2020. CLUENER2020: Fine-grained Named Entity Recognition Dataset and Benchmark for Chinese. arXiv preprint (2020).
    [37]
    Mingkun Yang, Yushuo Guan, Minghui Liao, Xin He, Kaigui Bian, Song Bai, Cong Yao, and Xiang Bai. 2019. Symmetry-constrained rectification network for scene text recognition. In ICCV. 9146--9155.
    [38]
    Deli Yu, Xuan Li, Chengquan Zhang, Tao Liu, Junyu Han, Jingtuo Liu, and Errui Ding. 2020 a. Towards Accurate Scene Text Recognition With Semantic Reasoning Networks. In CVPR. 12110--12119.
    [39]
    Wenwen Yu, Ning Lu, Xianbiao Qi, Ping Gong, and Rong Xiao. 2020 b. PICK: Processing key information extraction from documents using improved graph learning-convolutional networks. In ICPR. 4363--4370.
    [40]
    Xiaoyu Yue, Zhanghui Kuang, Chenhao Lin, Hongbin Sun, and Wayne Zhang. 2020. RobustScanner: Dynamically Enhancing Positional Clues for Robust Text Recognition. In ECCV. 135--151.
    [41]
    Xiaoyu Yue, Zhanghui Kuang, and Wayne Zhang. 2021. SegOCR: Simple Baseline. In Unpublished Manuscript.
    [42]
    Xiaoyu Yue, Zhanghui Kuang, Zhaoyang Zhang, Zhenfang Chen, Pan He, Yu Qiao, and Wei Zhang. 2018. Boosting up Scene Text Detectors with Guided CNN. In BMVC.
    [43]
    Shi-Xue Zhang, Xiaobin Zhu, Jie-Bo Hou, Chang Liu, Chun Yang, Hongfa Wang, and Xu-Cheng Yin. 2020. Deep Relational Reasoning Graph Network for Arbitrary Shape Text Detection. In CVPR. 9696--9705.
    [44]
    Suncong Zheng, Feng Wang, Hongyun Bao, Yuexing Hao, Peng Zhou, and Bo Xu. 2017. Joint Extraction of Entities and Relations Based on a Novel Tagging Scheme. In ACL. 1227--1236.
    [45]
    Xinyu Zhou, Cong Yao, He Wen, Yuzhi Wang, Shuchang Zhou, Weiran He, and Jiajun Liang. 2017. EAST: An Efficient and Accurate Scene Text Detector. CVPR (2017), 2642--2651.
    [46]
    Yiqin Zhu, Jianyong Chen, Lingyu Liang, Zhuanghui Kuang, Lianwen Jin, and Wayne Zhang. 2021. Fourier Contour Embedding for Arbitrary-Shaped Text Detection. In CVPR.

    Cited By

    View all
    • (2024)The Industrial Application of Artificial Intelligence-Based Optical Character Recognition in Modern Manufacturing InnovationsSustainability10.3390/su1605216116:5(2161)Online publication date: 5-Mar-2024
    • (2024)Exploring Style-Robust Scene Text Detection via Style-Aware LearningElectronics10.3390/electronics1302024313:2(243)Online publication date: 5-Jan-2024
    • (2024)LCSTR: Scene Text Recognition with Large Convolutional KernelsInternational Journal of Pattern Recognition and Artificial Intelligence10.1142/S021800142353004X38:01Online publication date: 9-Feb-2024
    • Show More Cited By

    Index Terms

    1. MMOCR: A Comprehensive Toolbox for Text Detection, Recognition and Understanding

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      MM '21: Proceedings of the 29th ACM International Conference on Multimedia
      October 2021
      5796 pages
      ISBN:9781450386517
      DOI:10.1145/3474085
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 17 October 2021

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. key information extraction
      2. named entity recognition
      3. open source
      4. text detection
      5. text recognition

      Qualifiers

      • Short-paper

      Funding Sources

      • the Shanghai Committee of Science and Technology

      Conference

      MM '21
      Sponsor:
      MM '21: ACM Multimedia Conference
      October 20 - 24, 2021
      Virtual Event, China

      Acceptance Rates

      Overall Acceptance Rate 995 of 4,171 submissions, 24%

      Upcoming Conference

      MM '24
      The 32nd ACM International Conference on Multimedia
      October 28 - November 1, 2024
      Melbourne , VIC , Australia

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)65
      • Downloads (Last 6 weeks)5

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)The Industrial Application of Artificial Intelligence-Based Optical Character Recognition in Modern Manufacturing InnovationsSustainability10.3390/su1605216116:5(2161)Online publication date: 5-Mar-2024
      • (2024)Exploring Style-Robust Scene Text Detection via Style-Aware LearningElectronics10.3390/electronics1302024313:2(243)Online publication date: 5-Jan-2024
      • (2024)LCSTR: Scene Text Recognition with Large Convolutional KernelsInternational Journal of Pattern Recognition and Artificial Intelligence10.1142/S021800142353004X38:01Online publication date: 9-Feb-2024
      • (2024)Detecting and recognizing seven segment digits using a deep learning approachITM Web of Conferences10.1051/itmconf/2024630100763(01007)Online publication date: 13-Feb-2024
      • (2024)Scene text visual question answering by using YOLO and STNInternational Journal of Speech Technology10.1007/s10772-023-10081-627:1(69-76)Online publication date: 3-Jan-2024
      • (2024)Can AI Assistance Aid in the Grading of Handwritten Answer Sheets?Artificial Intelligence in Education. Posters and Late Breaking Results, Workshops and Tutorials, Industry and Innovation Tracks, Practitioners, Doctoral Consortium and Blue Sky10.1007/978-3-031-64312-5_35(291-298)Online publication date: 2-Jul-2024
      • (2023)Optical character recognition on engineering drawings to achieve automation in production quality controlFrontiers in Manufacturing Technology10.3389/fmtec.2023.11541323Online publication date: 20-Mar-2023
      • (2023)HFENet: Hybrid Feature Enhancement Network for Detecting Texts in Scenes and Traffic PanelsIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2023.330568624:12(14200-14212)Online publication date: 25-Aug-2023
      • (2023)Text-Enhanced Scene Image Super-Resolution via Stroke Mask and Orthogonal AttentionIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.326713333:11(6317-6330)Online publication date: 14-Apr-2023
      • (2023)M2SH: A Hybrid Approach to Table Structure Recognition using Two-Stage Multi-Modality Feature Fusion2023 IEEE International Conference on Systems, Man, and Cybernetics (SMC)10.1109/SMC53992.2023.10394093(791-798)Online publication date: 1-Oct-2023
      • Show More Cited By

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media