PP-OCR: A Practical Ultra Lightweight OCR System

Du, Yuning; Li, Chenxia; Guo, Ruoyu; Yin, Xiaoting; Liu, Weiwei; Zhou, Jun; Bai, Yifan; Yu, Zilin; Yang, Yehua; Dang, Qingqing; Wang, Haoshuang

Computer Science > Computer Vision and Pattern Recognition

arXiv:2009.09941 (cs)

[Submitted on 21 Sep 2020 (v1), last revised 15 Oct 2020 (this version, v3)]

Title:PP-OCR: A Practical Ultra Lightweight OCR System

Authors:Yuning Du, Chenxia Li, Ruoyu Guo, Xiaoting Yin, Weiwei Liu, Jun Zhou, Yifan Bai, Zilin Yu, Yehua Yang, Qingqing Dang, Haoshuang Wang

View PDF

Abstract:The Optical Character Recognition (OCR) systems have been widely used in various of application scenarios, such as office automation (OA) systems, factory automations, online educations, map productions etc. However, OCR is still a challenging task due to the various of text appearances and the demand of computational efficiency. In this paper, we propose a practical ultra lightweight OCR system, i.e., PP-OCR. The overall model size of the PP-OCR is only 3.5M for recognizing 6622 Chinese characters and 2.8M for recognizing 63 alphanumeric symbols, respectively. We introduce a bag of strategies to either enhance the model ability or reduce the model size. The corresponding ablation experiments with the real data are also provided. Meanwhile, several pre-trained models for the Chinese and English recognition are released, including a text detector (97K images are used), a direction classifier (600K images are used) as well as a text recognizer (17.9M images are used). Besides, the proposed PP-OCR are also verified in several other language recognition tasks, including French, Korean, Japanese and German. All of the above mentioned models are open-sourced and the codes are available in the GitHub repository, i.e., this https URL.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2009.09941 [cs.CV]
	(or arXiv:2009.09941v3 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2009.09941

Submission history

From: Ruoyu Guo [view email]
[v1] Mon, 21 Sep 2020 14:57:18 UTC (4,515 KB)
[v2] Tue, 22 Sep 2020 08:57:29 UTC (4,515 KB)
[v3] Thu, 15 Oct 2020 14:21:53 UTC (4,515 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:PP-OCR: A Practical Ultra Lightweight OCR System

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:PP-OCR: A Practical Ultra Lightweight OCR System

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators