Abstract
This paper presents a novel auto-thresholding method for character segmentation and restoration of historical Chinese documents. The objective was to segment and restore the characters of Qin-Han bamboo slips effectively with complex background noise. To that end, giving a whole page image with several bamboo slips, the proposed method first extracted and straightened every single slip by connected component analysis. Furthermore, for every straightened slip, a horizontal histogram projection method was used to segment all character regions. After that, a novel auto thresholding method, which was motivated by the auto-focus process of camera, was used to find the optimal threshold of every character region. In this method, the algorithm traversed all the thresholds in a certain range and generated an Effective Character Contour Length (ECCL) value for each threshold, then multi-Gaussian model was used to fit the ECCL curve and the global peak position of ECCL curve was the needed final optimal threshold for the character region. Experimental results showed that the proposed method was effective for historical character segmentation and restoration under complex background noise. Compared to five existing state of the art algorithms, including Otsu, integral image adaptive thresholding method, Sauvola, GAN denoising and SAE algorithm, the proposed method can not only restore the whole characters more completely, but also suppress the noise better.
Similar content being viewed by others
Data availability
The dataset is available in https://gitee.com/cramkl_cjlu/auto-focus-threshold-character-segment.
Code availability
The source code is available in https://gitee.com/cramkl_cjlu/auto-focus-threshold-character-segment.
References
Babu NSA (2019) Character recognition in historical handwritten documents – A survey. Proceedings of the 2019 International Conference on Communication and Signal Processing (ICCSP), F 4-6 April 2019
Calvo-Zaragoza J, Gallego A-J (2019) A selectional auto-encoder approach for document image binarization [J]. Pattern Recogn 86:37–47
Huang Z-K, Ma Y-L Lu L et al (2016) Chinese historic image threshold using adaptive K-means cluster and Bradley’s [J]. 9773:171–179
Kehtarnavaz N, Oh HJ (2003) Development and real-time implementation of a rule-based auto-focus algorithm [J]. Real-Time Imaging 9(3):197–203
Liu CL, Koga M, Fujisawa H (2002) Lexicon-driven segmentation and recognition of handwritten character strings for Japanese address reading [J]. IEEE Trans Pattern Anal Mach Intell 24(11):1425–1437
Liu S, Liu M, Yang Z (2016) An image auto-focusing algorithm for industrial image measurement. EURASIP J Adv Signal Process 2016(1):70. https://doi.org/10.1186/s13634-016-0368-5
Messina R, Louradour J (2015) Segmentation-free handwritten Chinese text recognition with LSTM-RNN. Proceedings of the 2015 13th International Conference on Document Analysis and Recognition (ICDAR), F 23-26 2015 [C]
Nguyen K, Nguyen C, Nakagawa M (2017) A Segmentation method of single- and multiple-touching characters in offline handwritten Japanese text recognition [J]. IEICE Trans Inf Syst E100.D:2962-72
Panichkriangkrai C, Li L, Hachimura K (2013) Character segmentation and retrieval for learning support system of Japanese historical books. In Proceedings of the 2nd international workshop on historical document imaging and processing. Association for Computing Machinery: Washington, District of Columbia, USA. pp 118–122. https://doi.org/10.1145/2501115.2501129
Santos R et al (2009) Text line segmentation based on morphology and histogram projection. 2009 International conference on document analysis and recognition. pp 651–655. https://doi.org/10.1109/ICDAR.2009.183
Sauvola J, Pietikainen M (2000) Adaptive document image binarization [J]. Pattern Recogn 33(2):225–236
Shirai K et al (2013) Character shape restoration of binarized historical documents by smoothing via geodesic morphology. In 2013 12th International conference on document analysis and recognition. https://doi.org/10.1109/ICDAR.2013.260
Wang QF, Yin F, Liu CL (2012) Handwritten Chinese text recognition by integrating multiple contexts [J]. IEEE Trans Pattern Anal Mach Intell 34(8):1469–1481
Watanabe K, Takahashi S, Kamaya Y et al (2019) Japanese character segmentation for historical handwritten official documents using fully convolutional networks. Proceedings of the 2019 International Conference on Document Analysis and Recognition (ICDAR), F 20-25 2019 [C]
Wu YC, Yin F, Chen Z, et al (2018) Handwritten Chinese text recognition using separable multi-dimensional recurrent neural network. Proceedings of the 201714th IAPR International Conference on Document Analysis and Recognition(ICDAR), F [C]
Xu X, Wang Y, Tang J et al (2011) Robust automatic focus algorithm for low contrast images using a new contrast measure [J]. Sensors (Basel) 11(9):8281–8294
Yang H, Jin L, Huang W et al (2018) Dense and tight detection of Chinese characters in historical documents: datasets and a recognition guided detector [J]. IEEE Access 6:30174–30183
Zecheng X, Zenghui S, Lianwen J et al (2016) Fully convolutional recurrent network for handwritten Chinese text recognition. Proceedings of the 2016 23rd International Conference on Pattern Recognition (ICPR), F 4-8 [C]
Zhang J, Guo M (2019) A novel generative adversarial net for calligraphic tablet images denoising [J]. Multimed Tools Appl 79(1–2):119–140
Funding
Natural Science Foundation of Zhejiang Province, China (No.LY18E050009 and No.Q19E060008).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors hereby declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Cao, S., Shu, Z., Xu, Z. et al. Character segmentation and restoration of Qin-Han bamboo slips using local auto-focus thresholding method. Multimed Tools Appl 81, 8199–8213 (2022). https://doi.org/10.1007/s11042-022-11988-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-022-11988-z