Abstract
Skeletons of characters provide vital information to support a variety of tasks, e.g., optical character recognition, image restoration, stroke segmentation and extraction, and style learning and transfer. However, automatically skeletonizing Chinese characters poses a steep computational challenge due to the large volume of Chinese characters and their versatile styles, for which traditional image analysis approaches are error-prone and fragile. Current deep learning based approach requires a heavy amount of manual labeling efforts, which imposes serious limitations on the precision, robustness, scalability and generalizability of an algorithm to solve a specific problem. To tackle the above challenge, this paper introduces a novel three-staged deep generative model developed as an image-to-image translation approach, which significantly reduces the model’s demand for labeled training samples. The new model is built upon an improved G-net, an enhanced X-net, and a newly proposed F-net. As compellingly demonstrated by comprehensive experimental results, the new model is able to iteratively extract skeletons of Chinese characters in versatile styles with a high quality, which noticeably outperforms two state-of-the-art peer deep learning methods and a classical thinning algorithm in terms of F-measure, Hausdorff distance, and average Hausdorff distance.
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Wang T Q, Liu C L. Fully convolutional network based skeletonization for handwritten Chinese characters. In Proc. the 32nd AAAI Conference on Artificial Intelligence, Apr. 2018, pp.2540–2547. https://doi.org/10.1609/aaai.v32i1.11868.
Xu L, Wang Y X, Li X X, Pan M. Recognition of handwritten Chinese characters based on concept learning. IEEE Access, 2019, 7: 102039–102053. https://doi.org/10.1109/ACCESS.2019.2930799.
Yu K, Wu J Q, Zhuang Y T. Skeleton-based recognition of Chinese calligraphic character image. In Proc. the 9th Pacific-Rim Conference on Multimedia, Dec. 2008, pp.228–237. https://doi.org/10.1007/978-3-540-89796-5_24.
Sun B, Hua S J, Li S T, Sun J. Graph-matching-based character recognition for Chinese seal images. Science China Information Sciences, 2019, 62(9): 192102. https://doi.org/10.1007/s11432-018-9724-7.
Jiang Y, Lian Z H, Tang Y M, Xiao J G. DCFont: An end-to-end deep Chinese font generation system. In Proc. the 2017 SIGGRAPH Asia Technical Briefs, Nov. 2017, Article No. 22. https://doi.org/10.1145/3145749.3149440.
Azadi S, Fisher M, Kim V, Wang Z W, Shechtman E, Darrell T. Multi-content GAN for few-shot font style transfer. In Proc. the 31st Conference on Computer Vision and Pattern Recognition, Jun. 2018, pp.7564–7573. https://doi.org/10.1109/CVPR.2018.00789.
Zhang Y X, Zhang Y, Cai W B. Separating style and content for generalized style transfer. In Proc. the 31st Conference on Computer Vision and Pattern Recognition, Jun. 2018, pp.8447–8455. https://doi.org/10.1109/CVPR.2018.00881.
Goodfellow I J, Pouget-Abadie J, Mirza M, Xu B, Warde- Farley D, Ozair S, Courville A, Bengio Y. Generative adversarial nets. In Proc. the 27th International Conference on Neural Information Processing Systems, Dec. 2014, pp.2672–2680.
Mirza M, Osindero S. Conditional generative adversarial nets. arXiv: 1411.1784, 2014. https://arxiv.org/abs/1411.1784, Nov. 2023.
Gatys L A, Ecker A S, Bethge M. A neural algorithm of artistic style. arXiv: 1508.06576, 2015. https://arxiv.org/abs/1508.06576, Nov. 2023.
Jiang Y, Lian Z H, Tang Y M, Xiao J G. SCFont: Structure- guided Chinese font generation via deep stacked networks. In Proc. the 33rd AAAI Conference on Artificial Intelligence, Jul. 2019, pp.4015–4022. https://doi.org/10.1609/aaai.v33i01.33014015.
Yuan T L, Zhu Z, Xu K, Li C J, Mu T J, Hu S M. A large Chinese text dataset in the wild. Journal of Computer Science and Technology, 2019, 34(3): 509–521. https://doi.org/10.1007/s11390-019-1923-y.
Zhang T Y, Suen C Y. A fast parallel algorithm for thinning digital patterns. Communications of the ACM, 1984, 27(3): 236–239. https://doi.org/10.1145/357994.358023.
Pujari A K, Mitra C, Mishra S. A new parallel thinning algorithm with stroke correction for Odia characters. In Proc. the 2nd International Conference on Advanced Computing, Networking and Informatics—Volume 1, Jun. 2014, pp.413–419. https://doi.org/10.1007/978-3-319-07353-8_48.
Dong J W, Chen Y M, Yang Z J, Ling B W K. A parallel thinning algorithm based on stroke continuity detection. Signal, Image and Video Processing, 2017, 11(5): 873–879. https://doi.org/10.1007/s11760-016-1034-y.
Badrinarayanan V, Kendall A, Cipolla R. SegNet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Analysis and Machine Intelligence, 2017, 39(12): 2481–2495. https://doi.org/10.1109/TPAMI.2016.2644615.
Xie S N, Tu Z W. Holistically-nested edge detection. In Proc. the 2015 International Conference on Computer Vision, Dec. 2015, pp.1395–1403. https://doi.org/10.1109/ICCV.2015.164.
Ke W, Chen J, Jiao J B, Zhao G Y, Ye Q X. SRN: Sideoutput residual network for object symmetry detection in the wild. In Proc. the 2017 Conference on Computer Vision and Pattern Recognition, Jul. 2017, pp.302–310. https://doi.org/10.1109/CVPR.2017.40.
Liu C, Ke W, Qin F, Ye Q X. Linear span network for object skeleton detection. In Proc. the 15th European Conference on Computer Vision, Sept. 2018, pp.136–151. https://doi.org/10.1007/978-3-030-01216-8_9.
Wang Y K, Xu Y C, Tsogkas S, Bai X, Dickinson S, Siddiqi K. DeepFlux for skeletons in the wild. In Proc. the 2019 Conference on Computer Vision and Pattern Recognition, Jun. 2019, pp.5282–5291. https://doi.org/10.1109/CVPR.2019.00543.
Zhao K, Shen W, Gao S H, Li D D, Cheng M M. Hi-Fi: Hierarchical feature integration for skeleton detection. arXiv: 1801.01849, 2018. https://arxiv.org/abs/1801.01849, Nov. 2023.
Shelhamer E, Long J, Darrell T. Fully convolutional networks for semantic segmentation. IEEE Trans. Pattern Analysis and Machine Intelligence, 2017, 39(4): 640–651. https://doi.org/10.1109/tpami.2016.2572683.
He K M, Zhang X Y, Ren S Q, Sun J. Deep residual learning for image recognition. In Proc. the 2016 Conference on Computer Vision and Pattern Recognition, Jun. 2016, pp.770–778. https://doi.org/10.1109/CVPR.2016.90.
Mechrez R, Talmi I, Zelnik-Manor L. The contextual loss for image transformation with non-aligned data. In Proc. the 15th European Conference on Computer Vision, Sept. 2018, pp.800–815. https://doi.org/10.1007/978-3-030-01264-9_47.
Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv: 1409.1556, 2014. https://arxiv.org/abs/1409.1556, Nov. 2023.
Heusel M, Ramsauer H, Unterthiner T, Nessler B, Hochreiter S. GANs trained by a two time-scale update rule converge to a local Nash equilibrium. In Proc. the 31st International Conference on Neural Information Processing Systems, Dec. 2017, pp.6629–6640.
Liu C L, Yin F, Wang D H, Wang Q F. CASIA online and offline Chinese handwriting databases. In Proc. the 2011 International Conference on Document Analysis and Recognition, Sept. 2011, pp.37–41. https://doi.org/10.1109/ICDAR.2011.17.
Lian Z H, Zhao B, Chen X D, Xiao J G. EasyFont: A style learning-based system to easily build your large-scale handwriting fonts. ACM Trans. Graphics, 2018, 38(1): Article No. 6. https://doi.org/10.1145/3213767.
Isola P, Zhu J Y, Zhou T H, Efros A A. Image-to-image translation with conditional adversarial networks. In Proc. the 30th Conference on Computer Vision and Pattern Recognition, Jul. 2017, pp.5967–5976. https://doi.org/10.1109/CVPR.2017.632.
Huang X, Belongie S. Arbitrary style transfer in real-time with adaptive instance normalization. In Proc. the 2017 International Conference on Computer Vision, Oct. 2017, pp.1501–1510. https://doi.org/10.1109/ICCV.2017.167.
Tang H, Xu D, Sebe N, Wang Y Z, Corso J J, Yan Y. Multi-channel attention selection GAN with cascaded semantic guidance for cross-view image translation. In Proc. the 2019 Conference on Computer Vision and Pattern Recognition, Jun. 2019, pp.2417–2426. https://doi.org/10.1109/CVPR.2019.00252.
Regmi K, Borji A. Cross-view image synthesis using conditional GANs. In Proc. the 31st Conference on Computer Vision and Pattern Recognition, Jun. 2018, pp.3501–3510. https://doi.org/10.1109/CVPR.2018.00369.
Chen K, Pang J M, Wang J Q, Xiong Y, Li X X, Sun S Y, Feng W S, Liu Z W, Shi J P, Ouyang W L, Loy C C, Lin D H. Hybrid task cascade for instance segmentation. In Proc. the 32nd Conference on Computer Vision and Pattern Recognition, Jun. 2019, pp.4969–4978. https://doi.org/10.1109/CVPR.2019.00511.
Liu X B, Qiao Y L, Xiong Y H, Cai Z H, Liu P. Cascade conditional generative adversarial nets for spatial-spectral hyperspectral sample generation. Science China Information Sciences, 2020, 63(4): 140306. https://doi.org/10.1007/s11432-019-2798-9.
Shin H C, Roberts K, Lu L, Demner-Fushman D, Yao J H, Summers R M. Learning to read chest X-rays: Recurrent neural cascade model for automated image annotation. In Proc. the 2016 Conference on Computer Vision and Pattern Recognition, Jun. 2016, pp.2497–2506. https://doi.org/10.1109/CVPR.2016.274.
Cui Z, Chang H, Shan S G, Zhong B N, Chen X L. Deep network cascade for image super-resolution. In Proc. the 13th European Conference on Computer Vision, Sept. 2014, pp.49–64. https://doi.org/10.1007/978-3-319-10602-1_4.
Huang Y X, He M C, Jin L W, Wang Y P. RD-GAN: Few/zero-shot Chinese character style transfer via radical decomposition and rendering. In Proc. the 16th European Conference on Computer Vision, Aug. 2020, pp.156–172. https://doi.org/10.1007/978-3-030-58539-6_10.
Park S, Chun S, Cha J, Lee B, Shim H. Few-shot font generation with localized style representations and factorization. arXiv: 2009.11042, 2020. https://arxiv.org/abs/2009.11042, Nov. 2023.
Gao Y M, Wu J Q. GAN-based unpaired Chinese character image translation via skeleton transformation and stroke rendering. In Proc. the 34th AAAI Conference on Artificial Intelligence, Feb. 2020, pp.646–653. https://doi.org/10.1609/aaai.v34i01.5405.
Sun D Y, Ren T Z, Li C X, Su H, Zhu J. Learning to write stylized Chinese characters by reading a handful of examples. arXiv: 1712.06424, 2017. https://arxiv.org/abs/1712.06424, Nov. 2023.
Zhang J W, Chen D N, Han G Q, Li G Z, He J T, Liu Z M, Ruan Z H. SSNet: Structure-semantic Net for Chinese typography generation based on image translation. Neurocomputing, 2020, 371: 15–26. https://doi.org/10.1016/j.neucom.2019.08.072.
Xu S H, Lau F C M, Cheung W K, Pan Y H. Automatic generation of artistic Chinese calligraphy. IEEE Intelligent Systems, 2005, 20(3): 32–39. https://doi.org/10.1109/MIS.2005.41.
Xu S H, Jin T, Jiang H, Lau F C M. Automatic generation of personal Chinese handwriting by capturing the characteristics of personal handwriting. In Proc. the 21st Innovative Applications of Artificial Intelligence Conference, Jul. 2009, pp.191–196.
Xu S H, Jiang H, Jin T, Lau F C M, Pan Y H. Automatic generation of Chinese calligraphic writings with style imitation. IEEE Intelligent Systems, 2009, 24(2): 44–53. https://doi.org/10.1109/MIS.2009.23.
Xu S H, Jiang H, Lau F C M, Pan Y H. An intelligent system for Chinese calligraphy. In Proc. the 22nd National Conference on Artificial Intelligence, Jul. 2007, pp.1578–1583.
Li B, Chen H H, Chen Y C, Dai Y C, He M Y. Skeleton boxes: Solving skeleton based action detection with a single deep convolutional neural network. In Proc. the 2017 IEEE International Conference on Multimedia and Expo Workshops, Jul. 2017, pp.613–616. https://doi.org/10.1109/ICMEW.2017.8026283.
Xu W J, Parmar G, Tu Z W. Geometry-aware end-to-end skeleton detection. In Proc. the 30th British Machine Vision Conference, Sept. 2019, pp.28.1–28.13. https://doi.org/10.5244/C.33.28.
Lin T Y, Dollár P, Girshick R, He K M, Hariharan B, Belongie S. Feature pyramid networks for object detection. In Proc. the 30th Conference on Computer Vision and Pattern Recognition, Jul. 2017, pp.2117–2125. https://doi.org/10.1109/CVPR.2017.106.
Woo S, Park J, Lee J Y et al. CBAM: Convolutional block attention module. In Proc. the 15th European Conference on Computer Vision, Sept. 2018, pp.3–19. https://doi.org/10.1007/978-3-030-01234-2_1.
He K M, Zhang X Y, Ren S Q, Sun J. Identity mappings in deep residual networks. In Proc. the 14th European Conference on Computer Vision, Oct. 2016, pp.630–645. https://doi.org/10.1007/978-3-319-46493-0_38.
Author information
Authors and Affiliations
Corresponding authors
Supplementary Information
ESM 1
(PDF 136 kb)
Rights and permissions
About this article
Cite this article
Tian, YC., Xu, SH. & Sylla, C. A Novel Three-Staged Generative Model for Skeletonizing Chinese Characters with Versatile Styles. J. Comput. Sci. Technol. 38, 1250–1271 (2023). https://doi.org/10.1007/s11390-023-1337-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11390-023-1337-8