Abstract
How to read Uyghur text from biomedical graphic images is a challenge problem due to the complex layout and cursive writing of Uyghur. In this paper, we propose a system that extracts text from Uyghur biomedical images, and matches the text in a specific lexicon for semantic analysis. The proposed system possesses following distinctive properties: first, it is an integrated system which firstly detects and crops the Uyghur text lines using a single fully convolutional neural network, and then keywords in the lexicon are matched by a well-designed matching network. Second, to train the matching network effectively an online sampling method is applied, which generates synthetic data continually. Finally, we propose a GPU acceleration scheme for matching network to match a complete Uyghur text line directly rather than a single window. Experimental results on benchmark dataset show our method achieves a good performance of F-measure 74.5%. Besides, our system keeps high efficiency with 0.5s running time for each image due to the GPU acceleration scheme.
Similar content being viewed by others
Notes
512 3 × 3 convolutional filters + ReLU, 512 3 × 3 convolutional filters + ReLU, 6 3 × 3 convolutional filters where 6 presents text/non-text predicted scores and four predicted coordinate offsets.
https://github.com/tesseract-ocr/tesseract, 4.0 version
References
Arbelaez, P., Maire, M., Fowlkes, C.C., & Malik, J. (2011). Contour detection and hierarchical image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(5), 898–916.
Bissacco, A., Cummins, M., Netzer, Y., & Neven, H. (2013). Photoocr: reading text in uncontrolled conditions. In ICCV (pp. 785–792). Washington: IEEE Computer Society.
Epshtein, B., Ofek, E., & Wexler, Y. (2010). Detecting text in natural scenes with stroke width transform. In CVPR (pp. 2963–2970). Washington: IEEE Computer Society.
Fang, S., Xie, H., Chen, Z., Zhu, S., Gu, X., & Gao, X. (2017). Detecting Uyghur text in complex background images with convolutional neural network. Multimedia Tools and Applications, 76(13), 15083–15103. https://doi.org/10.1007/s11042-017-4538-8.
Girshick, R.B. (2015). Fast R-CNN. In ICCV (pp. 1440–1448). Washington: IEEE Computer Society.
Goodfellow, I.J., Warde-farley, D., Mirza, M., Courville, A.C., & Bengio, Y. (2013). Maxout networks. In ICML (3), JMLR.org, JMLR Workshop and Conference Proceedings, (Vol. 28 pp. 1319–1327).
Graves, A., Fernández, S., Gomez, F.J., & Schmidhuber, J. (2006). Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In ICML, ACM, ACM international conference proceeding series, (Vol. 148 pp. 369–376).
Gupta, A, Vedaldi, A, & Zisserman, A. (2016). Synthetic data for text localisation in natural images. In CVPR (pp. 2315–2324). Washington: IEEE Computer Society.
He, K., Zhang, X., Ren, S., & Sun, J. (2015). Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(9), 1904–1916.
He, P., Huang, W., Qiao, Y., Loy, C.C., & Tang, X. (2016). Reading scene text in deep convolutional sequences. In AAAI (pp. 3501–3508). Washington: AAAI Press.
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780.
Huang, W., Lin, Z., Yang, J., & Wang, J. (2013). Text localization in natural images using stroke feature transform and text covariance descriptors. In ICCV (pp. 1241–1248). Washington: IEEE Computer Society.
Ibrayim, M., & Hamdulla, A. (2012). Design and implementation of prototype system for online handwritten uyghur character recognition. Wuhan University Journal of Natural Sciences, 17(2), 131–136.
Jaderberg, M., Simonyan, K., Vedaldi, A., & Zisserman, A. (2014). Synthetic data and artificial neural networks for natural scene text recognition. CoRR arXiv:abs/1406.2227.
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R.B. , Guadarrama, S., & Darrell, T. (2014). Caffe: convolutional architecture for fast feature embedding. In ACM multimedia (pp. 675–678). New York: ACM.
Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S.K., Bagdanov, A.D., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., Shafait, F., Uchida, S., & Valveny, E. (2015). ICDAR 2015 competition on robust reading. In ICDAR (pp. 1156–1160). Washington: IEEE Computer Society.
Kennedy, D.N., & Haselgrove, C. (2006). The internet analysis tools registry a public resource for image analysis. Neuroinformatics, 4(3), 263–270.
Kingma, D.P., & Ba, J. (2014). Adam: a method for stochastic optimization. CoRR arXiv:abs/1412.6980.
Lin, T., Maire, M., Belongie, S.J., Hays, J., Perona, P., Ramanan, D., Dollár, P, & Zitnick, C.L. (2014). Microsoft COCO: common objects in context. In ECCV (5), Springer, Lecture Notes in Computer Science, (Vol. 8693 pp. 740–755).
Liu, N., Yu, X., Wang, C., Li, C., Ma, L., & Lei, J. (2017). Energy-sharing model with price-based demand response for microgrids of peer-to-peer prosumers. IEEE Transactions on Power Systems, 32(5), 3569–3583.
Neumann, L., & Matas, J. (2010). A method for text localization and recognition in real-world images. In ACCV (3), Springer, Lecture Notes in Computer Science, (Vol. 6494 pp. 770–783).
Ren, H.Y., Yuan B.S., & Tian, Y. (2010). On-line & handwritten uyghur character recognition based on bp neural network [j]. Microelectronics & Computer, 8, 060.
Ren, S., He, K., Girshick, R.B., & Sun, J. (2015). Faster R-CNN: towards real-time object detection with region proposal networks. In NIPS (pp. 91–99).
Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., & LeCun, Y. (2013). Overfeat: integrated recognition, localization and detection using convolutional networks. CoRR arXiv:abs/1312.6229.
Shi, B., Bai, X., & Yao, C. (2015). An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. CoRR arXiv:abs/1507.05717.
Simayi, W., Ibrayim, M., Tursun, D., & Hamdulla, A. (2013). Research on on-line uyghur character recognition technology based on center distance feature. In 2013 IEEE international symposium on signal processing and information technology (ISSPIT) (pp. 000,293–000,298). Piscataway: IEEE.
Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale imagev recognition. CoRR arXiv:abs/1409.1556.
Smith, R. (2007). An overview of the tesseract OCR engine. In ICDAR (pp. 629–633). Washington: IEEE Computer Society.
Som, T., Can, D., & Saraclar, M. (2009). Hmm-based sliding video text recognition for turkish broadcast news. In ISCIS (pp. 475–479). Piscataway: IEEE.
Song, Y., Chen, J., Xie, H., Chen, Z., Gao, X., & Chen, X. (2017). Robust and parallel Uyghur text localization in complex background images. Machine Vision and Applications, 28(7), 755–769. https://doi.org/10.1007/s00138-017-0837-3.
Tian, Z., Huang, W., He, T., He, P., & Qiao, Y. (2016). Detecting text in natural image with connectionist text proposal network. In ECCV (8), Springer, Lecture Notes in Computer Science, (Vol. 9912 pp. 56–72).
Veit, A., Matera, T., Neumann, L., Matas, J., & Belongie, S.J. (2016). Coco-text: dataset and benchmark for text detection and recognition in natural images. CoRR arXiv:abs/1601.07140.
Wang, K., Babenko, B., & Belongie, S.J. (2011). End-to-end scene text recognition. In ICCV (pp. 1457–1464). Washington: IEEE Computer Society.
Wang, T., Wu, D.J., Coates, A., & Ng, A.Y. (2012). End-to-end text recognition with convolutional neural networks. In ICPR (pp. 3304–3308). Washington: IEEE Computer Society.
Xie, H., Gao, K., Zhang, Y., Tang, S., Li, J., & Liu, Y. (2011). Efficient feature detection and effective post-verification for large scale near-duplicate image search. IEEE Transactions on Multimedia, 13(6), 1319–1332.
Xie, H., Zhang, Y., Gao, K., Tang, S., Xu, K., Guo, L., & Li, J. (2013). Robust common visual pattern discovery using graph matching. Journal of Visual Communication and Image Representation, 24(5), 635–646.
Yan, C., Zhang, Y., Dai, F., Wang, X., Li, L., & Dai, Q. (2014a). Parallel deblocking filter for hevc on many-core processor. Electronics Letters, 50(5), 367–368. https://doi.org/10.1049/el.2013.3235.
Yan, C., Zhang, Y., Xu, J., Dai, F., Li, L., Dai, Q., & Wu, F. (2014b). A highly parallel framework for HEVC coding unit partitioning tree decision on many-core processors. IEEE Signal Processing Letters, 21(5), 573–576.
Yan, C.C., Zhang, Y., Xu, J., Dai, F., Zhang, J., Dai, Q., & Wu, F. (2014c). Efficient parallel framework for HEVC motion estimation on many-core processors. IEEE Transactions on Circuits and Systems for Video Technology, 24(12), 2077–2089.
Yan, C., Xie, H., Liu, S., Yin, J., Zhang, Y., & Dai, Q. (2017). Effective uyghur language text detection in complex background images for traffic prompt identification. IEEE Transactions on Intelligent Transportation Systems. https://doi.org/10.1109/TITS.2017.2749977.
Yan, C., Xie, H., Yang, D., Yin, J., Zhang, Y., & Dai, Q. (2017). Supervised hash coding with deep neural network for environment perception of intelligent vehicles. IEEE Transactions on Intelligent Transportation Systems. https://doi.org/10.1109/TITS.2017.2749965.
Yin, X., Yin, X., Huang, K., & Hao, H. (2014). Robust text detection in natural scene images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(5), 970–983.
Acknowledgements
This work is supported by the National Nature Science Foundation of China (61771468, 61772526), the Youth Innovation Promotion Association Chinese Academy of Sciences (2017209).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Fang, S., Xie, H., Chen, Z. et al. Uyghur Text Matching in Graphic Images for Biomedical Semantic Analysis. Neuroinform 16, 445–455 (2018). https://doi.org/10.1007/s12021-017-9350-0
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12021-017-9350-0