Abstract
We present PWDB_13, a Word-level printed document image corpus from thirteen official Indic scripts, which consists of 26,000 words with equal distribution of each of the thirteen script types, collected by an automated process. A realistic classification framework based on four major regions of India has been proposed which represent the work as a unique one. Benchmarking is done with respect to PSI or printed script identification problem as it is very relevant in multi-script scenario. The result is said to be impressive observing the volume of the corpus and intrinsic complexities of Indic scripts. PWDB_13 will bridge the gap of unavailability of a complete document image dataset on all official Indic scripts and freely available to the researchers for noncommercial use.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Obaidullah, S.M., Das, S.K., Roy, K.: A system for handwritten script identification from Indian document. J. Pattern Recognit. Res. 8(1), 1–12 (2013)
Ghosh, D., Dube, T., Shivprasad, S.P.: Script recognition—a review. IEEE Trans. Pattern Anal. Mach. Intell. 32(12), 2142–2161 (2010)
Obaidullah, S.M., Rahaman, Z., Das, N., Roy, K.: Development of document image database for offline handwritten Indic script identification—a state-of-the-art. Int. J. Appl. Eng. Res. 9(20) special issue, 4625–4630, Research India Publication
Chaudhuri, B.B., Pal, U.: An OCR system to read two Indian language scripts: Bangla and Devanagari (Hindi). In: Proceedings of 4th International Conference on Document Analysis and Recognition, pp. 18–20. University Health Network (1997)
Hochberg, J., Kelly, P., Thomas, T., Kerns, L.: Automatic script identification from document images using cluster-based templates. IEEE Trans. Pattern Anal. Mach. Intell. 19, 176–181 (1997)
Chaudhury, S., Harit, G., Madnani, S., Shet, R.B.: Identification of scripts of Indian languages by combining trainable classifiers. In: Proceedings of Indian Conference on Computer Vision, Graphics and Image Processing, Bangalore, India (2000)
Dhanya, D., Ramakrishnan, A.G., Pati, P.B.: Script identification in printed bilingual documents. Sadhana 27(part-1), 73–82 (2002)
Pati, P.B., Ramakrishnan, A.G.: Word level multi-script identification. Pattern Recognit. Lett. 29(9), 1218–1229 (2008)
Obaidullah, S.M., Mondal, A., Das, N., Roy, K.: Script identification from printed indian document images and performance evaluation using different classifiers. Appl. Comput. Intell. Soft Comput. 2014(Article ID 896128), 12 (2014)
Roy, K., Banerjee, A., Pal, U.: A system for word-wise handwritten script identification for Indian postal automation. In: Proceedings of IEEE India Annual Conference, pp. 266–271 (2004)
Obaidullah, S.M., Halder, C., Das, N., Roy, K.: A corpus of word-level offline handwritten numeral images from official Indic scripts. In: International Conference on Computer and Communication Technologies. AISC Series, Springer, Hyderabad (2015)
Mandal, J.K., Sengupta, M.: Authentication/secret message transformation through wavelet transform based subband image coding (WTSIC). In: International Symposium on Electronic System Design 2010, pp. 225–229. Bhubaneswar, India (2010). ISBN: 978-0-7695-4294-2
Bhateja, V., Urooj, S., Mehrotra, R., Verma, R., Ekuakille, A.L., Verma, V.D.: A composite wavelets and morphology approach for ECG noise filtering. PReMI 2013, 361–366 (2013)
Dey, N., Das, A., Chaudhuri, S.S.: Wavelet based normal and abnormal heart sound identification using spectrogram analysis. Int. J. Comput. Sci. Eng. Technol. (IJCSET) 3(6) (2012)
Pardeshi, R., Chaudhury, B.B., Hangarge, M., Santosh, K.C.: Automatic handwritten Indian scripts identification. In: Proceedings of 14th International Conference on Frontiers in Handwriting Recognition, pp. 375–380 (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer India
About this paper
Cite this paper
Obaidullah, S.M., Halder, C., Das, N., Roy, K. (2016). PWDB_13: A Corpus of Word-Level Printed Document Images from Thirteen Official Indic Scripts. In: Das, S., Pal, T., Kar, S., Satapathy, S., Mandal, J. (eds) Proceedings of the 4th International Conference on Frontiers in Intelligent Computing: Theory and Applications (FICTA) 2015. Advances in Intelligent Systems and Computing, vol 404. Springer, New Delhi. https://doi.org/10.1007/978-81-322-2695-6_21
Download citation
DOI: https://doi.org/10.1007/978-81-322-2695-6_21
Published:
Publisher Name: Springer, New Delhi
Print ISBN: 978-81-322-2693-2
Online ISBN: 978-81-322-2695-6
eBook Packages: EngineeringEngineering (R0)