Abstract
Here, we present a new technique for separating Indic scripts based on matra (or shirorekha), where an optimized fractal geometry analysis (FGA) is used as the sole pertinent feature. Separating those scripts having matra from those which do not have one, can be used as a precursor to ease the subsequent script identification process. In our work, we consider two matra-based scripts namely Bangla and Devanagari as positive samples, and the counter samples are obtained from two different scripts namely Roman and Urdu. Altogether, we took 1204 document images with a distribution of 525 matra-based (325 Bangla and 200 Devanagari) and 679 without matra-based (370 Roman and 309 Urdu) scripts. For experimentation, we have used three different classifiers: multilayer perceptron (MLP), random forest (RF), and BayesNet (BN), with the target of selecting the best performer. From a series of test, we achieved an average accuracy of 96.44 % from MLP classifier.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Ghosh, D., Dube, T., Shivprasad, S. P.: Script Recognition - A Review. IEEE Trans. Pattern Anal. Mach. Intell. 32(12), 2142–2161 (2010)
Hochberg, J., Bowers, K., Cannon, M., Kelly, P.: Script and Language Identification for Handwritten Document Images. Int. J. Doc. Anal. Recog. 2(2/3), 45–52 (1999)
Zhu, G., Yu, X., Li, Y., Doermann, D.: Language Identification for Handwritten Document Images Using A Shape Codebook. Pattern Recog. 42, 3184–3191 (2009)
Singhal, V., Navin, N., Ghosh, D.: Script-based Classification of Hand-written Text Documents in a Multi-lingual Environment. In: \(13^{th}\) RIDE-MLIM. pp. 47–54 (2003)
Hangarge, M., Dhandra, B. V.: Offline Handwritten Script Identification in Document Images. Int. J. Comput. Appl. 4(6), 6–10 (2010)
Rajput, G., H. B., A.: Handwritten Script Recognition using DCT and Wavelet Features at Block Level. IJCA,Special Issue on RTIPPR. 3, 158–163 (2010)
Sarkar, R., Das, N., Basu, S., Kundu, M., Nasipuri, M., Basu, D. K.: Word level Script Identification from Bangla and Devanagri Handwritten Texts Mixed with Roman Script. J. Comput. 2(2), 103–108 (2010)
Hangarge, M., Santosh, K. C., Pardeshi, R.: Directional discrete cosine transform for handwritten script identification. In: ICDAR. pp. 344–348 (2013)
Rani, R., Dhir, R., Lehal, G. S.: Script Identification for Pre-segmented Multi-font Characters and Digits. In: \(12^{th}\) ICDAR. pp. 2010–1154 (2013)
Roy, K., Pal, U.: Word-wise Hand-written Script Separation for Indian Postal Automation. In \(10^{th}\) IWFHR. pp. 521–526 (2006)
Roy, K., Banerjee, A., Pal, U.: A System for Word Wise Handwritten Script Identification for Indian Postal Automation. In: IEEE India Annual Conf. pp. 266–271 (2004)
Mandelbrot, B. B.: The Fractal Geometry of Nature (New York: Freeman). (1982)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer Science+Business Media Singapore
About this paper
Cite this paper
Obaidullah, S., Goswami, C., Santosh, K.C., Halder, C., Das, N., Roy, K. (2017). Separating Indic Scripts with ‘matra’—A Precursor to Script Identification in Multi-script Documents. In: Raman, B., Kumar, S., Roy, P., Sen, D. (eds) Proceedings of International Conference on Computer Vision and Image Processing. Advances in Intelligent Systems and Computing, vol 459. Springer, Singapore. https://doi.org/10.1007/978-981-10-2104-6_19
Download citation
DOI: https://doi.org/10.1007/978-981-10-2104-6_19
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-2103-9
Online ISBN: 978-981-10-2104-6
eBook Packages: EngineeringEngineering (R0)