Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

A deep learning based system for writer identification in handwritten Arabic historical manuscripts

Published: 01 September 2022 Publication History

Abstract

Determining the writer or transcriber of historical Arabic manuscripts has always been a major challenge for researchers in the field of humanities. With the development of advanced techniques in pattern recognition and machine learning, these technologies have been applied to automate the extraction of paleographical features in order to solve this issue. This paper presents a baseline system for writer identification, tested on a Historical Arabic dataset of 11610 single and double folio images. These texts were extracted from a unique collection of 567 Historical Arabic Manuscripts available at the Balamand Digital Humanities Center. A survey has been conducted on the available Arabic datasets and previously proposed techniques and algorithms. The Balamand dataset presents an important challenge due to the geo-historical identity of manuscripts and their physical conditions. An advanced Deep Learning system was developed and tested on three different Latin and Arabic datasets: ICDAR19, ICFHR20 and KHATT, before testing it on the Balamand dataset. The system was compared with many other systems and it has yielded a state-of-the-art performance on the new challenging images with 95.2% mean Average Precision (mAP) and 98.1% accuracy.

References

[1]
Abdelhaleem A, Droby A, Asi A, Kassis M, Al Asam R, El-sanaa J (2017) Wahd: a database for writer identification of arabic historical documents. In: 2017 1st International workshop on arabic script analysis and recognition (ASAR), pp 64–68. IEEE
[2]
Abdleazeem S and El-Sherif E Arabic handwritten digit recognition Int J Doc Anal Recogn (IJDAR) 2008 11 127-141
[3]
Asi A, Abdalhaleem A, Fecker D, Märgner V, and El-Sana J On writer identification for arabic historical manuscripts Int J Doc Anal Recogn (IJDAR) 2017 20 173-187
[4]
Awaida S, Mahmoud S (2011) Writer identification of arabic handwritten digits. In: First international workshop on frontiers in arabic handwritng recognition, 2010
[5]
Awaida SM and Mahmoud SA State of the art in off-line writer identification of handwritten text and survey of writer identification of arabic text Educ Res Rev 2012 7 445
[6]
Bausi A, Borbone PG, Briquel-Chatonnet F, Buzi P, Gippert J, Macé C, Melissakēs Z, Parodi LE, Witakowski W, Sokolinski E (2015) Comparative Oriental manuscript studies: an introduction. COMSt
[7]
Chammas M, Makhoul A, Demerjian J (2020) Writer identification for historical handwritten documents using a single feature extraction method. In: 19th IEEE International conference on machine learning and applications (ICMLA 2020)
[8]
Chandra K, Kapoor G, Kohli R, Gupta A (2016) Improving software quality using machine learning. In: 2016 international conference on innovation and challenges in cyber security (ICICCS-INBUSH), pp 115–118. IEEE
[9]
Chaurasia P, Kohli R, Garg A (2014) Biometrics minutiae detection and feature extraction. LAP LAMBERT Academic Publishing
[10]
Chen S, Wang Y, Lin C-T, Ding W, and Cao Z Semi-supervised feature learning for improving writer identification Inform Sci 2019 482 156-170
[11]
Christlein V, Bernecker D, Honig F, Angelopoulou E (2014) Writer identification and verification using GMM supervectors. IEEE Winter Conference on Applications of Computer Vision
[12]
Christlein V, Bernecker D, Hönig F, Maier A, and Angelopoulou E Writer identification using GMM supervectors and Exemplar-SVMs Pattern Recogn 2017 63 258-267
[13]
Christlein V, Gropp M, Fiel S, Maier A (2017) Unsupervised feature learning for writer identification and writer retrieval. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR)
[14]
Christlein V, Maier A (2018) Encoding CNN activations for writer recognition. In: 2018 13th IAPR international workshop on document analysis systems (DAS)
[15]
Christlein V, Nicolaou A, Seuret M, Stutzmann D, Maier A (2019) ICDAR 2019 competition on image retrieval for historical handwritten documents. arXiv [cs.CV]
[16]
Dé roche FÇO, Rossi VS (2012) The manuscripts in Arabic characters. Viella
[17]
Déroche F et al (2005) Islamic codicology. An Introduction to the Study of Manuscripts in Arabic Script
[18]
Djeddi C, Souici-Meslati L (2011) Artificial immune recognition system for arabic writer identification. In: International symposium on innovations in information and communications technology, pp 159–165. IEEE
[19]
Fecker D, Asi A, Pantke W, Märgner V, El-Sana J, Fingscheidt T (2014) Document writer analysis with rejection for historical arabic manuscripts. In: 2014 14th international conference on frontiers in handwriting recognition, pp 743–748. IEEE
[20]
Fecker D, Asit A, Märgner V, El-Sana J, Fingscheidt T (2014) Writer identification for historical arabic documents. In: 2014 22nd International conference on pattern recognition, pp 3050–3055. IEEE
[21]
Fiel S, Sablatnig R (2015) Writer identification and retrieval using a convolutional neural network. Computer Analysis of Images and Patterns, 26–37
[22]
Hannad Y, Siddiqi I, Djeddi C, and El-Kettani ME-Y Improving arabic writer identification using score-level fusion of textural descriptors IET Biometr 2019 8 221-229
[23]
Lai S, Zhu Y, and Jin L Encoding pathlet and sift features with bagged vlad for historical writer identification IEEE Trans Inform Forens Secur 2020 15 3553-3566
[24]
Mahmoud SA, Ahmad I, Al-Khatib WG, Alshayeb M, Parvez MT, Märgner V, and Fink GA Khatt: an open arabic offline handwritten text database Pattern Recogn 2014 47 1096-1112
[25]
Mahmoud SA, Ahmad I, Alshayeb M, Al-Khatib WG, Parvez MT, Fink GA, Märgner V, El Abed H (2012) Khatt: Arabic offline handwritten text database. In: 2012 International conference on frontiers in handwriting recognition, pp 449–454. IEEE
[26]
Malisiewicz T, Gupta A, Efros AA Ensemble of exemplar-SVMs for object detection and beyond. In: 2011 International conference on computer vision, vol 2011
[27]
Nguyen HT, Nguyen CT, Ino T, Indurkhya B, and Nakagawa M Text-independent writer identification using convolutional neural network Pattern Recogn Lett 2019 121 104-112
[28]
Pechwitz M, Maddouri S, Märgner V, Ellouze N, Amiri H (2002) Ifn/enit: database of handwritten arabic words
[29]
P5: Guidelines for electronic text encoding and interchange. https://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-colophon.html. Accessed December 10th 2021
[30]
Rehman A, Naz S, and Razzak MI Writer identification using machine learning approaches: a comprehensive review Multimed Tools Appl 2019 78 10889-10931
[31]
Seuret M, Nicolaou A, Maier A, Christlein V, Stutzmann D (2020) Icfhr 2020 competition on image retrieval for historical handwritten fragments. In: 2020 17th International conference on frontiers in handwriting recognition (ICFHR), pp 216–221. IEEE
[32]
Slimane F, Awaida S, Mezghani A, Parvez MT, Kanoun S, Mahmoud SA, Märgner V (2014) Icfhr2014 competition on arabic writer identification using ahtid/mw and khatt databases. In: 2014 14th international conference on frontiers in handwriting recognition, pp 797–802. IEEE
[33]
The Arabic Manuscripts in the Antiochian Orthodox Monasteries in Lebanon volume 1–2. University of Balamand

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Multimedia Tools and Applications
Multimedia Tools and Applications  Volume 81, Issue 21
Sep 2022
1489 pages

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 September 2022
Accepted: 21 February 2022
Revision received: 12 January 2022
Received: 04 May 2021

Author Tags

  1. Writer identification
  2. Historical documents
  3. Artificial intelligence
  4. Document analysis
  5. Arabic manuscripts

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 22 Sep 2024

Other Metrics

Citations

Cited By

View all

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media