A database of Arabic historical subwords Under LICENSE-CC-BY-NC-ND-4.0 Original Paper : https://doi.org/10.1016/j.patrec.2022.04.040
The proposed database contains 560000 subwords distributed on 5600 different classes. It was built using 64 pages extracted from 10 books written in the 17th and 16th centuries. MOJ-DB database is divided into three sets; 70%,20%, and 10% for training, testing, and validation, respectively. Ground truth is established iteratively to guarantee minimal error. It includes information about the subword as of the sourcebook and page. We conducted several experiments to verify the robustness of the proposed database as well as the validity of the segmentation process. The database is freely available for the public research community. It can be used for word and subword recognition, word spotting, subword extraction, and database construction.
To get access to this material, please contact the author abdelhay.zoizou@usmba.ac.ma / zoizou.abdelhay@gmail.com