Authors:
Faten Ziadi
1
;
2
;
Imen Ben Cheikh
2
and
Mohamed Jemni
2
Affiliations:
1
Latice Laboratory, ENSIT, University of Tunis, Tunis, Tunisia
;
2
University of Sousse, ISITCom, 4011, Sousse, Tunisia
Keyword(s):
CNN, LSTM, Arabic Writing, Large Vocabulary, Linguistic Knowledge, APTI Dataset.
Abstract:
In this paper, we propose a convolutional recurrent approach for Arabic word recognition. We handle a large vocabulary of Arabic decomposable words, which are factored according to their roots and schemes. Exploiting derivational morphology, we have conceived as the first step a convolutional neural network, which classifies Arabic roots extracted from a set of word samples int the APTI database. In order to further exploit linguistic knowledge, we have accomplished the word recognition process through a recurrent network, especially LSTM. Thanks to its recurrence and memory cabability, the LSTM model focuses not only prefixes, infixes and suffixes listed in chronological order, but also on the relation between them in order to recognize word patterns and some flexional details such as, gender, number, tense, etc.