Abstract
One of methods used to reduce the size of terms vocabulary in Arabic text categorization is to replace the different variants (forms) of words by their common root. The search of root in Arabic or Arabic word root extraction is more difficult than other languages since Arabic language has a very different and difficult structure, that is because it is a very rich language with complex morphology. Many algorithms are proposed in this field. Some of them are based on morphological rules and grammatical patterns, thus they are quite difficult and require deep linguistic knowledge. Others are statistical, so they are less difficult and based only on some calculations. In this paper we propose a new statistical algorithm which permits to extract roots of Arabic words using the technique of n-grams of characters without using any morphological rule or grammatical patterns.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Fatma, A.H., Keith, E.: Rule-based Approach for Arabic Root Extraction: New Rules to Directly Extract Roots of Arabic Words. Journal of Computing and Information Technology CIT Journal, 57–68 (2014)
Ghazzawi, S.: The Arabic Language in the Class Room, 2nd edn. Georgetown University, Washington DC (1992)
ETHNOLOGUE, http://www.ethnologue.com/statistics/size (accessed January 16, 2014)
Al-Kamar, R.: Computer and arabic language computerizing. Dar Al Kotob Al-Ilmiya, Cairo (2006)
Ghwanmeh, S., Kanaan, G., Al-Shalabi, R., Rabab’ah, S.: Enhanced algorithm for extracting the root of Arabic words. In: Proceeding of the 6th International Conference on Computer Graphics, Imaging and Visualization, August 11-14, pp. 388–391. IEEE Xplore Press, Tianjin (2009)
Yousef, N., Al-Bidewi, I., Fayoumi, M.: Evaluation of different query expansion techniques and using different similarity measures in Arabic documents. Eur. J. Sci. Res. 43, 156–166 (2010)
Wightwick, J., Gaafar, M.: Arabic Verbs and Essentials of Grammar, 2E (Verbs and Essentials of Grammar Series), 2nd edn., p. 160. McGraw-Hill Companies, Inc. (2007) ISBN-10: 0071498052
Al-omari, A., Abuata, B., Al-kabi, M.: Building and Benchmarking New Heavy/Light Arabic Stemmer. In: The 4th International conference on Information and Communication systems (ICICS 2013) (2013)
Shereen, K., Garside, R.: Stemming Arabic text. Technical report, Computing Department, Lancaster University (1999), http://www.comp.lancs.ac.uk/computing/users/khoja/stemmer.ps (last visited 1999)
Momani, M., Faraj, J.: A novel algorithm to extract tri-literal Arabic roots. In: Proceedings of the IEEE/ACS International Conference on Computer Systems and Applications, May 13-16, pp. 309–315. IEEE Xplore Press, Amman (2007)
Al shalabi, R.: Pattern-based stemmer for finding Arabic roots. Information Technology Journal 4(1), 38–43 (2005)
Hajjar, A.E.S.A., Hajjar, M.: Zreik, K.: A system for evaluation of Arabic root extraction methods. In: Proceeding of 5th International Conference on Internet and Web Applications and Services (ICIW), May 9-15, pp. 506–512. IEEE Xplore Press, Barcelona (2010)
Al-Nashashibi, M.Y., Neagu, D., Yaghi, A.A.: An improved root extraction technique for Arabic words. In: Proceeding of 2nd International Conference on Computer Technology and Development (ICCTD), November 2-4, pp. 264–269. IEEE Xplore Press, Cairo (2010)
Al-shalabi, R., Kanaan, G., Al-Serhan, H.: New Approach for Extracting Arabic Roots. In: Proceedings of the International ArabConference on Information Technology (ACIT 20003), Alexandria, Egypt, pp. 42–59 (2003)
Rehab, D.: Arabic Text Categorization. The International Arab Journal of Information Technology 4(2), 125–131 (2007)
Al-Nashashibi, M.Y., Neagu, D.: Ali. A. Y.: Stemming Techniques for Arabic Words: A Comparative Study. In: 2nd International Conference on Computer Technology and development (lCCTD 2010), pp. 270–276 (2010)
Kanaan, G., Al-Shalabi, R., Al-Kabi, M.: New Approach for Extracting Quadrilateral Arabic Roots. Abhath Al-Yarmouk, Basic Science and Engineering 14(1), 51–66 (2005)
Ghwanmeh, S., Al-Shalabi, R., Kanaan, G., Khanfar, K., Rabab’ah, S.: An Algorithm for extracting the Root of Arabic Words. In: Proceedings of the 5th International Business Information Management Conference (IBIMA), Cairo, Egypt (2005)
Mohamad, A., Al-Shalabi, R., Kanaan, G., Al-Nobani, A: Building an Effective Rule-Based Light Stemmer for Arabic Language to Improve Search Effectiveness. The International Arab Journal of Information Technology, 9(4) (July 2012), (received February 22, 2010) (accepted May 20, 2010)
Al-Shalabi, R., Kanaan, G., Ghwanmeh, S.: Stemmer Algorithm for Arabic Words Based on Excessive Letter Locations. In: IEEE Conference (2008)
Shereen, K.: Stemming Arabic Text, http://zeus.cs.pacificu.edu/shereen/research.htm
Larkey, L., Connell, M.E.: Arabic information retrieval at UMass in TREC 2010. In: Proceedings of TREC 2001, NIST, Gaithersburg (2001)
Larkey, S., Ballesteros, L., Margaret, E.: Improving Stemming for Arabic Information Retrieval: Light Stemming and Occurrence Analysis. In: Proc. of the 25th ACM International Conference on Research and Development in Information Retrieval (SIGIR 2002), Tampere, Finland, pp. 275–282 (2002)
Larkey, S., Ballesteros, L., Margaret, C.E.: Light Stemming for Arabic Information Retrieval. In: Arabic Computational Morphology. Text, Speech and Language Technology, vol. 38, pp. 221–243 (2007)
Sawalha, M., Atwell, E.: Comparative Evaluation of Arabic Language Morphological Analyzers and Stemmers. In: Proceedings of COLING-ACL (2008)
Hawas, F.A.: Exploit relations between the word letters and their placement in the word for Arabic root extraction. Comput. Sci. 14, 27–431
Hmeidi, I.I., Al-Shalabi, R., Al-Taani, A.T., Najadat, H., Al-Hazaimeh, S.A.: A novel approach to the extraction of roots from Arabic words using bigrams. J. Am. Soc. Inform. Sci. Technol. 61, 583–591 (2010)
Boudlal, A., Belahbib, R., Belahbib, A., Mazroui, A.: A markovian approach for Arabic root extraction. Int. Arab J. Inform. Technol. 8, 91–98 (2011)
Yousef, N., Aymen, A.E., Ashraf, O., Hayel, K.: An Improved Arabic Word’s Roots Extraction Method Using N-gram Technique. Journal of Computer science JSC 10(4) (2014), Published Online http://www.thescipub.com/jcs.toc
Frakes, W.B.: Stemming Algorithms. In: Frakes, W.B., Baeza-Yates, R. (eds.) Information Retrieval: Data Structures and Algorithms, pp. 131–160. Prentice-Hall India (1992) ISBN-10: 8131716929
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 IFIP International Federation for Information Processing
About this paper
Cite this paper
Gadri, S., Moussaoui, A. (2015). Arabic Texts Categorization: Features Selection Based on the Extraction of Words’ Roots. In: Amine, A., Bellatreche, L., Elberrichi, Z., Neuhold, E., Wrembel, R. (eds) Computer Science and Its Applications. CIIA 2015. IFIP Advances in Information and Communication Technology, vol 456. Springer, Cham. https://doi.org/10.1007/978-3-319-19578-0_14
Download citation
DOI: https://doi.org/10.1007/978-3-319-19578-0_14
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-19577-3
Online ISBN: 978-3-319-19578-0
eBook Packages: Computer ScienceComputer Science (R0)