Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3206098.3206109acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicisdmConference Proceedingsconference-collections
research-article

Do Words with Certain Part of Speech Tags Improve the Performance of Arabic Text Classification?

Published: 09 April 2018 Publication History
  • Get Citation Alerts
  • Abstract

    Feature extraction - the process of choosing feature types that can represent and discriminate between dataset topics - is one of the critical steps in text classification and varies with the language of the texts. Different feature types have been proposed for Arabic text classification, ranging from features based on word orthography (single word and character and word N-grams) to features based on linguistic analysis (roots, stems). To the best of our knowledge, little attention has been paid to investigating the performance of Arabic text classification when Part of Speech (POS) tagging information is used to extract features. In this study, we used a corpus comprising 4900 newspaper texts distributed evenly over seven topics to investigate the effect of using POS tag distribution and words that belong to certain POS tags on Arabic text classification, namely nouns, verbs and adjectives. For feature selection, feature representation and text classification we used Chi-squared, Log-Weighted Term Frequency Inverse Document Frequency with Cosine Normalization (LTC) and support vector machine (SVM) respectively. We used four metrics, namely accuracy, precision, recall and F-measure to measure classification performance. Experiment data suggest that the words achieved the best classification performance when the number of features was low; however, the classification performance can be marginally increased when nouns, verbs and adjectives are used as features, given that the number of features is increased.

    References

    [1]
    Al-Badarneh, A. et al. 2017. The impact of indexing approaches on Arabic text classification. Journal of Information Science. 43, 2 (2017), 159--173.
    [2]
    Al-Thubaity, A. et al. 2015. Using Word N-Grams as Features in Arabic Text Classification. Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing. R. Lee, ed. Springer International Publishing. 35--43.
    [3]
    Al-Thubaity, A. and Al-Subaie, A. 2015. Effect of word segmentation on Arabic text classification. 2015 International Conference on Asian Language Processing (IALP).
    [4]
    Chang, C.-C. and Lin, C.-J. 2011. LIBSVM: A Library for Support Vector Machines. ACM Trans. Intell. Syst. Technol. 2, 3 (2011), 27:1--27:27.
    [5]
    Duwairi, R. et al. 2009. Feature Reduction Techniques for Arabic Text Categorization. J. Am. Soc. Inf. Sci. Technol. 60, 11 (2009), 2347--2352.
    [6]
    Haralambous, Y. et al. 2014. Arabic language text classification using dependency syntax-based feature selection. arXiv preprint arXiv:1410.4863. (2014).
    [7]
    Harrag, F. et al. 2010. Comparing dimension reduction techniques for Arabic text classification using BPNN algorithm. Integrated Intelligent Computing (ICIIC), 2010 First International Conference on (2010), 6--11.
    [8]
    Hmeidi, I. et al. 2015. Automatic Arabic text categorization: A comprehensive comparative study. Journal of Information Science. 41, 1 (2015), 114--124.
    [9]
    Khorsheed, M.S. and Al-Thubaity, A.O. 2013. Comparative evaluation of text classification techniques using a large diverse Arabic dataset. Language resources and evaluation. 47, 2 (2013), 513--538.
    [10]
    Khreisat, L. 2009. A machine learning approach for Arabic text classification using N-gram frequency statistics. Journal of Informetrics. 3, 1 (2009), 72--77.
    [11]
    Madsen, R.E. et al. 2004. Part-of-speech enhanced context recognition. Machine Learning for Signal Processing, 2004. Proceedings of the 2004 14th IEEE Signal Processing Society Workshop (2004), 635--643.
    [12]
    Moh'd A Mesleh, A. 2007. Chi square feature extraction based svms arabic language text categorization system. Journal of Computer Science. 3, 6 (2007), 430--435.
    [13]
    Moh'd Mesleh, A. 2011. Feature sub-set selection metrics for Arabic text classification. Pattern Recognition Letters. 32, 14 (2011), 1922--1929.
    [14]
    Monroe, W. et al. 2014. Word Segmentation of Informal Arabic with Domain Adaptation. ACL (2) (2014), 206--211.
    [15]
    Moschitti, A. and Basili, R. 2004. Complex linguistic features for text classification: A comprehensive study. European Conference on Information Retrieval (2004), 181--196.
    [16]
    Toutanova, K. et al. 2003. Feature-rich part-of-speech tagging with a cyclic dependency network. Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology-Volume 1 (2003), 173--180.
    [17]
    Yousif, S.A. et al. Enhancement of Arabic Text Classification Using Semantic Relations with Part of Speech Tagger. W transactions Advances In Electrical And Computer Engineering. 195--201.

    Cited By

    View all
    • (2021)Part-of-Speech Tagging Enhancement to Natural Language Processing for Thai Wh-Question Classification with Deep LearningHeliyon10.1016/j.heliyon.2021.e08216(e08216)Online publication date: Oct-2021
    • (2020)Impact of Stemming and Word Embedding on Deep Learning-Based Arabic Text CategorizationIEEE Access10.1109/ACCESS.2020.30092178(127913-127928)Online publication date: 2020
    • (2019)Exploring the Performance of Tagging for the Classical and the Modern Standard ArabicAdvances in Fuzzy Systems10.1155/2019/62546492019Online publication date: 23-Jan-2019

    Index Terms

    1. Do Words with Certain Part of Speech Tags Improve the Performance of Arabic Text Classification?

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      ICISDM '18: Proceedings of the 2nd International Conference on Information System and Data Mining
      April 2018
      169 pages
      ISBN:9781450363549
      DOI:10.1145/3206098
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 09 April 2018

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Arabic text classification
      2. POS tagging
      3. classification performance
      4. feature selection

      Qualifiers

      • Research-article
      • Research
      • Refereed limited

      Conference

      ICISDM '18

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)5
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 09 Aug 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2021)Part-of-Speech Tagging Enhancement to Natural Language Processing for Thai Wh-Question Classification with Deep LearningHeliyon10.1016/j.heliyon.2021.e08216(e08216)Online publication date: Oct-2021
      • (2020)Impact of Stemming and Word Embedding on Deep Learning-Based Arabic Text CategorizationIEEE Access10.1109/ACCESS.2020.30092178(127913-127928)Online publication date: 2020
      • (2019)Exploring the Performance of Tagging for the Classical and the Modern Standard ArabicAdvances in Fuzzy Systems10.1155/2019/62546492019Online publication date: 23-Jan-2019

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media