Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
short-paper

Sentiment Analysis for a Resource Poor Language—Roman Urdu

Published: 16 August 2019 Publication History

Abstract

Sentiment analysis is an important sub-task of Natural Language Processing that aims to determine the polarity of a review. Most of the work done on sentiment analysis is for the resource-rich languages of the world, but very limited work has been done on resource-poor languages. In this work, we focus on developing a Sentiment Analysis System for Roman Urdu, which is a resource-poor language. To this end, a dataset of 11,000 reviews has been gathered from six different domains. Comprehensive annotation guidelines were defined and the dataset was annotated using the multi-annotator methodology. Using the annotated dataset, state-of-the-art algorithms were used to build a sentiment analysis system. To improve the results of these algorithms, four different studies were carried out based on: word-level features, character level features, and feature union. The best results showed that we could reduce the error rate by 12% from the baseline (80.07%). Also, to see if the improvements are statistically significant, we applied t-test and Confidence Interval on the obtained results and found that the best results of each study are statistically significant from the baseline.

References

[1]
R. M. Duwairi, R. Marji, N. Sha'ban, and S. Rushaidat. 2014. Sentiment analysis in Arabic tweets. In 2014 5th International Conference on Information and Communication Systems (ICICS). IEEE, 1--6.
[2]
B. Anwar. 2009. Urdu-English code switching: The use of Urdu phrases and clauses in Pakistani English (A non-native variety). Int. J. Lang. Stud. 3, 4 (2009), 409--424.
[3]
A. Pak and P. Paroubek. 2010. Twitter as a corpus for sentiment analysis and opinion mining. In LREc, (Vol. 10, No. 2010), 1320--1326.
[4]
Gary F. Simons and Charles D. Fennig (Eds.). 2017. Ethnologue: Languages of the World, 20th edition. Dallas, Texas: SIL International. Retrieved from http://www.ethnologue.com.
[5]
R. Feldman. 2013. Techniques and applications for sentiment analysis. Communications of the ACM 56, 4 (2013), 82--89.
[6]
B. Pang, L. Lee, and S. Vaithyanathan. 2002. Thumbs up?: Sentiment classification using machine learning techniques. In Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing. Vol. 10. 79--86.
[7]
W. Medhat, A. Hassan, and H. Korashy. 2014. Sentiment analysis algorithms and applications: A survey. Ain Shams Eng. J. 5, 4 (2014), 1093--1113.
[8]
A. Abbasi, H. Chen, and A. Salem. 2008. Sentiment analysis in multiple languages: Feature selection for opinion classification in web forums. ACM Trans. Inf. Syst. 26, 3 (2008), 12:11--12.34.
[9]
C. Yang, K. H. Y. Lin, and H. H. Chen. 2007. Emotion classification using web blog corpora. In IEEE/WIC/ACM International Conference on Web Intelligence. IEEE, 275--278.
[10]
K. Mehmood, D. Essam, and K. Shafi. 2018. Sentiment analysis system for Roman Urdu. In Science and Information Conference. Springer, Cham, 29--42.
[11]
R. Socher, D. Chen, C. D. Manning, and A. Ng. 2013. Reasoning with neural tensor networks for knowledge base completion. In Advances in Neural Information Processing Systems. 926--934.
[12]
C. Zhang, D. Zeng, J. Li, F. Y. Wang, and W. Zuo. 2009. Sentiment analysis of Chinese documents: From sentence to document level. J. Assoc. Inf. Sci. Tech. 60, 12 (2009), 2474--2487.
[13]
C. Clavel and Z. Callejas. 2016. Sentiment analysis: From opinion mining to human-agent interaction. IEEE Trans. Affective Comput. 7, 1 (2016), 74--93.
[14]
S. Ahmed, S. Hina, and R. Asif. 2018. Detection of sentiment polarity of unstructured multi-language text from social media. Int. J. Adv. Comput. Sci. Appl. 9, 7 (2018), 199--203.
[15]
M. Daud, R. Khan, and A. Daud. 2015. Roman Urdu opinion mining system (RUOMiS). arXiv preprint arXiv:1501.01386.
[16]
A. Z. Syed, M. Aslam, and A. M. Martinez-Enriquez. 2010. Lexicon based sentiment analysis of Urdu text using SentiUnits. In Mexican International Conference on Artificial Intelligence. Springer, Berlin, 32--43.
[17]
N. Mukhtar, M. A. Khan, and N. Chiragh. 2017. Effective use of evaluation measures for the validation of best classifier in Urdu sentiment analysis. Cognitive Computation (2017), 1--11.
[18]
N. Mukhtar and M. A. Khan. 2018. Urdu sentiment analysis using supervised machine learning approach. Int. J. Pattern Recognit. Artif. Intell. (2018), 32.
[19]
S. Mukund and R. K. Srihari. 2012. Analyzing Urdu social media for sentiments using transfer learning with controlled translations. In Proceedings of the Second Workshop on Language in Social Media. ACL, 1--8
[20]
J. Cohen. 1960. A coefficient of agreement for nominal scales. Educational and Psychological Measurement 20, 1 (1960), 37--46.
[21]
S. B. Kotsiantis, I. Zaharakis, and P. Pintelas. 2007. Supervised machine learning: A review of classification techniques. Emerging Artificial Intelligence Applications in Computer Engineering 160 (2007), 3--24.
[22]
T. Hastie, R. Tibshirani, and J. Friedman. 2009. Overview of supervised learning. In The Elements of Statistical Learning. Springer New York, 9--41.
[23]
S. I. Gallant. 1990. Perceptron-based learning algorithms. IEEE Trans. Neural Networks 1, 2 (1990), 179--191.
[24]
B. E. Boser, I. M. Guyon, and V. N. Vapnik. 1992. A training algorithm for optimal margin classifiers. In Proceedings of the 5th Annual Workshop on Computational Learning Theory. ACM, 144--152.
[25]
G. Zenobi and P. Cunningham. 2001. Using diversity in preparing ensembles of classifiers based on different feature subsets to minimize generalization error. Machine Learning: ECML 2001, 576--587.
[26]
A. Yessenalina, Y. Yue, and C. Cardie. 2010. Multi-level structured models for document-level sentiment classification. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. ACL, 1046--1056.
[27]
W. Medhat, A. Hassan, and H. Korashy. 2014. Sentiment analysis algorithms and applications: A survey. Ain Shams Eng. J. 5, 4 (2014), 1093--1113.
[28]
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, and J. Vanderplas. 2011. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 12, (Oct. 2011), 2825--2830.
[29]
P. H. Shahana and B. Omman. 2015. Evaluation of features on sentimental analysis. Procedia Comp. Sci. 46 (2015), 1585--1592.
[30]
C. W. Hsu and C. J. Lin. 2002. A comparison of methods for multiclass support vector machines. IEEE Trans. Neural Networks 13, 2 (2002), 415--425.
[31]
E. Cambria, B. Schuller, Y. Xia, and C. Havasi. 2013. New avenues in opinion mining and sentiment analysis. IEEE Intell. Syst. 28, 2 (2013), 15--21.
[32]
K. Oouchida, J. D. Kim, T. Takagi, and J. I. Tsujii. 2009. GuideLink: A corpus annotation system that integrates the management of annotation guidelines. In Proceedings of 23rd Pacific Asia Conference on Language, Information, and Computation. Vol. 2.
[33]
Y. Kim. 2014. Convolutional neural networks for sentence classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP'14). Association for Computational Linguistics, 1746--1751.
[34]
M. Bilal, H. Israr, M. Shahid, and A. Khan. 2016. Sentiment classification of Roman-Urdu opinions using Naïve Bayesian, Decision Tree, and KNN classification techniques. J. King Saud Univ. Comp, Inf. Sci. 28, 3 (2016), 330--344.
[35]
S. Lai, L. Xu, K. Liu, and J. Zhao. 2015. Recurrent convolutional neural networks for text classification. In AAAI, Vol. 333. 2267--2273.
[36]
M. Taboada, J. Brooke, M. Tofiloski, K. Voll, and M. Stede. 2011. Lexicon-based methods for sentiment analysis. Comput. Ling. 37, 2 (2011), 267--307.
[37]
R. M. Duwairi, R. Marji, N. Sha'ban, and S. Rushaidat. 2014. Sentiment analysis in Arabic tweets. In 2014 5th International Conference on Information and Communication Systems (ICICS), IEEE. 1--6.
[38]
D. Alessia, F. Ferri, P. Grifoni, and T. Guzzo. 2015. Approaches, tools, and applications for sentiment analysis implementation. Int. J. Comput. Appl. 125, 3 (2015), 26--33.
[39]
M. K. Malik. 2017. Urdu named entity recognition and classification system using artificial neural network. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 17, 1 (2017), 2.
[40]
S. Mohammad. 2016. A practical guide to sentiment annotation: Challenges and solutions. In Proceedings of the 7th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis. 174--179.
[41]
Y. Sun, A. K. Wong, and M. S. Kamel. 2009. Classification of imbalanced data: A review. Int. J. Pattern Recognit. Artif. Intell. 23, 4 (2009), 687--719.
[42]
R. Socher, A. Perelygin, J. Wu, J. Chuang, C. D. Manning, A. Ng, and C. Potts. 2013. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of 2013 Conference on Empirical Methods in Natural Language Processing.
[43]
Z. Lu, M. Bada, P. V. Ogren, K. B. Cohen, and L. Hunter. 2006. Improving biomedical corpus annotation guidelines. In Proceedings of the Joint BioLink and 9th Bio-ontologies Meeting. 89--92.
[44]
Z. Sharf and S. U. Rahman. 2018. Performing natural language processing on roman urdu datasets. Int. J. Comput. Sci. Network Secur. 18, 1 (2018), 141--148.
[45]
Ravi Kumar and Ravi Vadlamani. 2015. A survey on opinion mining and sentiment analysis: tasks, approaches and applications. Knowledge-Based Syst. 89 (2015), 14--46.

Cited By

View all
  • (2024)SEEUNRS: Semantically Enriched Entity-Based Urdu News Recommendation SystemACM Transactions on Asian and Low-Resource Language Information Processing10.1145/363904923:3(1-13)Online publication date: 9-Mar-2024
  • (2024)Analyzing Public Sentiment: A Deep Dive into Twitter Discourse on the 2022 No Confidence Motion in Pakistan2024 5th International Conference on Advancements in Computational Sciences (ICACS)10.1109/ICACS60934.2024.10473299(1-6)Online publication date: 19-Feb-2024
  • (2024)A Roman Urdu Corpus for sentiment analysisThe Computer Journal10.1093/comjnl/bxae05267:9(2864-2876)Online publication date: 18-Jun-2024
  • Show More Cited By

Index Terms

  1. Sentiment Analysis for a Resource Poor Language—Roman Urdu

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Asian and Low-Resource Language Information Processing
    ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 19, Issue 1
    January 2020
    345 pages
    ISSN:2375-4699
    EISSN:2375-4702
    DOI:10.1145/3338846
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 16 August 2019
    Accepted: 01 April 2019
    Revised: 01 February 2019
    Received: 01 February 2018
    Published in TALLIP Volume 19, Issue 1

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Resource poor language
    2. Roman Urdu
    3. Roman Urdu sentiment analysis

    Qualifiers

    • Short-paper
    • Research
    • Refereed

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)43
    • Downloads (Last 6 weeks)5
    Reflects downloads up to 10 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)SEEUNRS: Semantically Enriched Entity-Based Urdu News Recommendation SystemACM Transactions on Asian and Low-Resource Language Information Processing10.1145/363904923:3(1-13)Online publication date: 9-Mar-2024
    • (2024)Analyzing Public Sentiment: A Deep Dive into Twitter Discourse on the 2022 No Confidence Motion in Pakistan2024 5th International Conference on Advancements in Computational Sciences (ICACS)10.1109/ICACS60934.2024.10473299(1-6)Online publication date: 19-Feb-2024
    • (2024)A Roman Urdu Corpus for sentiment analysisThe Computer Journal10.1093/comjnl/bxae05267:9(2864-2876)Online publication date: 18-Jun-2024
    • (2024)Enhanced UrduAspectNet: Leveraging Biaffine Attention for superior Aspect-Based Sentiment AnalysisJournal of King Saud University - Computer and Information Sciences10.1016/j.jksuci.2024.10222136:9(102221)Online publication date: Nov-2024
    • (2024)Extracting emotion from resource poor language through transfer learningMultimedia Tools and Applications10.1007/s11042-024-19870-wOnline publication date: 30-Jul-2024
    • (2024)Urdu Sentiment Analysis: A ReviewData Science and Applications10.1007/978-981-99-7817-5_34(463-472)Online publication date: 18-Jan-2024
    • (2024)Machine Learning-Based Binary Sentiment Classification of Movie Reviews in Hindi (Devanagari Script)Proceedings of Data Analytics and Management10.1007/978-981-99-6544-1_3(23-38)Online publication date: 14-Jan-2024
    • (2023)Count Me Too: Sentiment Analysis of Roman Sindhi ScriptSage Open10.1177/2158244023119745213:3Online publication date: 29-Sep-2023
    • (2023)Towards Enhanced Identification of Emotion from Resource-Constrained Language through a novel Multilingual BERT ApproachACM Transactions on Asian and Low-Resource Language Information Processing10.1145/3592794Online publication date: 19-Apr-2023
    • (2023)A Comprehensive Roadmap on Bangla Text-based Sentiment AnalysisACM Transactions on Asian and Low-Resource Language Information Processing10.1145/357278322:4(1-29)Online publication date: 6-Apr-2023
    • Show More Cited By

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media