Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Automatic Indonesian Sentiment Lexicon Curation with Sentiment Valence Tuning for Social Media Sentiment Analysis

Published: 02 March 2021 Publication History

Abstract

A novel Indonesian sentiment lexicon (SentIL -- Sentiment Indonesian Lexicon) is created with an automatic pipeline; from creating sentiment seed words, adding new words with slang words, emoticons, and from the given dictionary and sentiment corpus, until tuning sentiment value with tagged sentiment corpus. It begins by taking seed words from WordNet Bahasa that mapped with sentiment value from English SentiWordNet. The seed words are enriched by combining the dictionary-based method with words’ synonyms and antonyms, and corpus-based methods with word embedding for word similarity that trained in positive and negative sentiment corpus from online marketplaces review and Twitter data. The valence score of each lexicon is recalculated based on its relative occurrence in the corpus. We also add some famous slang words and emoticons to enrich the lexicon. Our experiment shows that the proposed method can provide an increase of 3.5 times lexicon number as well as improve the accuracy of 80.9% for online review and 95.7% for Twitter data, and they are better than other published and available Indonesian sentiment lexicons.

References

[1]
Silvio Amir, Ramón Astudillo, Wang Ling, Bruno Martins, Mario J. Silva, and Isabel Trancoso. 2015. INESC-ID: A regression model for large scale Twitter sentiment lexicon induction. In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval’15). 613--618.
[2]
Stefano Baccianella, Andrea Esuli, and Fabrizio Sebastiani. 2010. SENTIWORDNET 3.0: An enhanced lexical resource for sentiment analysis and opinion mining. Proceedings of LREC 10, (Nov. 2010), 2200--2204.
[3]
Gilbert Badaro, Ramy Baly, Hazem Hajj, Nizar Habash, and Wassim El-Hajj. 2014. A large scale arabic sentiment lexicon for arabic opinion mining. In Proceedings of the EMNLP 2014 Workshop on Arabic Natural Language Processing (ANLP’14). 165--173.
[4]
Francis Bond, Lian Tze Lim, Enya Kong Tang, and Hammam Riza. 2014. The combined Wordnet Bahasa. NUSA: Linguistic Studies of Languages in and Around Indonesia 57, Chapter 8 (2014), 83--100. arxiv:arXiv:1310.1707v3
[5]
Gerlof Bouma. 2009. Normalized (pointwise) mutual information in collocation extraction. In Proceedings of the Biennial GSCL Conference (2009), 31--40.
[6]
Erik Cambria, Robert Speer, Catherine Havasi, and Amir Hussain. 2010. SenticNet: A publicly available semantic resource for opinion mining. AAAI Fall Symposium - Technical Report FS-10-02 (2010), 14--18.
[7]
Ilia Chetviorkin and Natalia Loukachevitch. 2015. Two-step model for sentiment lexicon extraction from Twitter streams. In Proceedings of the 5th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis. 67--72.
[8]
Mohammad Darwich, Shahrul Azman Mohd Noah, and Nazlia Omar. 2016. Automatically generating a sentiment lexicon for the Malay language. Asia-Pacific Journal of Information Technology and Multimedia 5, 1 (2016), 49--59.
[9]
Rahim Dehkharghani. 2019. SentiFars: A Persian polarity lexicon for sentiment analysis. ACM Transactions on Asian and Low-Resource Language Information Processing 19, 2 (2019).
[10]
Mohammad Ehsan Basiri and Arman Kabiri. 2018. Words are important: Improving sentiment analysis in the Persian language by lexicon refining. ACM Transactions on Asian and Low-Resource Language Information Processing 17, 4 (2018).
[11]
Franky, Ondřej Bojar, and Kateřina Veselovská. 2015. Resources for Indonesian sentiment analysis. The Prague Bulletin of Mathematical Linguistics 103, 1 (2015), 21--41.
[12]
Lichan Hong, Gregorio Convertino, and Ed H. Chi. 2011. Language matters in Twitter: A large scale study. AAAI (2011), 518--521.
[13]
CJ J. Hutto and Eric Gilbert. 2014. Vader: A parsimonious rule-based model for sentiment analysis of social media text. In Proceedings of the 8th International Conference on Weblogs and Social Media (ICWSM’14), 216--225.
[14]
Svetlana Kiritchenko, Xiaodan Zhu, and Saif M. Mohammad. 2014. Sentiment analysis of short informal texts. Journal of Artificial Intelligence Research 50 (2014), 723--762.
[15]
O. Yu Koltsova, S. V. Alexeeva, and S. N. Kolcov. 2016. An opinion word lexicon and a training dataset for Russian sentiment analysis of social media. Proceedings of the International Conference “Dialogue 2016” (2016), 277--287.
[16]
Fajri Koto and Gemala Y. Rahmaningtyas. 2018. Inset lexicon: Evaluation of a word list for Indonesian sentiment analysis in microblogs. In Proceedings of the 2017 International Conference on Asian Language Processing (IALP’17) (2018), 391--394.
[17]
Bing Liu. 2012. Sentiment Analysis and Opinion Mining. Morgan and Claypool Publishers. arxiv:1003.5699
[18]
Bing Liu, Minqing Hu, and Junsheng Cheng. 2005. Opinion observer: Analyzing and comparing opinions on the web. In Proceedings of the 14th International Conference on World Wide Web, 342--351.
[19]
Walaa Medhat, Ahmed Hassan, and Hoda Korashy. 2014. Sentiment analysis algorithms and applications: A survey. Ain Shams Engineering Journal 5, 4 (2014), 1093--1113.
[20]
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. In Workshop at ICLR. 1--12. arxiv:1301.3781
[21]
Saif M. Mohammad, Mohammad Salameh, and Svetlana Kiritchenko. 2016. Sentiment lexicons for arabic social media. In Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC’16), 33--37.
[22]
Saif M. Mohammad and Peter D. Turney. 2012. Crowdsourcing a word-emotion association lexicon. Computational Intelligence 29, 3 (2012).
[23]
Finn Årup Nielsen. 2011. A new ANEW: Evaluation of a word list for sentiment analysis in microblogs. In CEUR Workshop Proceedings, Vol. 718. 93--98. arxiv:1103.2903
[24]
Veronica Perez-Rosas, Carmen Banea, and Rada Mihalcea. 2012. Learning sentiment lexicons in spanish. In Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC’12), 3077--3081.
[25]
Jacobo Rouces, Nina Tahmasebi, Lars Borin, and Stian Rødven Eide. 2019. Sensaldo: Creating a sentiment lexicon for Swedish. In Proceedings of LREC 2018-11th International Conference on Language Resources and Evaluation, 4192--4198.
[26]
Maite Taboada, Julian Brooke, Milan Tofiloski, Kimberly Voll, and Manfred Stede. 2011. Lexicon-based methods for sentiment analysis. Computational Linguistics 37, 2 (2011), 267--307.
[27]
Mike Thelwall, Kevan Buckley, and Georgios Paltoglou. 2013. Sentiment strength detection for the social web. Journal of the American Society for Information Science and Technology 64, (July 2013), 1852--1863.
[28]
Clara Vania, Moh. Ibrahim, and Mirna Adriani. 2014. Sentiment lexicon generation for an under-resourced language. International Journal of Computational Linguistics and Applications 5, 1 (2014).
[29]
Xuan-Son Vu and Seong-Bae Park. 2014. Construction of Vietnamese sentiwordnet by using Vietnamese dictionary. arXiv:1412.8010 (2014), 2--5.
[30]
Devid Haryalesmana Wahid and Azhari S. N. 2016. Peringkasan sentimen esktraktif di twitter menggunakan hybrid TF-IDF dan cosine similarity. Indonesian Journal of Computing and Cybernetics Systems 10, 2 (2016), 207--218.
[31]
Ulli Waltinger. 2010. GermanPolarityClues: A lexical resource for German sentiment analysis. In Proceedings of the 7th International Conference on Language Resources and Evaluation, 1638--1642.
[32]
Xiaojun Wan. 2010. Using bilingual knowledge and ensemble techniques for unsupervised Chinese sentiment analysis. In Proceedings of Tthe Conference on Empirical Methods in Natural Language Processing. 553.
[33]
Theresa Wilson, Janyce Wiebe, and Paul Hoffmann. 2010. Recognizing contextual polarity in phrase-level sentiment analysis. International Journal of Computer Applications 7, 5 (2010), 12--21.
[34]
Liang-Chih Yu, Lung-Hao Lee, Shuai Hao, Jin Wang, Yunchao He, Jun Hu, K. Robert Lai, and Xuejie Zhang. 2016. Building Chinese affective resources in valence-arousal dimensions. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 540--545.

Cited By

View all
  • (2023)Image and Text Aspect Level Multimodal Sentiment Classification Model Using Transformer and Multilayer Attention InteractionInternational Journal of Data Warehousing and Mining10.4018/IJDWM.33385419:1(1-20)Online publication date: 15-Nov-2023
  • (2023)An Analysis Framework to Reveal Automobile Users’ Preferences from Online User-Generated ContentSustainability10.3390/su15181333615:18(13336)Online publication date: 6-Sep-2023
  • (2023)Framework to Create Sentiment Lexicons For Extremism Detection in Text Documents2023 International Conference on IoT, Communication and Automation Technology (ICICAT)10.1109/ICICAT57735.2023.10263761(1-6)Online publication date: 23-Jun-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Asian and Low-Resource Language Information Processing
ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 20, Issue 1
Special issue on Deep Learning for Low-Resource Natural Language Processing, Part 1 and Regular Papers
January 2021
332 pages
ISSN:2375-4699
EISSN:2375-4702
DOI:10.1145/3439335
Issue’s Table of Contents
Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 March 2021
Accepted: 01 September 2020
Revised: 01 June 2020
Received: 01 July 2019
Published in TALLIP Volume 20, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Sentiment analysis
  2. lexicon
  3. sentiment valence
  4. social media
  5. word embedding

Qualifiers

  • Research-article
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)45
  • Downloads (Last 6 weeks)4
Reflects downloads up to 04 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Image and Text Aspect Level Multimodal Sentiment Classification Model Using Transformer and Multilayer Attention InteractionInternational Journal of Data Warehousing and Mining10.4018/IJDWM.33385419:1(1-20)Online publication date: 15-Nov-2023
  • (2023)An Analysis Framework to Reveal Automobile Users’ Preferences from Online User-Generated ContentSustainability10.3390/su15181333615:18(13336)Online publication date: 6-Sep-2023
  • (2023)Framework to Create Sentiment Lexicons For Extremism Detection in Text Documents2023 International Conference on IoT, Communication and Automation Technology (ICICAT)10.1109/ICICAT57735.2023.10263761(1-6)Online publication date: 23-Jun-2023
  • (2023)Sentiment Analysis and Topic Modeling of E-Grocery Application Reviews Using Naive Bayes and Support Vector Machine: A Case Study of Segari Data Review on the Google Play Store2023 3rd International Conference on Electronic and Electrical Engineering and Intelligent System (ICE3IS)10.1109/ICE3IS59323.2023.10335206(13-18)Online publication date: 9-Aug-2023
  • (2023)Advancing Automated Content Analysis for a New Era of Media Effects Research: The Key Role of Transfer LearningCommunication Methods and Measures10.1080/19312458.2023.226137218:2(142-162)Online publication date: 4-Oct-2023
  • (2023)Aspect-based sentiment analysis: an overview in the use of Arabic languageArtificial Intelligence Review10.1007/s10462-022-10215-356:3(2325-2363)Online publication date: 1-Mar-2023
  • (2022)MELex: The Construction of Malay-English Sentiment LexiconComputers, Materials & Continua10.32604/cmc.2022.02113171:1(1789-1805)Online publication date: 2022
  • (2022)Sentiment lexicon construction for Chinese book reviews based on ultrashort reviewsThe Electronic Library10.1108/EL-07-2021-014740:3(221-236)Online publication date: 12-Apr-2022
  • (2022)CNN-LSTM neural network model for fine-grained negative emotion computing in emergenciesAlexandria Engineering Journal10.1016/j.aej.2021.12.02261:9(6755-6767)Online publication date: Sep-2022

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media