research-article

Toward Integrated CNN-based Sentiment Analysis of Tweets for Scarce-resource Language—Hindi

Authors:

Shubham Shubham,

Ankit Chaudhary,

Qin XinAuthors Info & Claims

Transactions on Asian and Low-Resource Language Information Processing, Volume 20, Issue 5

Article No.: 80, Pages 1 - 23

https://doi.org/10.1145/3450447

Published: 30 June 2021 Publication History

Abstract

Linguistic resources for commonly used languages such as English and Mandarin Chinese are available in abundance, hence the existing research in these languages. However, there are languages for which linguistic resources are scarcely available. One of these languages is the Hindi language. Hindi, being the fourth-most popular language, still lacks in richly populated linguistic resources, owing to the challenges involved in dealing with the Hindi language. This article first explores the machine learning-based approaches—Naïve Bayes, Support Vector Machine, Decision Tree, and Logistic Regression—to analyze the sentiment contained in Hindi language text derived from Twitter.

Further, the article presents lexicon-based approaches (Hindi Senti-WordNet, NRC Emotion Lexicon) for sentiment analysis in Hindi while also proposing a Domain-specific Sentiment Dictionary. Finally, an integrated convolutional neural network (CNN)—Recurrent Neural Network and Long Short-term Memory—is proposed to analyze sentiment from Hindi language tweets, a total of 23,767 tweets classified into positive, negative, and neutral. The proposed CNN approach gives an accuracy of 85%.

References

[1]

S. Singh, K. Gupta, M. Shrivastava, and P. Bhattacharyya. 2006. Morphological richness offsets resource demand-experiences in constructing a pos tagger for Hindi. In Proceedings of the International Conference on Computational Linguistics (COLING’06). Association for Computational Linguistics, 779–786.

Digital Library

[2]

V. Jha, N. Manjunath, P. D. Shenoy, K. R. Venugopal, and L. M. Patnaik. 2015. Homs: Hindi opinion mining system. In Proceedings of the IEEE 2nd International Conference on Recent Trends in Information Systems (ReTIS’15). IEEE, 366–371.

[3]

V. Gupta, V. K. Singh, P. Mukhija, and U. Ghose. 2019. Aspect-based sentiment analysis of mobile reviews. J. Intell. Fuzzy Syst. 36, 5 (2019), 4721–4730.

[4]

R. Piryani, V. Gupta, V. K. Singh, and U. Ghose. 2017. A linguistic rule-based approach for aspect-level sentiment analysis of movie reviews. In Advances in Computer and Computational Sciences. Springer, Singapore, 201–109.

[5]

R. Piryani, V. Gupta, and V. K. Singh. 2017. Movie prism: A novel system for aspect level sentiment profiling of movies. J. Intell. Fuzzy Syst. 32, 5 (2017), 3297–331

[6]

V. Gupta, N. Jain, P. Katariya, A. Kumar, S. Mohan, A. Ahmadian, and M. Ferrara. 2021. An emotion care model using multimodal textual analysis on COVID-19. Chaos, Solitons Fractals 144 (2021), 110708.

[7]

B. R. Ambati, S. Husain, S. Jain, D. M. Sharma, and R. Sangal. 2010. Two methods to incorporate local morphosyntactic features in Hindi dependency parsing. In Proceedings of the NAACL HLT 1st Workshop on Statistical Parsing of Morphologically Rich Languages. Association for Computational Linguistics, 22–30.

Digital Library

[8]

A. Joshi, A. R. Balamurali, and P. Bhattacharyya. 2010. A fall-back strategy for sentiment analysis in Hindi: a case study. Proceedings of the 8th International Conference on Natural Language Processing (ICON’10).

[9]

A. Karthikeyan. 2010, May. Hindi English Wordnet Linkage. Dual-degree thesis, CSE Dept. IIT Bombay.

[10]

A. Bakliwal, P. Arora, A. Patil, and V. Varma. 2011. Towards enhanced opinion classification using NLP techniques. In Proceedings of the Workshop on Sentiment Analysis where AI meets Psychology (SAAIP’11). 101–107.

[11]

A. Bakliwal, P. Arora, and V. Varma. 2012. Hindi subjective lexicon: A lexical resource for Hindi polarity classification. In Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC’12). 1189–1196.

[12]

P. Arora, A. Bakliwal, and V. Varma. 2012. Hindi subjective lexicon generation using WordNet graph traversal. International J. Comput. Linguist. Appl. 3, 1 (2012), 25–39.

[13]

S. Mukherjee and P. Bhattacharyya. 2012. Sentiment analysis in Twitter with lightweight discourse analysis. In Proceedings of the International Conference on Computational Linguistics (COLING’12). 1847–1864.

[14]

N. Mittal, B. Agarwal, G. Chouhan, N. Bania, and P. Pareek. 2013. Sentiment analysis of Hindi reviews based on negation and discourse relation. In Proceedings of the 11th Workshop on Asian Language Resources. 45–50.

[15]

R. Sharma, S. Nigam, and R. Jain. 2014. Polarity detection of Movie Review in Hindi Language. In Int. J. Comput. Sci. Appl. 4, 4 (2014), 49–57.

[16]

K. Ravi and V. Ravi. 2016. Sentiment classification of Hinglish text. In Proceedings of the 3rd International Conference on Recent Advances in Information Technology (RAIT’16). IEEE, 641–645.

[17]

M. Z. Ansari, T. Ahmad, and M. A. Ali. 2018. Cross script Hindi-English NER corpus from Wikipedia. In Proceedings of the International Conference on Intelligent Data Communication Technologies and Internet of Things. Springer, Cham, 1006–1012.

[18]

R. Piryani, V. Gupta, and V. K. Singh. 2018. Generating aspect-based extractive opinion summary: Drawing inferences from social media texts. Comput. Sistem. 22, 1 (2018), 83–91.

[19]

R. Jain, N. Jain, A. Aggarwal, and D. J. Hemanth. 2019. Convolutional neural network-based Alzheimer's disease classification from magnetic resonance brain images. Cogn. Syst. Res. 57, 147–159.

[20]

V. Gupta, S. Juyal, G. P. Singh, C. Killa, and N. Gupta. 2020. Emotion recognition of audio/speech data using deep learning approaches. J. Info. Optimiz. Sci. 41, 6 (2020), 1309–1317.

[21]

N. Jain, A. Chauhan, P. Tripathi, S. B. Moosa, P. Aggarwal, and B. Oznacar. 2020. Cell image analysis for malaria detection using deep convolutional network. Intell. Decis. Technol. 14, 1 (2020), 55–65.

[22]

D. Gupta, A. Ekbal, and P. Bhattacharyya. 2019. A deep neural network framework for english hindi question answering. ACM Trans. Asian Low-Res. Lang. Info. Process. 19, 2 (2019), 1–22.

Digital Library

[23]

M. Tummalapalli, M. Chinnakotla, and R. Mamidi. 2018, March. Towards better sentence classification for morphologically rich languages. In Proceedings of the International Conference on Computational Linguistics and Intelligent Text Processing.

[24]

M. Singh, R. Kumar, and I. Chana. 2020. Corpus-based machine translation system with deep neural network for Sanskrit to Hindi translation. Procedia Comput. Sci. 167, 2534–2544.

Digital Library

[25]

M. S. Akhtar, A. Kumar, A. Ekbal, and P. Bhattacharyya. 2016, December. A hybrid deep learning architecture for sentiment analysis. In Proceedings of the 26th International Conference on Computational Linguistics (COLING’16). 482–493.

[26]

L. Rolling. 1981. Indexing consistency, quality, and efficiency. Info. Process. Manage. 17, 2 (1981), 69–76.

[27]

T. Byrt. 1996. How good is that agreement? Epidemiology 7, 5 (1996), 561.

[28]

N. Jain, S. Jhunthra, H. Garg, V. Gupta, S. Mohan, A. Ahmadian, S. Salahshour, and M. Ferrara. 2021. Prediction Modelling of COVID using Machine Learning methods from B-Cell dataset. Results Phys. 21 (2021), 103813.

[29]

Y. Duan, L. Jiang, T. Qin, M. Zhou, and H. Y. Shum. 2010. An empirical study on learning to the rank of tweets. In Proceedings of the 23rd International Conference on Computational Linguistics. Association for Computational Linguistics, 295–303.

Digital Library

[30]

R. McCreadie and C. Macdonald. 2013. Relevance in microblogs: Enhancing tweet retrieval using hyperlinked documents. In Proceedings of the 10th Conference on Open Research Areas in Information Retrieval. Le Centre de Hautes Etudes Internationales D'informatique Documentaire, 189–196.

Digital Library

[31]

J. Vosecky, K. W. T. Leung, and W. Ng. 2012. Searching for quality microblog posts: Filtering and ranking based on content analysis and implicit links. In Proceedings of the International Conference on Database Systems for Advanced Applications. Springer, Berlin, 397–413.

Digital Library

[32]

S. Mohammad. 2011. From once upon a time to happily ever after: Tracking emotions in novels and fairy tales. In Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities. Association for Computational Linguistics, 105–114.

Digital Library

[33]

S. M. Mohammad and P. D. Turney. 2013. Crowdsourcing a word–emotion association lexicon. Comput. Intell. 29, 3 (2013), 436–465.

[34]

D. Jain, A. Kumar, and G. Garg. 2020. Sarcasm detection in mash-up language using soft-attention-based bi-directional LSTM and feature-rich CNN. Appl. Soft Comput. 91 (2020), 106198.

[35]

S. Seshadri, A. K. Madasamy, S. K. Padannayil, and M. A. Kumar. 2016. Analyzing sentiment in Indian languages micro text using a recurrent neural network. Inst. Integr. Omics Appl. Biotechnol. J. 7 (2016), 313–318.

Cited By

Shifa HMojumdar MRahman MChakraborty NGupta V(2024)Machine Learning Models for Maternal Health Risk Prediction based on Clinical Data2024 11th International Conference on Computing for Sustainable Global Development (INDIACom)10.23919/INDIACom61295.2024.10498822(1312-1318)Online publication date: 28-Feb-2024
https://doi.org/10.23919/INDIACom61295.2024.10498822
Kumar ABhatia SKhosravi MMashat AAgarwal P(2024)Semantic and Context Understanding for Sentiment Analysis in Hindi Handwritten Character Recognition Using a Multiresolution TechniqueACM Transactions on Asian and Low-Resource Language Information Processing10.1145/355789523:1(1-22)Online publication date: 15-Jan-2024
https://dl.acm.org/doi/10.1145/3557895
Aliyu YSarlan AUsman Danyaro KRahman AAbdullahi M(2024)Sentiment Analysis in Low-Resource Settings: A Comprehensive Review of Approaches, Languages, and Data SourcesIEEE Access10.1109/ACCESS.2024.339863512(66883-66909)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3398635
Show More Cited By

Index Terms

Toward Integrated CNN-based Sentiment Analysis of Tweets for Scarce-resource Language—Hindi
1. Information systems
  1. Information retrieval
    1. Evaluation of retrieval results
      1. Presentation of retrieval results

Recommendations

Hindi EmotionNet: A Scalable Emotion Lexicon for Sentiment Classification of Hindi Text

In this study, we create an emotion lexicon for the Hindi language called Hindi EmotionNet. It can assign emotional affinity to words in IndoWordNet. This lexicon contains 3,839 emotion words, with 1,246 positive and 2,399 negative words. We also ...
Automatic Indonesian Sentiment Lexicon Curation with Sentiment Valence Tuning for Social Media Sentiment Analysis
Special issue on Deep Learning for Low-Resource Natural Language Processing, Part 1 and Regular Papers

A novel Indonesian sentiment lexicon (SentIL -- Sentiment Indonesian Lexicon) is created with an automatic pipeline; from creating sentiment seed words, adding new words with slang words, emoticons, and from the given dictionary and sentiment corpus, ...
A Word-Character Convolutional Neural Network for Language-Agnostic Twitter Sentiment Analysis
ADCS '17: Proceedings of the 22nd Australasian Document Computing Symposium

Convolutional Neural Networks (CNN) have been widely used for text classification. Both word-based CNNs and character-based CNNs have shown good performance for Twitter sentiment classification. Most research on CNNs is towards English Twitter sentiment ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Asian and Low-Resource Language Information Processing

ACM Transactions on Asian and Low-Resource Language Information Processing Volume 20, Issue 5

September 2021

320 pages

ISSN:2375-4699

EISSN:2375-4702

DOI:10.1145/3467024

Editor:
Imed Zitouni
Google, USA

Issue’s Table of Contents

Copyright © 2021 Association for Computing Machinery.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 June 2021

Accepted: 01 February 2021

Revised: 01 November 2020

Received: 01 August 2020

Published in TALLIP Volume 20, Issue 5

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

28
Total Citations
View Citations
317
Total Downloads

Downloads (Last 12 months)64
Downloads (Last 6 weeks)9

Reflects downloads up to 04 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Shifa HMojumdar MRahman MChakraborty NGupta V(2024)Machine Learning Models for Maternal Health Risk Prediction based on Clinical Data2024 11th International Conference on Computing for Sustainable Global Development (INDIACom)10.23919/INDIACom61295.2024.10498822(1312-1318)Online publication date: 28-Feb-2024
https://doi.org/10.23919/INDIACom61295.2024.10498822
Kumar ABhatia SKhosravi MMashat AAgarwal P(2024)Semantic and Context Understanding for Sentiment Analysis in Hindi Handwritten Character Recognition Using a Multiresolution TechniqueACM Transactions on Asian and Low-Resource Language Information Processing10.1145/355789523:1(1-22)Online publication date: 15-Jan-2024
https://dl.acm.org/doi/10.1145/3557895
Aliyu YSarlan AUsman Danyaro KRahman AAbdullahi M(2024)Sentiment Analysis in Low-Resource Settings: A Comprehensive Review of Approaches, Languages, and Data SourcesIEEE Access10.1109/ACCESS.2024.339863512(66883-66909)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3398635
Neog MBaruah N(2024)A hybrid deep learning approach for Assamese toxic comment detection in social mediaProcedia Computer Science10.1016/j.procs.2024.04.218235(2297-2306)Online publication date: 2024
https://doi.org/10.1016/j.procs.2024.04.218
Mhaske NPatil A(2024)Sentence Annotation for Aspect-oriented Sentiment Analysis: A Lexicon based Approach with Marathi Movie ReviewsJournal of The Institution of Engineers (India): Series B10.1007/s40031-024-01072-5Online publication date: 18-May-2024
https://doi.org/10.1007/s40031-024-01072-5
Arora SAgrawal VKumar DArora SBanshal S(2024)Sentimental impact of fake news on social media using an integrated ensemble frameworkSocial Network Analysis and Mining10.1007/s13278-024-01334-614:1Online publication date: 9-Sep-2024
https://doi.org/10.1007/s13278-024-01334-6
Guleria AVarshney KPahwa GSinghal SSharma N(2024)Multimodal sentiment analysis of english and hinglish memesMultimedia Tools and Applications10.1007/s11042-024-19640-8Online publication date: 20-Jun-2024
https://doi.org/10.1007/s11042-024-19640-8
Das RSingh T(2024)Which words are important?: an empirical study of Assamese sentiment analysisLanguage Resources and Evaluation10.1007/s10579-024-09756-6Online publication date: 19-Jun-2024
https://doi.org/10.1007/s10579-024-09756-6
Jain MJindal RJain A(2024)DoSLex: automatic generation of all domain semantically rich sentiment lexiconLanguage Resources and Evaluation10.1007/s10579-024-09753-9Online publication date: 18-Jul-2024
https://doi.org/10.1007/s10579-024-09753-9
Jain AJain GTewari D(2024)KNetwork: advancing cross-lingual sentiment analysis for enhanced decision-making in linguistically diverse environmentsKnowledge and Information Systems10.1007/s10115-023-02051-w66:5(2925-2943)Online publication date: 26-Jan-2024
https://dl.acm.org/doi/10.1007/s10115-023-02051-w
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Issue’s Table of Contents