Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Toward Integrated CNN-based Sentiment Analysis of Tweets for Scarce-resource Language—Hindi

Published: 30 June 2021 Publication History

Abstract

Linguistic resources for commonly used languages such as English and Mandarin Chinese are available in abundance, hence the existing research in these languages. However, there are languages for which linguistic resources are scarcely available. One of these languages is the Hindi language. Hindi, being the fourth-most popular language, still lacks in richly populated linguistic resources, owing to the challenges involved in dealing with the Hindi language. This article first explores the machine learning-based approaches—Naïve Bayes, Support Vector Machine, Decision Tree, and Logistic Regression—to analyze the sentiment contained in Hindi language text derived from Twitter.
Further, the article presents lexicon-based approaches (Hindi Senti-WordNet, NRC Emotion Lexicon) for sentiment analysis in Hindi while also proposing a Domain-specific Sentiment Dictionary. Finally, an integrated convolutional neural network (CNN)—Recurrent Neural Network and Long Short-term Memory—is proposed to analyze sentiment from Hindi language tweets, a total of 23,767 tweets classified into positive, negative, and neutral. The proposed CNN approach gives an accuracy of 85%.

References

[1]
S. Singh, K. Gupta, M. Shrivastava, and P. Bhattacharyya. 2006. Morphological richness offsets resource demand-experiences in constructing a pos tagger for Hindi. In Proceedings of the International Conference on Computational Linguistics (COLING’06). Association for Computational Linguistics, 779–786.
[2]
V. Jha, N. Manjunath, P. D. Shenoy, K. R. Venugopal, and L. M. Patnaik. 2015. Homs: Hindi opinion mining system. In Proceedings of the IEEE 2nd International Conference on Recent Trends in Information Systems (ReTIS’15). IEEE, 366–371.
[3]
V. Gupta, V. K. Singh, P. Mukhija, and U. Ghose. 2019. Aspect-based sentiment analysis of mobile reviews. J. Intell. Fuzzy Syst. 36, 5 (2019), 4721–4730.
[4]
R. Piryani, V. Gupta, V. K. Singh, and U. Ghose. 2017. A linguistic rule-based approach for aspect-level sentiment analysis of movie reviews. In Advances in Computer and Computational Sciences. Springer, Singapore, 201–109.
[5]
R. Piryani, V. Gupta, and V. K. Singh. 2017. Movie prism: A novel system for aspect level sentiment profiling of movies. J. Intell. Fuzzy Syst. 32, 5 (2017), 3297–331
[6]
V. Gupta, N. Jain, P. Katariya, A. Kumar, S. Mohan, A. Ahmadian, and M. Ferrara. 2021. An emotion care model using multimodal textual analysis on COVID-19. Chaos, Solitons Fractals 144 (2021), 110708.
[7]
B. R. Ambati, S. Husain, S. Jain, D. M. Sharma, and R. Sangal. 2010. Two methods to incorporate local morphosyntactic features in Hindi dependency parsing. In Proceedings of the NAACL HLT 1st Workshop on Statistical Parsing of Morphologically Rich Languages. Association for Computational Linguistics, 22–30.
[8]
A. Joshi, A. R. Balamurali, and P. Bhattacharyya. 2010. A fall-back strategy for sentiment analysis in Hindi: a case study. Proceedings of the 8th International Conference on Natural Language Processing (ICON’10).
[9]
A. Karthikeyan. 2010, May. Hindi English Wordnet Linkage. Dual-degree thesis, CSE Dept. IIT Bombay.
[10]
A. Bakliwal, P. Arora, A. Patil, and V. Varma. 2011. Towards enhanced opinion classification using NLP techniques. In Proceedings of the Workshop on Sentiment Analysis where AI meets Psychology (SAAIP’11). 101–107.
[11]
A. Bakliwal, P. Arora, and V. Varma. 2012. Hindi subjective lexicon: A lexical resource for Hindi polarity classification. In Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC’12). 1189–1196.
[12]
P. Arora, A. Bakliwal, and V. Varma. 2012. Hindi subjective lexicon generation using WordNet graph traversal. International J. Comput. Linguist. Appl. 3, 1 (2012), 25–39.
[13]
S. Mukherjee and P. Bhattacharyya. 2012. Sentiment analysis in Twitter with lightweight discourse analysis. In Proceedings of the International Conference on Computational Linguistics (COLING’12). 1847–1864.
[14]
N. Mittal, B. Agarwal, G. Chouhan, N. Bania, and P. Pareek. 2013. Sentiment analysis of Hindi reviews based on negation and discourse relation. In Proceedings of the 11th Workshop on Asian Language Resources. 45–50.
[15]
R. Sharma, S. Nigam, and R. Jain. 2014. Polarity detection of Movie Review in Hindi Language. In Int. J. Comput. Sci. Appl. 4, 4 (2014), 49–57.
[16]
K. Ravi and V. Ravi. 2016. Sentiment classification of Hinglish text. In Proceedings of the 3rd International Conference on Recent Advances in Information Technology (RAIT’16). IEEE, 641–645.
[17]
M. Z. Ansari, T. Ahmad, and M. A. Ali. 2018. Cross script Hindi-English NER corpus from Wikipedia. In Proceedings of the International Conference on Intelligent Data Communication Technologies and Internet of Things. Springer, Cham, 1006–1012.
[18]
R. Piryani, V. Gupta, and V. K. Singh. 2018. Generating aspect-based extractive opinion summary: Drawing inferences from social media texts. Comput. Sistem. 22, 1 (2018), 83–91.
[19]
R. Jain, N. Jain, A. Aggarwal, and D. J. Hemanth. 2019. Convolutional neural network-based Alzheimer's disease classification from magnetic resonance brain images. Cogn. Syst. Res. 57, 147–159.
[20]
V. Gupta, S. Juyal, G. P. Singh, C. Killa, and N. Gupta. 2020. Emotion recognition of audio/speech data using deep learning approaches. J. Info. Optimiz. Sci. 41, 6 (2020), 1309–1317.
[21]
N. Jain, A. Chauhan, P. Tripathi, S. B. Moosa, P. Aggarwal, and B. Oznacar. 2020. Cell image analysis for malaria detection using deep convolutional network. Intell. Decis. Technol. 14, 1 (2020), 55–65.
[22]
D. Gupta, A. Ekbal, and P. Bhattacharyya. 2019. A deep neural network framework for english hindi question answering. ACM Trans. Asian Low-Res. Lang. Info. Process. 19, 2 (2019), 1–22.
[23]
M. Tummalapalli, M. Chinnakotla, and R. Mamidi. 2018, March. Towards better sentence classification for morphologically rich languages. In Proceedings of the International Conference on Computational Linguistics and Intelligent Text Processing.
[24]
M. Singh, R. Kumar, and I. Chana. 2020. Corpus-based machine translation system with deep neural network for Sanskrit to Hindi translation. Procedia Comput. Sci. 167, 2534–2544.
[25]
M. S. Akhtar, A. Kumar, A. Ekbal, and P. Bhattacharyya. 2016, December. A hybrid deep learning architecture for sentiment analysis. In Proceedings of the 26th International Conference on Computational Linguistics (COLING’16). 482–493.
[26]
L. Rolling. 1981. Indexing consistency, quality, and efficiency. Info. Process. Manage. 17, 2 (1981), 69–76.
[27]
T. Byrt. 1996. How good is that agreement? Epidemiology 7, 5 (1996), 561.
[28]
N. Jain, S. Jhunthra, H. Garg, V. Gupta, S. Mohan, A. Ahmadian, S. Salahshour, and M. Ferrara. 2021. Prediction Modelling of COVID using Machine Learning methods from B-Cell dataset. Results Phys. 21 (2021), 103813.
[29]
Y. Duan, L. Jiang, T. Qin, M. Zhou, and H. Y. Shum. 2010. An empirical study on learning to the rank of tweets. In Proceedings of the 23rd International Conference on Computational Linguistics. Association for Computational Linguistics, 295–303.
[30]
R. McCreadie and C. Macdonald. 2013. Relevance in microblogs: Enhancing tweet retrieval using hyperlinked documents. In Proceedings of the 10th Conference on Open Research Areas in Information Retrieval. Le Centre de Hautes Etudes Internationales D'informatique Documentaire, 189–196.
[31]
J. Vosecky, K. W. T. Leung, and W. Ng. 2012. Searching for quality microblog posts: Filtering and ranking based on content analysis and implicit links. In Proceedings of the International Conference on Database Systems for Advanced Applications. Springer, Berlin, 397–413.
[32]
S. Mohammad. 2011. From once upon a time to happily ever after: Tracking emotions in novels and fairy tales. In Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities. Association for Computational Linguistics, 105–114.
[33]
S. M. Mohammad and P. D. Turney. 2013. Crowdsourcing a word–emotion association lexicon. Comput. Intell. 29, 3 (2013), 436–465.
[34]
D. Jain, A. Kumar, and G. Garg. 2020. Sarcasm detection in mash-up language using soft-attention-based bi-directional LSTM and feature-rich CNN. Appl. Soft Comput. 91 (2020), 106198.
[35]
S. Seshadri, A. K. Madasamy, S. K. Padannayil, and M. A. Kumar. 2016. Analyzing sentiment in Indian languages micro text using a recurrent neural network. Inst. Integr. Omics Appl. Biotechnol. J. 7 (2016), 313–318.

Cited By

View all
  • (2024)Machine Learning Models for Maternal Health Risk Prediction based on Clinical Data2024 11th International Conference on Computing for Sustainable Global Development (INDIACom)10.23919/INDIACom61295.2024.10498822(1312-1318)Online publication date: 28-Feb-2024
  • (2024)Semantic and Context Understanding for Sentiment Analysis in Hindi Handwritten Character Recognition Using a Multiresolution TechniqueACM Transactions on Asian and Low-Resource Language Information Processing10.1145/355789523:1(1-22)Online publication date: 15-Jan-2024
  • (2024)Sentiment Analysis in Low-Resource Settings: A Comprehensive Review of Approaches, Languages, and Data SourcesIEEE Access10.1109/ACCESS.2024.339863512(66883-66909)Online publication date: 2024
  • Show More Cited By

Index Terms

  1. Toward Integrated CNN-based Sentiment Analysis of Tweets for Scarce-resource Language—Hindi

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Asian and Low-Resource Language Information Processing
    ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 20, Issue 5
    September 2021
    320 pages
    ISSN:2375-4699
    EISSN:2375-4702
    DOI:10.1145/3467024
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 30 June 2021
    Accepted: 01 February 2021
    Revised: 01 November 2020
    Received: 01 August 2020
    Published in TALLIP Volume 20, Issue 5

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Convolutional neural network
    2. Hindi
    3. lexicon
    4. linguistic
    5. scarce-resource language (SRL)
    6. sentiment analysis
    7. Twitter

    Qualifiers

    • Research-article
    • Refereed

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)64
    • Downloads (Last 6 weeks)9
    Reflects downloads up to 04 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Machine Learning Models for Maternal Health Risk Prediction based on Clinical Data2024 11th International Conference on Computing for Sustainable Global Development (INDIACom)10.23919/INDIACom61295.2024.10498822(1312-1318)Online publication date: 28-Feb-2024
    • (2024)Semantic and Context Understanding for Sentiment Analysis in Hindi Handwritten Character Recognition Using a Multiresolution TechniqueACM Transactions on Asian and Low-Resource Language Information Processing10.1145/355789523:1(1-22)Online publication date: 15-Jan-2024
    • (2024)Sentiment Analysis in Low-Resource Settings: A Comprehensive Review of Approaches, Languages, and Data SourcesIEEE Access10.1109/ACCESS.2024.339863512(66883-66909)Online publication date: 2024
    • (2024)A hybrid deep learning approach for Assamese toxic comment detection in social mediaProcedia Computer Science10.1016/j.procs.2024.04.218235(2297-2306)Online publication date: 2024
    • (2024)Sentence Annotation for Aspect-oriented Sentiment Analysis: A Lexicon based Approach with Marathi Movie ReviewsJournal of The Institution of Engineers (India): Series B10.1007/s40031-024-01072-5Online publication date: 18-May-2024
    • (2024)Sentimental impact of fake news on social media using an integrated ensemble frameworkSocial Network Analysis and Mining10.1007/s13278-024-01334-614:1Online publication date: 9-Sep-2024
    • (2024)Multimodal sentiment analysis of english and hinglish memesMultimedia Tools and Applications10.1007/s11042-024-19640-8Online publication date: 20-Jun-2024
    • (2024)Which words are important?: an empirical study of Assamese sentiment analysisLanguage Resources and Evaluation10.1007/s10579-024-09756-6Online publication date: 19-Jun-2024
    • (2024)DoSLex: automatic generation of all domain semantically rich sentiment lexiconLanguage Resources and Evaluation10.1007/s10579-024-09753-9Online publication date: 18-Jul-2024
    • (2024)KNetwork: advancing cross-lingual sentiment analysis for enhanced decision-making in linguistically diverse environmentsKnowledge and Information Systems10.1007/s10115-023-02051-w66:5(2925-2943)Online publication date: 26-Jan-2024
    • Show More Cited By

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media