Abstract
Sentiment analysis is a classification task where polarity of textual data is identified, i.e. to analyze whether a sentence or document expresses a negative, positive or neutral sentiment. Manipuri is a less privileged, highly agglutinative and tonal language. Despite being a scheduled language of Indian Constitution, it is also a resource constrained language. In this work, we report the sentiment analysis for Manipuri using different types of machine learning based approaches. The dataset used in our work is collected from local daily newspaper. The novelty of this work is that we carry out language specific pre-processing tasks such as transliteration, building negative morpheme based lexicon and filtering of noisy words. Using them as additional linguistic features in our models improves the classification result in terms of precision, recall and F-score. The ensemble voting of best three classifiers based on TF-IDF perform better than BM25 based classifiers and other stand-alone classifiers. Based on this result, we attempt to classify the sentiment of news articles during a certain period of time. Further, we report the finding of deep learning based approaches on the same dataset.







Similar content being viewed by others
Notes
References
Albayati, A. Q., Al-Araji, A. S., & Ameen, S. H. A Method of Deep Learning Tackles Sentiment Analysis Problem in Arabic Texts.
Cambria, E., & Hussain, A. (2015). SenticNet. In Sentic Computing (pp. 23–71). Springer, Cham.
Cambria, E., Das, D., Bandyopadhyay, S., & Feraco, A. (2017). Affective computing and sentiment analysis. In A practical guide to sentiment analysis (pp. 1–10). Springer, Cham.
Das, A., & Bandyopadhyay, S. (2010, August). SentiWordNet for Indian languages. In Proceedings of the Eighth Workshop on Asian Language Resources (pp. 56–63).
Dashtipour, K., Gogate, M., Li, J., Jiang, F., Kong, B., & Hussain, A. (2020). A hybrid Persian sentiment analysis framework: Integrating dependency grammar based rules and deep neural networks. Neurocomputing, 380, 1–10.
Denecke, K. (2008, April). Using sentiwordnet for multilingual sentiment analysis. In 2008 IEEE 24th International Conference on Data Engineering Workshop (pp. 507–512). IEEE.
El-Haj, M., Kruschwitz, U., & Fox, C. (2015). Creating language resources for under-resourced languages: methodologies, and experiments with Arabic. Language Resources and Evaluation, 49(3), 549–580.
Gangula, R. R. R., & Mamidi, R. (2018, May). Resource creation towards automated sentiment analysis in telugu (a low resource language) and integrating multiple domain sources to enhance sentiment prediction. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018).
Goldberg, A. B., & Zhu, X. (2006, June). Seeing stars when there aren’t many stars: graph-based semi-supervised learning for sentiment categorization. In Proceedings of the first workshop on graph based methods for natural language processing (pp. 45-52). Association for Computational Linguistics.
Haddi, E., Liu, X., & Shi, Y. (2013). The role of text pre-processing in sentiment analysis. Procedia Computer Science, 17, 26–32.
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780.
Hu, X., Tang, J., Gao, H., & Liu, H. (2013, May). Unsupervised sentiment analysis with emotional signals. In Proceedings of the 22nd international conference on World Wide Web (pp. 607–618). ACM.
Jang, H., & Shin, H. (2010, August). Language-specific sentiment analysis in morphologically rich languages. In Proceedings of the 23rd International Conference on Computational Linguistics: Posters (pp. 498–506). Association for Computational Linguistics.
Jianqiang, Z., & Xiaolin, G. (2017). Comparison research on text pre-processing methods on twitter sentiment analysis. IEEE Access, 5, 2870–2879.
Johansson, F., Brynielsson, J., & Quijano, M. N. (2012, August). Estimating citizen alertness in crises using social media monitoring and analysis. In 2012 European Intelligence and Security Informatics Conference (pp. 189–196). IEEE.
Khan, A., & Baharudin, B. (2011, September). Sentiment classification using sentence-level semantic orientation of opinion terms from blogs. In 2011 National Postgraduate Conference (pp. 1–7). IEEE.
Kim, S. M., & Hovy, E. (2004, August). Determining the sentiment of opinions. In Proceedings of the 20th international conference on Computational Linguistics (p. 1367). Association for Computational Linguistics.
Le, T. A., Moeljadi, D., Miura, Y., & Ohkuma, T. (2016, December). Sentiment analysis for low resource languages: A study on informal Indonesian tweets. In Proceedings of the 12th Workshop on Asian Language Resources (ALR12) (pp. 123–131).
Lo, S. L., Cambria, E., Chiong, R., & Cornforth, D. (2017). Multilingual sentiment analysis: From formal to informal and scarce resource languages. Artificial Intelligence Review, 48(4), 499–527.
Mishne, G. (2005, August). Experiments with mood classification in blog posts. In Proceedings of ACM SIGIR 2005 workshop on stylistic analysis of text for information access (Vol. 19, pp. 321–327).
Na, J. C., Sui, H., Khoo, C. S., Chan, S., & Zhou, Y. (2004). Effectiveness of simple linguistic processing in automatic sentiment classification of product reviews. International ISKO Conference.
Nasukawa, T., & Yi, J. (2003, October). Sentiment analysis: Capturing favorability using natural language processing. In Proceedings of the 2nd international conference on Knowledge capture (pp. 70–77). ACM.
Niu, Y., Zhu, X., Li, J., & Hirst, G. (2005). Analysis of polarity information in medical text. In AMIA annual symposium proceedings (Vol. 2005, p. 570). American Medical Informatics Association.
Nongmeikapam, K., Khangembam, D., Hemkumar, W., Khuraijam, S., & Bandyopadhyay, S. (2014). Verb based manipuri sentiment analysis. International Journal on Natural Language Computing (IJNLC), 3, 12–13.
Pak, A., & Paroubek, P. (2010, May). Twitter as a corpus for sentiment analysis and opinion mining. In LREc (Vol. 10, No. 2010, pp. 1320–1326).
Pang, B., Lee, L., & Vaithyanathan, S. (2002, July). Thumbs up?: sentiment classification using machine learning techniques. In Proceedings of the ACL-02 conference on Empirical methods in natural language processing, Vol. 10 (pp. 79–86). Association for Computational Linguistics.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., & Vanderplas, J. (2011). Scikit-learn: Machine learning in Python. Journal of machine learning research, 12(Oct), 2825–2830.
Robertson, S. E., Walker, S., Jones, S., Hancock-Beaulieu, M. M., & Gatford, M. (1995). Okapi at TREC-3. Nist Special Publication Sp, 109, 109.
Singh T.D., Singh T.J., Shadang M., & Thokchom S. (2021) Review Comments of Manipuri Online Video: Good, Bad or Ugly. In: Maji A.K., Saha G., Das S., Basu S., Tavares J.M.R.S. (eds) Proceedings of the International Conference on Computing and Communication Systems. Lecture Notes in Networks and Systems, vol 170. Springer, Singapore.
Singh, T. D. (2012, December). Bidirectional bengali script and meetei mayek transliteration of web based manipuri news corpus. In Proceedings of the 3rd Workshop on South and Southeast Asian Natural Language Processing (pp. 181–190)
Singh, T. D., & Bandyopadhyay, S. (2006). Word class and sentence type identification in manipuri morphological analyzer,” In Proceedings of MSPIL, Mumbai, India, 11-–17.
Singh, T. D., & Bandyopadhyay, S. (2010, August). Web Based Manipuri Corpus for Multiword NER and Reduplicated MWEs Identification using SVM. In Proceedings of the 1st Workshop on South and Southeast Asian Natural Language Processing (pp. 35–42).
Sixto, J., Almeida, A., & López-de-Ipiña, D. (2016, June). Improving the sentiment analysis process of Spanish Tweets with BM25. In International Conference on Applications of Natural Language to Information Systems (pp. 285–291). Springer, Cham.
Sixto, J., Almeida, A., & López-de-Ipiña, D. (2016, September). An approach to subjectivity detection on Twitter using the structured information. In International Conference on Computational Collective Intelligence (pp. 121–130). Springer, Cham.
Sixto, J., Almeida, A., & Löpez-de-Ipiña, D. (2018). Analysis of the Structured Information for Subjectivity Detection in Twitter. In Transactions on Computational Collective Intelligence XXIX (pp. 163–181). Springer, Cham.
Vilares, D., Peng, H., Satapathy, R., & Cambria, E. (2018, November). BabelSenticNet: a commonsense reasoning framework for multilingual sentiment analysis. In 2018 IEEE Symposium Series on Computational Intelligence (SSCI) (pp. 1292–1298). IEEE.
Wilson, T., Wiebe, J., & Hoffmann, P. (2005). Recognizing contextual polarity in phrase-level sentiment analysis. In Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing.
Yu, H., & Hatzivassiloglou, V. (2003, July). Towards answering opinion questions: Separating facts from opinions and identifying the polarity of opinion sentences. In Proceedings of the 2003 conference on Empirical methods in natural language processing (pp. 129–136). Association for Computational Linguistics.
Zhang, Y., & Wallace, B. (2015). A sensitivity analysis of (and practitioners’ guide to) convolutional neural networks for sentence classification. arXiv:1510.03820.
Zhang, W., Xu, H., & Wan, W. (2012). Weakness finder: Find product weakness from Chinese reviews by using aspects based sentiment analysis. Expert Systems with Applications, 39(11), 10283–10291.
Acknowledgements
We would like to express our appreciation to L. Arunkumar's team and Preety Q. Sinam for their assistance without whom the compilation of this corpus would not have been possible. We also thank the anonymous reviewers for their careful reading and their many insightful comments, which helped us to improve our manuscript.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Meetei, L.S., Singh, T.D., Borgohain, S.K. et al. Low resource language specific pre-processing and features for sentiment analysis task. Lang Resources & Evaluation 55, 947–969 (2021). https://doi.org/10.1007/s10579-021-09541-9
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10579-021-09541-9