Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3297662.3365808acmotherconferencesArticle/Chapter ViewAbstractPublication PagesmedesConference Proceedingsconference-collections
research-article

CRD-SentEnse: Cross-domain Sentiment Analysis using an Ensemble Model

Published: 10 January 2020 Publication History

Abstract

Micro-blogging and comments on social media include valuable information about people's emotions and opinions towards products, political and social topics and so forth. Unfortunately, due to the large volume of data, is infeasible to label all these comments and reviews. Additionally, having this data labelled manually by human experts is very expensive, time-consuming and applicable only for small amounts of data. As a result, a more scalable solution is needed. Cross-domain sentiment analysis addresses the problem of training a model for classifying a text with respect to its sentiment polarity as a negative, positive (and/ or neutral), using data from one domain (source domain), then the same model is tested using data from a different unlabeled domain (target domain). Cross-domain sentiment analysis is still an open research issue, as the classification performance is still not as good as in the in-domain sentiment analysis, even though proposed approaches have improved significantly. In this paper, we propose a framework for cross-domain sentiment analysis that uses the chi-square test with the data in the source domain. Firstly, we eliminate domain-related words from the source domain that do not bear transferable knowledge to the target domain. Secondly, the chi-square test is utilized for finding the important words regarding the sentiment polarity. Subsequently, we develop a second model that drops the nouns both from source and target domains and we use TFIDF weights for finding the important words in both domains. Finally, we use a stacking ensemble model that combines the two above proposed models for enhancing the performance of the proposed framework.

References

[1]
Tareq Al-Moslmi, Nazlia Omar, Salwani Abdullah, and Mohammed Albared. 2017. Approaches to Cross-Domain Sentiment Analysis: A Systematic Literature Review. IEEE Access 5 (2017), 16173--16192. https://doi.org/10.1109/ACCESS.2017.2690342
[2]
Giulio Angiani, Laura Ferrari, Tomaso Fontanini, Paolo Fornacciari, Eleonora Iotti, Federico Magliani, and Stefano Manicardi. 2016. A Comparison between Preprocessing Techniques for Sentiment Analysis in Twitter. In Proceedings of the 2nd International Workshop on Knowledge Discovery on the WEB, KDWeb 2016, Cagliari, Italy, September 8-10, 2016. http://ceur-ws.org/Vol-1748/paper-06.pdf
[3]
A. A. Aziz, A. Starkey, and M. C. Bannerman. 2017. Evaluating cross domain sentiment analysis using supervised machine learning techniques. In 2017 Intelligent Systems Conference (IntelliSys). 689--696. https://doi.org/10.1109/IntelliSys.2017.8324369
[4]
John Blitzer, Mark Dredze, and Fernando Pereira. 2007. Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics. Association for Computational Linguistics, Prague, Czech Republic, 440--447.
[5]
John Blitzer, Ryan McDonald, and Fernando Pereira. 2006. Domain Adaptation with Structural Correspondence Learning. In Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing (EMNLP '06). Association for Computational Linguistics, Stroudsburg, PA, USA, 120--128. http://dl.acm.org/citation.cfm?id=1610075.1610094
[6]
Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. Enriching Word Vectors with Subword Information. Transactions of the Association for Computational Linguistics 5 (2017), 135--146. https://doi.org/10.1162/tacl_a_00051
[7]
Danushka Bollegala, David Weir, and John Carroll. 2013. Cross-Domain Sentiment Classification Using a Sentiment Sensitive Thesaurus. IEEE Trans. on Knowl. and Data Eng. 25, 8 (Aug. 2013), 1719--1731. https://doi.org/10.1109/TKDE.2012.103
[8]
Isaac Councill, Ryan McDonald, and Leonid Velikovich. 2010. What's great and what's not: learning to classify the scope of negation for improved sentiment analysis. In Proceedings of the Workshop on Negation and Speculation in Natural Language Processing. University of Antwerp, Uppsala, Sweden, 51--59.
[9]
M. Dadvar, C. Hauff, and Franciska M.G. de Jong. 2011. Scope of negation detection in sentiment analysis. In Proceedings of the Dutch-Belgian Information Retrieval Workshop, DIR 2011. University of Amsterdam, 16--20.
[10]
Giacomo Domeniconi, Gianluca Moro, Andrea Pagliarani, and Roberto Pasolini. 2017. On Deep Learning in Cross-Domain Sentiment Classification. In KDIR.
[11]
Alec Go, Richa Bhayani, and Lei Huang. 2009. Twitter Sentiment Classification using Distant Supervision. Processing (2009), 1--6. http://www.stanford.edu/~alecmgo/papers/TwitterDistantSupervision09.pdf
[12]
Yulan He, Chenghua Lin, and Harith Alani. 2011. Automatically extracting polarity-bearing topics for cross-domain sentiment classification. In The 49th Annual Meeting of the Association for Computational Linguistics, Vol. 1. Association for Computational Linguistics, 123--131.
[13]
Brian Heredia, Taghi M. Khoshgoftaar, Joseph D. Prusa, and Michael Crawford. 2016. Cross-Domain Sentiment Analysis: An Empirical Investigation. 2016 IEEE 17th International Conference on Information Reuse and Integration (IRI) (2016), 160--165.
[14]
Jiwei Li, Myle Ott, Claire Cardie, and Eduard Hovy. 2014. Towards a General Rule for Identifying Deceptive Opinion Spam. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Baltimore, Maryland, 1566--1576. https://doi.org/10.3115/v1/P14-1147
[15]
Bing Liu. 2012. Sentiment Analysis and Opinion Mining. Morgan & Claypool Publishers.
[16]
Phayung Meesad, Pudsadee Boonrawd, and Vatinee Nuipian. 2011. A Chi-Square-Test for Word Importance Differentiation in Text Classification. Proceedings of Computer Science and Information Technology 6 (01 2011).
[17]
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013).
[18]
Gianluca Moro, Andrea Pagliarani, Roberto Pasolini, and Claudio Sartori. 2018. Cross-domain In-domain Sentiment Analysis with Memory-based Deep Neural Networks. 127--138. https://doi.org/10.5220/0007239101270138
[19]
Preslav Nakov, Alan Ritter, Sara Rosenthal, Fabrizio Sebastiani, and Veselin Stoyanov. 2016. SemEval-2016 Task 4: Sentiment Analysis in Twitter. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016). Association for Computational Linguistics, San Diego, California, 1--18. https://doi.org/10.18653/v1/S16-1001
[20]
Chris Nicholls and Fei Song. 2009. Improving sentiment analysis with Part-of-Speech weighting. Proceedings of the 2009 International Conference on Machine Learning and Cybernetics 3, 1592 -- 1597. https://doi.org/10.1109/ICMLC.2009.5212278
[21]
NLTK. [n.d.]. Natural Language Toolkit. Retrieved August 20, 2019 from http://www.nltk.org/
[22]
Michael Oakes, Rob Gaizauskas, and Helene Fowkes. 2001. A Method Based on the Chi-Square Test for Document Classification. SIGIR Forum (ACM Special Interest Group on Information Retrieval), 440--441.
[23]
Sinno Jialin Pan, Xiaochuan Ni, Jian-Tao Sun, Qiang Yang, and Zheng Chen. 2010. Cross-domain Sentiment Classification via Spectral Feature Alignment. In Proceedings of the 19th International Conference on World Wide Web (WWW '10). ACM, New York, NY, USA, 751--760. https://doi.org/10.1145/1772690.1772767
[24]
Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake VanderPlas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, and Edouard Duchesnay. 2012. Scikit-learn: Machine Learning in Python. CoRR abs/1201.0490 (2012). arXiv:1201.0490 http://arxiv.org/abs/1201.0490
[25]
Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global Vectors for Word Representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Doha, Qatar, 1532--1543. https://doi.org/10.3115/v1/D14-1162
[26]
Sebastian Raschka. [n.d.]. Stacking Classifier. http://rasbt.github.io/mlxtend/user_guide/classifier/StackingClassifier/
[27]
Raksha Sharma, Pushpak Bhattacharyya, Sandipan Dandapat, and Himanshu Sharad Bhatt. 2018. Identifying Transferable Information Across Domains for Cross-domain Sentiment Classification. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Melbourne, Australia, 968--978. https://doi.org/10.18653/v1/P18-1089
[28]
Simon Tong and Daphne Koller. 2002. Support Vector Machine Active Learning with Applications to Text Classification. J. Mach. Learn. Res. 2 (March 2002), 45--66. https://doi.org/10.1162/153244302760185243
[29]
Ali Yadollahi, Ameneh Gholipour Shahraki, and Osmar R. Zaiane. 2017. Current State of Text Sentiment Analysis from Opinion to Emotion Mining. ACM Comput. Surv. 50, 2, Article 25 (May 2017), 33 pages. https://doi.org/10.1145/3057270
[30]
Yiming Yang and Jan O. Pedersen. 1997. A Comparative Study on Feature Selection in Text Categorization. In Proceedings of the Fourteenth International Conference on Machine Learning (ICML '97). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 412--420. http://dl.acm.org/citation.cfm?id=645526.657137

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
MEDES '19: Proceedings of the 11th International Conference on Management of Digital EcoSystems
November 2019
350 pages
ISBN:9781450362382
DOI:10.1145/3297662
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 January 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Cross-domain sentiment analysis
  2. FastText
  3. chi-square test
  4. stacking ensemble

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

MEDES '19

Acceptance Rates

MEDES '19 Paper Acceptance Rate 41 of 102 submissions, 40%;
Overall Acceptance Rate 267 of 682 submissions, 39%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 97
    Total Downloads
  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 03 Feb 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media