Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/2655780.2655861dlproceedingsArticle/Chapter ViewAbstractPublication PagesasistConference Proceedingsconference-collections
research-article

Semi-supervised probabilistic sentiment analysis: merging labeled sentences with unlabeled reviews to identify sentiment

Published: 01 November 2013 Publication History

Abstract

Document level sentiment analysis, the task of determining whether the sentiment expressed in a document is positive or negative, is commonly performed by supervised methods. As with all supervised tasks, obtaining training data for these methods can be expensive and time-consuming. Some semi-supervised approaches have been proposed that rely on sentiment lexicons. We propose a novel supervised and a novel semi-supervised sentiment analysis method that are both based on a probabilistic graphical model, without requiring any lexicon. Our semi-supervised method takes advantage of the numerical ratings that are often included in online reviews (e.g., 4 out of 5 stars). While these numerical ratings are related to sentiment, they are noisy and hence, by themselves, they are an imperfect indicator of reviews' sentiments. We incorporate unlabeled user reviews as training data by treating the reviews' numerical ratings as sentiment labels while modeling the ratings' noisy nature. Our empirical results, utilizing a corpus of labeled sentences from hotel reviews and unlabeled hotel reviews with numerical ratings, show that treating reviews' ratings as noisy and utilizing them to augment a small amount of labeled sentences outperforms strong existing supervised and semi-supervised classification-based and lexicon-based approaches.

References

[1]
Andrieu, C., De Freitas, N., Doucet, A., & Jordan, M. I. (2003). An Introduction to MCMC for Machine Learning. Machine Learning, 50(1-2), 5--43.
[2]
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet Allocation. Journal of Machine Learning Research, 3, 993--1022.
[3]
Brody, S., & Elhadad, N. (2010). An unsupervised aspect-sentiment model for online reviews. Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics (HLT-NAACL '10) (pp. 804--812).
[4]
Casella, G., & George, E. I. (1992). Explaining the Gibbs Sampler. The American Statistician, 46(3), 167.
[5]
Ding, X., Liu, B., & Yu, P. (2008). A holistic lexicon-based approach to opinion mining. Proceedings of the 2008 International Conference on Web Search and Data Mining (WSDM '08).
[6]
Hu, M., & Liu, B. (2004). Mining and summarizing customer reviews. Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining (KDD '04) (pp. 168--177).
[7]
Jo, Y., & Oh, A. H. (2011). Aspect and sentiment unification model for online review analysis. Proceedings of the fourth ACM international conference on Web search and data mining (WSDM '11) (p. 815).
[8]
Joachims, T. (1999). Making large-scale support vector machine learning practical. In B. Schölkopf, C. Burges, & A. Smola (Eds.), Advances in Kernel Methods - Support Vector Learning (pp. 169--184). MIT Press.
[9]
Kim, J., Li, J., & Lee, J. (2009). Discovering the Discriminative Views: Measuring Term Weights for Sentiment Analysis. Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP (ACL-IJCNLP'09) (pp. 253--261).
[10]
Kiss, T., & Strunk, J. (2006). Unsupervised Multilingual Sentence Boundary Detection. Computational Linguistics, 32(4), 485--525.
[11]
Lin, C., & He, Y. (2009). Joint sentiment/topic model for sentiment analysis. Proceedings of the 18th ACM conference on Information and knowledge management (CIKM '09).
[12]
Liu, B., & Zhang, L. (2012). A Survey of Opinion Mining and Sentiment Analysis. In C. C. Aggarwal & C. Zhai (Eds.), Mining Text Data (pp. 415--463). Boston, MA: Springer US.
[13]
Lu, Y., & Zhai, C. (2008). Opinion integration through semi-supervised topic modeling. Proceedings of the 17th international conference on World Wide Web (WWW '08) (p. 121). ACM Press.
[14]
Lu, Y., Zhai, C., & Sundaresan, N. (2009). Rated aspect summarization of short comments. Proceedings of the 18th international conference on World wide web (WWW '09) (pp. 131--140). Madrid, Spain: ACM.
[15]
Mei, Q., Ling, X., Wondra, M., Su, H., & Zhai, C. (2007). Topic Sentiment Mixture: Modeling Facets and Opinions in Weblogs. Proceedings of the 16th international conference on World Wide Web (WWW '07).
[16]
Melville, P., Ox, O., & Lawrence, R. D. (2009). Sentiment Analysis of Blogs by Combining Lexical Knowledge with Text Classification. Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD '09).
[17]
Moghaddam, S., & Ester, M. (2011). ILDA: Interdependent LDA Model for Learning Latent Aspects and their Ratings from Online Product Reviews Categories and Subject Descriptors. Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval (SIGIR '11) (pp. 665--674).
[18]
Mukherjee, A., & Liu, B. (2012a). Aspect extraction through semi-supervised modeling. Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (ACL '12).
[19]
Mukherjee, A., & Liu, B. (2012b). Modeling review comments. Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL '10).
[20]
Ng, V., Dasgupta, S., & Arifin, S. (2006). Examining the role of linguistic knowledge sources in the automatic identification and classification of reviews. Proceedings of the COLING/ACL on Main conference poster sessions (COLING-ACL '06) (pp. 611--618).
[21]
Organisciak, P., Efron, M., Fenlon, K., & Senseney, M. (2012). Evaluating rater quality and rating difficulty in online annotation activities. ASIST '12 (Vol. 49, pp. 1--10).
[22]
Paltoglou, G., & Thelwall, M. (2010). A study of information retrieval weighting schemes for sentiment analysis. Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL '10) (pp. 1386--1395).
[23]
Pang, B., & Lee, L. (2008). Opinion Mining and Sentiment Analysis. (C. C. Aggarwal & C. Zhai, Eds.)Foundations and Trends® in Information Retrieval, 2(1--2), 1--135. Now Publishers Inc.
[24]
Pang, B., Lee, L., & Vaithyanathan, S. (2002). Thumbs up?: sentiment classification using machine learning techniques. Proceedings of the ACL-02 conference on Empirical methods in natural language processing (EMNLP '02).
[25]
Plummer, M. (2003). JAGS: A program for analysis of Bayesian graphical models using Gibbs sampling. Proceedings of the 3rd International Workshop on Distributed Statistical Computing (DSC 2003).
[26]
Qiu, G., Liu, B., Bu, J., & Chen, C. (2009). Expanding Domain Sentiment Lexicon through Double Propagation. Proceedings of the 21st international jont conference on Artifical intelligence (IJCAI '09).
[27]
Qiu, L., Zhang, W., Hu, C., & Zhao, K. (2009). SELC: A Self-Supervised Model for Sentiment Classification. Proceedings of the 18th ACM conference on Information and knowledge management (CIKM '09) (pp. 929--936).
[28]
Ramage, D., Hall, D., Nallapati, R., & Manning, C. D. (2009). Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora. EMNLP '09 (pp. 248--256).
[29]
Sauper, C., Haghighi, A., & Barzilay, R. (2011). Content models with attitude. Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics (ACL '11).
[30]
Taboada, M., Brooke, J., Tofiloski, M., Voll, K., & Stede, M. (2011). Lexicon-Based Methods for Sentiment Analysis. Computational Linguistics, 37(2), 267--307.
[31]
Tan, S., Wang, Y., & Cheng, X. (2008). Combining learn-based and lexicon-based techniques for sentiment detection without using labeled examples. Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR '08) (p. 743). New York, New York, USA: ACM Press.
[32]
Titov, I., & McDonald, R. (2008). A joint model of text and aspect ratings for sentiment summarization. Proceedings of ACL-08: HLT.
[33]
Wang, H., Lu, Y., & Zhai, C. (2010). Latent aspect rating analysis on review text data: a rating regression approach. Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD '10) (pp. 783--792). Washington, DC, USA: ACM.
[34]
Wang, H., Lu, Y., & Zhai, C. (2011). Latent aspect rating analysis without aspect keyword supervision. Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD '11) (p. 618). New York, New York, USA: ACM Press.

Cited By

View all
  1. Semi-supervised probabilistic sentiment analysis: merging labeled sentences with unlabeled reviews to identify sentiment

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image DL Hosted proceedings
      ASIST '13: Proceedings of the 76th ASIS&T Annual Meeting: Beyond the Cloud: Rethinking Information Boundaries
      November 2013
      1065 pages
      ISBN:0877155453

      Publisher

      American Society for Information Science

      United States

      Publication History

      Published: 01 November 2013

      Author Tags

      1. semi-supervised classification
      2. sentiment analysis

      Qualifiers

      • Research-article

      Conference

      ASIST '13
      ASIST '13: Rethinking Information Boundaries
      November 1 - 5, 2013
      Quebec, Montreal, Canada

      Acceptance Rates

      ASIST '13 Paper Acceptance Rate 83 of 128 submissions, 65%;
      Overall Acceptance Rate 135 of 277 submissions, 49%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 14 Oct 2024

      Other Metrics

      Citations

      Cited By

      View all

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media