Abstract
This paper identifies a widely existing phenomenon in social media content, which we call the “words of few mouths” phenomenon. This phenomenon challenges the development of recommender systems based on users’ online opinions by presenting additional sources of uncertainty. In the context of predicting the “helpfulness” of a review document based on users’ online votes on other reviews (where a user’s vote on a review is either HELPFUL or UNHELPFUL), the “words of few mouths” phenomenon corresponds to the case where a large fraction of the reviews are each voted only by very few users. Focusing on the “review helpfulness prediction” problem, we illustrate the challenges associated with the “words of few mouths” phenomenon in the training of a review helpfulness predictor. We advocate probabilistic approaches for recommender system development in the presence of “words of few mouths”. More concretely, we propose a probabilistic metric as the training target for conventional machine learning based predictors. Our empirical study using Support Vector Regression (SVR) augmented with the proposed probability metric demonstrates advantages of incorporating probabilistic methods in the training of the predictors. In addition to this “partially probabilistic” approach, we also develop a logistic regression based probabilistic model and correspondingly a learning algorithm for review helpfulness prediction. We demonstrate experimentally the superior performance of the logistic regression method over SVR, the prior art in review helpfulness prediction.
Similar content being viewed by others
References
Adomavicius, G., Tuzhilin, A.: Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions. IEEE Trans. Knowl. Data Eng. 17, 734–749 (2005)
Bertino, E., Ferrari, E., Perego, A.: A general framework for web content filtering. World Wide Web 13, 215–249 (2009)
Bíró, I., Siklósi, D., Szabó, J., Benczúr, A.A.: Linked latent dirichlet allocation in web spam filtering. In: AIRWeb ’09: Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web, pp. 37–40 (2009)
Blei, D.M., Ng, A.Y., Jordan, M.I., Lafferty, J.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 2003 (2003)
Burges, C.J.C.: A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery 2(2), 121–167 (1998)
Chang, C.C., Lin, C.J.: LIBSVM: A Library for Support Vector Machines (2001). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
Flesca, S., Greco, S., Tagarelli, A., Zumpano, E.: Mining user preferences, page content and usage to personalize website navigation. World Wide Web 8, 317–345 (2005)
Goldberg, D., Nichols, D., Oki, B.M., Terry, D.: Using collaborative filtering to weave an information tapestry. Commun. ACM 35(12), 61–70 (1992)
Han, S.K., Shin, D., Jung, J.Y., Park, J.: Exploring the relationship between keywords and feed elements in blog post search. World Wide Web 12, 381–398 (2009)
Hatzivassiloglou, V., McKeown, K.R.: Predicting the semantic orientation of adjectives. In: Proceedings of the Eighth Conference on European Chapter of the Association for Computational Linguistics, pp. 174–181 (1997)
Hofmann, T.: Probabilistic latent semantic analysis. In: Proc. of Uncertainty in Artificial Intelligence, UAI’ 99, pp. 289–296 (1999)
Hofmann, T.: Probabilistic latent semantic indexing. In: SIGIR ’99: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 50–57 (1999)
Hosmer, D.W., Lemeshow, S.: Applied Logistic Regression (Wiley Series in Probability and Statistics), 2nd edn. Wiley-Interscience, New York (2001)
Hu, M., Liu, B.: Mining and summarizing customer reviews. In: KDD ’04: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 168–177 (2004)
Jolliffe, I.T.: Principal Component Analysis, 2nd edn. Springer, New York (2002)
Jordan, M.: Why the Logistic Function? A Tutorial Discussion on Probabilities and Neural Networks. Tech. rep., Massachusetts Institute of Technology (1995)
Karimzadehgan, M., Zhai, C., Belford, G.: Multi-aspect expertise matching for review assignment. In: CIKM ’08: Proceeding of the 17th ACM Conference on Information and Knowledge Management, pp. 1113–1122 (2008)
Kim, S.M., Pantel, P., Chklovski, T., Pennacchiotti, M.: Automatically assessing review helpfulness. In: Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, pp. 423–430. Association for Computational Linguistics, Sydney (2006)
Kindermann, R.: Markov Random Fields and Their Applications (Contemporary Mathematics; vol. 1). American Mathematical Society, Providence
Krestel, R., Fankhauser, P., Nejdl, W.: Latent dirichlet allocation for tag recommendation. In: RecSys ’09: Proceedings of the Third ACM Conference on Recommender Systems, pp. 61–68 (2009)
Liu, Y., Huang, X., An, A., Yu, X.: Modeling and Predicting the Helpfulness of Online Reviews, pp. 443–452 (2008)
Lu, Y., Zhai, C., Sundaresan, N.: Rated aspect summarization of short comments. In: WWW ’09: Proceedings of the 18th International Conference on World Wide Web, pp. 131–140 (2009)
Neapolitan, R.E.: Learning Bayesian Networks. Prentice Hall, Englewood Cliffs (2004)
Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up?: sentiment classification using machine learning techniques. In: EMNLP ’02: Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing, pp. 79–86 (2002)
Resnick, P., Iacovou, N., Suchak, M., Bergstrom, P., Riedl, J.: Grouplens: an open architecture for collaborative filtering of netnews. In: CSCW ’94: Proceedings of the 1994 ACM Conference on Computer Supported Cooperative Work, pp. 175–186 (1994)
Schindler, R.M., Bickart, B.: Online Consumer Psychology: Understanding and Influencing Consumer Behavior in the Virtual World. Lawrence Erlbaum, London (2005)
Weimer, M., Gurevych, I.: Predicting the perceived quality of web forum posts. In: Proceedings of the Conference on Recent Advances in Natural Language Processing (RANLP) (2007)
Weimer, M., Gurevych, I., Mühlhäuser, M.: Automatically assessing the post quality in online discussions on software. In: Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions, pp. 125–128. Association for Computational Linguistics, Prague (2007)
Yu, H., Hatzivassiloglou, V.: Towards answering opinion questions: separating facts from opinions and identifying the polarity of opinion sentences. In: Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, pp. 129–136 (2003)
Zhang, Z., Varadarajan, B.: Utility scoring of product reviews. In: CIKM ’06: Proceedings of the 15th ACM International Conference on Information and Knowledge Management, pp. 51–57 (2006)
Zhuang, L., Jing, F., Zhu, X.Y.: Movie review mining and summarization. In: CIKM ’06: Proceedings of the 15th ACM International Conference on Information and Knowledge Management, pp. 43–50 (2006)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zhang, R., Tran, T. & Mao, Y. Opinion helpfulness prediction in the presence of “words of few mouths”. World Wide Web 15, 117–138 (2012). https://doi.org/10.1007/s11280-011-0127-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11280-011-0127-3