Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2487575.2487580acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
poster

Spotting opinion spammers using behavioral footprints

Published: 11 August 2013 Publication History
  • Get Citation Alerts
  • Abstract

    Opinionated social media such as product reviews are now widely used by individuals and organizations for their decision making. However, due to the reason of profit or fame, people try to game the system by opinion spamming (e.g., writing fake reviews) to promote or to demote some target products. In recent years, fake review detection has attracted significant attention from both the business and research communities. However, due to the difficulty of human labeling needed for supervised learning and evaluation, the problem remains to be highly challenging. This work proposes a novel angle to the problem by modeling spamicity as latent. An unsupervised model, called Author Spamicity Model (ASM), is proposed. It works in the Bayesian setting, which facilitates modeling spamicity of authors as latent and allows us to exploit various observed behavioral footprints of reviewers. The intuition is that opinion spammers have different behavioral distributions than non-spammers. This creates a distributional divergence between the latent population distributions of two clusters: spammers and non-spammers. Model inference results in learning the population distributions of the two clusters. Several extensions of ASM are also considered leveraging from different priors. Experiments on a real-life Amazon review dataset demonstrate the effectiveness of the proposed models which significantly outperform the state-of-the-art competitors.

    References

    [1]
    Popken, B. 2010. 30 Ways You Can Spot Fake Online Reviews. The Consumerist.
    [2]
    Bishop, C.M. 2006. Pattern Recognition and Machine Learning. Springer.
    [3]
    Castillo, C., Donato, D., Becchetti, L., Boldi, P., Leonardi, S., Santini, M. and Vigna, S. 2006. A reference collection for web spam. SIGIR Forum. (2006).
    [4]
    Celeux, G., Chaveau, D., & Diebolt, J. 1996. Stochastic versions of the em algorithm: an experimental study in the mixture case. Journal of Statistical Computation and Simulation. (1996).
    [5]
    Chirita, P.A., Diederich, J., and Nejdl, W. 2005. MailRank?: Using Ranking for Spam Detection. CIKM (2005).
    [6]
    Duda, R. O., Hart, P. E., and Stork, D.J. 2001. Pattern Recognition. Wiley.
    [7]
    Fayyad, U., & Irani, K. 1993. Multi-interval discretization of continuous-valued attributes for classification learning. UAI (1993), 1022--1027.
    [8]
    Fei, G., Mukherjee, A., Liu, B., Hsu, M., Castellanos, M. and Ghosh, R. 2013. Exploiting Burstiness in Reviews for Review Spammer Detection. ICWSM. (2013).
    [9]
    Feng, S., Xing, L., Gogar, A. and Choi, Y. 2012. Distributional Footprints of Deceptive Product Reviews. ICWSM (2012).
    [10]
    Feng, S., Banerjee R., Choi, Y. 2011. Syntactic Stylometry for Deception Detection. ACL (2011).
    [11]
    Fleiss, J. 1971. Measuring nominal scale agreement among many raters. Psychological Bulletin. (1971), 378--382.
    [12]
    Frietchen, C. 2009. How to spot fake user reviews. Consumersearch.com.
    [13]
    Ghosh, S., Viswanath, B., Kooti, F., Sharma, N.K., Korlam, G., Benevenuto, F., Ganguly, N. and Gummadi, K.P. 2012. Understanding and combating link farming in the twitter social network. WWW. (2012).
    [14]
    Jindal, N. and Liu, B. 2008. Opinion Spam and Analysis. WSDM (2008).
    [15]
    Jindal, N., Liu, B. and Lim, E.-P. 2010. Finding Unusual Review Patterns Using Unexpected Rules. CIKM (2010).
    [16]
    Joachims, T. 1999. Making large-scale support vector machine learning practical. Advances in Kernel Methods. (1999).
    [17]
    Joachims, T. 2002. Optimizing Search Engines Using Clickthrough Data. KDD (2002).
    [18]
    Joachims, T. 1998. Text categorization with support vector machines: Learning with many relevant features. ECML (1998).
    [19]
    Kang, H., Wang, K., Soukal, D., Behr, F. and Zheng, Z. 2010. Large-scale bot detection for search engines. WWW (2010).
    [20]
    Keselj, V., Peng, F., Cercone, N., Thomas, C. 2003. N- Gram-Based Author Profiles for Authorship Attribution. PACL (2003), 255--264.
    [21]
    Klementiev, A., Roth, D. and Small, K. 2007. An Unsupervised Learning Algorithm for Rank Aggregation. ECML (2007).
    [22]
    Kolari, P., Java, A., Finin, T., Oates, T. and Joshi, A. 2006. Detecting Spam Blogs?: A Machine Learning Approach. AAAI (2006).
    [23]
    Landis, J. R. and Koch, G.G. 1977. The measurement of observer agreement for categorical data. Biometrics. (1977), 159--174.
    [24]
    Lauw, H.W., Lim, E. and Wang, K. 2007. Summarizing Review Scores of "Unequal" Reviewers. SIAM SDM (2007), 539--544.
    [25]
    Li, F., Huang, M., Yang, Y. and Zhu, X. 2011. Learning to Identify Review Spam. IJCAI (2011), 2488--2493.
    [26]
    Lim, E.-P., Nguyen, V.-A., Jindal, N., Liu, B. and Lauw, H.W. 2010. Detecting product review spammers using rating behaviors. CIKM (2010)
    [27]
    Liu, T.Y. 2009. Learning to Rank for Information Retrieval. Foundations and Trends in Information Retrieval. (2009), 225--331.
    [28]
    McAulié, D.B. and J. 2007. Supervised Topic Models. NIPS (2007).
    [29]
    Mukherjee, A., Liu, B. and Glance, N. 2012. Spotting Fake Reviewer Groups in Consumer Reviews. WWW (2012).
    [30]
    Mukherjee, A., Liu, B., Wang, J., Glance, N. and Jindal, N. 2011. Detecting Group Review Spam. WWW (2011).
    [31]
    Mukherjee, A., Venkataraman, V., Liu, B. and Glance, N. 2013. What Yelp Fake Review Filter might be Doing? ICWSM. (2013).
    [32]
    Newman, M.L., Pennebaker, J.W., Berry, D.S., Richards, J.M. 2003. Lying words: predicting deception from linguistic styles. Personality and Social Psychology Bulletin. (2003), 665--675.
    [33]
    Ott, M., Cardie, C. and Hancock, J. 2012. Estimating the prevalence of deception in online review communities. WWW (2012).
    [34]
    Ott, M., Choi, Y., Cardie, C. and Hancock, J.T. 2011. Finding Deceptive Opinion Spam by Any Stretch of the Imagination. ACL (2011), 309--319.
    [35]
    Pandit, S., Chau, D.H., Wang, S. and Faloutsos, C. 2007. NetProbe?: A Fast and Scalable System for Fraud Detection in Online Auction Networks. WWW.
    [36]
    Ramage, D., Hall, D., Nallapati, R., & Manning, C.D. 2009. A supervised topic model for credit attribution in multi-labeled corpora. EMNLP (2009).
    [37]
    Smyth, P. 1999. Probabilistic Model-Based Clustering of Multivariate and Sequential Data. AISTATS (1999).
    [38]
    Spirin, N. and Han, J. 2012. Survey on Web Spam Detection?: Principles and Algorithms. ACM SIGKDD Explorations. 13, 2 (2012), 50--64.
    [39]
    Streitfeld, D. 2012. Buy Reviews on Yelp, Get Black Mark. (2012).
    [40]
    Streitfeld, D. 2012. Fake Reviews, Real Problem. New York Times.
    [41]
    Vogt, C.C., Cottrell, G.W. 1999. Fusion via a linear combination of scores. Information Retrieval. (1999), 151--173.
    [42]
    Wang, G., Xie, S., Liu, B. and Yu, P.S. 2011. Review Graph Based Online Store Review Spammer Detection. ICDM (2011), 1242--1247.
    [43]
    Wang, Z. 2010. Anonymity, Social Image, and the Competition for Volunteers: A Case Study of the Online Market for Reviews. The B.E. Journal of Economic Analysis & Policy. 10, 1 (Jan. 2010), 1--34.
    [44]
    Wei, F., Li, W., Liu, S. 2010. iRANK: A Rank-Learn-Combine Framework for Unsupervised Ensemble Ranking. Journal of the American Society for Information Science and Technology. (2010).
    [45]
    Wu, B., Goel V. & Davison, B.D. 2006. Topical TrustRank: using topicality to combat Web spam. WWW (2006).
    [46]
    Xie, S., Wang, G., Lin, S. and Yu, P.S. 2012. Review spam detection via temporal pattern discovery. KDD. (2012).
    [47]
    Freund, Y., Iyer, R., Schapire, R. and Singer, Y. 2003. An efficient boosting algorithm for combining preferences. Journal of Machine Learning Research. 4 (2003), 933--959.
    [48]
    Zhu, C., Byrd, R. H., Lu, P., & Nocedal, J. 1997. L-BFGS-B: Fortran routines for large scale bound constrained optimization. ACM Transactions on Mathematical Software. (1997).

    Cited By

    View all
    • (2024)Multi-Layer Ranking with Large Language Models for News Source RecommendationProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657966(2537-2542)Online publication date: 10-Jul-2024
    • (2024)Spatio-Temporal Graph Representation Learning for Fraudster Group DetectionIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2022.321200135:5(6628-6642)Online publication date: May-2024
    • (2024)Predicting Credibility of Online Reviews: An Integrated ApproachIEEE Access10.1109/ACCESS.2024.338384612(49050-49061)Online publication date: 2024
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '13: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
    August 2013
    1534 pages
    ISBN:9781450321747
    DOI:10.1145/2487575
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 11 August 2013

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. abuse
    2. deceptive and fake reviewer detection
    3. opinion spam

    Qualifiers

    • Poster

    Conference

    KDD' 13
    Sponsor:

    Acceptance Rates

    KDD '13 Paper Acceptance Rate 125 of 726 submissions, 17%;
    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Upcoming Conference

    KDD '24

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)69
    • Downloads (Last 6 weeks)9
    Reflects downloads up to 12 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Multi-Layer Ranking with Large Language Models for News Source RecommendationProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657966(2537-2542)Online publication date: 10-Jul-2024
    • (2024)Spatio-Temporal Graph Representation Learning for Fraudster Group DetectionIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2022.321200135:5(6628-6642)Online publication date: May-2024
    • (2024)Predicting Credibility of Online Reviews: An Integrated ApproachIEEE Access10.1109/ACCESS.2024.338384612(49050-49061)Online publication date: 2024
    • (2024)Detecting Fake Online Reviews: An Unsupervised Detection Method With a Novel Performance EvaluationInternational Journal of Electronic Commerce10.1080/10864415.2023.2295067(1-24)Online publication date: 3-Jan-2024
    • (2024) Stacking GA 2 M for inherently interpretable fraudulent reviewer identification by fusing target and non-target features International Journal of General Systems10.1080/03081079.2024.2384404(1-36)Online publication date: 28-Jul-2024
    • (2024)Collusive spam detection from Chinese community question answering sitesInformation Sciences: an International Journal10.1016/j.ins.2024.120379667:COnline publication date: 1-May-2024
    • (2024)SUH-AIFRD: A self-training-based hybrid approach for individual fake reviewer detectionMultimedia Tools and Applications10.1007/s11042-024-18192-183:26(67643-67671)Online publication date: 26-Jan-2024
    • (2024)DHMFRD – TER: a deep hybrid model for fake review detection incorporating review texts, emotions, and ratingsMultimedia Tools and Applications10.1007/s11042-023-15193-483:2(4533-4549)Online publication date: 1-Jan-2024
    • (2024)Fake review detection techniques, issues, and future research directions: a literature reviewKnowledge and Information Systems10.1007/s10115-024-02118-266:9(5071-5112)Online publication date: 17-May-2024
    • (2023)Detecting fake reviewers in heterogeneous networks of buyers and sellers: a collaborative training-based spammer group algorithmCybersecurity10.1186/s42400-023-00159-86:1Online publication date: 2-Oct-2023
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media