Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3357384.3358034acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article
Public Access

Wide-Ranging Review Manipulation Attacks: Model, Empirical Study, and Countermeasures

Published: 03 November 2019 Publication History

Abstract

User reviews have become a cornerstone of how we make decisions. However, this user-based feedback is susceptible to manipulation as recent research has shown the feasibility of automatically generating fake reviews. Previous investigations, however, have focused on generative fake review approaches that are (i) domain dependent and not extendable to other domains without replicating the whole process from scratch; and (ii) character-level based known to generate reviews of poor quality that are easily detectable by anti-spam detectors and by end users. In this work, we propose and evaluate a new class of attacks on online review platforms based on neural language models at word-level granularity in an inductive transfer-learning framework wherein a universal model is refined to handle domain shift, leading to potentially wide-ranging attacks on review systems. Through extensive evaluation, we show that such model-generated reviews can bypass powerful anti-spam detectors and fool end users. Paired with this troubling attack vector, we propose a new defense mechanism that exploits the distributed representation of these reviews to detect model-generated reviews. We conclude that despite the success of neural models in generating realistic reviews, our proposed RNN-based discriminator can combat this type of attack effectively (90% accuracy).

References

[1]
Roee Aharoni and et al. 2014. Automatic detection of machine translated text and translation quality estimation. In ACL .
[2]
Ebru Arisoy, Tara N Sainath, Brian Kingsbury, and Bhuvana Ramabhadran. 2012. Deep neural network language models. In NAACL-HLT. ACL.
[3]
Stefano Baccianella, Andrea Esuli, and Fabrizio Sebastiani. 2010. Sentiwordnet 3.0: an enhanced lexical resource for sentiment analysis and opinion mining. In Lrec .
[4]
Amir Fayazi, and et al. 2015. Uncovering crowdsourced manipulation of online reviews. In SIGIR .
[5]
William Fedus and et al. 2018. Maskgan: Better text generation via filling in the _. ICLR (2018).
[6]
Daniela Gerz and et al. 2018. On the Relation between Linguistic Typology and (Limitations of) Multilingual Language Modeling. In EMNLP .
[7]
Wael H Gomaa and Aly A Fahmy. 2013. A survey of text similarity approaches. International Journal of Computer Applications (2013).
[8]
Ian Goodfellow and et al. 2014. Generative adversarial nets. In NIPS .
[9]
Alex Graves. 2013. Generating sequences with recurrent neural networks. arXiv (2013).
[10]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation (1997).
[11]
Bryan Hooi and et al. 2016. Birdnest: Bayesian inference for ratings-fraud detection. In ICDM .
[12]
Dirk Hovy. 2016. The enemy in your own camp: How well can we detect statistically-generated fake reviews--An adversarial study. In ACL .
[13]
Jeremy Howard and Sebastian Ruder. 2018. Universal language model fine-tuning for text classification. In ACL .
[14]
Hakan Inan and et al. 2016. Tying word vectors and word classifiers: A loss framework for language modeling. arXiv (2016).
[15]
Nitin Jindal and Bing Liu. 2008. Opinion spam and analysis. In WSDM .
[16]
Mika Juuti and et al. 2018. Stay on-topic: Generating context-specific fake restaurant reviews. In ESORICS .
[17]
Parisa Kaghazgaran and et al. 2017. Behavioral analysis of review fraud: Linking malicious crowdsourcing to amazon and beyond. In ICWSM .
[18]
Parisa Kaghazgaran and et al. 2018. Combating crowdsourced review manipulators: A neighborhood-based approach. In WSDM .
[19]
Parisa Kaghazgaran and et al. 2019. TOmCAT: Target-Oriented Crowd Review ATtacks and Countermeasures. In ICWSM .
[20]
Anjuli Kannan and et al. 2016. Smart reply: Automated response suggestion for email. In KDD .
[21]
Andrej Karpathy and Li Fei-Fei. 2015. Deep visual-semantic alignments for generating image descriptions. In CVPR .
[22]
Srijan Kumar and Neil Shah. 2018. False information on web and social media: A survey. Social Media Analytics: Advances and Applications (2018).
[23]
Shanshan Li and et al. 2017. Crowdsourced App Review Manipulation. In SIGIR .
[24]
Zachary C Lipton and et al. 2015. Capturing meaning in product reviews with character-level generative text models. arXiv (2015).
[25]
Michael Luca and Georgios Zervas. 2016. Fake it till you make it: Reputation, competition, and Yelp review fraud. Management Science (2016).
[26]
Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of machine learning research (2008).
[27]
Mitul Makadia. 2018. What Are the Advantage of Natural Language Generation and Its Impact on Business Intelligence?, https://www.marutitech.com/advantages-of-natural-language-generation/, Last Access: 01/28/2019. (2018).
[28]
Julian McAuley and et al. 2015. Image-based recommendations on styles and substitutes. In SIGIR .
[29]
Gábor Melis and et al. 2017. On the state of the art of evaluation in neural language models. ICLR (2017).
[30]
Stephen Merity and et al. 2017. Regularizing and optimizing LSTM language models. arXiv (2017).
[31]
Stephen Merity and et al. 2018. An Analysis of Neural Language Modeling at Multiple Scales. arXiv (2018).
[32]
Tomávs Mikolov and et al. [n.d.]. Empirical evaluation and combination of advanced language modeling techniques.
[33]
Tomávs Mikolov and et al. 2010. Recurrent neural network based language model. In ISCA .
[34]
Tomas Mikolov and et al. 2013. Distributed representations of words and phrases and their compositionality. In NIPS .
[35]
Sewon Min and et al. 2017. Question answering through transfer learning from large fine-grained supervision data. arXiv (2017).
[36]
Arjun Mukherjee and et al. 2013. What yelp fake review filter might be doing?. In ICWSM .
[37]
Hoang-Quoc Nguyen-Son and et al. 2017. Identifying computer-generated text using statistical analysis. In APSIPA ASC .
[38]
Matthew Peters and et al. 2017. Semi-supervised sequence tagging with bidirectional language models. arXiv (2017).
[39]
Matthew E et al. Peters. 2018. Deep contextualized word representations. arXiv (2018).
[40]
Jakub Piskorski and et al. 2008. Exploring linguistic features for web spam detection: a preliminary study. In AIRWeb .
[41]
Alec Radford and et al. 2017. Learning to generate reviews and discovering sentiment. arXiv (2017).
[42]
Shebuti Rayana and Leman Akoglu. 2015. Collective opinion spam detection: Bridging review networks and metadata. In KDD .
[43]
Ehud Reiter and et al. 2005. Choosing words in computer-generated weather forecasts. Artificial Intelligence (2005).
[44]
Rico Sennrich and et al. 2015. Improving neural machine translation models with monolingual data. ACL (2015).
[45]
Iulian Vlad Serban and et al. 2015. Hierarchical neural network generative models for movie dialogues. CoRR, abs/1507.04808 (2015).
[46]
Lifeng Shang, Zhengdong Lu, and Hang Li. 2015. Neural responding machine for short-text conversation. arXiv (2015).
[47]
Kijung Shin, and et al. 2017. D-cube: Dense-block detection in terabyte-scale tensors. In WSDM .
[48]
Ilya Sutskever and et al. 2011. Generating text with recurrent neural networks. In ICML .
[49]
Gang Wang, and et al. 2012. Serf and turf: crowdturfing for fun and profit. In WWW .
[50]
Anbang Xu and et al. 2017. A new chatbot for customer service on social media. In CHI .
[51]
Arun Kumar Yadav and Samir Kumar Borgohain. 2014. Sentence generation from a bag of words using N-gram model. In ICACCCT .
[52]
Yuanshun Yao and et al. 2017. Automated crowdturfing attacks and defenses in online review systems. In CCS .
[53]
Jason Yosinski and et al. 2014. How transferable are features in deep neural networks?. In Advances in neural information processing systems .

Cited By

View all
  • (2025)Evaluating Language Model Vulnerability to Poisoning Attacks in Low-Resource SettingsIEEE Transactions on Audio, Speech and Language Processing10.1109/TASLP.2024.350756533(54-67)Online publication date: 2025
  • (2025)Multiplex graph fusion network with reinforcement structure learning for fraud detection in online e-commerce platformsExpert Systems with Applications10.1016/j.eswa.2024.125598262(125598)Online publication date: Mar-2025
  • (2024)ChatGPT paraphrased product reviews can confuse consumers and undermine their trust in genuine reviews. Can you tell the difference?Information Processing and Management: an International Journal10.1016/j.ipm.2024.10384261:6Online publication date: 1-Nov-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '19: Proceedings of the 28th ACM International Conference on Information and Knowledge Management
November 2019
3373 pages
ISBN:9781450369763
DOI:10.1145/3357384
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 November 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. crowd attacks
  2. discriminator
  3. fake review detection
  4. fake reviews
  5. model-generated reviews
  6. online attacks
  7. spam detector

Qualifiers

  • Research-article

Funding Sources

Conference

CIKM '19
Sponsor:

Acceptance Rates

CIKM '19 Paper Acceptance Rate 202 of 1,031 submissions, 20%;
Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)105
  • Downloads (Last 6 weeks)10
Reflects downloads up to 07 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Evaluating Language Model Vulnerability to Poisoning Attacks in Low-Resource SettingsIEEE Transactions on Audio, Speech and Language Processing10.1109/TASLP.2024.350756533(54-67)Online publication date: 2025
  • (2025)Multiplex graph fusion network with reinforcement structure learning for fraud detection in online e-commerce platformsExpert Systems with Applications10.1016/j.eswa.2024.125598262(125598)Online publication date: Mar-2025
  • (2024)ChatGPT paraphrased product reviews can confuse consumers and undermine their trust in genuine reviews. Can you tell the difference?Information Processing and Management: an International Journal10.1016/j.ipm.2024.10384261:6Online publication date: 1-Nov-2024
  • (2023)Improving fraud detection via imbalanced graph structure learningMachine Learning10.1007/s10994-023-06464-0113:3(1069-1090)Online publication date: 29-Nov-2023
  • (2022)IDGL: An Imbalanced Disassortative Graph Learning Framework for Fraud DetectionService-Oriented Computing10.1007/978-3-031-20984-0_44(616-631)Online publication date: 22-Nov-2022

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media