Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3133956.3133990acmconferencesArticle/Chapter ViewAbstractPublication PagesccsConference Proceedingsconference-collections
research-article
Public Access

Automated Crowdturfing Attacks and Defenses in Online Review Systems

Published: 30 October 2017 Publication History

Abstract

Malicious crowdsourcing forums are gaining traction as sources of spreading misinformation online, but are limited by the costs of hiring and managing human workers. In this paper, we identify a new class of attacks that leverage deep learning language models (Recurrent Neural Networks or RNNs) to automate the generation of fake online reviews for products and services. Not only are these attacks cheap and therefore more scalable, but they can control rate of content output to eliminate the signature burstiness that makes crowdsourced campaigns easy to detect.
Using Yelp reviews as an example platform, we show how a two phased review generation and customization attack can produce reviews that are indistinguishable by state-of-the-art statistical detectors. We conduct a survey-based user study to show these reviews not only evade human detection, but also score high on "usefulness" metrics by users. Finally, we develop novel automated defenses against these attacks, by leveraging the lossy transformation introduced by the RNN training and generation cycle. We consider countermeasures against our mechanisms, show that they produce unattractive cost-benefit tradeoffs for attackers, and that they can be further curtailed by simple constraints imposed by online service providers.

Supplemental Material

MP4 File

References

[1]
Hyrum S. Anderson, Jonathan Woodbridge, and Bobby Filar. 2016. DeepDGA: Adversarially-tuned domain generation and detection Proc. of AISec Workshop.
[2]
Ebru Arisoy, Tara N. Sainath, Brian Kingsbury, and Bhuvana Ramabhadran 2012. Deep neural network language models. In Proc. NAACL-HLT Workshop.
[3]
Stefano Baccianella, Andrea Esuli, and Fabrizio Sebastiani. 2010. SentiWordNet 3.0: An enhanced lexical resource for sentiment analysis and opinion mining Proc. of LREC.
[4]
Yazan Boshmaf, Ildar Muslukhov, Konstantin Beznosov, and Matei Ripeanu 2012. Key challenges in defending against malicious socialbots Proc. of LEET Workshop.
[5]
Andrei Z. Broder, Steven C. Glassman, Mark S. Manasse, and Geoffrey Zweig 1997. Syntactic clustering of the web. In Proc. of WWW.
[6]
Bo Dong and Xue Wang. 2016. Comparison deep learning method to traditional methods using for network intrusion detection Proc. of ICCSN.
[7]
Geli Fei, Arjun Mukherjee, Bing Liu, Meichun Hsu, Malu Castellanos, and Riddhiman Ghosh. 2013. Exploiting burstiness in reviews for review spammer detection. Proc. of ICWSM.
[8]
Oana Goga, Giridhari Venkatadri, and Krishna P. Gummadi. 2015. The doppelg"anger bot attack: Exploring identity impersonation in online social networks Proc. of IMC.
[9]
Ian J. Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep learning. MIT press.
[10]
Ian J. Goodfellow, Yaroslav Bulatov, Julian Ibarz, Sacha Arnoud, and Vinay Shet. 2013. Multi-digit number recognition from street view imagery using deep convolutional neural networks. arXiv:1312.6082 (2013).
[11]
Alex Graves. 2013. Generating sequences with recurrent neural networks. arXiv:1308.0850 (2013).
[12]
F. Maxwell Harper and Joseph A. Konstan 2016. The movielens datasets: History and context. ACM Transactions on Interactive Intelligent Systems, Vol. 5, 4 (2016), 19.
[13]
Sepp Hochreiter and Jürgen Schmidhuber 1997. Long short-term memory. Neural computation, Vol. 9, 8 (1997), 1735--1780.
[14]
Dirk Hovy. 2016. The enemy in your own camp: How well can we detect statistically-generated fake reviews--An adversarial study. In Proc. of ACL (Short Paper).
[15]
Nitin Jindal and Bing Liu 2007. Review spam detection Proc. of WWW.
[16]
Nitin Jindal and Bing Liu 2008. Opinion spam and analysis. In Proc. of WSDM.
[17]
Rafal Jozefowicz, Oriol Vinyals, Mike Schuster, Noam Shazeer, and Yonghui Wu 2016. Exploring the limits of language modeling. arXiv:1602.02410 (2016).
[18]
Rafal Jozefowicz, Wojciech Zaremba, and Ilya Sutskever. 2015. An empirical exploration of recurrent network architectures Proc. of ICML.
[19]
Anjuli Kannan, Karol Kurach, Sujith Ravi, Tobias Kaufmann, Andrew Tomkins, Balint Miklos, Greg Corrado, László Lukács, Marina Ganea, Peter Young, and others 2016. Smart reply: Automated response suggestion for email Proc. of KDD.
[20]
Andrej Karpathy and Li Fei-Fei 2015. Deep visual-semantic alignments for generating image descriptions Proc. of CVPR.
[21]
Gyuwan Kim, Hayoon Yi, Jangho Lee, Yunheung Paek, and Sungroh Yoon 2017. LSTM-based system-call language modeling and robust ensemble method for designing host-based intrusion detection systems. In Proc. of ICLR.
[22]
Diederik Kingma and Jimmy Ba 2014. Adam: A method for stochastic optimization. arXiv:1412.6980 (2014).
[23]
Theodoros Lappas. 2012. Fake reviews: The malicious perspective. In Proc. of NLDB.
[24]
Rémi Lebret, David Grangier, and Michael Auli. 2016. Neural text generation from structured data with application to the biography domain Proc. of EMNLP.
[25]
Kyumin Lee, Prithivi Tamilarasan, and James Caverlee. 2013. Crowdturfers, campaigns, and social media: Tracking and revealing crowdsourced manipulation of social media. In Proc. of ICWSM.
[26]
Kyumin Lee, Steve Webb, and Hancheng Ge 2014. The dark side of micro-task marketplaces: Characterizing fiverr and automatically detecting crowdturfing. In Proc. of ICWSM.
[27]
Fangtao Li, Minlie Huang, Yi Yang, and Xiaoyan Zhu. 2011. Learning to identify review spam. In Proc. of IJCAI.
[28]
Jiwei Li, Myle Ott, Claire Cardie, and Eduard H. Hovy. 2014. Towards a general rule for identifying deceptive opinion spam Proc. of ACL.
[29]
Zachary C. Lipton, Sharad Vikram, and Julian McAuley. 2015. Capturing meaning in product reviews with character-Level generative text models. arXiv:1511.03683 (2015).
[30]
Michael Luca and Georgios Zervas 2016. Fake it till you make it: Reputation, competition, and Yelp review fraud. Management Science, Vol. 62, 12 (2016), 3412--3427.
[31]
Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and Christopher Potts. 2011. Learning word vectors for sentiment analysis. In Proc. of NAACL-HLT.
[32]
Julian McAuley, Rahul Pandey, and Jure Leskovec. 2015. Inferring networks of substitutable and complementary products Proc. of KDD.
[33]
William Melicher, Blase Ur, Sean M. Segreti, Saranga Komanduri, Lujo Bauer, Nicolas Christin, and Lorrie Faith Cranor 2016. Fast, lean and accurate: Modeling password guessability using neural networks Proc. of Usenix Security.
[34]
Rada Mihalcea and Carlo Strapparava 2009. The lie detector: Explorations in the automatic recognition of deceptive language Proc. of ACL-IJCNLP.
[35]
Tomávs Mikolov. 2012. Statistical language models based on neural networks. PhD Thesis, Brno University of Technology (2012).
[36]
Tomávs Mikolov, Anoop Deoras, Stefan Kombrink, Lukávs Burget, and Jan vCernockỳ 2011. Empirical evaluation and combination of advanced language modeling techniques Proc. of Interspeech.
[37]
George A. Miller. 1995. WordNet: A lexical database for English. Commun. ACM Vol. 38, 11 (1995), 39--41.
[38]
Arash Molavi Kakhki, Chloe Kliman-Silver, and Alan Mislove. 2013. Iolaus: Securing online content rating systems. In Proc. of WWW.
[39]
Marti Motoyama, Damon McCoy, Kirill Levchenko, Stefan Savage, and Geoffrey M. Voelker. 2011. Dirty jobs: The role of freelance labor in web service abuse Proc. of SEC.
[40]
Arjun Mukherjee, Abhinav Kumar, Bing Liu, Junhui Wang, Meichun Hsu, Malu Castellanos, and Riddhiman Ghosh 2013. Spotting opinion spammers using behavioral footprints Proc. of KDD.
[41]
Arjun Mukherjee, Vivek Venkataraman, Bing Liu, and Natalie S Glance 2013. What yelp fake review filter might be doing?. In Proc. of ICWSM.
[42]
Myle Ott, Yejin Choi, Claire Cardie, and Jeffrey T. Hancock. 2011. Finding deceptive opinion spam by any stretch of the imagination Proc. of NAACL-HLT.
[43]
Razvan Pascanu, Tomas Mikolov, and Yoshua Bengio. 2013. On the difficulty of training recurrent neural networks Proc. of ICML.
[44]
Ted Pedersen, Siddharth Patwardhan, and Jason Michelizzi. 2004. WordNet::Similarity: Measuring the relatedness of concepts Proc. of HLT-NAACL (Demonstration Paper).
[45]
James W. Pennebaker, Cindy K. Chung, Molly Ireland, Amy Gonzales, and Roger J. Booth 2007. The development and psychometric properties of LIWC2007. Technical Report (2007).
[46]
Jakub Piskorski, Marcin Sydow, and Dawid Weiss. 2008. Exploring linguistic features for web spam detection: a preliminary study Proc. of AIRWeb Workshop.
[47]
Mahmudur Rahman, Bogdan Carbunar, Jaime Ballesteros, and Duen Horng Polo Chau 2015. To catch a fake: Curbing deceptive yelp ratings and venues. Statistical Analysis and Data Mining: The ASA Data Science Journal, Vol. 8, 3 (2015), 147--161.
[48]
Shebuti Rayana and Leman Akoglu 2015. Collective opinion spam detection: Bridging review networks and metadata Proc. of KDD.
[49]
Ehud Reiter, Robert Dale, and Zhiwei Feng 2000. Building natural language generation systems. Vol. Vol. 33. MIT Press.
[50]
Ehud Reiter, Somayajulu Sripada, Jim Hunter, Jin Yu, and Ian Davy 2005. Choosing words in computer-generated weather forecasts. Artificial Intelligence Vol. 167, 1--2 (2005), 137--169.
[51]
Chen Rui, Yang Jing, Hu Rong-gui, and Huang Shu-guang. 2013. A novel LSTM-RNN decoding algorithm in CAPTCHA recognition Proc.of IMCCC.
[52]
Joshua Saxe and Konstantin Berlin 2015. Deep neural network based malware detection using two dimensional binary program features Proc. of MALWARE.
[53]
Saul Schleimer, Daniel S. Wilkerson, and Alex Aiken. 2003. Winnowing: local algorithms for document fingerprinting Proc. of SIGMOD.
[54]
Iulian V. Serban, Alessandro Sordoni, Yoshua Bengio, Aaron C. Courville, and Joelle Pineau. 2016. Hierarchical neural network generative models for movie dialogues Proc. of AAAI.
[55]
Lifeng Shang, Zhengdong Lu, and Hang Li 2015. Neural responding machine for short-text conversation Proc. of ACL-IJCNLP.
[56]
Eui Chul Richard Shin, Dawn Song, and Reza Moazzezi. 2015. Recognizing functions in binaries with neural networks Proc. of Usenix Security.
[57]
Ilya Sutskever, James Martens, and Geoffrey Hinton. 2011. Generating text with recurrent neural networks. In Proc. of ICML.
[58]
Ross Turner, Somayajulu Sripada, and Ehud Reiter. 2010. Generating approximate geographic descriptions. In Proc. of EMNLP.
[59]
Bimal Viswanath, Muhammad A. Bashir, Muhammad B. Zafar, Simon Bouget, Saikat Guha, Krishna P. Gummadi, Aniket Kate, and Alan Mislove 2015. Strength in numbers: Robust tamper detection in crowd computations Proc. of COSN.
[60]
Andreas Vlachos and Sebastian Riedel 2014. Fact Checking: Task definition and dataset construction Proc. of ACL.
[61]
Gang Wang, Tianyi Wang, Haitao Zheng, and Ben Y. Zhao. 2014. Man vs. machine: Practical adversarial detection of malicious crowdsourcing workers Proc. of Usenix Security.
[62]
Gang Wang, Christo Wilson, Xiaohan Zhao, Yibo Zhu, Manish Mohanlal, Haitao Zheng, and Ben Y. Zhao 2012. Serf and turf: crowdturfing for fun and profit. In Proc. of WWW.
[63]
Guan Wang, Sihong Xie, Bing Liu, and Philip S. Yu. 2011. Review graph based online store review spammer detection Proc. of ICDM.
[64]
William Yang Wang. 2017. "Liar, liar pants on fire": A new benchmark dataset for fake news detection Proc. of ACL.
[65]
Tsung-Hsien Wen, Milica Gasic, Nikola Mrksic, Pei-Hao Su, David Vandyke, and Steve Young. 2015. Semantically conditioned lstm-based natural language generation for spoken dialogue systems Proc. of EMNLP.
[66]
Zhenlong Yuan, Yongqiang Lu, Zhaoguo Wang, and Yibo Xue. 2014. Droid-sec: deep learning in Android malware detection Proc. of SIGCOMM (Demonstration Paper).
[67]
Qing Zhang, David Y. Wang, and Geoffrey M. Voelker. 2014. DSpin: Detecting automatically spun content on the web Proc. of NDSS. endthebibliography

Cited By

View all
  • (2024)A Sampling-Based Method for Detecting Data Poisoning Attacks in Recommendation SystemsMathematics10.3390/math1202024712:2(247)Online publication date: 12-Jan-2024
  • (2024)A multiview clustering framework for detecting deceptive reviewsJournal of Computer Security10.3233/JCS-22000132:1(31-52)Online publication date: 2-Feb-2024
  • (2024)Engineered Features for Personal Textual Language Style ClustersProceedings of the 17th International Conference on PErvasive Technologies Related to Assistive Environments10.1145/3652037.3663954(273-276)Online publication date: 26-Jun-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CCS '17: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security
October 2017
2682 pages
ISBN:9781450349468
DOI:10.1145/3133956
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 October 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. crowdtur ng
  2. fake review
  3. opinion spam
  4. web security

Qualifiers

  • Research-article

Funding Sources

Conference

CCS '17
Sponsor:

Acceptance Rates

CCS '17 Paper Acceptance Rate 151 of 836 submissions, 18%;
Overall Acceptance Rate 1,261 of 6,999 submissions, 18%

Upcoming Conference

CCS '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)259
  • Downloads (Last 6 weeks)15
Reflects downloads up to 10 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)A Sampling-Based Method for Detecting Data Poisoning Attacks in Recommendation SystemsMathematics10.3390/math1202024712:2(247)Online publication date: 12-Jan-2024
  • (2024)A multiview clustering framework for detecting deceptive reviewsJournal of Computer Security10.3233/JCS-22000132:1(31-52)Online publication date: 2-Feb-2024
  • (2024)Engineered Features for Personal Textual Language Style ClustersProceedings of the 17th International Conference on PErvasive Technologies Related to Assistive Environments10.1145/3652037.3663954(273-276)Online publication date: 26-Jun-2024
  • (2024)Understanding Underground Incentivized Review ServicesProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642342(1-18)Online publication date: 11-May-2024
  • (2024)SockDef: A Dynamically Adaptive Defense to a Novel Attack on Review Fraud Detection EnginesIEEE Transactions on Computational Social Systems10.1109/TCSS.2023.332134511:4(5253-5265)Online publication date: Aug-2024
  • (2024)AI-Driven Solutions for Social Engineering Attacks: Detection, Prevention, and Response2024 2nd International Conference on Cyber Resilience (ICCR)10.1109/ICCR61006.2024.10533010(1-8)Online publication date: 26-Feb-2024
  • (2023)Measuring and Understanding Crowdturfing in the App StoreInformation10.3390/info1407039314:7(393)Online publication date: 11-Jul-2023
  • (2023)STUDY OF ARTIFICIAL INTELLIGENCE IN CYBER SECURITY AND THE EMERGING THREAT OF AI-DRIVEN CYBER ATTACKS AND CHALLENGESSRN Electronic Journal10.2139/ssrn.4652028Online publication date: 2023
  • (2023)Unethical but not illegal! A critical look at two-sided disinformation platforms: Justifications, critique, and a way forwardJournal of Information Technology10.1177/0268396223118114539:3(441-476)Online publication date: 20-Jun-2023
  • (2023)A Novel Review Helpfulness Measure Based on the User-Review-Item ParadigmACM Transactions on the Web10.1145/358528017:4(1-31)Online publication date: 11-Jul-2023
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media