research-article

Public Access

Automated Crowdturfing Attacks and Defenses in Online Review Systems

Authors:

Bimal Viswanath,

Ben Y. ZhaoAuthors Info & Claims

CCS '17: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security

Pages 1143 - 1158

https://doi.org/10.1145/3133956.3133990

Published: 30 October 2017 Publication History

Abstract

Malicious crowdsourcing forums are gaining traction as sources of spreading misinformation online, but are limited by the costs of hiring and managing human workers. In this paper, we identify a new class of attacks that leverage deep learning language models (Recurrent Neural Networks or RNNs) to automate the generation of fake online reviews for products and services. Not only are these attacks cheap and therefore more scalable, but they can control rate of content output to eliminate the signature burstiness that makes crowdsourced campaigns easy to detect.

Using Yelp reviews as an example platform, we show how a two phased review generation and customization attack can produce reviews that are indistinguishable by state-of-the-art statistical detectors. We conduct a survey-based user study to show these reviews not only evade human detection, but also score high on "usefulness" metrics by users. Finally, we develop novel automated defenses against these attacks, by leveraging the lossy transformation introduced by the RNN training and generation cycle. We consider countermeasures against our mechanisms, show that they produce unattractive cost-benefit tradeoffs for attackers, and that they can be further curtailed by simple constraints imposed by online service providers.

Supplemental Material

MP4 File

Download
2924.86 MB

References

[1]

Hyrum S. Anderson, Jonathan Woodbridge, and Bobby Filar. 2016. DeepDGA: Adversarially-tuned domain generation and detection Proc. of AISec Workshop.

[2]

Ebru Arisoy, Tara N. Sainath, Brian Kingsbury, and Bhuvana Ramabhadran 2012. Deep neural network language models. In Proc. NAACL-HLT Workshop.

[3]

Stefano Baccianella, Andrea Esuli, and Fabrizio Sebastiani. 2010. SentiWordNet 3.0: An enhanced lexical resource for sentiment analysis and opinion mining Proc. of LREC.

[4]

Yazan Boshmaf, Ildar Muslukhov, Konstantin Beznosov, and Matei Ripeanu 2012. Key challenges in defending against malicious socialbots Proc. of LEET Workshop.

[5]

Andrei Z. Broder, Steven C. Glassman, Mark S. Manasse, and Geoffrey Zweig 1997. Syntactic clustering of the web. In Proc. of WWW.

Digital Library

[6]

Bo Dong and Xue Wang. 2016. Comparison deep learning method to traditional methods using for network intrusion detection Proc. of ICCSN.

[7]

Geli Fei, Arjun Mukherjee, Bing Liu, Meichun Hsu, Malu Castellanos, and Riddhiman Ghosh. 2013. Exploiting burstiness in reviews for review spammer detection. Proc. of ICWSM.

[8]

Oana Goga, Giridhari Venkatadri, and Krishna P. Gummadi. 2015. The doppelg"anger bot attack: Exploring identity impersonation in online social networks Proc. of IMC.

[9]

Ian J. Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep learning. MIT press.

Digital Library

[10]

Ian J. Goodfellow, Yaroslav Bulatov, Julian Ibarz, Sacha Arnoud, and Vinay Shet. 2013. Multi-digit number recognition from street view imagery using deep convolutional neural networks. arXiv:1312.6082 (2013).

[11]

Alex Graves. 2013. Generating sequences with recurrent neural networks. arXiv:1308.0850 (2013).

[12]

F. Maxwell Harper and Joseph A. Konstan 2016. The movielens datasets: History and context. ACM Transactions on Interactive Intelligent Systems, Vol. 5, 4 (2016), 19.

[13]

Sepp Hochreiter and Jürgen Schmidhuber 1997. Long short-term memory. Neural computation, Vol. 9, 8 (1997), 1735--1780.

Digital Library

[14]

Dirk Hovy. 2016. The enemy in your own camp: How well can we detect statistically-generated fake reviews--An adversarial study. In Proc. of ACL (Short Paper).

[15]

Nitin Jindal and Bing Liu 2007. Review spam detection Proc. of WWW.

[16]

Nitin Jindal and Bing Liu 2008. Opinion spam and analysis. In Proc. of WSDM.

Digital Library

[17]

Rafal Jozefowicz, Oriol Vinyals, Mike Schuster, Noam Shazeer, and Yonghui Wu 2016. Exploring the limits of language modeling. arXiv:1602.02410 (2016).

[18]

Rafal Jozefowicz, Wojciech Zaremba, and Ilya Sutskever. 2015. An empirical exploration of recurrent network architectures Proc. of ICML.

[19]

Anjuli Kannan, Karol Kurach, Sujith Ravi, Tobias Kaufmann, Andrew Tomkins, Balint Miklos, Greg Corrado, László Lukács, Marina Ganea, Peter Young, and others 2016. Smart reply: Automated response suggestion for email Proc. of KDD.

[20]

Andrej Karpathy and Li Fei-Fei 2015. Deep visual-semantic alignments for generating image descriptions Proc. of CVPR.

[21]

Gyuwan Kim, Hayoon Yi, Jangho Lee, Yunheung Paek, and Sungroh Yoon 2017. LSTM-based system-call language modeling and robust ensemble method for designing host-based intrusion detection systems. In Proc. of ICLR.

[22]

Diederik Kingma and Jimmy Ba 2014. Adam: A method for stochastic optimization. arXiv:1412.6980 (2014).

[23]

Theodoros Lappas. 2012. Fake reviews: The malicious perspective. In Proc. of NLDB.

Digital Library

[24]

Rémi Lebret, David Grangier, and Michael Auli. 2016. Neural text generation from structured data with application to the biography domain Proc. of EMNLP.

[25]

Kyumin Lee, Prithivi Tamilarasan, and James Caverlee. 2013. Crowdturfers, campaigns, and social media: Tracking and revealing crowdsourced manipulation of social media. In Proc. of ICWSM.

[26]

Kyumin Lee, Steve Webb, and Hancheng Ge 2014. The dark side of micro-task marketplaces: Characterizing fiverr and automatically detecting crowdturfing. In Proc. of ICWSM.

[27]

Fangtao Li, Minlie Huang, Yi Yang, and Xiaoyan Zhu. 2011. Learning to identify review spam. In Proc. of IJCAI.

[28]

Jiwei Li, Myle Ott, Claire Cardie, and Eduard H. Hovy. 2014. Towards a general rule for identifying deceptive opinion spam Proc. of ACL.

[29]

Zachary C. Lipton, Sharad Vikram, and Julian McAuley. 2015. Capturing meaning in product reviews with character-Level generative text models. arXiv:1511.03683 (2015).

[30]

Michael Luca and Georgios Zervas 2016. Fake it till you make it: Reputation, competition, and Yelp review fraud. Management Science, Vol. 62, 12 (2016), 3412--3427.

Digital Library

[31]

Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and Christopher Potts. 2011. Learning word vectors for sentiment analysis. In Proc. of NAACL-HLT.

[32]

Julian McAuley, Rahul Pandey, and Jure Leskovec. 2015. Inferring networks of substitutable and complementary products Proc. of KDD.

[33]

William Melicher, Blase Ur, Sean M. Segreti, Saranga Komanduri, Lujo Bauer, Nicolas Christin, and Lorrie Faith Cranor 2016. Fast, lean and accurate: Modeling password guessability using neural networks Proc. of Usenix Security.

[34]

Rada Mihalcea and Carlo Strapparava 2009. The lie detector: Explorations in the automatic recognition of deceptive language Proc. of ACL-IJCNLP.

[35]

Tomávs Mikolov. 2012. Statistical language models based on neural networks. PhD Thesis, Brno University of Technology (2012).

[36]

Tomávs Mikolov, Anoop Deoras, Stefan Kombrink, Lukávs Burget, and Jan vCernockỳ 2011. Empirical evaluation and combination of advanced language modeling techniques Proc. of Interspeech.

[37]

George A. Miller. 1995. WordNet: A lexical database for English. Commun. ACM Vol. 38, 11 (1995), 39--41.

Digital Library

[38]

Arash Molavi Kakhki, Chloe Kliman-Silver, and Alan Mislove. 2013. Iolaus: Securing online content rating systems. In Proc. of WWW.

[39]

Marti Motoyama, Damon McCoy, Kirill Levchenko, Stefan Savage, and Geoffrey M. Voelker. 2011. Dirty jobs: The role of freelance labor in web service abuse Proc. of SEC.

[40]

Arjun Mukherjee, Abhinav Kumar, Bing Liu, Junhui Wang, Meichun Hsu, Malu Castellanos, and Riddhiman Ghosh 2013. Spotting opinion spammers using behavioral footprints Proc. of KDD.

[41]

Arjun Mukherjee, Vivek Venkataraman, Bing Liu, and Natalie S Glance 2013. What yelp fake review filter might be doing?. In Proc. of ICWSM.

[42]

Myle Ott, Yejin Choi, Claire Cardie, and Jeffrey T. Hancock. 2011. Finding deceptive opinion spam by any stretch of the imagination Proc. of NAACL-HLT.

Digital Library

[43]

Razvan Pascanu, Tomas Mikolov, and Yoshua Bengio. 2013. On the difficulty of training recurrent neural networks Proc. of ICML.

[44]

Ted Pedersen, Siddharth Patwardhan, and Jason Michelizzi. 2004. WordNet::Similarity: Measuring the relatedness of concepts Proc. of HLT-NAACL (Demonstration Paper).

[45]

James W. Pennebaker, Cindy K. Chung, Molly Ireland, Amy Gonzales, and Roger J. Booth 2007. The development and psychometric properties of LIWC2007. Technical Report (2007).

[46]

Jakub Piskorski, Marcin Sydow, and Dawid Weiss. 2008. Exploring linguistic features for web spam detection: a preliminary study Proc. of AIRWeb Workshop.

[47]

Mahmudur Rahman, Bogdan Carbunar, Jaime Ballesteros, and Duen Horng Polo Chau 2015. To catch a fake: Curbing deceptive yelp ratings and venues. Statistical Analysis and Data Mining: The ASA Data Science Journal, Vol. 8, 3 (2015), 147--161.

Digital Library

[48]

Shebuti Rayana and Leman Akoglu 2015. Collective opinion spam detection: Bridging review networks and metadata Proc. of KDD.

[49]

Ehud Reiter, Robert Dale, and Zhiwei Feng 2000. Building natural language generation systems. Vol. Vol. 33. MIT Press.

[50]

Ehud Reiter, Somayajulu Sripada, Jim Hunter, Jin Yu, and Ian Davy 2005. Choosing words in computer-generated weather forecasts. Artificial Intelligence Vol. 167, 1--2 (2005), 137--169.

Digital Library

[51]

Chen Rui, Yang Jing, Hu Rong-gui, and Huang Shu-guang. 2013. A novel LSTM-RNN decoding algorithm in CAPTCHA recognition Proc.of IMCCC.

[52]

Joshua Saxe and Konstantin Berlin 2015. Deep neural network based malware detection using two dimensional binary program features Proc. of MALWARE.

[53]

Saul Schleimer, Daniel S. Wilkerson, and Alex Aiken. 2003. Winnowing: local algorithms for document fingerprinting Proc. of SIGMOD.

[54]

Iulian V. Serban, Alessandro Sordoni, Yoshua Bengio, Aaron C. Courville, and Joelle Pineau. 2016. Hierarchical neural network generative models for movie dialogues Proc. of AAAI.

[55]

Lifeng Shang, Zhengdong Lu, and Hang Li 2015. Neural responding machine for short-text conversation Proc. of ACL-IJCNLP.

[56]

Eui Chul Richard Shin, Dawn Song, and Reza Moazzezi. 2015. Recognizing functions in binaries with neural networks Proc. of Usenix Security.

[57]

Ilya Sutskever, James Martens, and Geoffrey Hinton. 2011. Generating text with recurrent neural networks. In Proc. of ICML.

[58]

Ross Turner, Somayajulu Sripada, and Ehud Reiter. 2010. Generating approximate geographic descriptions. In Proc. of EMNLP.

[59]

Bimal Viswanath, Muhammad A. Bashir, Muhammad B. Zafar, Simon Bouget, Saikat Guha, Krishna P. Gummadi, Aniket Kate, and Alan Mislove 2015. Strength in numbers: Robust tamper detection in crowd computations Proc. of COSN.

[60]

Andreas Vlachos and Sebastian Riedel 2014. Fact Checking: Task definition and dataset construction Proc. of ACL.

[61]

Gang Wang, Tianyi Wang, Haitao Zheng, and Ben Y. Zhao. 2014. Man vs. machine: Practical adversarial detection of malicious crowdsourcing workers Proc. of Usenix Security.

Digital Library

[62]

Gang Wang, Christo Wilson, Xiaohan Zhao, Yibo Zhu, Manish Mohanlal, Haitao Zheng, and Ben Y. Zhao 2012. Serf and turf: crowdturfing for fun and profit. In Proc. of WWW.

Digital Library

[63]

Guan Wang, Sihong Xie, Bing Liu, and Philip S. Yu. 2011. Review graph based online store review spammer detection Proc. of ICDM.

[64]

William Yang Wang. 2017. "Liar, liar pants on fire": A new benchmark dataset for fake news detection Proc. of ACL.

[65]

Tsung-Hsien Wen, Milica Gasic, Nikola Mrksic, Pei-Hao Su, David Vandyke, and Steve Young. 2015. Semantically conditioned lstm-based natural language generation for spoken dialogue systems Proc. of EMNLP.

[66]

Zhenlong Yuan, Yongqiang Lu, Zhaoguo Wang, and Yibo Xue. 2014. Droid-sec: deep learning in Android malware detection Proc. of SIGCOMM (Demonstration Paper).

[67]

Qing Zhang, David Y. Wang, and Geoffrey M. Voelker. 2014. DSpin: Detecting automatically spun content on the web Proc. of NDSS. endthebibliography

Cited By

Li MLian YZhu JLin JWan JSun Y(2024)A Sampling-Based Method for Detecting Data Poisoning Attacks in Recommendation SystemsMathematics10.3390/math1202024712:2(247)Online publication date: 12-Jan-2024
https://doi.org/10.3390/math12020247
Zhang YWang HStavrou A(2024)A multiview clustering framework for detecting deceptive reviewsJournal of Computer Security10.3233/JCS-22000132:1(31-52)Online publication date: 2-Feb-2024
https://dl.acm.org/doi/10.3233/JCS-220001
Koone M(2024)Engineered Features for Personal Textual Language Style ClustersProceedings of the 17th International Conference on PErvasive Technologies Related to Assistive Environments10.1145/3652037.3663954(273-276)Online publication date: 26-Jun-2024
https://dl.acm.org/doi/10.1145/3652037.3663954
Show More Cited By

Index Terms

Automated Crowdturfing Attacks and Defenses in Online Review Systems
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Natural language generation
  2. Machine learning
    1. Machine learning approaches
      1. Neural networks
2. Security and privacy
  1. Human and societal aspects of security and privacy
    1. Social aspects of security and privacy

Recommendations

Scriptless attacks: Stealing more pie without touching the sill
Web Application Security Web @ 25

Due to their high practical impact, Cross-Site Scripting (XSS) attacks have attracted a lot of attention from the members of security community worldwide. In the same way, a plethora of more or less effective defense techniques have been proposed, ...
mXSS attacks: attacking well-secured web-applications by using innerHTML mutations
CCS '13: Proceedings of the 2013 ACM SIGSAC conference on Computer & communications security

Back in 2007, Hasegawa discovered a novel Cross-Site Scripting (XSS) vector based on the mistreatment of the backtick character in a single browser implementation. This initially looked like an implementation error that could easily be fixed. Instead, ...
A solution for the automated detection of clickjacking attacks
ASIACCS '10: Proceedings of the 5th ACM Symposium on Information, Computer and Communications Security

Clickjacking is a web-based attack that has recently received a wide media coverage. In a clickjacking attack, a malicious page is constructed such that it tricks victims into clicking on an element of a different page that is only barely (or not at all)...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CCS '17: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security

October 2017

2682 pages

ISBN:9781450349468

DOI:10.1145/3133956

General Chair:
Bhavani Thuraisingham
The University of Texas at Dallas, USA
,
Program Chairs:
David Evans
University of Virginia
,
Tal Malkin
Columbia University
,
Dongyan Xu
Purdue University

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGSAC: ACM Special Interest Group on Security, Audit, and Control

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 October 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

NSF

Conference

CCS '17

Sponsor:

SIGSAC

CCS '17: 2017 ACM SIGSAC Conference on Computer and Communications Security

October 30 - November 3, 2017

Texas, Dallas, USA

Acceptance Rates

CCS '17 Paper Acceptance Rate 151 of 836 submissions, 18%;

Overall Acceptance Rate 1,261 of 6,999 submissions, 18%

Upcoming Conference

CCS '25

Sponsor:
sigsac

ACM SIGSAC Conference on Computer and Communications Security

October 13 - 17, 2025

Taipei , Taiwan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

81
Total Citations
View Citations
1,608
Total Downloads

Downloads (Last 12 months)259
Downloads (Last 6 weeks)15

Reflects downloads up to 10 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Li MLian YZhu JLin JWan JSun Y(2024)A Sampling-Based Method for Detecting Data Poisoning Attacks in Recommendation SystemsMathematics10.3390/math1202024712:2(247)Online publication date: 12-Jan-2024
https://doi.org/10.3390/math12020247
Zhang YWang HStavrou A(2024)A multiview clustering framework for detecting deceptive reviewsJournal of Computer Security10.3233/JCS-22000132:1(31-52)Online publication date: 2-Feb-2024
https://dl.acm.org/doi/10.3233/JCS-220001
Koone M(2024)Engineered Features for Personal Textual Language Style ClustersProceedings of the 17th International Conference on PErvasive Technologies Related to Assistive Environments10.1145/3652037.3663954(273-276)Online publication date: 26-Jun-2024
https://dl.acm.org/doi/10.1145/3652037.3663954
Oak RShafiq Z(2024)Understanding Underground Incentivized Review ServicesProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642342(1-18)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3642342
Zhang YChakrabarty SLiu RPugliese ASubrahmanian V(2024)SockDef: A Dynamically Adaptive Defense to a Novel Attack on Review Fraud Detection EnginesIEEE Transactions on Computational Social Systems10.1109/TCSS.2023.332134511:4(5253-5265)Online publication date: Aug-2024
https://doi.org/10.1109/TCSS.2023.3321345
Fakhouri HAlhadidi BOmar KMakhadmeh SHamad FHalalsheh N(2024)AI-Driven Solutions for Social Engineering Attacks: Detection, Prevention, and Response2024 2nd International Conference on Cyber Resilience (ICCR)10.1109/ICCR61006.2024.10533010(1-8)Online publication date: 26-Feb-2024
https://doi.org/10.1109/ICCR61006.2024.10533010
Hu QZhang XLi FTang ZWang S(2023)Measuring and Understanding Crowdturfing in the App StoreInformation10.3390/info1407039314:7(393)Online publication date: 11-Jul-2023
https://doi.org/10.3390/info14070393
Hassan S(2023)STUDY OF ARTIFICIAL INTELLIGENCE IN CYBER SECURITY AND THE EMERGING THREAT OF AI-DRIVEN CYBER ATTACKS AND CHALLENGESSRN Electronic Journal10.2139/ssrn.4652028Online publication date: 2023
https://doi.org/10.2139/ssrn.4652028
Soliman WRinta-Kahila T(2023)Unethical but not illegal! A critical look at two-sided disinformation platforms: Justifications, critique, and a way forwardJournal of Information Technology10.1177/0268396223118114539:3(441-476)Online publication date: 20-Jun-2023
https://doi.org/10.1177/02683962231181145
Pajola LChen DConti MSubrahmanian V(2023)A Novel Review Helpfulness Measure Based on the User-Review-Item ParadigmACM Transactions on the Web10.1145/358528017:4(1-31)Online publication date: 11-Jul-2023
https://dl.acm.org/doi/10.1145/3585280
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents