Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3366423.3380055acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

Detecting Undisclosed Paid Editing in Wikipedia

Published: 20 April 2020 Publication History

Abstract

Wikipedia, the free and open-collaboration based online encyclopedia, has millions of pages that are maintained by thousands of volunteer editors. As per Wikipedia’s fundamental principles, pages on Wikipedia are written with a neutral point of view and maintained by volunteer editors for free with well-defined guidelines in order to avoid or disclose any conflict of interest. However, there have been several known incidents where editors intentionally violate such guidelines in order to get paid (or even extort money) for maintaining promotional spam articles without disclosing such.
In this paper, we address for the first time the problem of identifying undisclosed paid articles in Wikipedia. We propose a machine learning-based framework using a set of features based on both the content of the articles as well as the patterns of edit history of users who create them. To test our approach, we collected and curated a new dataset from English Wikipedia with ground truth on undisclosed paid articles. Our experimental evaluation shows that we can identify undisclosed paid articles with an AUROC of 0.98 and an average precision of 0.91. Moreover, our approach outperforms ORES, a scoring system tool currently used by Wikipedia to automatically detect damaging content, in identifying undisclosed paid articles. Finally, we show that our user-based features can also detect undisclosed paid editors with an AUROC of 0.94 and an average precision of 0.92, outperforming existing approaches.

References

[1]
B. Thomas Adler, Luca de Alfaro, Santiago Moisés Mola-Velasco, Paolo Rosso, and Andrew G. West. 2011. Wikipedia Vandalism Detection: Combining Natural Language, Metadata, and Reputation Features. In Computational Linguistics and Intelligent Text Processing - 12th International Conference, CICLing 2011, Tokyo, Japan, February 20-26, 2011. Proceedings, Part II. 277–288.
[2]
B. Thomas Adler, Luca de Alfaro, and Ian Pye. 2010. Detecting Wikipedia Vandalism using WikiTrust - Lab Report for PAN at CLEF 2010. In CLEF 2010 LABs and Workshops, Notebook Papers, 22-23 September 2010, Padua, Italy.
[3]
Tony Ballioni, James Heilman, Brian Henry, and Aaron Halfaker. 2018. Known Undisclosed Paid Editors (English Wikipedia). (4 2018). https://doi.org/10.6084/m9.figshare.6176927.v1
[4]
Zhan Bu, Zhengyou Xia, and Jiandong Wang. 2013. A sock puppet detection algorithm on virtual spaces. Knowledge-Based Systems 37 (2013), 366–377.
[5]
Thomas Green and Francesca Spezzano. 2017. Spam Users Identification in Wikipedia Via Editing Behavior. In Proceedings of the Eleventh International Conference on Web and Social Media, ICWSM 2017, Montréal, Québec, Canada, May 15-18, 2017.532–535.
[6]
Abhik Jana, Pranjal Kanojiya, Pawan Goyal, and Animesh Mukherjee. 2018. WikiRef: Wikilinks as a route to recommending appropriate references for scientific Wikipedia pages. arXiv preprint arXiv:1806.04092(2018).
[7]
Srijan Kumar, Justin Cheng, Jure Leskovec, and V. S. Subrahmanian. 2017. An Army of Me: Sockpuppets in Online Discussion Communities. In Proceedings of the 26th International Conference on World Wide Web, WWW 2017, Perth, Australia, April 3-7, 2017. 857–866.
[8]
Srijan Kumar, Francesca Spezzano, and V. S. Subrahmanian. 2015. VEWS: A Wikipedia Vandal Early Warning System. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, NSW, Australia, August 10-13, 2015. 607–616.
[9]
Srijan Kumar, Robert West, and Jure Leskovec. 2016. Disinformation on the Web: Impact, Characteristics, and Detection of Wikipedia Hoaxes. In Proceedings of the 25th International Conference on World Wide Web, WWW 2016, Montreal, Canada, April 11 - 15, 2016. 591–602.
[10]
Dong Liu, Quanyuan Wu, Weihong Han, and Bin Zhou. 2016. Sockpuppet gang detection on social media sites. Frontiers of Computer Science 10, 1 (2016), 124–135.
[11]
Martin Potthast, Benno Stein, and Robert Gerling. 2008. Automatic Vandalism Detection in Wikipedia. In Advances in Information Retrieval, 30th European Conference on IR Research, ECIR 2008, Glasgow, UK, March 30-April 3, 2008. Proceedings. 663–668.
[12]
Thamar Solorio, Ragib Hasan, and Mainul Mizan. 2013. A case study of sockpuppet detection in wikipedia. In Proceedings of the Workshop on Language Analysis in Social Media at NAACL HTL. 59–68.
[13]
Thamar Solorio, Ragib Hasan, and Mainul Mizan. 2013. Sockpuppet detection in wikipedia: A corpus of real-world deceptive writing for linking identities. arXiv preprint arXiv:1310.6772(2013).
[14]
Michail Tsikerdekis and Sherali Zeadally. 2014. Multiple account identity deception detection in social media using nonverbal behavior. IEEE Transactions on Information Forensics and Security 9, 8(2014), 1311–1321.
[15]
Bimal Viswanath, Ansley Post, Krishna P Gummadi, and Alan Mislove. 2011. An analysis of social network-based sybil defenses. ACM SIGCOMM Computer Communication Review 41, 4 (2011), 363–374.
[16]
Andrew G. West, Avantika Agrawal, Phillip Baker, Brittney Exline, and Insup Lee. 2011. Autonomous link spam detection in purely collaborative environments. In Proceedings of the 7th International Symposium on Wikis and Open Collaboration, 2011, Mountain View, CA, USA, October 3-5, 2011. 91–100.
[17]
Andrew G. West, Jian Chang, Krishna K. Venkatasubramanian, Oleg Sokolsky, and Insup Lee. 2011. Link spamming Wikipedia for profit. In The 8th Annual Collaboration, Electronic messaging, Anti-Abuse and Spam Conference, CEAS 2011, Perth, Australia, September 1-2, 2011, Proceedings. 152–161.
[18]
Andrew G. West, Sampath Kannan, and Insup Lee. 2010. Detecting Wikipedia vandalism via spatio-temporal analysis of revision metadata. In Proceedings of the Third European Workshop on System Security, EUROSEC 2010, Paris, France, April 13, 2010. 22–28.
[19]
Zaher Yamak, Julien Saunier, and Laurent Vercouter. 2016. Detection of multiple identity manipulation in collaborative projects. In Proceedings of the 25th International Conference Companion on World Wide Web (Companion). 955–960.
[20]
Reza Zafarani and Huan Liu. 2015. 10 Bits of Surprise: Detecting Malicious Users with Minimum Information. In Proceedings of the 24th ACM International Conference on Information and Knowledge Management, CIKM 2015, Melbourne, VIC, Australia, October 19 - 23, 2015. 423–431.

Cited By

View all
  • (2024)Governance Capture in a Self-Governing Community: A Qualitative Comparison of the Croatian, Serbian, Bosnian, and Serbo-Croatian WikipediasProceedings of the ACM on Human-Computer Interaction10.1145/36373388:CSCW1(1-26)Online publication date: 26-Apr-2024
  • (2023)Fair Multilingual Vandalism Detection System for WikipediaProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599823(4981-4990)Online publication date: 6-Aug-2023
  • (2023)Interpretable Classification of Wiki-Review StreamsIEEE Access10.1109/ACCESS.2023.334247211(141137-141151)Online publication date: 2023
  • Show More Cited By

Index Terms

  1. Detecting Undisclosed Paid Editing in Wikipedia
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image ACM Conferences
          WWW '20: Proceedings of The Web Conference 2020
          April 2020
          3143 pages
          ISBN:9781450370233
          DOI:10.1145/3366423
          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Sponsors

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          Published: 20 April 2020

          Permissions

          Request permissions for this article.

          Check for updates

          Author Tags

          1. Detection of abusive content
          2. Malicious editors
          3. Sockpuppet accounts.
          4. Wikipedia

          Qualifiers

          • Research-article
          • Research
          • Refereed limited

          Conference

          WWW '20
          Sponsor:
          WWW '20: The Web Conference 2020
          April 20 - 24, 2020
          Taipei, Taiwan

          Acceptance Rates

          Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)32
          • Downloads (Last 6 weeks)4
          Reflects downloads up to 23 Dec 2024

          Other Metrics

          Citations

          Cited By

          View all
          • (2024)Governance Capture in a Self-Governing Community: A Qualitative Comparison of the Croatian, Serbian, Bosnian, and Serbo-Croatian WikipediasProceedings of the ACM on Human-Computer Interaction10.1145/36373388:CSCW1(1-26)Online publication date: 26-Apr-2024
          • (2023)Fair Multilingual Vandalism Detection System for WikipediaProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599823(4981-4990)Online publication date: 6-Aug-2023
          • (2023)Interpretable Classification of Wiki-Review StreamsIEEE Access10.1109/ACCESS.2023.334247211(141137-141151)Online publication date: 2023
          • (2022)A Survey on Detecting Vandalism in Crowdsourcing Models2022 International Conference on Data Science and Intelligent Computing (ICDSIC)10.1109/ICDSIC56987.2022.10076011(25-30)Online publication date: 1-Nov-2022
          • (2022)Automated Detection of Sockpuppet Accounts in Wikipedia2022 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)10.1109/ASONAM55673.2022.10068604(155-158)Online publication date: 10-Nov-2022

          View Options

          Login options

          View options

          PDF

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format.

          HTML Format

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media