Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1645953.1646235acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
poster

A co-classification framework for detecting web spam and spammers in social media web sites

Published: 02 November 2009 Publication History

Abstract

Social media are becoming increasingly popular and have attracted considerable attention from spammers. Using a sample of more than ninety thousand known spam Web sites, we found between 7% to 18% of their URLs are posted on two popular social media Web sites, digg.com and delicious.com. In this paper, we present a co-classification framework to detect Web spam and the spammers who are responsible for posting them on the social media Web sites. The rationale for our approach is that since both detection tasks are related, it would be advantageous to train them simultaneously to make use of the labeled examples in the Web spam and spammer training data. We have evaluated the effectiveness of our algorithm on the delicious.com data set. Our experimental results showed that the proposed co-classification algorithm significantly outperforms classifiers that learn each detection task independently.

References

[1]
J. Abernethy, O. Chapelle, and C. Castillo. Web spam identification through content and hyperlinks. In Proc. of the SIGIR Workshop on Adversarial Information Retrieval on the Web (AIRWEB'08), Beijing, China, April 2008.
[2]
F. Chen, J. Scripps, and P. Tan. Link mining for a social bookmarking web site. In Proc. of IEEE/WIC/ACM Int'l Conf. on Web Intelligence, 2008.
[3]
K. Ishida. Extracting spam blogs with co-citation clusters. In Proc. of the 17th Int'l Conf. on World Wide Web, pages 1043--1044, New York, NY, 2008.
[4]
N. Jindal and B. Liu. Opinion spam and analysis. In Proc. of Int'l Conf. on Web Search and Web Data Mining (WSDM 08), 2008.
[5]
T. Joachims. http://svmlight.joachims.org/.
[6]
G. Koutrika, F. A. Effendi, Z. Gyongyi, P. Heymann, and H. Garcia-Molina. Combating spam in tagging systems: An evaluation. 2(4):1--34, 2008.
[7]
Y. Lin, H. Sundaram, Y. Chi, J. Tatemura, and B. L. Tseng. Detecting splogs via temporal dynamics using self-similarity analysis. ACM Trans. Web, 2(1):1--35, Feb. 2008.
[8]
J. Suykens, T. Gestel, J. Brabanter, B. Moor, and J. Vandewalle. Least Squares Support Vector Machines. World Scientific Pub, Singapore, 2002.
[9]
S. Webb, J. Caverlee, and C. Pu. Introducing the Webb spam corpus: Using email spam to identify web spam automatically. In Proc. of CEAS '06, 2006.
[10]
T. Zhang, A. Popescul, and B. Dom. Linear prediction models with graph regularization for web-page categorization. In Proc. of ACM SIGKDD Int'l Conf on Data Mining, pages 821--826, Philadelphia, PA, 2006.

Cited By

View all
  • (2024)A Survey on the Applications of Semi-supervised Learning to Cyber-securityACM Computing Surveys10.1145/365764756:10(1-41)Online publication date: 22-Jun-2024
  • (2020)A Survey of Sentiment Analysis from Social Media DataIEEE Transactions on Computational Social Systems10.1109/TCSS.2019.29569577:2(450-464)Online publication date: Apr-2020
  • (2019)Comment Spam Detection via Effective Features CombinationICC 2019 - 2019 IEEE International Conference on Communications (ICC)10.1109/ICC.2019.8761340(1-6)Online publication date: May-2019
  • Show More Cited By

Index Terms

  1. A co-classification framework for detecting web spam and spammers in social media web sites

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge management
      November 2009
      2162 pages
      ISBN:9781605585123
      DOI:10.1145/1645953
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 02 November 2009

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. classification
      2. social media
      3. web spam

      Qualifiers

      • Poster

      Conference

      CIKM '09
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

      Upcoming Conference

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)2
      • Downloads (Last 6 weeks)2
      Reflects downloads up to 03 Oct 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)A Survey on the Applications of Semi-supervised Learning to Cyber-securityACM Computing Surveys10.1145/365764756:10(1-41)Online publication date: 22-Jun-2024
      • (2020)A Survey of Sentiment Analysis from Social Media DataIEEE Transactions on Computational Social Systems10.1109/TCSS.2019.29569577:2(450-464)Online publication date: Apr-2020
      • (2019)Comment Spam Detection via Effective Features CombinationICC 2019 - 2019 IEEE International Conference on Communications (ICC)10.1109/ICC.2019.8761340(1-6)Online publication date: May-2019
      • (2018)Semi-Supervised Collaborative Learning for Social Spammer and Spam Message Detection in MicrobloggingProceedings of the 27th ACM International Conference on Information and Knowledge Management10.1145/3269206.3269324(1791-1794)Online publication date: 17-Oct-2018
      • (2018)A Lexicon Generation Method for Aspect-Based Opinion Mining2018 IEEE 22nd International Conference on Intelligent Engineering Systems (INES)10.1109/INES.2018.8523897(000107-000112)Online publication date: Jun-2018
      • (2018)Facilitating apps recommendation in Google PlayThe Electronic Library10.1108/EL-05-2017-011936:5(856-874)Online publication date: Oct-2018
      • (2016)Detecting Spam and Promoting Campaigns in TwitterACM Transactions on the Web10.1145/284610210:1(1-28)Online publication date: 8-Feb-2016
      • (2016)Co-detecting social spammers and spam messages in microblogging via exploiting social contextsNeurocomputing10.1016/j.neucom.2016.03.036201:C(51-65)Online publication date: 12-Aug-2016
      • (2016)Recent developments in social spam detection and combating techniquesInformation Processing and Management: an International Journal10.1016/j.ipm.2016.04.00952:6(1053-1073)Online publication date: 1-Nov-2016
      • (2016)A Social Spam Detection Framework via Semi-supervised LearningRevised Selected Papers of the PAKDD 2016 Workshops on Trends and Applications in Knowledge Discovery and Data Mining - Volume 979410.1007/978-3-319-42996-0_18(214-226)Online publication date: 19-Apr-2016
      • Show More Cited By

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media