Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1451983.1451998acmotherconferencesArticle/Chapter ViewAbstractPublication Pagesiea-aeiConference Proceedingsconference-collections
research-article

The anti-social tagger: detecting spam in social bookmarking systems

Published: 22 April 2008 Publication History

Abstract

The annotation of web sites in social bookmarking systems has become a popular way to manage and find information on the web. The community structure of such systems attracts spammers: recent post pages, popular pages or specific tag pages can be manipulated easily. As a result, searching or tracking recent posts does not deliver quality results annotated in the community, but rather unsolicited, often commercial, web sites. To retain the benefits of sharing one's web content, spam-fighting mechanisms that can face the flexible strategies of spammers need to be developed.
A classical approach in machine learning is to determine relevant features that describe the system's users, train different classifiers with the selected features and choose the one with the most promising evaluation results. In this paper we will transfer this approach to a social bookmarking setting to identify spammers. We will present features considering the topological, semantic and profile-based information which people make public when using the system. The dataset used is a snapshot of the social bookmarking system BibSonomy and was built over the course of several months when cleaning the system from spam. Based on our features, we will learn a large set of different classification models and compare their performance. Our results represent the groundwork for a first application in BibSonomy and for the building of more elaborate spam detection mechanisms.

References

[1]
C. Castillo, D. Donato, A. Gionis, V. Murdock, and F. Silvestri. Know your neighbors: web spam detection using the web topology. In Proc. SIGIR '07, pages 423--430, New York, NY, USA, 2007. ACM.
[2]
C. Cattuto, C. Schmitz, A. Baldassarri, V. D. P. Servedio, V. Loreto, A. Hotho, M. Grahl, and G. Stumme. Network properties of folksonomies. AI Communications, 20(4):245--262, 2007.
[3]
C.-C. Chang and C.-J. Lin. Libsvm: a library for support vector machines (version 2.31).
[4]
T. Fawcett. An introduction to roc analysis. Pattern Recogn. Lett., 27(8):861--874, 2006.
[5]
Q. Gan and T. Suel. Improving web spam classifiers using link structure (s). In Proc. AIRWeb '07, pages 17--20, New York, NY, USA, 2007. ACM.
[6]
S. Golder and B. A. Huberman. The structure of collaborative tagging systems. Journal of Information Science, 32(2):198--208, April 2006.
[7]
T. Hammond, T. Hannay, B. Lund, and J. Scott. Social Bookmarking Tools (I): A General Review. D-Lib Magazine, 11(4), April 2005.
[8]
J. Han and M. Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann, September 2000.
[9]
P. Heymann, G. Koutrika, and H. Garcia-Molina. Fighting spam on social web sites: A survey of approaches and future challenges. IEEE Internet Computing, 11(6):36--45, 2007.
[10]
A. Hotho, R. Jäschke, C. Schmitz, and G. Stumme. BibSonomy: A social bookmark and publication sharing system. In CS-TIW '06, Aalborg, Denmark, July 2006. Aalborg University Press.
[11]
A. Hotho, R. Jäschke, C. Schmitz, and G. Stumme. Information retrieval in folksonomies: Search and ranking. In Proc. ESWC '06, pages 411--426, Budva, Montenegro, June 2006. Springer.
[12]
R. Jäschke, L. B. Marinho, A. Hotho, L. Schmidt-Thieme, and G. Stumme. Tag recommendations in folksonomies. In Proc. PKDD '07, Berlin, Heidelberg.
[13]
P. Kolari, T. Finin, and A. Joshi. SVMs for the Blogosphere: Blog Identification and Splog Detection. AAAI Spring Symposium on Computational Approaches to Analyzing Weblogs, 2006.
[14]
P. Kolari, A. Java, T. Finin, T. Oates, and A. Joshi. Detecting Spam Blogs: A Machine Learning Approach. AAAI '06, 2006.
[15]
G. Koutrika, F. A. Effendi, Z. Gyöngyi, P. Heymann, and H. Garcia-Molina. Combating spam in tagging systems. In Proc. AIRWeb '07, pages 57--64, New York, NY, USA, 2007. ACM.
[16]
R. Lambiotte and M. Ausloos. Collaborative tagging as a tripartite network. Lecture Notes in Computer Science, 3993:1114, Dec 2005.
[17]
B. Lund, T. Hammond, M. Flack, and T. Hannay. Social Bookmarking Tools (II): A Case Study - Connotea. D-Lib Magazine, 11(4), April 2005.
[18]
A. Mathes. Folksonomies - Cooperative Classification and Communication Through Shared Metadata, December 2004. http://www.adammathes.com/academic/computermediated- communication/folksonomies.html.
[19]
P. Mika. Ontologies are us: A unified model of social networks and semantics. In Proc. ISWC '05, LNCS, pages 522--536. Springer, 2005.
[20]
G. Mishne, D. Carmel, and R. Lempel. Blocking blog spam with language model disagreement. In Proc. AIRWeb '05, pages 1--6, New York, NY, USA, 2005. ACM.
[21]
A. Ntoulas, M. Najork, M. Manasse, and D. Fetterly. Detecting spam web pages through content analysis. In Proc. WWW '06, pages 83--92, New York, NY, USA, 2006. ACM.
[22]
J. R. Quinlan. C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1993.
[23]
S. Sen, A. M. Lam, Shyong K. and Rashid, D. Cosley, D. Frankowski, J. Osterhouse, M. F. Harper, and J. Riedl. tagging, communities, vocabulary, evolution. In Proc. CSCW '06, pages 181--190, New York, NY, USA, 2006.
[24]
I. H. Witten and E. Frank. Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, October 1999.
[25]
B. Wu and B. D. Davison. Detecting semantic cloaking on the web. In Proc. WWW '06, pages 819--828, New York, NY, USA, 2006.

Cited By

View all
  • (2021)Finding Place in a Design SpaceProceedings of the ACM on Human-Computer Interaction10.1145/34492465:CSCW1(1-30)Online publication date: 22-Apr-2021
  • (2018)FolksonomiesEncyclopedia of Social Network Analysis and Mining10.1007/978-1-4939-7131-2_110(851-855)Online publication date: 12-Jun-2018
  • (2017)Utilizing human processing for fuzzy-based military situation awareness based on social media2017 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE)10.1109/FUZZ-IEEE.2017.8015709(1-6)Online publication date: Jul-2017
  • Show More Cited By

Index Terms

  1. The anti-social tagger: detecting spam in social bookmarking systems

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      AIRWeb '08: Proceedings of the 4th international workshop on Adversarial information retrieval on the web
      April 2008
      81 pages
      ISBN:9781605581590
      DOI:10.1145/1451983
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 22 April 2008

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. folksonomy
      2. spam detection
      3. web 2.0

      Qualifiers

      • Research-article

      Funding Sources

      Conference

      AIRWeb'08

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 10 Oct 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2021)Finding Place in a Design SpaceProceedings of the ACM on Human-Computer Interaction10.1145/34492465:CSCW1(1-30)Online publication date: 22-Apr-2021
      • (2018)FolksonomiesEncyclopedia of Social Network Analysis and Mining10.1007/978-1-4939-7131-2_110(851-855)Online publication date: 12-Jun-2018
      • (2017)Utilizing human processing for fuzzy-based military situation awareness based on social media2017 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE)10.1109/FUZZ-IEEE.2017.8015709(1-6)Online publication date: Jul-2017
      • (2017)FolksonomiesEncyclopedia of Social Network Analysis and Mining10.1007/978-1-4614-7163-9_110-1(1-5)Online publication date: 22-Jun-2017
      • (2016)Resisting tag spam by leveraging implicit user behaviorsProceedings of the VLDB Endowment10.14778/3021924.302193910:3(241-252)Online publication date: 1-Nov-2016
      • (2016)You shall not passProceedings of the 1st International Workshop on Online Safety, Trust and Fraud Prevention10.1145/2915368.2915370(1-6)Online publication date: 22-May-2016
      • (2016)What Users Actually Do in a Social Tagging SystemACM Transactions on the Web10.1145/289682110:2(1-32)Online publication date: 20-May-2016
      • (2016)Detecting Spam and Promoting Campaigns in TwitterACM Transactions on the Web10.1145/284610210:1(1-28)Online publication date: 8-Feb-2016
      • (2016)Semantics discovery in social tagging systemsMultimedia Tools and Applications10.1007/s11042-014-2309-375:1(573-605)Online publication date: 1-Jan-2016
      • (2016)Improvement of collaborative filtering using rating normalizationMultimedia Tools and Applications10.1007/s11042-013-1814-075:9(4957-4968)Online publication date: 1-May-2016
      • Show More Cited By

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media