research-article

The anti-social tagger: detecting spam in social bookmarking systems

Authors:

Christoph Schmitz,

Gerd StummeAuthors Info & Claims

AIRWeb '08: Proceedings of the 4th international workshop on Adversarial information retrieval on the web

Pages 61 - 68

https://doi.org/10.1145/1451983.1451998

Published: 22 April 2008 Publication History

Abstract

The annotation of web sites in social bookmarking systems has become a popular way to manage and find information on the web. The community structure of such systems attracts spammers: recent post pages, popular pages or specific tag pages can be manipulated easily. As a result, searching or tracking recent posts does not deliver quality results annotated in the community, but rather unsolicited, often commercial, web sites. To retain the benefits of sharing one's web content, spam-fighting mechanisms that can face the flexible strategies of spammers need to be developed.

A classical approach in machine learning is to determine relevant features that describe the system's users, train different classifiers with the selected features and choose the one with the most promising evaluation results. In this paper we will transfer this approach to a social bookmarking setting to identify spammers. We will present features considering the topological, semantic and profile-based information which people make public when using the system. The dataset used is a snapshot of the social bookmarking system BibSonomy and was built over the course of several months when cleaning the system from spam. Based on our features, we will learn a large set of different classification models and compare their performance. Our results represent the groundwork for a first application in BibSonomy and for the building of more elaborate spam detection mechanisms.

References

[1]

C. Castillo, D. Donato, A. Gionis, V. Murdock, and F. Silvestri. Know your neighbors: web spam detection using the web topology. In Proc. SIGIR '07, pages 423--430, New York, NY, USA, 2007. ACM.

Digital Library

[2]

C. Cattuto, C. Schmitz, A. Baldassarri, V. D. P. Servedio, V. Loreto, A. Hotho, M. Grahl, and G. Stumme. Network properties of folksonomies. AI Communications, 20(4):245--262, 2007.

Digital Library

[3]

C.-C. Chang and C.-J. Lin. Libsvm: a library for support vector machines (version 2.31).

Digital Library

[4]

T. Fawcett. An introduction to roc analysis. Pattern Recogn. Lett., 27(8):861--874, 2006.

Digital Library

[5]

Q. Gan and T. Suel. Improving web spam classifiers using link structure (s). In Proc. AIRWeb '07, pages 17--20, New York, NY, USA, 2007. ACM.

Digital Library

[6]

S. Golder and B. A. Huberman. The structure of collaborative tagging systems. Journal of Information Science, 32(2):198--208, April 2006.

Digital Library

[7]

T. Hammond, T. Hannay, B. Lund, and J. Scott. Social Bookmarking Tools (I): A General Review. D-Lib Magazine, 11(4), April 2005.

[8]

J. Han and M. Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann, September 2000.

Digital Library

[9]

P. Heymann, G. Koutrika, and H. Garcia-Molina. Fighting spam on social web sites: A survey of approaches and future challenges. IEEE Internet Computing, 11(6):36--45, 2007.

Digital Library

[10]

A. Hotho, R. Jäschke, C. Schmitz, and G. Stumme. BibSonomy: A social bookmark and publication sharing system. In CS-TIW '06, Aalborg, Denmark, July 2006. Aalborg University Press.

[11]

A. Hotho, R. Jäschke, C. Schmitz, and G. Stumme. Information retrieval in folksonomies: Search and ranking. In Proc. ESWC '06, pages 411--426, Budva, Montenegro, June 2006. Springer.

Digital Library

[12]

R. Jäschke, L. B. Marinho, A. Hotho, L. Schmidt-Thieme, and G. Stumme. Tag recommendations in folksonomies. In Proc. PKDD '07, Berlin, Heidelberg.

[13]

P. Kolari, T. Finin, and A. Joshi. SVMs for the Blogosphere: Blog Identification and Splog Detection. AAAI Spring Symposium on Computational Approaches to Analyzing Weblogs, 2006.

[14]

P. Kolari, A. Java, T. Finin, T. Oates, and A. Joshi. Detecting Spam Blogs: A Machine Learning Approach. AAAI '06, 2006.

Digital Library

[15]

G. Koutrika, F. A. Effendi, Z. Gyöngyi, P. Heymann, and H. Garcia-Molina. Combating spam in tagging systems. In Proc. AIRWeb '07, pages 57--64, New York, NY, USA, 2007. ACM.

Digital Library

[16]

R. Lambiotte and M. Ausloos. Collaborative tagging as a tripartite network. Lecture Notes in Computer Science, 3993:1114, Dec 2005.

Digital Library

[17]

B. Lund, T. Hammond, M. Flack, and T. Hannay. Social Bookmarking Tools (II): A Case Study - Connotea. D-Lib Magazine, 11(4), April 2005.

[18]

A. Mathes. Folksonomies - Cooperative Classification and Communication Through Shared Metadata, December 2004. http://www.adammathes.com/academic/computermediated- communication/folksonomies.html.

[19]

P. Mika. Ontologies are us: A unified model of social networks and semantics. In Proc. ISWC '05, LNCS, pages 522--536. Springer, 2005.

Digital Library

[20]

G. Mishne, D. Carmel, and R. Lempel. Blocking blog spam with language model disagreement. In Proc. AIRWeb '05, pages 1--6, New York, NY, USA, 2005. ACM.

[21]

A. Ntoulas, M. Najork, M. Manasse, and D. Fetterly. Detecting spam web pages through content analysis. In Proc. WWW '06, pages 83--92, New York, NY, USA, 2006. ACM.

Digital Library

[22]

J. R. Quinlan. C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1993.

Digital Library

[23]

S. Sen, A. M. Lam, Shyong K. and Rashid, D. Cosley, D. Frankowski, J. Osterhouse, M. F. Harper, and J. Riedl. tagging, communities, vocabulary, evolution. In Proc. CSCW '06, pages 181--190, New York, NY, USA, 2006.

Digital Library

[24]

I. H. Witten and E. Frank. Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, October 1999.

Digital Library

[25]

B. Wu and B. D. Davison. Detecting semantic cloaking on the web. In Proc. WWW '06, pages 819--828, New York, NY, USA, 2006.

Digital Library

Cited By

MacNeil SDing ZBoone AGrubbs ADow S(2021)Finding Place in a Design SpaceProceedings of the ACM on Human-Computer Interaction10.1145/34492465:CSCW1(1-30)Online publication date: 22-Apr-2021
https://dl.acm.org/doi/10.1145/3449246
Singer PNiebler THotho AStrohmaier M(2018)FolksonomiesEncyclopedia of Social Network Analysis and Mining10.1007/978-1-4939-7131-2_110(851-855)Online publication date: 12-Jun-2018
https://doi.org/10.1007/978-1-4939-7131-2_110
Miao STang Z(2017)Utilizing human processing for fuzzy-based military situation awareness based on social media2017 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE)10.1109/FUZZ-IEEE.2017.8015709(1-6)Online publication date: Jul-2017
https://doi.org/10.1109/FUZZ-IEEE.2017.8015709
Show More Cited By

Index Terms

The anti-social tagger: detecting spam in social bookmarking systems
1. Information systems
  1. Information retrieval
  2. Information storage systems

Recommendations

Social Spammer and Spam Message Co-Detection in Microblogging with Social Context Regularization
CIKM '15: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management

The popularity of microblogging platforms, such as Twitter, makes them important for information dissemination and sharing. However, they are also recognized as ideal places by spammers to conduct social spamming. Massive social spammers and spam ...
Co-detecting social spammers and spam messages in microblogging via exploiting social contexts

Microblogging websites, such as Twitter, have become popular platforms for information dissemination and sharing. However, they are also full of spammers who frequently conduct social spamming on them. Massive social spammers and spam messages heavily ...
Detecting blog spam hashtags using topic modeling
ICEC '16: Proceedings of the 18th Annual International Conference on Electronic Commerce: e-Commerce in Smart connected World

Tremendous amounts of data are generated daily. Accordingly, unstructured text data that is distributed through news, blogs, and social media has gained much attention from many researchers as this data contains abundant information about various ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

AIRWeb '08: Proceedings of the 4th international workshop on Adversarial information retrieval on the web

April 2008

81 pages

ISBN:9781605581590

DOI:10.1145/1451983

Editors:
Carlos Castillo
Yahoo! Research
,
Kumar Chellapilla
Microsoft Live Labs
,
Dennis Fetterly
Microsoft Research

Copyright © 2008 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 April 2008

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Seventh Framework Programme

Conference

AIRWeb'08

AIRWeb'08: AIRWeb '08, Third International Workshop on Adversarial Information Retrieval on the Web

April 22, 2008

Beijing, China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

61
Total Citations
View Citations
547
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 10 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

MacNeil SDing ZBoone AGrubbs ADow S(2021)Finding Place in a Design SpaceProceedings of the ACM on Human-Computer Interaction10.1145/34492465:CSCW1(1-30)Online publication date: 22-Apr-2021
https://dl.acm.org/doi/10.1145/3449246
Singer PNiebler THotho AStrohmaier M(2018)FolksonomiesEncyclopedia of Social Network Analysis and Mining10.1007/978-1-4939-7131-2_110(851-855)Online publication date: 12-Jun-2018
https://doi.org/10.1007/978-1-4939-7131-2_110
Miao STang Z(2017)Utilizing human processing for fuzzy-based military situation awareness based on social media2017 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE)10.1109/FUZZ-IEEE.2017.8015709(1-6)Online publication date: Jul-2017
https://doi.org/10.1109/FUZZ-IEEE.2017.8015709
Singer PNiebler THotho AStrohmaier M(2017)FolksonomiesEncyclopedia of Social Network Analysis and Mining10.1007/978-1-4614-7163-9_110-1(1-5)Online publication date: 22-Jun-2017
https://doi.org/10.1007/978-1-4614-7163-9_110-1
Zhai ELi ZLi ZWu FChen G(2016)Resisting tag spam by leveraging implicit user behaviorsProceedings of the VLDB Endowment10.14778/3021924.302193910:3(241-252)Online publication date: 1-Nov-2016
https://dl.acm.org/doi/10.14778/3021924.3021939
Kater CJäschke RFisichella M(2016)You shall not passProceedings of the 1st International Workshop on Online Safety, Trust and Fraud Prevention10.1145/2915368.2915370(1-6)Online publication date: 22-May-2016
https://dl.acm.org/doi/10.1145/2915368.2915370
Doerfel SZoller DSinger PNiebler THotho AStrohmaier M(2016)What Users Actually Do in a Social Tagging SystemACM Transactions on the Web10.1145/289682110:2(1-32)Online publication date: 20-May-2016
https://dl.acm.org/doi/10.1145/2896821
Zhang XLi ZZhu SLiang W(2016)Detecting Spam and Promoting Campaigns in TwitterACM Transactions on the Web10.1145/284610210:1(1-28)Online publication date: 8-Feb-2016
https://dl.acm.org/doi/10.1145/2846102
Jabeen FKhusro SMajid ARauf A(2016)Semantics discovery in social tagging systemsMultimedia Tools and Applications10.1007/s11042-014-2309-375:1(573-605)Online publication date: 1-Jan-2016
https://dl.acm.org/doi/10.1007/s11042-014-2309-3
Kim SSung KPark CKim S(2016)Improvement of collaborative filtering using rating normalizationMultimedia Tools and Applications10.1007/s11042-013-1814-075:9(4957-4968)Online publication date: 1-May-2016
https://dl.acm.org/doi/10.1007/s11042-013-1814-0
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents