Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1458469.1458483acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Cost-effective spam detection in p2p file-sharing systems

Published: 30 October 2008 Publication History
  • Get Citation Alerts
  • Abstract

    Spam is highly pervasive in P2P file-sharing systems and is difficult to detect automatically before actually downloading a file due to the insufficient and biased description of a file returned to a client as a query result. To alleviate this problem, we propose probing technique to collect more complete feature information of query results from the network and apply feature-based ranking for automatically detecting spam in P2P query result sets. Furthermore, we examine the tradeoff between the spam detection performance and the network cost. Different ways of probing are explored to reduce the network cost. Experimental results show that the proposed techniques successfully decrease the amount of spam by 9% in the top-200 results and by 92% in the top-20 results with reasonable cost.

    References

    [1]
    S. Shin, J. Jung, H. Balakrishnan. Malware Prevalence in the KaZaA File-Sharing. Network. In Proc. of the Internet Measurement Conference (IMC), ACM 2006.
    [2]
    N. Christin, A. S. Weigend and J. Chuang. Content Availability, Pollution and Poisoning in Peer-to-Peer File Sharing Networks. In ACM E-Commerce Conference (EC'05), 2005.
    [3]
    J. Liang, R. Kumar, Y. Xi and K. Ross. Pollution in P2P File Sharing Systems. In Proc. of INFOCOM'05, May 2005.
    [4]
    R. Hashemi, M. Bahar, K. D. Tift, and H. Nguyen. Spam Detection: A Syntax and Semantic-based Approach. In proc. International Conf. on Information and Knowledge Engineering (IKE'06), Las Vegas, Nevada, June 2006.
    [5]
    P. A. Chirita, J. Diederich, and W. Nejdl. MailRank: Using ranking for spam detection. In proc. CIKM'05, Bremen, Germany, 2005.
    [6]
    Qingqing Gan and Torsten Suel. Improving Web Spam Classifiers Using Link Structure. In Third International Workshop on Adversarial Information Retrieval on the Web (AIRWeb'07), Banff, AB, Canada, May 2007.
    [7]
    A. Ntoulas, M. Najork, M. Manasse, D. Fetterly. Detecting spam web pages through content analysis. In Proc. of WWW'06.
    [8]
    J. Liang, N. Naoumov, K. Ross. The Index Poisoning Attack in P2P File Sharing Systems. In proc. of INFOCOM, Barcelona, Spain, Apr. 2006
    [9]
    D. Jia, W. G. Yee, O. Frieder. Spam Characterization and Detection in Peer-to-Peer File-Sharing Systems. In Proc. ACM 17th Conference on Information and Knowledge Management (CIKM'08), Napa Valley, California, Oct. 2008.
    [10]
    Limewire. www.limewire.org
    [11]
    D. Dutta, A. Goel, R. Govindan, H. Zhang, The Design of A Distributed Rating Scheme for Peer-to-peer Systems, In Proc. of Workshop on the Economics of Peer-to-Peer Systems, 2003
    [12]
    Sepandar D. Kamvar, Mario T. Schlosser, and Hector Garcia-Molina. The EigenTrust Algorithm for Reputation Management in P2P Networks. In Proc. of the Twelfth International World Wide Web (WWW) Conference, May, 2003.
    [13]
    Kevin Walsh, Emin Gun Sirer. Experience with an Object Reputation System for Peer-to-Peer Filesharing. In 3rd Symposium on NSDI, 2006.
    [14]
    L. T. Nguyen, W. G. Yee, D. Jia, and O. Frieder, A Tool for Information Retrieval Research in Peer-to-Peer File Sharing Systems, In Proc. IEEE ICDE, 2007.
    [15]
    D. Dumitriu, E. Knightly, A. Kuzmanovic, I. Stoica and W. Zwaenepoel. Denial-of-Service Resilience in Peer-to-Peer File Sharing Systems. In Proc. Of ACM SIGMETRICS'05, Banff, AB, Canada, June 2005.
    [16]
    Runfang Zhou and Kai Hwang. Gossip-based Reputation Aggregation for Unstructured Peer-to-Peer Networks. 21th IEEE International Parallel & Distributed Processing Symposium (IPDPS'07), Los Angeles, March 26-30, 2007
    [17]
    Bitzi website. www.Bitzi.com
    [18]
    Google Duplicate Content Web Site. http://www.google.com/support/webmasters/bin/answer.py?answer=66359. Accessed May 25, 2008.
    [19]
    M. Nilsson. Id3v2 web site. www.id3.org.
    [20]
    D. Grossman and O. Frieder. Information Retrieval: Algorithms and Heuristics. Springer, second edition, 2004.
    [21]
    Steve Webb, J. Caverlee, and C. Pu. Characterizing Web Spam Using Content and HTTP Session Analysis. In Proc. 4th Conf. on Email and Anti-Spam (CEAS), 2007.
    [22]
    {22 J. Macguire. Hitting P2P Users Where It Hurts, In Wired, Jan. 13, 2003. http://www.wired.com/entertainment/music/news/2003/01/57112
    [23]
    Googlebombing 'failure.' Official Google Blog. Sept. 16, 2005. http://googleblog.blogspot.com/2005/09/googlebombing-failure.html
    [24]
    http://wiki.limewire.org/index.php?title=Junk_Filter
    [25]
    K. Svore, Q. Wu, C. J. C. Burges and A. Raman. Improving Web spam classification using Rank-time features. In Proc. AIRWeb workshop in WWW, 2007
    [26]
    http://en.wikipedia.org/wiki/Web_scraping#References
    [27]
    J. Caverlee and L. Liu. Countering Web Spam with Credibility-Based Link Analysis. In Proc. the 26th ACM Symposium on Principles of Distributed Computing (PODC), 2007.
    [28]
    The Gnutella protocol specification v0.6. http://rfc-gnutella.sourceforge.net.
    [29]
    D. Jia, W. G. Yee, L. T. Nguyen, O. Frieder. Distributed, Automatic File Description Tuning in P2P File-Sharing Systems. Springer Journal of Peer-to-Peer Networking and Applications, 2008.
    [30]
    W. G. Yee, L. T. Nguyen, and O. Frieder. Improved Result Ranking in P2P File-Sharing Systems by Probing for Metadata. In Proc. IEEE NCA, 2006.

    Cited By

    View all
    • (2012)Economic Evaluation of Interactive Audio Media for Securing Internet ServicesGlobal Security, Safety and Sustainability & e-Democracy10.1007/978-3-642-33448-1_7(46-53)Online publication date: 2012
    • (2009)Workshop on large-scale distributed systems for information retrievalACM SIGIR Forum10.1145/1670598.167060643:1(42-48)Online publication date: 25-Jun-2009

    Index Terms

    1. Cost-effective spam detection in p2p file-sharing systems

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      LSDS-IR '08: Proceedings of the 2008 ACM workshop on Large-Scale distributed systems for information retrieval
      October 2008
      90 pages
      ISBN:9781605582542
      DOI:10.1145/1458469
      • Program Chairs:
      • Sebastian Michel,
      • Gleb Skobeltsyn,
      • Wai Gen Yee
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 30 October 2008

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. detection
      2. p2p search
      3. spam

      Qualifiers

      • Research-article

      Conference

      CIKM08
      CIKM08: Conference on Information and Knowledge Management
      October 30, 2008
      California, Napa Valley, USA

      Acceptance Rates

      Overall Acceptance Rate 3 of 5 submissions, 60%

      Upcoming Conference

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 09 Aug 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2012)Economic Evaluation of Interactive Audio Media for Securing Internet ServicesGlobal Security, Safety and Sustainability & e-Democracy10.1007/978-3-642-33448-1_7(46-53)Online publication date: 2012
      • (2009)Workshop on large-scale distributed systems for information retrievalACM SIGIR Forum10.1145/1670598.167060643:1(42-48)Online publication date: 25-Jun-2009

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media