Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Personalizing Top-k Processing Online in a Peer-to-Peer Social Tagging Network

Published: 01 July 2014 Publication History
  • Get Citation Alerts
  • Abstract

    The rapidly increasing amount of user-generated content in social tagging systems provides a huge source of information. Yet, performing effective search in these systems is very challenging, especially when we seek the most appropriate items that match a potentially ambiguous query. Collaborative filtering-based personalization is appealing in this context, as it limits the search within a small network of participants with similar preferences. Offline personalization, which consists in maintaining, for every user, a network of similar participants based on their tagging behaviors, is effective for queries that are close to the querying user’s tagging profile but performs poorly when the queries, reflecting emerging interests, have little correlation with the querying user’s profile.
    We present P2TK2, the first protocol to personalize query processing in social tagging systems online. P2TK2 is completely decentralized, and this design choice stems from the observation that the evolving social tagging systems naturally resemble P2P systems where users are both producers and consumers. This design exploits the power of the crowd and prevents any central authority from controlling personal information. P2TK2 is gossip-based and probabilistic. It dynamically associates each user with social acquaintances sharing similar tagging behaviors. Appropriate users for answering a query are discovered at query time with the help of social acquaintances. This is achieved according to the hybrid interest of the querying user, taking into account both her tagging behavior and her query. Results are iteratively refined and returned to the querying user. We evaluate P2TK2 on CiteULike and Delicious traces involving up to 50,000 users. We highlight the advantages of online personalization compared to offline personalization, as well as its efficiency, scalability, and inherent ability to cope with user departure and interest evolution in P2P systems.

    References

    [1]
    Sihem Amer-Yahia, Michael Benedikt, Laks V. S. Lakshmanan, and Julia Stoyanovich. 2008. Efficient network aware search in collaborative tagging sites. Proc. VLDB Endow. 1, 1, 710--721.
    [2]
    Ricardo Baeza-Yates, Aristides Gionis, Flavio P. Junqueira, Vanessa Murdock, Vassilis Plachouras, and Fabrizio Silvestri. 2008. Design trade-offs for search engine caching. ACM Trans. Web 2, 4, Article 20, 28.
    [3]
    Xiao Bai, Marin Bertier, Rachid Guerraoui, Anne-Marie Kermarrec, and Vincent Leroy. 2010. Gossiping personalized queries. In Proceedings of the 13th International Conference on Extending Database Technology (EDBT’10). ACM, New York, NY, 87--98.
    [4]
    Xiao Bai, Rachid Guerraoui, Anne-Marie Kermarrec, and Vincent Leroy. 2011. Collaborative personalized top-k processing. ACM Trans. Datab. Syst. 36, 4, Article 26.
    [5]
    Xiao Bai and Flavio P. Junqueira. 2012. Online result cache invalidation for real-time Web search. In Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’12). ACM, New York, NY, 641--650.
    [6]
    Robert M. Bell and Yehuda Koren. 2007. Improved neighborhood-based collaborative filtering. In Proceedings of the 1st KDDCup’07.
    [7]
    Matthias Bender, Tom Crecelius, Mouna Kacimi, Sebastian Michel, Josiane Xavier Parreira, and Gerhard Weikum. 2007. Peer-to-peer information search: Semantic, social, or spiritual? IEEE Data Eng. Bull. 30, 2, 51--60.
    [8]
    Pei Cao and Sandy Irani. 1997. Cost-aware WWW proxy caching algorithms. In Proceedings of the USENIX Symposium on Internet Technologies and Systems on USENIX Symposium on Internet Technologies and Systems (USITS’97). USENIX Association, Berkeley, CA, 193--206.
    [9]
    Thomas M. Cover and Joy A. Thomas. 1991. Elements of Information Theory. Wiley-Interscience, New York, NY.
    [10]
    Zhicheng Dou, Ruihua Song, and Ji-Rong Wen. 2007. A large-scale evaluation and analysis of personalized search strategies. In Proceedings of the 16th International Conference on World Wide Web (WWW’07). ACM, New York, NY, 581--590.
    [11]
    Patrick T. Eugster, Rachid Guerraoui, Anne-Marie Kermarrec, and Laurent Massoulié. 2004. Epidemic information dissemination in distributed systems. Computer 37, 5, 60--67.
    [12]
    Ronald Fagin. 2002. Combining fuzzy information: An overview. SIGMOD Rec. 31, 2, 109--118.
    [13]
    Paul Heymann, Georgia Koutrika, and Hector Garcia-Molina. 2008. Can social bookmarking improve Web search? In Proceedings of the International Conference on Web Search and Web Data Mining (WSDM’08). ACM, New York, NY, 195--206.
    [14]
    M. Jelasity, W. Kowalczyk, and M. van Steen. 2004. An approach to massively distributed aggregate computing on peer-to-peer networks. In Proceedings of the 12th Euromicro Conference on Parallel, Distributed and Network-Based Processing (PDP’04). 200--207.
    [15]
    Márk Jelasity, Spyros Voulgaris, Rachid Guerraoui, Anne-Marie Kermarrec, and Maarten van Steen. 2007. Gossip-based peer sampling. ACM Trans. Comput. Syst. 25, 3, Article 8.
    [16]
    Michael S. Lew, Nicu Sebe, Chabane Djeraba, and Ramesh Jain. 2006. Content-based multimedia information retrieval: State of the art and challenges. ACM Trans. Multimedia Comput. Commun. Appl. 2, 1, 1--19.
    [17]
    Xiaohui Long and Torsten Suel. 2005. Three-level caching for efficient query processing in large Web search engines. In Proceedings of the 14th International Conference on World Wide Web (WWW’05). ACM, New York, NY, 257--266.
    [18]
    Andreas Loupasakis, Nikos Ntarmos, and Peter Triantafillou. 2011. eXO: Decentralized autonomous scalable social networking. In Proceedings of the 5th Biennial Conference on Innovative Data Systems Research (CIDR’11). 85--95.
    [19]
    Xin Luo, Yuanxin Ouyang, and Zhang Xiong. 2012. Improving neighborhood based collaborative filtering via integrated folksonomy information. Pattern Recogn. Lett. 33, 3, 263--270.
    [20]
    Alan Mislove, Krishna P. Gummadi, and Peter Druschel. 2006. Exploiting social networks for Internet search. In Proceedings of the 5th Workshop on Hot Topics in Networks (HotNets’06). 79--84.
    [21]
    Alberto Montresor and Márk Jelasity. 2009. PeerSim: A scalable P2P simulator. In Proceedings of the 9th International Conference on Peer-to-Peer Computing. Henning Schulzrinne, Karl Aberer, and Anwitaman Datta Eds., IEEE, 99--100.
    [22]
    Michael G. Noll and Christoph Meinel. 2007. Web search personalization via social bookmarking and tagging. In Proceedings of the 6th International Semantic Web Conference and 2nd Asian Semantic Web Conference (ISWC’07/ASWC’07). Springer-Verlag, Berlin, Heidelberg, 367--380.
    [23]
    G. Salton, A. Wong, and C. S. Yang. 1975. A vector space model for automatic indexing. Commun. ACM 18, 11, 613--620.
    [24]
    Ralf Schenkel, Tom Crecelius, Mouna Kacimi, Sebastian Michel, Thomas Neumann, Josiane X. Parreira, and Gerhard Weikum. 2008. Efficient top-k querying over social-tagging networks. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’08). ACM, New York, NY, USA, 523--530.
    [25]
    Micro Speretta and Susan Gauch. 2005. Personalized search based on user search histories. In Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence (WI’05). IEEE Computer Society, Los Alamitos, CA, 622--628.
    [26]
    Julia Stoyanovich, Sihem Amer-Yahia, Cameron Marlow, and Cong Yu. 2008. Leveraging tagging to model user interests in del.icio.us. In Proceedings of the AAAI Social Information Spring Symposium (AAAI-SIP’08). 104--109.
    [27]
    Kazunari Sugiyama, Kenji Hatano, and Masatoshi Yoshikawa. 2004. Adaptive Web search based on user profile constructed without any effort from users. In Proceedings of the 13th International Conference on World Wide Web (WWW’04). ACM, New York, NY, 675--684.
    [28]
    Jian-Tao Sun, Hua-Jun Zeng, Huan Liu, Yuchang Lu, and Zheng Chen. 2005. CubeSVD: A novel approach to personalized Web search. In Proceedings of the 14th International Conference on World Wide Web (WWW’05). ACM, New York, NY, 382--390.
    [29]
    Jaime Teevan, Meredith Ringel Morris, and Steve Bush. 2009. Discovering and using groups to improve personalized search. In Proceedings of the 2nd ACM International Conference on Web Search and Data Mining (WSDM’09). ACM, New York, NY, 15--24.
    [30]
    Yohannes Tsegay, Andrew Turpin, and Justin Zobel. 2007. Dynamic index pruning for effective caching. In Proceedings of the 16th ACM Conference on Information and Knowledge Management (CIKM’07). ACM, New York, NY, 987--990.
    [31]
    Spyros Voulgaris and Maarten van Steen. 2005. Epidemic-style management of semantic overlays for content-based searching. In Proceedings of the 11th International Euro-Par Conference on Parallel Processing (Euro-Par’05). Lecture Notes in Computer Science, vol. 3648. Springer-Verlag, Berlin, Heidelberg, 1143--1152.
    [32]
    Neal E. Young. 1998. On-line file caching. In Proceedings of the 9th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’98). SIAM, Philadelphia, PA, 82--86.

    Cited By

    View all
    • (2017)Subscription Covering for Relevance-Based Filtering in Content-Based Publish/Subscribe Systems2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS)10.1109/ICDCS.2017.184(2039-2044)Online publication date: Jun-2017

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Internet Technology
    ACM Transactions on Internet Technology  Volume 13, Issue 4
    July 2014
    89 pages
    ISSN:1533-5399
    EISSN:1557-6051
    DOI:10.1145/2656491
    • Editor:
    • Munindar P. Singh
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 July 2014
    Accepted: 01 December 2013
    Revised: 01 November 2013
    Received: 01 June 2012
    Published in TOIT Volume 13, Issue 4

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Peer-to-peer systems
    2. gossip
    3. online
    4. personalization
    5. social tagging networks
    6. top-k processing

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 10 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2017)Subscription Covering for Relevance-Based Filtering in Content-Based Publish/Subscribe Systems2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS)10.1109/ICDCS.2017.184(2039-2044)Online publication date: Jun-2017

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media