Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3485447.3511959acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article
Open access

Efficient and Effective Similarity Search over Bipartite Graphs

Published: 25 April 2022 Publication History

Abstract

Similarity search over a bipartite graph aims to retrieve from the graph the nodes that are similar to each other, which finds applications in various fields such as online advertising, recommender systems etc. Existing similarity measures either (i) overlook the unique properties of bipartite graphs, or (ii) fail to capture high-order information between nodes accurately, leading to suboptimal result quality. Recently, Hidden Personalized PageRank (HPP) is applied to this problem and found to be more effective compared with prior similarity measures. However, existing solutions for HPP computation incur significant computational costs, rendering it inefficient especially on large graphs.
In this paper, we first identify an inherent drawback of HPP and overcome it by proposing bidirectional HPP (BHPP). Then, we formulate similarity search over bipartite graphs as the problem of approximate BHPP computation, and present an efficient solution Approx-BHPP. Specifically, Approx-BHPP offers rigorous theoretical accuracy guarantees with optimal computational complexity by combining deterministic graph traversal with matrix operations in an optimized and non-trivial way. Moreover, our solution achieves significant gain in practical efficiency due to several carefully-designed optimizations. Extensive experiments, comparing BHPP against 8 existing similarity measures over 7 real bipartite graphs, demonstrate the effectiveness of BHPP on query rewriting and item recommendation. Moreover, Approx-BHPP outperforms baseline solutions often by up to orders of magnitude in terms of computational time on both small and large datasets.

References

[1]
2003. MovieLens 1M Dataset. Retrieved Oct, 2021 from https://grouplens.org/datasets/movielens
[2]
2006. AOL Query Logs. Retrieved Oct, 2021 from http://www.cim.mcgill.ca/~dudek/206/Logs/AOL-user-ct-collection
[3]
2010. Last.fm Dataset Version 1.2. Retrieved Oct, 2021 from http://ocelma.net/MusicRecommendationDataset/lastfm-360K.html
[4]
2012. KDD Cup 2012, Track 2. Retrieved Oct, 2021 from https://www.kaggle.com/c/kddcup2012-track2
[5]
2014. Amazon product data. Retrieved Oct, 2021 from https://jmcauley.ucsd.edu/data/amazon
[6]
2015. Avito Context Ad Clicks. Retrieved Oct, 2021 from https://www.kaggle.com/c/avito-context-ad-clicks/data
[7]
Lada A Adamic and Eytan Adar. 2003. Friends and neighbors on the web. Social networks (2003), 211–230.
[8]
Tasos Anastasakos, Dustin Hillard, Sanjay Kshetramade, and Hema Raghavan. 2009. A collaborative filtering approach to ad recommendation using the query-ad click graph. In CIKM. 1927–1930.
[9]
Reid Andersen, Christian Borgs, Jennifer Chayes, John Hopcroft, Vahab Mirrokni, and Shang-Hua Teng. 2008. Local computation of pagerank contributions. Internet Mathematics(2008), 23–45.
[10]
Reid Andersen, Fan Chung, and Kevin Lang. 2006. Local graph partitioning using pagerank vectors. In FOCS. 475–486.
[11]
Ioannis Antonellis, Hector Garcia Molina, and Chi Chao Chang. 2008. Simrank++: Query Rewriting through Link Analysis of the Click Graph. In PVLDB. 408–421.
[12]
Konstantin Avrachenkov, Nelly Litvak, Danil Nemirovsky, and Natalia Osipova. 2007. Monte Carlo methods in PageRank computation: When one iteration is sufficient. SINUM (2007), 890–904.
[13]
Bahman Bahmani, Abdur Chowdhury, and Ashish Goel. 2010. Fast Incremental and Personalized PageRank. PVLDB (2010).
[14]
Alejandro Bellogin, Pablo Castells, and Ivan Cantador. 2011. Precision-oriented evaluation of recommender systems: an algorithmic comparison. In RecSys. 333–336.
[15]
Pavel Berkhin. 2005. A survey on PageRank computing. Internet mathematics(2005), 73–120.
[16]
Pavel Berkhin. 2006. Bookmark-coloring algorithm for personalized pagerank computing. Internet Mathematics(2006), 41–62.
[17]
O. Celma. 2010. Music Recommendation and Discovery in the Long Tail. Springer.
[18]
Soumen Chakrabarti. 2007. Dynamic personalized pagerank in entity-relation graphs. In WWW. 571–580.
[19]
Hongbo Deng, Michael R Lyu, and Irwin King. 2009. A generalized co-hits algorithm and its application to bipartite graphs. In SIGKDD. 239–248.
[20]
Todd Z DeSantis, Keith Keller, Ulas Karaoz, Alexander V Alekseyenko, Navjeet NS Singh, Eoin L Brodie, Zhiheng Pei, Gary L Andersen, and Niels Larsen. 2011. Simrank: Rapid and sensitive general-purpose k-mer search tool. BMC ecology (2011), 1–8.
[21]
Mukund Deshpande and George Karypis. 2004. Item-based top-n recommendation algorithms. TOIS (2004), 143–177.
[22]
Prasenjit Dey, Kunal Goel, and Rahul Agrawal. 2020. P-Simrank: Extending Simrank to Scale-free bipartite networks. In The Web Conference. 3084–3090.
[23]
Alessandro Epasto, Jon Feldman, Silvio Lattanzi, Stefano Leonardi, and Vahab Mirrokni. 2014. Reduce and aggregate: similarity ranking in multi-categorical bipartite graphs. In WWW. 349–360.
[24]
Dániel Fogaras, Balázs Rácz, Károly Csalogány, and Tamás Sarlós. 2005. Towards scaling fully personalized pagerank: Algorithms, lower bounds, and experiments. Internet Mathematics(2005), 333–358.
[25]
Yasuhiro Fujiwara, Makoto Nakatsuji, Makoto Onizuka, and Masaru Kitsuregawa. 2012. Fast and exact top-k search for random walk with restart. PVLDB (2012), 442–453.
[26]
Ming Gao, Leihui Chen, Xiangnan He, and Aoying Zhou. 2018. BiNE: Bipartite Network Embedding. 715–724.
[27]
Tao Guo, Xin Cao, Gao Cong, Jiaheng Lu, and Xuemin Lin. 2017. Distributed algorithms on exact personalized pagerank. In SIGMOD. 479–494.
[28]
Wentian Guo, Yuchen Li, Mo Sha, and Kian-Lee Tan. 2017. Parallel personalized pagerank on dynamic graphs. PVLDB (2017), 93–106.
[29]
F Maxwell Harper and Joseph A Konstan. 2015. The movielens datasets: History and context. TIIS 5, 4 (2015), 1–19.
[30]
Taher H Haveliwala. 2002. Topic-sensitive PageRank. In WWW.
[31]
Ruining He and Julian McAuley. 2016. Ups and downs: Modeling the visual evolution of fashion trends with one-class collaborative filtering. In The WebConf. 507–517.
[32]
Ruining He and Julian McAuley. 2016. VBPR: visual Bayesian Personalized Ranking from implicit feedback. In AAAI. 144–150.
[33]
Jonathan L Herlocker, Joseph A Konstan, Loren G Terveen, and John T Riedl. 2004. Evaluating collaborative filtering recommender systems. TOIS 22, 1 (2004), 5–53.
[34]
Guanhao Hou, Xingguang Chen, Sibo Wang, and Zhewei Wei. 2021. Massively Parallel Algorithms for Personalized PageRank. PVLDB (2021), 1668–1680.
[35]
Paul Jaccard. 1912. The distribution of the flora in the alpine zone. 1. New phytologist (1912), 37–50.
[36]
Kalervo Järvelin and Jaana Kekäläinen. 2017. IR evaluation methods for retrieving highly relevant documents. In SIGIR. 243–250.
[37]
Glen Jeh and Jennifer Widom. 2002. Simrank: a measure of structural-context similarity. In SIGKDD. 538–543.
[38]
Glen Jeh and Jennifer Widom. 2003. Scaling personalized web search. In WWW. 271–279.
[39]
Jinhong Jung, Namyong Park, Sael Lee, and U Kang. 2017. Bepi: Fast and memory-efficient method for billion-scale random walk with restart. In SIGMOD. 789–804.
[40]
Sepandar D Kamvar, Taher H Haveliwala, Christopher D Manning, and Gene H Golub. 2003. Extrapolation methods for accelerating PageRank computations. In WWW. 261–270.
[41]
Leo Katz. 1953. A new status index derived from sociometric analysis. Psychometrika (1953), 39–43.
[42]
Jon M Kleinberg 1998. Authoritative sources in a hyperlinked environment. In SODA. 668–677.
[43]
Yehuda Koren. 2008. Factorization meets the neighborhood: a multifaceted collaborative filtering model. In SIGKDD. 426–434.
[44]
Lina Li, Cuiping Li, Hong Chen, and Xiaoyong Du. 2013. Mapreduce-based SimRank computation and its application in social recommender system. In BigData Congress. 133–140.
[45]
Dandan Lin, Raymond Chi-Wing Wong, Min Xie, and Victor Junqiu Wei. 2020. Index-Free Approach with Theoretical Guarantee for Efficient Random Walk with Restart Query. In ICDE. 913–924.
[46]
Wenqing Lin. 2019. Distributed algorithms for fully personalized pagerank on large graphs. In WWW. 1084–1094.
[47]
Qin Liu, Zhenguo Li, John CS Lui, and Jiefeng Cheng. 2016. Powerwalk: Scalable personalized pagerank via random walks with vertex-centric decomposition. In CIKM. 195–204.
[48]
Peter Lofgren, Siddhartha Banerjee, and Ashish Goel. 2015. Bidirectional PageRank Estimation: From Average-Case to Worst-Case. In WAW. 164–176.
[49]
Peter Lofgren, Siddhartha Banerjee, and Ashish Goel. 2016. Personalized pagerank estimation and search: A bidirectional approach. In WSDM. 163–172.
[50]
Peter Lofgren and Ashish Goel. 2013. Personalized pagerank to a target node. arXiv preprint arXiv:1304.4658(2013).
[51]
Peter A Lofgren, Siddhartha Banerjee, Ashish Goel, and C Seshadhri. 2014. Fast-ppr: Scaling personalized pagerank estimation for large graphs. In SIGKDD. 1436–1445.
[52]
Takanori Maehara, Takuya Akiba, Yoichi Iwata, and Ken-ichi Kawarabayashi. 2014. Computing personalized pagerank quickly by exploiting graph structures. PVLDB (2014), 1023–1034.
[53]
Julian McAuley, Christopher Targett, Qinfeng Shi, and Anton Van Den Hengel. 2015. Image-based recommendations on styles and substitutes. In SIGIR. 43–52.
[54]
Qiaozhu Mei, Dengyong Zhou, and Kenneth Church. 2008. Query suggestion using hitting time. In CIKM. 469–478.
[55]
Phuong Nguyen, Paolo Tomeo, Tommaso Di Noia, and Eugenio Di Sciascio. 2015. An evaluation of SimRank and Personalized PageRank to build a recommender system for the Web of Data. In WWW. 1477–1482.
[56]
Naoto Ohsaka, Takanori Maehara, and Ken-ichi Kawarabayashi. 2015. Efficient pagerank tracking in evolving networks. In SIGKDD. 875–884.
[57]
Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1999. The PageRank citation ranking: Bringing order to the web.Technical Report. Stanford InfoLab.
[58]
Jia-Yu Pan, Hyung-Jeong Yang, Christos Faloutsos, and Pinar Duygulu. 2004. Automatic multimedia cross-modal correlation discovery. In SIGKDD. 653–658.
[59]
Sungchan Park, Wonseok Lee, Byeongseo Choe, and Sang-Goo Lee. 2019. A survey on personalized PageRank computation algorithms. IEEE Access (2019), 163049–163062.
[60]
Greg Pass, Abdur Chowdhury, and Cayley Torgeson. 2006. A picture of search. In InfoScale.
[61]
Georgios A Pavlopoulos, Panagiota I Kontou, Athanasia Pavlopoulou, Costas Bouyioukos, Evripides Markou, and Pantelis G Bagos. 2018. Bipartite graphs in systems biology and medicine: a survey of methods and applications. GigaScience (2018), 1–31.
[62]
Sascha Rothe and Hinrich Schütze. 2014. Cosimrank: A flexible & efficient graph-theoretic similarity measure. In ACL. 1392–1402.
[63]
Gerard Salton, James Allan, and Chris Buckley. 1993. Approaches to passage retrieval in full text information systems. In SIGIR. 49–58.
[64]
Purnamrita Sarkar and Andrew W Moore. 2010. Fast nearest-neighbor search in disk-resident graphs. In SIGKDD. 513–522.
[65]
Tamás Sarlós, Adrás A Benczúr, Károly Csalogány, Dániel Fogaras, and Balázs Rácz. 2006. To randomize or not to randomize: space optimal summaries for hyperlink analysis. In WWW. 297–306.
[66]
Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl. 2001. Item-based collaborative filtering recommendation algorithms. In WWW. 285–295.
[67]
Jieming Shi, Renchi Yang, Tianyuan Jin, Xiaokui Xiao, and Yin Yang. 2019. Realtime top-k personalized pagerank over large graphs on gpus. PVLDB (2019), 15–28.
[68]
Kijung Shin, Jinhong Jung, Sael Lee, and U Kang. 2015. Bear: Block elimination approach for random walk with restart on large graphs. In SIGMOD. 1571–1585.
[69]
Jimeng Sun, Huiming Qu, Deepayan Chakrabarti, and Christos Faloutsos. 2005. Neighborhood Formation and Anomaly Detection in Bipartite Graphs. In ICDM. 418–425.
[70]
Liwen Sun, Reynold Cheng, Xiang Li, David W Cheung, and Jiawei Han. 2011. On link-based similarity join. PVLDB (2011), 714–725.
[71]
Hanghang Tong, Christos Faloutsos, and Jia-Yu Pan. 2006. Fast random walk with restart and its applications. In ICDM. IEEE, 613–622.
[72]
Hanghang Tong, Spiros Papadimitriou, Philip S Yu, and Christos Faloutsos. 2008. Proximity tracking on time-evolving bipartite graphs. In SDM. 704–715.
[73]
Amos Tversky. 1977. Features of similarity.Psychological review(1977), 327.
[74]
MK Vijaymeena and K Kavitha. 2016. A survey on similarity measures in text mining. MLAIJ (2016), 19–28.
[75]
Alastair J Walker. 1974. New fast method for generating discrete random numbers with arbitrary frequency distributions. Electronics Letters (1974), 127–128.
[76]
Hanzhi Wang, Zhewei Wei, Junhao Gan, Sibo Wang, and Zengfeng Huang. 2020. Personalized PageRank to a Target Node, Revisited. In SIGKDD. 657–667.
[77]
Runhui Wang, Sibo Wang, and Xiaofang Zhou. 2019. Parallelizing approximate single-source personalized pagerank queries on shared memory. VLDBJ (2019), 923–940.
[78]
Sibo Wang, Youze Tang, Xiaokui Xiao, Yin Yang, and Zengxiang Li. 2016. Hubppr: effective indexing for approximate personalized pagerank. PVLDB (2016), 205–216.
[79]
Sibo Wang, Renchi Yang, Runhui Wang, Xiaokui Xiao, Zhewei Wei, Wenqing Lin, Yin Yang, and Nan Tang. 2019. Efficient algorithms for approximate single-source personalized pagerank queries. TODS (2019), 1–37.
[80]
Sibo Wang, Renchi Yang, Xiaokui Xiao, Zhewei Wei, and Yin Yang. 2017. FORA: simple and effective approximate single-source personalized pagerank. In SIGKDD. 505–514.
[81]
Zhewei Wei, Xiaodong He, Xiaokui Xiao, Sibo Wang, Shuo Shang, and Ji-Rong Wen. 2018. Topppr: top-k personalized pagerank queries with precision guarantees on large graphs. In SIGMOD. 441–456.
[82]
Hao Wu, Junhao Gan, Zhewei Wei, and Rui Zhang. 2021. Unifying the Global and Local Approaches: An Efficient Power Iteration with Forward Push. In SIGMOD.
[83]
Minji Yoon, Jinhong Jung, and U Kang. 2018. Tpa: Fast, scalable, and accurate method for approximate random walk with restart on billion scale graphs. In ICDE. 1132–1143.
[84]
Weiren Yu and Xuemin Lin. 2013. IRWR: incremental random walk with restart. In SIGIR. 1017–1020.
[85]
Hongyang Zhang, Peter Lofgren, and Ashish Goel. 2016. Approximate personalized pagerank on dynamic graphs. In SIGKDD. 1315–1324.
[86]
Fanwei Zhu, Yuan Fang, Kevin Chen-Chuan Chang, and Jing Ying. 2013. Incremental and accuracy-aware personalized pagerank through scheduled approximation. PVLDB (2013), 481–492.

Cited By

View all
  • (2024)BIRD: Efficient Approximation of Bidirectional Hidden Personalized PageRankProceedings of the VLDB Endowment10.14778/3665844.366585517:9(2255-2268)Online publication date: 1-May-2024
  • (2024)Efficient High-Quality Clustering for Large Bipartite GraphsProceedings of the ACM on Management of Data10.1145/36392782:1(1-27)Online publication date: 26-Mar-2024
  • (2024)Fast Query of Biharmonic Distance in NetworksProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671856(1887-1897)Online publication date: 25-Aug-2024
  • Show More Cited By

Index Terms

  1. Efficient and Effective Similarity Search over Bipartite Graphs
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        WWW '22: Proceedings of the ACM Web Conference 2022
        April 2022
        3764 pages
        ISBN:9781450390965
        DOI:10.1145/3485447
        This work is licensed under a Creative Commons Attribution International 4.0 License.

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 25 April 2022

        Check for updates

        Author Tags

        1. Approximate Algorithms
        2. Bipartite Graphs
        3. Similarity Search

        Qualifiers

        • Research-article
        • Research
        • Refereed limited

        Conference

        WWW '22
        Sponsor:
        WWW '22: The ACM Web Conference 2022
        April 25 - 29, 2022
        Virtual Event, Lyon, France

        Acceptance Rates

        Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)437
        • Downloads (Last 6 weeks)43
        Reflects downloads up to 03 Oct 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)BIRD: Efficient Approximation of Bidirectional Hidden Personalized PageRankProceedings of the VLDB Endowment10.14778/3665844.366585517:9(2255-2268)Online publication date: 1-May-2024
        • (2024)Efficient High-Quality Clustering for Large Bipartite GraphsProceedings of the ACM on Management of Data10.1145/36392782:1(1-27)Online publication date: 26-Mar-2024
        • (2024)Fast Query of Biharmonic Distance in NetworksProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671856(1887-1897)Online publication date: 25-Aug-2024
        • (2024)Effective Edge-wise Representation Learning in Edge-Attributed Bipartite GraphsProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671805(3081-3091)Online publication date: 25-Aug-2024
        • (2024)Efficient Algorithms for Personalized PageRank Computation: A SurveyIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.337600036:9(4582-4602)Online publication date: Sep-2024

        View Options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format.

        HTML Format

        Get Access

        Login options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media