Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Massively parallel algorithms for personalized pagerank

Published: 01 May 2021 Publication History

Abstract

Personalized PageRank (PPR) has wide applications in search engines, social recommendations, community detection, and so on. Nowadays, graphs are becoming massive and many IT companies need to deal with large graphs that cannot be fitted into the memory of most commodity servers. However, most existing state-of-the-art solutions for PPR computation only work for single-machines and are inefficient for the distributed framework since such solutions either (i) result in an excessively large number of communication rounds, or (ii) incur high communication costs in each round.
Motivated by this, we present Delta-Push, an efficient framework for single-source and top-k PPR queries in distributed settings. Our goal is to reduce the number of rounds while guaranteeing that the load, i.e., the maximum number of messages an executor sends or receives in a round, can be bounded by the capacity of each executor. We first present a non-trivial combination of a redesigned parallel push algorithm and the Monte-Carlo method to answer single-source PPR queries. The solution uses pre-sampled random walks to reduce the number of rounds for the push al6gorithm. Theoretical analysis under the Massively Parallel Computing (MPC) model shows that our proposed solution bounds the communication rounds to [EQUATION] under a load of O(m/p), where m is the number of edges of the input graph, p is the number of executors, and ϵ is a user-defined error parameter. In the meantime, as the number of executors increases to p' = γ · p, the load constraint can be relaxed since each executor can hold O(γ · m/p') messages with invariant local memory. In such scenarios, multiple queries can be processed in batches simultaneously. We show that with a load of O(γ · m/p'), our Delta-Push can process γ queries in a batch with [EQUATION] rounds, while other baseline solutions still keep the same round cost for each batch. We further present a new top-k algorithm that is friendly to the distributed framework and reduces the number of rounds required in practice. Extensive experiments show that our proposed solution is more efficient than alternatives.

References

[1]
Reid Andersen, Christian Borgs, Jennifer T. Chayes, John E. Hopcroft, Vahab S. Mirrokni, and Shang-Hua Teng. 2007. Local Computation of PageRank Contributions. In WAW. 150--165.
[2]
Reid Andersen, Fan R. K. Chung, and Kevin J. Lang. 2006. Local Graph Partitioning using PageRank Vectors. In FOCS. 475--486.
[3]
Alexandr Andoni, Aleksandar Nikolov, Krzysztof Onak, and Grigory Yaroslavtsev. 2014. Parallel Algorithms for Geometric Graph Problems. In STOC. 574--583.
[4]
Konstantin Avrachenkov, Nelly Litvak, Danil Nemirovsky, Elena Smirnova, and Marina Sokol. 2011. Quick Detection of Top-k Personalized PageRank Lists. In WAW. 50--61.
[5]
Lars Backstrom and Jure Leskovec. 2011. Supervised random walks: predicting and recommending links in social networks. In WSDM. 635--644.
[6]
Bahman Bahmani, Kaushik Chakrabarti, and Dong Xin. 2011. Fast personalized PageRank on MapReduce. In SIGMOD. 973--984.
[7]
Paul Beame, Paraschos Koutris, and Dan Suciu. 2013. Communication steps for parallel query processing. In PODS. 273--284.
[8]
Pavel Berkhin. 2005. Survey: A Survey on PageRank Computing. Internet Mathematics 2, 1 (2005), 73--120.
[9]
Jeffrey Dean and Sanjay Ghemawat. 2004. MapReduce: Simplified Data Processing on Large Clusters. In OSDI. 137--150.
[10]
Dániel Fogaras, Balázs Rácz, Károly Csalogány, and Tamás Sarlós. 2005. Towards scaling fully personalized pagerank: Algorithms, lower bounds, and experiments. Internet Mathematics 2, 3 (2005), 333--358.
[11]
Yasuhiro Fujiwara, Makoto Nakatsuji, Hiroaki Shiokawa, Takeshi Mishima, and Makoto Onizuka. 2013. Efficient ad-hoc search for personalized PageRank. In SIGMOD. 445--456.
[12]
Yasuhiro Fujiwara, Makoto Nakatsuji, Takeshi Yamamuro, Hiroaki Shiokawa, and Makoto Onizuka. 2012. Efficient personalized pagerank with accuracy assurance. In KDD. 15--23.
[13]
Joseph E. Gonzalez, Reynold S. Xin, Ankur Dave, Daniel Crankshaw, Michael J. Franklin, and Ion Stoica. 2014. GraphX: Graph Processing in a Distributed Dataflow Framework. In OSDI. 599--613.
[14]
Michael Goodrich, Nodari Sitchinava, and Qin Zhang. 2011. Sorting, Searching, and Simulation in the MapReduce Framework. In ISAAC. 374--383.
[15]
Tao Guo, Xin Cao, Gao Cong, Jiaheng Lu, and Xuemin Lin. 2017. Distributed Algorithms on Exact Personalized PageRank. In SIGMOD. 479--494.
[16]
Manish S. Gupta, Amit Pathak, and Soumen Chakrabarti. 2008. Fast algorithms for topk personalized pagerank queries. In WWW. 1225--1226.
[17]
Pankaj Gupta, Ashish Goel, Jimmy Lin, Aneesh Sharma, Dong Wang, and Reza Zadeh. 2013. Wtf: The who to follow service at twitter. In WWW. 505--514.
[18]
Zoltán Gyöngyi, Pavel Berkhin, Hector Garcia-Molina, and Jan O. Pedersen. 2006. Link Spam Detection Based on Mass Estimation. In VLDB. 439--450.
[19]
Glen Jeh and Jennifer Widom. 2003. Scaling personalized web search. In WWW. 271--279.
[20]
Jinhong Jung, Namyong Park, Lee Sael, and U Kang. 2017. BePI: Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart. In SIGMOD. 789--804.
[21]
Howard Karloff, Siddharth Suri, and Sergei Vassilvitskii. 2010. A Model of Computation for MapReduce. In SODA. 938--948.
[22]
Jérôme Kunegis. 2013. KONECT - The Koblenz Network Collection. In Proc. Int. Conf. on World Wide Web Companion. 1343--1350.
[23]
Dandan Lin, Raymond Chi-Wing Wong, Min Xie, and Victor Junqiu Wei. 2020. Index-Free Approach with Theoretical Guarantee for Efficient Random Walk with Restart Query. In ICDE. 913--924.
[24]
Wenqing Lin. 2019. Distributed Algorithms for Fully Personalized PageRank on Large Graphs. In WWW. 1084--1094.
[25]
Peter Lofgren, Siddhartha Banerjee, and Ashish Goel. 2016. Personalized pagerank estimation and search: A bidirectional approach. In WSDM. 163--172.
[26]
Peter A Lofgren, Siddhartha Banerjee, Ashish Goel, and C Seshadhri. 2014. Fast-ppr: Scaling personalized pagerank estimation for large graphs. In KDD. 1436--1445.
[27]
Siqiang Luo. 2019. Distributed PageRank Computation: An Improved Theoretical Study. In AAAI. 4496--4503.
[28]
Takanori Maehara, Takuya Akiba, Yoichi Iwata, and Ken-ichi Kawarabayashi. 2014. Computing personalized PageRank quickly by exploiting graph structures. PVLDB 7, 12 (2014), 1023--1034.
[29]
Naoto Ohsaka, Takanori Maehara, and Ken-ichi Kawarabayashi. 2015. Efficient PageRank Tracking in Evolving Networks. In SIGKDD. 875--884.
[30]
Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1999. The PageRank citation ranking: bringing order to the web. (1999).
[31]
Peitian Pan and Chao Li. 2017. Congra: Towards Efficient Processing of Concurrent Graph Queries on Shared-Memory Machines. In ICCD. 217--224.
[32]
Atish Das Sarma, Anisur Rahaman Molla, Gopal Pandurangan, and Eli Upfal. 2013. Fast Distributed PageRank Computation. In ICDCN. 11--26.
[33]
Kijung Shin, Jinhong Jung, Lee Sael, and U. Kang. 2015. BEAR: Block Elimination Approach for Random Walk with Restart on Large Graphs. In SIGMOD. 1571--1585.
[34]
Hanzhi Wang, Zhewei Wei, Junhao Gan, Sibo Wang, and Zengfeng Huang. 2020. Personalized PageRank to a Target Node, Revisited. In SIGKDD. 657--667.
[35]
Runhui Wang, Sibo Wang, and Xiaofang Zhou. 2019. Parallelizing approximate single-source personalized PageRank queries on shared memory. VLDB J. 28, 6 (2019), 923--940.
[36]
Sibo Wang, Youze Tang, Xiaokui Xiao, Yin Yang, and Zengxiang Li. 2016. HubPPR: Effective Indexing for Approximate Personalized PageRank. PVLDB 10, 3 (2016), 205--216.
[37]
Sibo Wang and Yufei Tao. 2018. Efficient Algorithms for Finding Approximate Heavy Hitters in Personalized PageRanks. In SIGMOD. 1113--1127.
[38]
Sibo Wang, Renchi Yang, Runhui Wang, Xiaokui Xiao, Zhewei Wei, Wenqing Lin, Yin Yang, and Nan Tang. 2019. Efficient Algorithms for Approximate Single-Source Personalized PageRank Queries. Trans. Database Syst. 44, 4 (2019), 18:1--18:37.
[39]
Sibo Wang, Renchi Yang, Xiaokui Xiao, Zhewei Wei, and Yin Yang. 2017. FORA: Simple and Effective Approximate Single-Source Personalized PageRank. In SIGKDD. 505--514.
[40]
Zhewei Wei, Xiaodong He, Xiaokui Xiao, Sibo Wang, Yu Liu, Xiaoyong Du, and Ji-Rong Wen. 2019. PRSim: Sublinear Time SimRank Computation on Large Power-Law Graphs. In SIGMOD. 1042--1059.
[41]
Zhewei Wei, Xiaodong He, Xiaokui Xiao, Sibo Wang, Shuo Shang, and Ji-Rong Wen. 2018. TopPPR: Top-k Personalized PageRank Queries with Precision Guarantees on Large Graphs. In SIGMOD. 441--456.
[42]
David P Williamson and David B Shmoys. 2011. The design of approximation algorithms. Cambridge university press.
[43]
Jilong Xue, Zhi Yang, Shian Hou, and Yafei Dai. 2017. Processing Concurrent Graph Analytics with Decoupled Computation Model. Trans. Computers 66, 5 (2017), 876--890.
[44]
Jilong Xue, Zhi Yang, Zhi Qu, Shian Hou, and Yafei Dai. 2014. Seraph: an efficient, low-cost system for concurrent graph processing. In HPDC. 227--238.
[45]
Matei Zaharia, Mosharaf Chowdhury, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2010. Spark: Cluster Computing with Working Sets. In HotCloud.
[46]
Hongyang Zhang, Peter Lofgren, and Ashish Goel. 2016. Approximate Personalized PageRank on Dynamic Graphs. In KDD. 1315--1324.
[47]
Jin Zhao, Yu Zhang, Xiaofei Liao, Ligang He, Bingsheng He, Hai Jin, Haikun Liu, and Yicheng Chen. 2019. GraphM: an efficient storage system for high throughput of concurrent graph processing. In SC. 3:1--3:14.
[48]
Fanwei Zhu, Yuan Fang, Kevin Chen-Chuan Chang, and Jing Ying. 2013. Incremental and Accuracy-Aware Personalized PageRank through Scheduled Approximation. PVLDB 6, 6 (2013), 481--492.

Cited By

View all
  • (2024)Efficient and Accurate PageRank Approximation on Large GraphsProceedings of the ACM on Management of Data10.1145/36771322:4(1-26)Online publication date: 30-Sep-2024
  • (2024)Efficient Approximation of Kemeny's Constant for Large GraphsProceedings of the ACM on Management of Data10.1145/36549372:3(1-26)Online publication date: 30-May-2024
  • (2024)Topology-aware Parallel JoinsProceedings of the ACM on Management of Data10.1145/36515982:2(1-25)Online publication date: 14-May-2024
  • Show More Cited By

Index Terms

  1. Massively parallel algorithms for personalized pagerank
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image Proceedings of the VLDB Endowment
        Proceedings of the VLDB Endowment  Volume 14, Issue 9
        May 2021
        249 pages
        ISSN:2150-8097
        Issue’s Table of Contents

        Publisher

        VLDB Endowment

        Publication History

        Published: 01 May 2021
        Published in PVLDB Volume 14, Issue 9

        Qualifiers

        • Research-article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)70
        • Downloads (Last 6 weeks)7
        Reflects downloads up to 04 Oct 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)Efficient and Accurate PageRank Approximation on Large GraphsProceedings of the ACM on Management of Data10.1145/36771322:4(1-26)Online publication date: 30-Sep-2024
        • (2024)Efficient Approximation of Kemeny's Constant for Large GraphsProceedings of the ACM on Management of Data10.1145/36549372:3(1-26)Online publication date: 30-May-2024
        • (2024)Topology-aware Parallel JoinsProceedings of the ACM on Management of Data10.1145/36515982:2(1-25)Online publication date: 14-May-2024
        • (2024)On Optimal Server Allocation for Moldable Jobs with Concave Speed-UpProceedings of the Twenty-fifth International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing10.1145/3641512.3686370(191-200)Online publication date: 14-Oct-2024
        • (2024)Fast Computation of Kemeny's Constant for Directed GraphsProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671859(3472-3483)Online publication date: 25-Aug-2024
        • (2024)Towards Deeper Understanding of PPR-based Embedding Approaches: A Topological PerspectiveProceedings of the ACM Web Conference 202410.1145/3589334.3645663(969-979)Online publication date: 13-May-2024
        • (2024)A survey on dynamic graph processing on GPUs: concepts, terminologies and systemsFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-023-2656-118:4Online publication date: 1-Aug-2024
        • (2024)Random walk with restart on hypergraphs: fast computation and an application to anomaly detectionData Mining and Knowledge Discovery10.1007/s10618-023-00995-938:3(1222-1257)Online publication date: 1-May-2024
        • (2024)FICOM: an effective and scalable active learning framework for GNNs on semi-supervised node classificationThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-024-00870-z33:5(1723-1742)Online publication date: 1-Sep-2024
        • (2023)Time-aware random walk diffusion to improve dynamic graph learningProceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v37i7.26021(8473-8481)Online publication date: 7-Feb-2023
        • Show More Cited By

        View Options

        Get Access

        Login options

        Full Access

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media