Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Computing personalized PageRank quickly by exploiting graph structures

Published: 01 August 2014 Publication History
  • Get Citation Alerts
  • Abstract

    We propose a new scalable algorithm that can compute Personalized PageRank (PPR) very quickly. The Power method is a state-of-the-art algorithm for computing exact PPR; however, it requires many iterations. Thus reducing the number of iterations is the main challenge.
    We achieve this by exploiting graph structures of web graphs and social networks. The convergence of our algorithm is very fast. In fact, it requires up to 7.5 times fewer iterations than the Power method and is up to five times faster in actual computation time.
    To the best of our knowledge, this is the first time to use graph structures explicitly to solve PPR quickly. Our contributions can be summarized as follows.
    1. We provide an algorithm for computing a tree decomposition, which is more efficient and scalable than any previous algorithm.
    2. Using the above algorithm, we can obtain a core-tree decomposition of any web graph and social network. This allows us to decompose a web graph and a social network into (1) the core, which behaves like an expander graph, and (2) a small tree-width graph, which behaves like a tree in an algorithmic sense.
    3. We apply a direct method to the small tree-width graph to construct an LU decomposition.
    4. Building on the LU decomposition and using it as pre-conditoner, we apply GMRES method (a state-of-the-art advanced iterative method) to compute PPR for whole web graphs and social networks.

    References

    [1]
    T. Akiba, T. Maehara, and K. Kawarabayashi. Network structural analysis via core-tree-decomposition. In KDD, 2014, to appear.
    [2]
    T. Akiba, C. Sommer, and K. Kawarabayashi. Shortest-path queries for complex networks: exploiting low tree-width outside the core. In EDBT, pages 144--155, 2012.
    [3]
    R. Albert, H. Jeong, and A.-L. Barabasi. Diameter of the World-Wide Web. Nature, 1999.
    [4]
    N. Alon and F. R. Chung. Explicit construction of linear sized tolerant networks. Ann. Disc. Math., 38: 15--19, 1988.
    [5]
    P. R. Amestoy, T. A. Davis, and I. S. Duff. An approximate minimum degree ordering algorithm. SIMAX, 17: 886--905, 1996.
    [6]
    S. Arnborg and A. Proskurowski. Linear time algorithms for NP-hard problems restricted to partial k-trees. Disc. Appl. Math., 2: 11--24, 1989.
    [7]
    K. Avrachenkov, N. Litvak, D. Nemirovsky, E. Smirnova, and M. Sokol. Quick detection of top-k personalized PageRank lists. In WAW, pages 50--61, 2011.
    [8]
    B. Bahmani, A. Chowdhury, and A. Goel. Fast incremental and personalized PageRank. In PVLDB, volume 4, pages 173--184, 2010.
    [9]
    S. M. Beitzel, E. C. Jensen, A. Chowdhury, D. Grossman, and O. Frieder. Hourly analysis of a very large topically categorized web query log. In SIGIR, pages 321--328, 2004.
    [10]
    A. Berry, P. Heggernes, and G. Simonet. The minimum degree heuristic and the minimal triangulation process. In Graph-Theoretic Concepts in Computer Science, volume 2880, pages 58--70. 2003.
    [11]
    P. Boldi, M. Rosa, M. Santini, and S. Vigna. Layered label propagation: A multiresolution coordinate-free ordering for compressing social networks. In WWW, 2011.
    [12]
    P. Boldi and S. Vigna. The WebGraph framework I: Compression techniques. In WWW, pages 595--601, 2004.
    [13]
    R. Chen, X. Weng, B. He, and M. Yang. Large graph processing in the cloud. In SIGMOD, pages 1123--1126, 2010.
    [14]
    F. Chung. Spectral Graph Theory. Cbms Regional Conference Series in Mathematics, 1997.
    [15]
    F. Chung. Laplacians and the Cheeger inequality for directed graphs. Ann. Comb., 9(1):1--19, 2005.
    [16]
    T. A. Davis. Direct methods for sparse linear systems. SIAM, 2006.
    [17]
    J. Dean and S. Ghemawat. MapReduce: simplified data processing on large clusters. In OSDI, pages 10--10, 2004.
    [18]
    G. M. Del Corso, A. Gullí, and F. Romani. Comparison of Krylov subspace methods on the PageRank problem. J. Comput. Appli. Math., 210: 159--166, 2007.
    [19]
    M. Embree. How descriptive are GMRES convergence bounds? Technical report, Oxford University Computing Laboratory, 1999.
    [20]
    Facebook. http://facebook.com/press/info.php?statistics, 2012.
    [21]
    D. Fogaras, B. Rácz, K. Csalogány, and T. Sarlós. Towards scaling fully personalized PageRank: Algorithms, lower bounds, and experiments. Internet Math., 2, 2005.
    [22]
    Y. Fujiwara, M. Nakatsuji, M. Onizuka, and M. Kitsuregawa. Fast and exact top-k search for random walk with restart. In PVLDB, volume 5, pages 442--453, 2012.
    [23]
    Y. Fujiwara, M. Nakatsuji, T. Yamamuro, H. Shiokawa, and M. Onizuka. Efficient personalized PageRank with accuracy assurance. In KDD, pages 15--23, 2012.
    [24]
    Y. Gao. Treewidth of Erdős--Rényi random graphs, random intersection graphs, and scale-free random graphs. Disc. Appl. Math., 160: 566--578, 2012.
    [25]
    A. George. Nested dissection of a regular finite element mesh. SINUM, 10: 345--363, 1973.
    [26]
    D. Gleich, L. Zhukov, and P. Berkhin. Fast parallel PageRank: A linear system approach. Yahoo! Research Technical Report YRL-2004-038, 13:22, 2004.
    [27]
    S. Hoory, N. Linial, and A. Wigderson. Expander graphs and their applications. Bull. Amer. Math. Soc., 43(4):439--561, 2006.
    [28]
    G. Jeh and J. Widom. Scaling personalized web search. WWW, pages 271--279, 2003.
    [29]
    M. Jerrum and A. Sinclair. Approximating the permanent. SIAM J. Comput., 18(6):1149--1178, 1989.
    [30]
    D. E. Knuth and A. Schönhage. The expected linearity of a simple equivalence algorithm. Theor. Comput. Sci., 6: 281--315, 1978.
    [31]
    H. Kwak, C. Lee, H. Park, and S. Moon. What is twitter, a social network or a news media? WWW, 2010.
    [32]
    A. N. Langville and C. D. Meyer. Google's PageRank and beyond: The science of search engine rankings. Princeton University Press, 2006.
    [33]
    J. Leskovec, K. Lang, A. Dasgupta, and M. Mahoney. Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. Internet Math., 6: 29--123, 2009.
    [34]
    D. Liben-Nowell and J. Kleinberg. The link prediction problem problem for social networks. In CIKM, pages 556--559, 2003.
    [35]
    Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, and J. M. Hellerstein. Distributed GraphLab: A framework for machine learning and data mining in the cloud. In PVLDB, volume 5, pages 716--727, 2012.
    [36]
    G. Malewicz, M. H. Austern, A. J. Bik, J. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: a system for large-scale graph processing. pages 135--146, 2010.
    [37]
    A. Mislove, M. Marcon, K. P. Gummadi, P. Druschel, and B. Bhattacharjee. Measurement and analysis of online social networks. In IMC, pages 29--42, 2007.
    [38]
    L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. Technical report, Stanford InfoLab, 1999.
    [39]
    N. Robertson and P. D. Seymour. Graph minors. III. planar tree-width. J. Combin. Theory Ser. B, 36: 49--63, 1984.
    [40]
    Y. Saad. Numerical methods for large eigenvalue problems, volume 158. SIAM, 1992.
    [41]
    Y. Saad. Iterative methods for sparse linear systems. SIAM, 2003.
    [42]
    Y. Saad and M. H. Schultz. GMRES: a generalized minimal residual algorithm for solving nonsymmetric linear systems. SISSC, 7: 869--869, 1986.
    [43]
    F. Wei. Tedi: efficient shortest path query answering on graphs. In SIGMOD, pages 99--110, 2010.
    [44]
    G. Wu and Y. Wei. Arnoldi versus GMRES for computing PageRank: A theoretical contribution to Google's PageRank problem. ACM Trans. Inf. Syst., 28(3): 11:1--11:28, 2010.
    [45]
    J. Xu, F. Jiao, and B. Berger. A tree-decomposition approach to protein structure prediction. In CSB, pages 247--256, 2005.
    [46]
    A. C.-C. Yao. On the average behavior of set merging algorithms. In STOC, pages 192--195, 1976.

    Cited By

    View all
    • (2024)Efficient and Provable Effective Resistance Computation on Large Graphs: An Index-based ApproachProceedings of the ACM on Management of Data10.1145/36549362:3(1-27)Online publication date: 30-May-2024
    • (2023)Efficient Personalized PageRank Computation: The Power of Variance-Reduced Monte Carlo ApproachesProceedings of the ACM on Management of Data10.1145/35893051:2(1-26)Online publication date: 20-Jun-2023
    • (2023)Personalized PageRank on Evolving Graphs with an Incremental Index-Update SchemeProceedings of the ACM on Management of Data10.1145/35887051:1(1-26)Online publication date: 30-May-2023
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Proceedings of the VLDB Endowment
    Proceedings of the VLDB Endowment  Volume 7, Issue 12
    August 2014
    296 pages
    ISSN:2150-8097
    Issue’s Table of Contents

    Publisher

    VLDB Endowment

    Publication History

    Published: 01 August 2014
    Published in PVLDB Volume 7, Issue 12

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)34
    • Downloads (Last 6 weeks)4
    Reflects downloads up to 26 Jul 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Efficient and Provable Effective Resistance Computation on Large Graphs: An Index-based ApproachProceedings of the ACM on Management of Data10.1145/36549362:3(1-27)Online publication date: 30-May-2024
    • (2023)Efficient Personalized PageRank Computation: The Power of Variance-Reduced Monte Carlo ApproachesProceedings of the ACM on Management of Data10.1145/35893051:2(1-26)Online publication date: 20-Jun-2023
    • (2023)Personalized PageRank on Evolving Graphs with an Incremental Index-Update SchemeProceedings of the ACM on Management of Data10.1145/35887051:1(1-26)Online publication date: 30-May-2023
    • (2022)Shortest-path queries on complex networksProceedings of the VLDB Endowment10.14778/3551793.355182015:11(2640-2652)Online publication date: 29-Sep-2022
    • (2022)Efficient Personalized PageRank Computation: A Spanning Forests Sampling Based ApproachProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3526140(2048-2061)Online publication date: 10-Jun-2022
    • (2021)Massively parallel algorithms for personalized pagerankProceedings of the VLDB Endowment10.14778/3461535.346155414:9(1668-1680)Online publication date: 22-Oct-2021
    • (2021)AgendaProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3482317(1315-1324)Online publication date: 26-Oct-2021
    • (2021)Unifying the Global and Local Approaches: An Efficient Power Iteration with Forward PushProceedings of the 2021 International Conference on Management of Data10.1145/3448016.3457298(1996-2008)Online publication date: 9-Jun-2021
    • (2020)Personalized PageRank to a Target Node, RevisitedProceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining10.1145/3394486.3403108(657-667)Online publication date: 23-Aug-2020
    • (2020)Scaling Up Distance Labeling on Graphs with Core-Periphery PropertiesProceedings of the 2020 ACM SIGMOD International Conference on Management of Data10.1145/3318464.3389748(1367-1381)Online publication date: 11-Jun-2020
    • Show More Cited By

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media