Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Random Walk with Restart on Large Graphs Using Block Elimination

Published: 11 May 2016 Publication History

Abstract

Given a large graph, how can we calculate the relevance between nodes fast and accurately? Random walk with restart (RWR) provides a good measure for this purpose and has been applied to diverse data mining applications including ranking, community detection, link prediction, and anomaly detection. Since calculating RWR from scratch takes a long time, various preprocessing methods, most of which are related to inverting adjacency matrices, have been proposed to speed up the calculation. However, these methods do not scale to large graphs because they usually produce large dense matrices that do not fit into memory. In addition, the existing methods are inappropriate when graphs dynamically change because the expensive preprocessing task needs to be computed repeatedly.
In this article, we propose Bear, a fast, scalable, and accurate method for computing RWR on large graphs. Bear has two versions: a preprocessing method BearS for static graphs and an incremental update method BearD for dynamic graphs. BearS consists of the preprocessing step and the query step. In the preprocessing step, BearS reorders the adjacency matrix of a given graph so that it contains a large and easy-to-invert submatrix, and precomputes several matrices including the Schur complement of the submatrix. In the query step, BearS quickly computes the RWR scores for a given query node using a block elimination approach with the matrices computed in the preprocessing step. For dynamic graphs, BearD efficiently updates the changed parts in the preprocessed matrices of BearS based on the observation that only small parts of the preprocessed matrices change when few edges are inserted or deleted. Through extensive experiments, we show that BearS significantly outperforms other state-of-the-art methods in terms of preprocessing and query speed, space efficiency, and accuracy. We also show that BearD quickly updates the preprocessed matrices and immediately computes queries when the graph changes.

References

[1]
Lada A. Adamic and Eytan Adar. 2003. Friends and neighbors on the Web. Social Networks 25, 3, 211--230.
[2]
Reka Albert, Hawoong Jeong, and Albert-Laszlo Barabasi. 2000. Error and attack tolerance of complex networks. Nature 406, 6794, 378--382.
[3]
Reid Andersen, Fan Chung, and Kevin Lang. 2006. Local graph partitioning using PageRank vectors. In Proceedings of the IEEE Symposium on Foundations of Computer Science (FOCS’06). 475--486.
[4]
Reid Andersen, David F. Gleich, and Vahab Mirrokni. 2012. Overlapping clusters for distributed computation. In Proceedings of the International Conference on Web Search and Data Mining (WSDM’12). 273--282.
[5]
Ioannis Antonellis, Hector Garcia Molina, and Chi Chao Chang. 2008. Simrank++: Query rewriting through link analysis of the click graph. Proceedings of the VLDB Endowment 1, 1, 408--421.
[6]
Lars Backstrom and Jure Leskovec. 2011. Supervised random walks: Predicting and recommending links in social networks. In Proceedings of the International Conference on Web Search and Data Mining (WSDM’11). 635--644.
[7]
Bahman Bahmani, Abdur Chowdhury, and Ashish Goel. 2010. Fast incremental and personalized PageRank. Proceedings of the VLDB Endowment 4, 3, 173--184.
[8]
Andrey Balmin, Vagelis Hristidis, and Yannis Papakonstantinou. 2004. ObjectRank: Authority-based keyword search in databases. In Proceedings of the International Conference on Very Large Data Bases (VLDB’04). 564--575.
[9]
Sudipto Banerjee and Anindya Roy. 2014. Linear Algebra and Matrix Analysis for Statistics. CRC Press, Boca Raton, FL.
[10]
Petko Bogdanov and Ambuj Singh. 2013. Accurate and scalable nearest neighbors in large networks based on effective importance. In Proceedings of the International Conference on Information and Knowledge Management (CIKM’13). 1009--1018.
[11]
Stephen Boyd and Lieven Vandenberghe. 2009. Convex Optimization. Cambridge University Press.
[12]
Deepayan Chakrabarti, Spiros Papadimitriou, Dharmendra S. Modha, and Christos Faloutsos. 2004a. Fully automatic cross-associations. In Proceedings of the International Conference on Knowledge Discovery and Data Mining (KDD’04). 79--88.
[13]
Deepayan Chakrabarti, Yiping Zhan, and Christos Faloutsos. 2004b. R-MAT: A recursive model for graph mining. In Proceedings of the SIAM International Conference on Data Mining (SDM’04). Vol. 4. 442--446.
[14]
Soumen Chakrabarti, Amit Pathak, and Manish Gupta. 2011. Index design and query processing for graph conductance search. Proceedings of the VLDB Endowment 20, 3, 445--470.
[15]
Flavio Chierichetti, Ravi Kumar, Silvio Lattanzi, Michael Mitzenmacher, Alessandro Panconesi, and Prabhakar Raghavan. 2009. On compressing social networks. In Proceedings of the International Conference on Knowledge Discovery and Data Mining (KDD’09). 219--228.
[16]
Peter G. Doyle and J. Laurie Snell. 1984. Random Walks and Electric Networks. Mathematical Association of America.
[17]
Yasuhiro Fujiwara, Makoto Nakatsuji, Makoto Onizuka, and Masaru Kitsuregawa. 2012a. Fast and exact top-k search for random walk with restart. Proceedings of the VLDB Endowment 5, 5, 442--453.
[18]
Yasuhiro Fujiwara, Makoto Nakatsuji, Takeshi Yamamuro, Hiroaki Shiokawa, and Makoto Onizuka. 2012b. Efficient personalized PageRank with accuracy assurance. In Proceedings of the International Conference on Knowledge Discovery and Data Mining (KDD’12). 15--23.
[19]
David Gleich and Marzia Polito. 2006. Approximating personalized PageRank with minimal use of Web graph data. Internet Mathematics 3, 3, 257--294.
[20]
David F. Gleich and C. Seshadhri. 2012. Vertex neighborhoods, low conductance cuts, and good seeds for local community methods. In Proceedings of the International Conference on Knowledge Discovery and Data Mining (KDD’12). 597--605.
[21]
Manish Gupta, Amit Pathak, and Soumen Chakrabarti. 2008. Fast algorithms for topk personalized PageRank queries. In Proceedings of the International Conference on World Wide Web (WWW’08). 1225--1226.
[22]
F. Harary and G. Gupta. 1997. Dynamic graph models. Mathematical and Computer Modelling 25, 7, 79--87.
[23]
Jingrui He, Mingjing Li, Hong-Jiang Zhang, Hanghang Tong, and Changshui Zhang. 2004. Manifold-ranking based image retrieval. In Proceedings of the Annual ACM International Conference on Multimedia (MULTIMEDIA’04). 9--16.
[24]
Glen Jeh and Jennifer Widom. 2002. SimRank: A measure of structural-context similarity. In Proceedings of the International Conference on Knowledge Discovery and Data Mining (KDD’02). 538--543.
[25]
U. Kang and Christos Faloutsos. 2011. Beyond ‘caveman communities’: Hubs and spokes for graph compression and mining. In Proceedings of the International Conference on Data Mining (ICDM’11). 300--309.
[26]
U. Kang, H. Tong, and J. Sun. 2012. Fast random walk graph kernel. In Proceedings of the SIAM International Conference on Data Mining (SDM’12). 828--838.
[27]
Gjergji Kasneci, Shady Elbassuoni, and Gerhard Weikum. 2009. Ming: Mining informative entity relationship subgraphs. In Proceedings of the InternationalConference on Information and Knowledge Management (CIKM’09). 1653--1656.
[28]
Danai Koutra, Tai-You Ke, U. Kang, Duen Horng Chau, Hsing-Kuo Kenneth Pao, and Christos Faloutsos. 2011. Unifying guilt-by-association approaches: Theorems and fast algorithms. In Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases (ECML PKDD’11). 245--260.
[29]
Danai Koutra, Joshua T. Vogelstein, and Christos Faloutsos. 2013. DELTACON: A principled massive-graph similarity function. In Proceedings of the 13th SIAM International Conference on Data Mining (SDM’13). 162--170.
[30]
Amy N. Langville and Carl D. Meyer. 2011. Google’s PageRank and Beyond: The Science of Search Engine Rankings. Princeton University Press, Princeton, NJ.
[31]
Jure Leskovec, Kevin J. Lang, Anirban Dasgupta, and Michael W. Mahoney. 2009. Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. Internet Mathematics 6, 1, 29--123.
[32]
David Liben-Nowell and Jon Kleinberg. 2007. The link-prediction problem for social networks. Journal of the Association for Information Science and Technology 58, 7, 1019--1031.
[33]
Y. Lim, U. Kang, and C. Faloutsos. 2014. SlashBurn: Graph compression and mining beyond caveman communities. IEEE Transactions on Knowledge and Data Engineering 26, 12, 3077--3089. 10.1109/TKDE.2014.2320716
[34]
Zhenjiang Lin, Michael R. Lyu, and Irwin King. 2009. MatchSim: A novel neighbor-based similarity measure with maximum neighborhood matching. In Proceedings of the International Conference on Information and Knowledge Management (CIKM’09). 1613--1616.
[35]
P. A. Lofgren, S. Banerjee, A. Goel, and C. Seshadhri. 2014. FAST-PPR: Scaling personalized PageRank estimation for large graphs. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, NY, 1436--1445.
[36]
Andrew Y. Ng, Michael I. Jordan, and Yair Weiss. 2002. On spectral clustering: Analysis and an algorithm. Advances in Neural Information Processing Systems 14, 2, 849--856.
[37]
Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1999. The PageRank Citation Ranking: Bringing Order to the Web. Technical Report. Stanford University, Stanford, CA.
[38]
Jia-Yu Pan, Hyung-Jeong Yang, Christos Faloutsos, and Pinar Duygulu. 2004. Automatic multimedia cross-modal correlation discovery. In Proceedings of the International Conference on Knowledge Discovery and Data Mining (KDD’04). 653--658.
[39]
Walter W. Piegorsch and George Casella. 1990. Inverting a sum of matrices. SIAM Review 32, 3, 470--470.
[40]
William H. Press. 2007. Numerical Recipes 3rd Edition: The Art of Scientific Computing. Cambridge University Press.
[41]
Purnamrita Sarkar and Andrew W. Moore. 2007. A tractable approach to finding closest truncated-commute-time neighbors in large graphs. In Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI’07). 335--343.
[42]
Purnamrita Sarkar and Andrew W. Moore. 2010. Fast nearest-neighbor search in disk-resident graphs. In Proceedings of the International Conference on Knowledge Discovery and Data Mining (KDD’10). 513--522.
[43]
K. Shin, J. Jung, L. Sael, and U. Kang. 2015. BEAR: Block elimination approach for random walk with restart on large graphs. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD’15).
[44]
Jimeng Sun, Huiming Qu, Deepayan Chakrabarti, and Christos Faloutsos. 2005. Neighborhood formation and anomaly detection in bipartite graphs. In Proceedings of the IEEE International Conference on Data Mining (ICDM’05). 418--425.
[45]
Hanghang Tong and Christos Faloutsos. 2006. Center-piece subgraphs: Problem definition and fast solutions. In Proceedings of the International Conference on Knowledge Discovery and Data Mining (KDD’06). 404--413.
[46]
Hanghang Tong, Christos Faloutsos, Brian Gallagher, and Tina Eliassi-Rad. 2007. Fast best-effort pattern matching in large attributed graphs. In Proceedings of the International Conference on Knowledge Discovery and Data Mining (KDD’07). 737--746.
[47]
Hanghang Tong, Christos Faloutsos, and Jia-Yu Pan. 2008. Random walk with restart: Fast solutions and applications. Knowledge and Information Systems 14, 3, 327--346.
[48]
Joyce Jiyoung Whang, David F. Gleich, and Inderjit S. Dhillon. 2013. Overlapping community detection using seed set expansion. In Proceedings of the International Conference on Information and Knowledge Management (CIKM’13). 2099--2108.
[49]
Yubao Wu, Ruoming Jin, and Xiang Zhang. 2014. Fast and unified local search for random walk based k-nearest-neighbor query in large graphs. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD’14). 1139--1150.
[50]
Chao Zhang, Lidan Shou, Ke Chen, Gang Chen, and Yijun Bei. 2012. Evaluating geo-social influence in location-based social networks. In Proceedings of the International Conference on Information and Knowledge Management (CIKM’12). 1442--1451.
[51]
Zeyuan A. Zhu, Silvio Lattanzi, and Vahab Mirrokni. 2013. A local algorithm for finding well-connected clusters. In Proceedings of the International Conference on Machine Learning (ICML’13). 396--404.

Cited By

View all
  • (2024)Network-based Multi-omics Disease–Drug Associations Reveal Drug Repurposing Candidates for Covid-19 Disease PhasesDrug Repurposing10.58647/DRUGREPO.24.1.00071:1Online publication date: 2024
  • (2024)GraphZeppelin: How to Find Connected Components (Even When Graphs Are Dense, Dynamic, and Massive)ACM Transactions on Database Systems10.1145/364384649:3(1-31)Online publication date: 16-May-2024
  • (2024)Efficient Algorithms for Personalized PageRank Computation: A SurveyIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.337600036:9(4582-4602)Online publication date: Sep-2024
  • Show More Cited By

Index Terms

  1. Random Walk with Restart on Large Graphs Using Block Elimination

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Database Systems
    ACM Transactions on Database Systems  Volume 41, Issue 2
    Invited Paper from SIGMOD 2014 and Regular Papers
    June 2016
    271 pages
    ISSN:0362-5915
    EISSN:1557-4644
    DOI:10.1145/2936309
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 11 May 2016
    Accepted: 01 January 2016
    Revised: 01 December 2015
    Received: 01 May 2015
    Published in TODS Volume 41, Issue 2

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Proximity
    2. random walk with restart
    3. ranking in graph
    4. relevance score

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    • NRF
    • MSIP/IITP
    • MSIP

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)17
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 30 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Network-based Multi-omics Disease–Drug Associations Reveal Drug Repurposing Candidates for Covid-19 Disease PhasesDrug Repurposing10.58647/DRUGREPO.24.1.00071:1Online publication date: 2024
    • (2024)GraphZeppelin: How to Find Connected Components (Even When Graphs Are Dense, Dynamic, and Massive)ACM Transactions on Database Systems10.1145/364384649:3(1-31)Online publication date: 16-May-2024
    • (2024)Efficient Algorithms for Personalized PageRank Computation: A SurveyIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.337600036:9(4582-4602)Online publication date: Sep-2024
    • (2023)Parallel Overlapping Community Detection Algorithm on GPUIEEE Transactions on Big Data10.1109/TBDATA.2022.31803609:2(677-687)Online publication date: 1-Apr-2023
    • (2022)UniCon: A unified star-operation to efficiently find connected components on a cluster of commodity hardwarePLOS ONE10.1371/journal.pone.027752717:11(e0277527)Online publication date: 30-Nov-2022
    • (2021)A Random Walk with Restart Model Based on Common Neighbors for Predicting the Clinical Drug Combinations on Coronary Heart DiseaseJournal of Healthcare Engineering10.1155/2021/45973912021(1-7)Online publication date: 8-Dec-2021
    • (2021)AdaSimProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3482316(1528-1537)Online publication date: 26-Oct-2021
    • (2021)A Multi-Layer Random Walk Method for Local Dynamic Community Detection in Brain Functional Network2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)10.1109/BIBM52615.2021.9669312(1098-1103)Online publication date: 9-Dec-2021
    • (2021)VPC: Pruning connected components using vector-based path compression for Graph500CCF Transactions on High Performance Computing10.1007/s42514-021-00070-zOnline publication date: 15-Jul-2021
    • (2020)PACC: Large scale connected component computation on Hadoop and SparkPLOS ONE10.1371/journal.pone.022993615:3(e0229936)Online publication date: 18-Mar-2020
    • Show More Cited By

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media