Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Efficient Approximation of Kemeny's Constant for Large Graphs

Published: 30 May 2024 Publication History

Abstract

For an undirected graph, its Kemeny's constant is defined as the mean hitting time of random walks from one vertex to another chosen randomly according to the stationary distribution. Kemeny's constant exhibits numerous explanations from different perspectives and has found various applications in the field of complex networks. Due to the requirement of computing the inverse of the normalized Laplacian matrix, it is infeasible to get the accurate Kemeny's constant of large networks with millions of vertices. Existing methods either consume excessive memory space that are impractical for large-scale networks, or involve redundant simulation, leaving room for further optimization. In this paper, we propose two scalable Monte Carlo algorithms RefinedMC and ForestMC to approximate Kemeny's constant. RefinedMC makes several refinements based on the simulation of truncated random walks, significantly reducing the amount of required random walks, while ForestMC utilizes the newly discovered paradigm connecting Kemeny's constant with the inverse of corresponding Laplacian submatrix, which is considerably accurate. Extensive numerical experiments on model and realistic networks demonstrate that our approximation algorithms evidently outperform the baseline methods in terms of efficiency and accuracy.

References

[1]
David Aldous and Jim Fill. 2002. Reversible Markov Chains and Random Walks on Graphs. (2002).
[2]
Jean-Yves Audibert, Rémi Munos, and Csaba Szepesvári. 2007. Tuning bandit algorithms in stochastic environments. In Proceedings of the 2007 International Conference on Algorithmic Learning Theory. 150--165.
[3]
Haim Avron and Sivan Toledo. 2011. Randomized algorithms for estimating the trace of an implicit symmetric positive semi-definite matrix. J. ACM, Vol. 58, 2, Article 8 (2011), bibinfonumpages34 pages.
[4]
Alain Barrat, Marc Barthelemy, Romualdo Pastor-Satorras, and Alessandro Vespignani. 2004. The architecture of complex weighted networks. Proceedings of the National Academy of Sciences of the United States of America, Vol. 101, 11 (2004), 3747--3752.
[5]
Adi Ben-Israel and Thomas N. E Greville. 1974. Generalized Inverses: Theory and Applications. J. Wiley.
[6]
Andrew Beveridge. 2009. Centers for random walks on trees. SIAM Journal on Discrete Mathematics, Vol. 23, 1 (2009), 300--318.
[7]
Andrew Beveridge. 2016. A hitting time formula for the discrete Green's function. Combinatorics, Probability and Computing, Vol. 25, 3 (2016), 362--379.
[8]
Xi Chen, Simon S. Du, and Xin T. Tong. 2020. On stationary-point hitting time and ergodicity of stochastic gradient Langevin dynamics. Journal of Machine Learning Research, Vol. 21, 68 (2020), 1--41.
[9]
Zhen Chen, Xingzhi Guo, Baojian Zhou, Deqing Yang, and Steven Skiena. 2023. Accelerating personalized PageRank vector computation. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 262--273.
[10]
Fan RK Chung. 1997. Spectral Graph Theory. American Mathematical Society.
[11]
S Condamin, O Bénichou, V Tejedor, R Voituriez, and J Klafter. 2007. First-passage times in complex scale-invariant media. Nature, Vol. 450, 7166 (2007), 77--80.
[12]
Jonathan PK Doye and Claire P Massen. 2005. Self-similar disk packings as model spatial scale-free networks. Physical Review E, Vol. 71, 1 (2005), 016128.
[13]
Peter G Doyle and J Laurie Snell. 1984. Random Walks and Electric Networks. Mathematical Association of America.
[14]
Peng Fang, Arijit Khan, Siqiang Luo, Fang Wang, Dan Feng, Zhenli Li, Wei Yin, and Yuchao Cao. 2023. Distributed graph embedding with information-oriented random walks. Proceedings of the VLDB Endowment, Vol. 16, 7 (2023), 1643--1656.
[15]
Francois Fouss, Alain Pirotte, Jean-michel Renders, and Marco Saerens. 2007. Random-walk computation of similarities between nodes of a graph with application to collaborative recommendation. IEEE Transactions on Knowledge and Data Engineering, Vol. 19, 3 (2007), 355--369.
[16]
Arpita Ghosh, Stephen Boyd, and Amin Saberi. 2008. Minimizing effective resistance of a graph. SIAM Rev., Vol. 50, 1 (2008), 37--66.
[17]
Takanori Hayashi, Takuya Akiba, and Yuichi Yoshida. 2016. Efficient algorithms for spanning tree centrality. In Proceedings of the 25th International Joint Conference on Artificial Intelligence, Vol. 16. 3733--3739.
[18]
Wassily Hoeffding. 1994. Probability Inequalities for Sums of Bounded Random Variables. Springer New York, 409--426.
[19]
Guanhao Hou, Xingguang Chen, Sibo Wang, and Zhewei Wei. 2021. Massively parallel algorithms for personalized PageRank. Proceedings of the VLDB Endowment, Vol. 14, 9 (2021), 1668--1680.
[20]
Guanhao Hou, Qintian Guo, Fangyuan Zhang, Sibo Wang, and Zhewei Wei. 2023. Personalized PageRank on evolving graphs with an incremental index-update scheme. Proceedings of the ACM on Management of Data, Vol. 1, 1, Article 25 (2023), bibinfonumpages26 pages.
[21]
Jeffrey J. Hunter. 2014. The role of Kemeny's constant in properties of Markov chains. Communications in Statistics - Theory and Methods, Vol. 43, 7 (2014), 1309--1321.
[22]
M.F. Hutchinson. 1989. A stochastic estimator of the trace of the influence matrix for Laplacian smoothing splines. Communications in Statistics - Simulation and Computation, Vol. 18, 3 (1989), 1059--1076.
[23]
Alafate Julaiti, Bin Wu, and Zhongzhi Zhang. 2013. Eigenvalues of normalized Laplacian matrices of fractal trees and dendrimers: Analytical results and applications. The Journal of Chemical Physics, Vol. 138, 20 (2013), 204116.
[24]
Adam Kells, Vladimir Koskin, Edina Rosta, and Alessia Annibale. 2020. Correlation functions, mean first passage times, and the Kemeny constant. The Journal of Chemical Physics, Vol. 152, 10 (2020), 104108.
[25]
Douglas J Klein and Milan Randić. 1993. Resistance distance. Journal of Mathematical Chemistry, Vol. 12, 1 (1993), 81--95.
[26]
Zunaira Kosar, Shahid Zaman, and Muhammad Kamran Siddiqui. 2023. Structural characterization and spectral properties of hexagonal phenylene chain network. The European Physical Journal Plus, Vol. 138, 5 (2023), 415.
[27]
Vladimir Koskin, Adam Kells, Joe Clayton, Alexander K. Hartmann, Alessia Annibale, and Edina Rosta. 2023. Variational kinetic clustering of complex networks. The Journal of Chemical Physics, Vol. 158, 10 (2023), 104112.
[28]
Jérôme Kunegis. 2013. KONECT: the Koblenz network collection. In Proceedings of the 22nd International Conference on World Wide Web. 1343--1350.
[29]
Rasmus Kyng and Sushant Sachdeva. 2016. Approximate Gaussian elimination for Laplacians - fast, sparse, and simple. In 2016 IEEE 57th Annual Symposium on Foundations of Computer Science. 573--582.
[30]
R. B. Lehoucq, D. C. Sorensen, and C. Yang. 1998. ARPACK Users' Guide. Society for Industrial and Applied Mathematics. https://doi.org/10.1137/1.9780898719628
[31]
Jure Leskovec and Andrej Krevl. 2014. SNAP Datasets: Stanford Large Network Dataset Collection. http://snap.stanford.edu/data.
[32]
Hui-Jia Li, Lin Wang, Zhan Bu, Jie Cao, and Yong Shi. 2021b. Measuring the network vulnerability based on Markov criticality. ACM Transactions on Knowledge Discovery from Data, Vol. 16, 2 (2021), 28:1--28:24.
[33]
Shiju Li, Xin Huang, and Chul-Ho Lee. 2021a. An efficient and scalable algorithm for estimating Kemeny's constant of a Markov chain on large graphs. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 964--974.
[34]
Meihao Liao, Rong-Hua Li, Qiangqiang Dai, Hongyang Chen, Hongchao Qin, and Guoren Wang. 2023. Efficient resistance distance computation: The power of landmark-based approaches. Proceedings of the ACM on Management of Data, Vol. 1, 1, Article 68 (2023), bibinfonumpages27 pages.
[35]
Seng Pei Liew, Tsubasa Takahashi, Shun Takagi, Fumiyuki Kato, Yang Cao, and Masatoshi Yoshikawa. 2022. Network shuffling: Privacy amplification via random walks. In Proceedings of the 2022 International Conference on Management of Data. 773--787.
[36]
Chuan-Hao Lin, Linchuan Xu, and Kenji Yamanishi. 2023. Network change detection based on random walk in latent space. IEEE Transactions on Knowledge and Data Engineering, Vol. 35, 6 (2023), 6136--6147.
[37]
Yuan Lin and Zhongzhi Zhang. 2013. Random walks in weighted networks with a perfect trap: An application of Laplacian spectra. Physical Review E, Vol. 87, 6 (2013), 062140.
[38]
Brandon K. Liu, David C. Parkes, and Sven Seuken. 2016. Personalized hitting time for informative trust mechanisms despite sybils. In Proceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems. 1124--1132.
[39]
László Lovász. 1993. Random walks on graphs: A survey. Combinatorics, Paul Erdös is eighty, Vol. 2, 1 (1993), 1--46.
[40]
Charalampos Mavroforakis, Michael Mathioudakis, and Aristides Gionis. 2015. Absorbing random-Walk centrality: Theory and algorithms. In 2015 IEEE International Conference on Data Mining. 901--906.
[41]
M.E. J. Newman. 2005. A measure of betweenness centrality based on random walks. Social Networks, Vol. 27, 1 (2005), 39--54.
[42]
José Luis Palacios and José Miguel Renom. 2010. Bounds for the Kirchhoff index of regular graphs via the spectra of their random walks. International Journal of Quantum Chemistry, Vol. 110, 9 (2010), 1637--1641.
[43]
Rushabh Patel, Pushkarini Agharkar, and Francesco Bullo. 2015. Robotic surveillance and Markov chains with minimal weighted Kemeny constant. IEEE Trans. Automat. Control, Vol. 60, 12 (2015), 3156--3167.
[44]
Bibek Paudel and Abraham Bernstein. 2021. Random walks with erasure: diversifying personalized recommendations on social and information networks. In Proceedings of the Web Conference. 2046--2057.
[45]
Pan Peng, Daniel Lopatta, Yuichi Yoshida, and Gramoz Goranci. 2021. Local algorithms for estimating effective resistance. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 1329--1338.
[46]
Yi Qi, Yuze Dong, Zhongzhi Zhang, and Zhang Zhang. 2020. Hitting times for random walks on Sierpi'nski graphs and hierarchical graphs. Comput. J., Vol. 63, 9 (2020), 1385--1396.
[47]
Yi Qi, Huan Li, and Zhongzhi Zhang. 2017. Extended corona product as an exactly tractable model for weighted heterogeneous networks. Comput. J., Vol. 61, 5 (2017), 745--760.
[48]
Yi Qi and Zhongzhi Zhang. 2019. Spectral properties of extended Sierpi'nski graphs and their applications. IEEE Transactions on Network Science and Engineering, Vol. 6, 3 (2019), 512--522.
[49]
Jiezhong Qiu, Laxman Dhulipala, Jie Tang, Richard Peng, and Chi Wang. 2021. LightNE: A lightweight graph processing system for network embedding. In Proceedings of the 2021 International Conference on Management of Data. 2281--2289.
[50]
Ryan Rossi and Nesreen Ahmed. 2015. The network data repository with interactive graph analytics and visualization. Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 29, 1 (2015).
[51]
Liren Shan, Huan Li, and Zhongzhi Zhang. 2017. Domination number and minimum dominating sets in pseudofractal scale-free web and Sierpi'nski graph. Theoretical Computer Science, Vol. 677 (2017), 12--30.
[52]
Yibin Sheng and Zhongzhi Zhang. 2019. Low-mean hitting time for random walks on heterogeneous networks. IEEE Transactions on Information Theory, Vol. 65, 11 (2019), 6898--6910.
[53]
Jieming Shi, Nikos Mamoulis, Dingming Wu, and David W. Cheung. 2014. Density-based place clustering in geo-social networks. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. 99--110.
[54]
Kumar Sricharan and Kamalika Das. 2014. Localizing anomalous changes in time-evolving graphs. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. 1347--1358.
[55]
V Tejedor, O Bénichou, and R Voituriez. 2009. Global mean first-passage times of random walks on complex networks. Physical Review E, Vol. 80, 6 (2009), 065104.
[56]
Prasad Tetali. 1991. Random walks and the effective resistance of networks. Journal of Theoretical Probability, Vol. 4, 1 (1991), 101--109.
[57]
Hanzhi Wang, Zhewei Wei, Junhao Gan, Ye Yuan, Xiaoyong Du, and Ji-Rong Wen. 2022. Edge-based local push for personalized PageRank. Proceedings of the VLDB Endowment, Vol. 15, 7 (2022), 1376--1389.
[58]
Xiangrong Wang, Johan L.A. Dubbeldam, and Piet Van Mieghem. 2017. Kemeny's constant and the effective graph resistance. Linear Algebra Appl., Vol. 535 (2017), 231--244.
[59]
Scott White and Padhraic Smyth. 2003. Algorithms for estimating relative importance in networks. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 266--275.
[60]
David Bruce Wilson. 1996. Generating random spanning trees more quickly than the cover time. In Proceedings of the 28th Annual ACM Symposium on Theory of Computing. 296--303.
[61]
Hao Wu, Junhao Gan, Zhewei Wei, and Rui Zhang. 2021. Unifying the global and local approaches: An efficient power iteration with forward push. In Proceedings of the 2021 International Conference on Management of Data. 1996--2008.
[62]
Shunqi Wu, Zhongzhi Zhang, and Guanrong Chen. 2011. Random walks on dual Sierpi'nski gaskets. The European Physical Journal B, Vol. 82, 1 (2011), 91--96.
[63]
Pinchen Xie, Yuan Lin, and Zhongzhi Zhang. 2015. Spectrum of walk matrix for Koch network and its application. The Journal of Chemical Physics, Vol. 142, 22 (2015), 224106.
[64]
Pinchen Xie, Zhongzhi Zhang, and Francesc Comellas. 2016. On the spectrum of the normalized Laplacian of iterated triangulations of graphs. Appl. Math. Comput., Vol. 273 (2016), 1123--1129.
[65]
Wanyue Xu, Yibin Sheng, Zuobai Zhang, Haibin Kan, and Zhongzhi Zhang. 2020. Power-law graphs have minimal scaling of Kemeny constant for random walks. In Proceedings of The Web Conference. 46--56.
[66]
Hongzhi Yin, Bin Cui, Jing Li, Junjie Yao, and Chen Chen. 2012. Challenging the long tail recommendation. Proceedings of the VLDB Endowment, Vol. 5, 9 (2012), 896--907.
[67]
Shiqi Zhang, Renchi Yang, Jing Tang, Xiaokui Xiao, and Bo Tang. 2023. Efficient approximation algorithms for spanning centrality. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 3386--3395.
[68]
Yiming Zhang and Keith W. Ross. 2021. On-policy deep reinforcement learning for the average-reward criterion. In Proceedings of the 38th International Conference on Machine Learning, Vol. 139. 12535--12545.
[69]
Zuobai Zhang, Wanyue Xu, and Zhongzhi Zhang. 2020. Nearly linear time algorithm for mean hitting times of random walks on a graph. In Proceedings of the 13th International Conference on Web Search and Data Mining. 726--734.
[70]
Zhongzhi Zhang, Yihang Yang, and Yuan Lin. 2012. Random walks in modular scale-free networks with multiple traps. Physical Review E, Vol. 85, 1 (2012), 011106.
[71]
Xiaohan Zhao, Adelbert Chang, Atish Das Sarma, Haitao Zheng, and Ben Y. Zhao. 2013. On the embeddability of random walk distances. Proceedings of the VLDB Endowment, Vol. 6, 14 (2013), 1690--1701.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Management of Data
Proceedings of the ACM on Management of Data  Volume 2, Issue 3
SIGMOD
June 2024
1953 pages
EISSN:2836-6573
DOI:10.1145/3670010
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 May 2024
Published in PACMMOD Volume 2, Issue 3

Permissions

Request permissions for this article.

Author Tags

  1. hitting time
  2. kemeny's constant
  3. random walk
  4. spectral graph theory

Qualifiers

  • Research-article

Funding Sources

  • National Natural Science Foundation of China

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 147
    Total Downloads
  • Downloads (Last 12 months)147
  • Downloads (Last 6 weeks)17
Reflects downloads up to 03 Feb 2025

Other Metrics

Citations

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media