Local approximation of pagerank and reverse pagerank

Z Bar-Yossef, LT Mashiach - Proceedings of the 17th ACM conference on …, 2008 - dl.acm.org
Z Bar-Yossef, LT Mashiach
Proceedings of the 17th ACM conference on Information and knowledge management, 2008dl.acm.org
We consider the problem of approximating the PageRank of a target node using only local
information provided by a link server. This problem was originally studied by Chen, Gan, and
Suel (CIKM 2004), who presented an algorithm for tackling it. We prove that local
approximation of PageRank, even to within modest approximation factors, is infeasible in the
worst-case, as it requires probing the link server for Ω (n) nodes, where n is the size of the
graph. The difficulty emanates from nodes of high in-degree and/or from slow convergence …
We consider the problem of approximating the PageRank of a target node using only local information provided by a link server. This problem was originally studied by Chen, Gan, and Suel (CIKM 2004), who presented an algorithm for tackling it. We prove that local approximation of PageRank, even to within modest approximation factors, is infeasible in the worst-case, as it requires probing the link server for Ω(n) nodes, where n is the size of the graph. The difficulty emanates from nodes of high in-degree and/or from slow convergence of the PageRank random walk.
We show that when the graph has bounded in-degree and admits fast PageRank convergence, then local PageRank approximation can be done using a small number of queries. Unfortunately, natural graphs, such as the web graph, are abundant with high in-degree nodes, making this algorithm (or any other local approximation algorithm) too costly. On the other hand, reverse natural graphs tend to have low in-degree while maintaining fast PageRank convergence. It follows that calculating Reverse PageRank locally is frequently more feasible than computing PageRank locally.
We demonstrate that Reverse PageRank is useful for several applications, including computation of hub scores for web pages, finding influencers in social networks, obtaining good seeds for crawling, and measurement of semantic relatedness between concepts in a taxonomy.
ACM Digital Library