Improved algorithms for topic distillation in a hyperlinked environment

K Bharat, MR Henzinger - Proceedings of the 21st annual international …, 1998 - dl.acm.org
K Bharat, MR Henzinger
Proceedings of the 21st annual international ACM SIGIR conference on …, 1998dl.acm.org
This paper addresses the problem of topic distillation on the World Wide Web, namely, given
a typical user query to find quality documents related to the query topic. Connectivity
analysis has been shown to be useful in identifying high quality pages within a topic specific
graph of hyperlinked documents. The essence of our approach is to augment a previous
connectivity analysis based algorithm with content analysis. We identify three problems with
the existing approach and devise algorithms to tackle them. The results of a user evaluation …
Abstract
This paper addresses the problem of topic distillation on the World Wide Web, namely, given a typical user query to find quality documents related to the query topic. Connectivity analysis has been shown to be useful in identifying high quality pages within a topic specific graph of hyperlinked documents. The essence of our approach is to augment a previous connectivity analysis based algorithm with content analysis. We identify three problems with the existing approach and devise algorithms to tackle them. The results of a user evaluation are reported that show an improvement of precision at 10 documents by at least 45% over pure connectivity analysis.
ACM Digital Library