Template detection via data mining and its applications

Z Bar-Yossef, S Rajagopalan - … of the 11th international conference on …, 2002 - dl.acm.org
Z Bar-Yossef, S Rajagopalan
Proceedings of the 11th international conference on World Wide Web, 2002dl.acm.org
We formulate and propose the template detection problem, and suggest a practical solution
for it based on counting frequent item sets. We show that the use of templates is pervasive
on the web. We describe three principles, which characterize the assumptions made by
hypertext information retrieval (IR) and data mining (DM) systems, and show that templates
are a major source of violation of these principles. As a consequence, basic" pure"
implementations of simple search algorithms coupled with template detection and …
We formulate and propose the template detection problem, and suggest a practical solution for it based on counting frequent item sets. We show that the use of templates is pervasive on the web. We describe three principles, which characterize the assumptions made by hypertext information retrieval (IR) and data mining (DM) systems, and show that templates are a major source of violation of these principles. As a consequence, basic "pure" implementations of simple search algorithms coupled with template detection and elimination show surprising increases in precision at all levels of recall.
ACM Digital Library