Abstract
Traditional search engines ignore the tremendous amount information “hidden” behind search forms of Web pages, in large searchable electronic databases, which is called hidden Web. In this paper, we address this problem of designing a system for extracting and retrieval hidden Web information. We present a generic operational model of the hidden Web information retrieval and describe the key techniques. We introduce a new Tag-Tree-based Object Extraction Technique for automatically extracting hidden Web information from web pages. Based on this technique, we implement the retrieval algorithm for structured query of hidden Web information. The test results have also been reported.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
http://www.brightplanet.com/deepcontent/tutorials/DeepWeb/index.asp
InvisibleWeb.com home page http://www.invisibleWeb.com
Steve Lawrence and C.L. Giles: Searching the World Wide Web. Science, 280:98–100, 1998
Steve Lawrence and C.L. Giles: Accessibility of information on the web. Nature, 400:107–109, 1999
Sriram Raghavan and Hector Garcia-Molina: Crawling the hidden Web, In Proc. of the International Conference on Vary Large Data Bases (VLDB). Rome, Italy, September 2001.
Panagiotis G. Ipeirotis, Luis Gravano and Mehran Sahami: Probe, Count, and Classify: Categorizing Hidden-Web Databases. Proc. of the ACM SIGMOD Conference, Santa Barbara, California, USA, May 2001
Arnaud Sahuguest and Fabien Azavant: Building Light-Weight Wrappers for Legacy Web Data-Sources Using W4F. Proc. of the International Conference on Very Large Data Bases (VLDB), Edinburgh, Scotland, September 1999.
Ling Liu, Calton Pu, and Wei Han: XWRAP: An XML-Enabled Wrapper Construction System for Web Information Sources. Proc. of the International Conference on Data Engineering (ICDE), San Deigo, California, February 2000.
David buttler, Ling Liu, and Calton Pu: A Fully automated Object Extraction System for the World Wide Web. Proc. of the International Conference on Distributed Computing Systems, Phoenix, Arizona, April 2001.
Jussi Myllymaki: Effective Web Data Extaction with Standard XML Technologies. Proc. of the International World Wide Web Conference, HongKong, May 2001
Naveen Ashish and Craig Knoblock: Wrapper Generation for Semi-Structured Internet Sources. Proc. of the ACM SIGMOD Workshop on Management of Semistructured Data, Tucson, Arizona, May 1997
A. Heydon and M. Najork: Mercator: A scalable, extensible Web crawler. World Wide Web, 2(4): 219–229, Dec 1999
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hui, S., Ling, Z., Yunming, Y., Fanyuan, M. (2002). Object-Extraction-Based Hidden Web Information Retrieval. In: Meng, X., Su, J., Wang, Y. (eds) Advances in Web-Age Information Management. WAIM 2002. Lecture Notes in Computer Science, vol 2419. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45703-8_31
Download citation
DOI: https://doi.org/10.1007/3-540-45703-8_31
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44045-1
Online ISBN: 978-3-540-45703-9
eBook Packages: Springer Book Archive