Object-Extraction-Based Hidden Web Information Retrieval

Hui, Song; Ling, Zhang; Yunming, Ye; Fanyuan, Ma

doi:10.1007/3-540-45703-8_31

Song Hui⁶,
Zhang Ling⁶,
Ye Yunming⁶ &
…
Ma Fanyuan⁶

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2419))

Included in the following conference series:

International Conference on Web-Age Information Management

340 Accesses

Abstract

Traditional search engines ignore the tremendous amount information “hidden” behind search forms of Web pages, in large searchable electronic databases, which is called hidden Web. In this paper, we address this problem of designing a system for extracting and retrieval hidden Web information. We present a generic operational model of the hidden Web information retrieval and describe the key techniques. We introduce a new Tag-Tree-based Object Extraction Technique for automatically extracting hidden Web information from web pages. Based on this technique, we implement the retrieval algorithm for structured query of hidden Web information. The test results have also been reported.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Web Content Extraction Technology

A survey of methods for the extraction of information from Web resources

Article 16 September 2016

Information Extraction from the Web by Matching Visual Presentation Patterns

References

http://www.brightplanet.com/deepcontent/tutorials/DeepWeb/index.asp
InvisibleWeb.com home page http://www.invisibleWeb.com
Steve Lawrence and C.L. Giles: Searching the World Wide Web. Science, 280:98–100, 1998
Article Google Scholar
Steve Lawrence and C.L. Giles: Accessibility of information on the web. Nature, 400:107–109, 1999
Article Google Scholar
Sriram Raghavan and Hector Garcia-Molina: Crawling the hidden Web, In Proc. of the International Conference on Vary Large Data Bases (VLDB). Rome, Italy, September 2001.
Google Scholar
Panagiotis G. Ipeirotis, Luis Gravano and Mehran Sahami: Probe, Count, and Classify: Categorizing Hidden-Web Databases. Proc. of the ACM SIGMOD Conference, Santa Barbara, California, USA, May 2001
Google Scholar
Arnaud Sahuguest and Fabien Azavant: Building Light-Weight Wrappers for Legacy Web Data-Sources Using W4F. Proc. of the International Conference on Very Large Data Bases (VLDB), Edinburgh, Scotland, September 1999.
Google Scholar
Ling Liu, Calton Pu, and Wei Han: XWRAP: An XML-Enabled Wrapper Construction System for Web Information Sources. Proc. of the International Conference on Data Engineering (ICDE), San Deigo, California, February 2000.
Google Scholar
David buttler, Ling Liu, and Calton Pu: A Fully automated Object Extraction System for the World Wide Web. Proc. of the International Conference on Distributed Computing Systems, Phoenix, Arizona, April 2001.
Google Scholar
Jussi Myllymaki: Effective Web Data Extaction with Standard XML Technologies. Proc. of the International World Wide Web Conference, HongKong, May 2001
Google Scholar
Naveen Ashish and Craig Knoblock: Wrapper Generation for Semi-Structured Internet Sources. Proc. of the ACM SIGMOD Workshop on Management of Semistructured Data, Tucson, Arizona, May 1997
Google Scholar
A. Heydon and M. Najork: Mercator: A scalable, extensible Web crawler. World Wide Web, 2(4): 219–229, Dec 1999
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Shanghai Jiao Tong University, 200030, Shanghai, China
Song Hui, Zhang Ling, Ye Yunming & Ma Fanyuan

Authors

Song Hui
View author publications
You can also search for this author in PubMed Google Scholar
Zhang Ling
View author publications
You can also search for this author in PubMed Google Scholar
Ye Yunming
View author publications
You can also search for this author in PubMed Google Scholar
Ma Fanyuan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Information School, Renmin University of China, Beijing, 100872, China
Xiaofeng Meng
Department of Computer Science, University of California, Santa Barbara, CA, 93106-5110, USA
Jianwen Su & Yujun Wang &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hui, S., Ling, Z., Yunming, Y., Fanyuan, M. (2002). Object-Extraction-Based Hidden Web Information Retrieval. In: Meng, X., Su, J., Wang, Y. (eds) Advances in Web-Age Information Management. WAIM 2002. Lecture Notes in Computer Science, vol 2419. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45703-8_31

Download citation

DOI: https://doi.org/10.1007/3-540-45703-8_31
Published: 21 August 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44045-1
Online ISBN: 978-3-540-45703-9
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Object-Extraction-Based Hidden Web Information Retrieval

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Web Content Extraction Technology

A survey of methods for the extraction of information from Web resources

Information Extraction from the Web by Matching Visual Presentation Patterns

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Object-Extraction-Based Hidden Web Information Retrieval

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Web Content Extraction Technology

A survey of methods for the extraction of information from Web resources

Information Extraction from the Web by Matching Visual Presentation Patterns

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation