SEEDEEP: A System for Exploring and Querying Scientific Deep Web Data Sources

Wang, Fan; Agrawal, Gagan

doi:10.1007/978-3-642-02279-1_6

Fan Wang¹⁷ &
Gagan Agrawal¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5566))

Included in the following conference series:

International Conference on Scientific and Statistical Database Management

1439 Accesses

Abstract

A recent and emerging trend in scientific data dissemination involves online databases that are hidden behind query forms, thus forming what is referred to as the deep web. In this paper, we propose SEEDEEP, a System for Exploring and quErying scientific DEEP web data sources. SEEDEEP is able to automatically mine deep web data source schemas, integrate heterogeneous data sources, answer cross-source keyword queries, and incorporates features like caching and fault-tolerance. Currently, SEEDEEP integrates 16 deep web data sources in the biological domain. We demonstrate how an integrated model for correlated deep web data sources is constructed, how a complex cross-source keyword query is answered efficiently and correctly, and how important performance issues are addressed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

SimbaQL: A Query Language for Multi-source Heterogeneous Data

An Open Repository Model for Acquiring Knowledge About Scientific Experiments

xBook, a Framework for Common Scientific Databases

References

He, B., Patel, M., Zhang, Z., Chang, K.C.-C.: Accessing the deep web: A survey. Communications of ACM 50, 94–101 (2007)
Article Google Scholar
Babu, P.A., Boddepalli, R., Lakshmi, V.V., Rao, G.N.: Dod: Database of databases–updated molecular biology databases. Silico. Biol. 5 (2005)
Google Scholar
He, B., Zhang, Z., Chang, K.C.C.: Knocking the door to the deep web: Integrating web query interfaces. In: Proceedings of the 2004 ACM SIGMOD international conference on Management of Data, pp. 913–914 (2004)
Google Scholar
Chang, K.C.C., Cho, J.: Accessing the web: From search to integration. In: Proceedings of the 2006 ACM SIGMOD international conference on Management of Data, pp. 804–805 (2006)
Google Scholar
Chang, K., He, B., Zhang, Z.: Toward large scale integration: Building a metaquerier over databases on the web (2005)
Google Scholar
He, H., Meng, W., Yu, C., Wu, Z.: Automatic integration of web search interfaces with wise_integrator. The international Journal on Very Large Data Bases 12, 256–273 (2004)
Google Scholar
Zhao, H., Meng, W., Yu, C.: Mining templates from search result records of search engines. In: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 884–891 (2007)
Google Scholar
Bergman, M.K.: The deep web: Surfacing hidden value. Journal of Electronic Publishing 7 (2001)
Google Scholar
Kementsietsidis, A., Neven, F., de Craen, D.V., Vansummeren, S.: Scalable multi-query optimization for exploratory queries over federated scientific databases. Proceedings of the VLDB Endowment 1, 16–27 (2008)
Article Google Scholar
Hristidis, V., Papakonstantinou, Y.: Discover: Keyword search in relational databases. In: Proceedings of the 28th international conference on Very Large Data Bases, pp. 67–681 (2002)
Google Scholar
Kacholia, V., Pandit, S., Chakrabarti, S., Sudarshan, S., Desai, R., Karambelkar, H.: Bidirectional expansion for keyword search on graph databases. In: Proceedings of the 31st international conference on Very Large Data Bases, pp. 505–516 (2005)
Google Scholar
Aditya, B., Bhalotia, G., Chakrabarti, S., Hulgeri, A., Nakhe, C., Parag, P., Sudarshan, S.: Banks: Browsing and keyword searching in relational databases. In: Proceedings of the 28th International Conference on Very Large Data Bases, vol. 28, pp. 1083–1086 (2002)
Google Scholar
Liu, T., Wang, F., Agrawal, G.: Exploiting parallelism to accelerate keyword search on deep-web sources. In: The proceedings of the 2009 DILS workshop (to appear, 2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Ohio State University, Columbus, OH 43210, USA
Fan Wang & Gagan Agrawal

Authors

Fan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Gagan Agrawal
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of Illinois at Urbana-Champaign, 201 N. Goodwin Avenue, IL 61801, Urbana, USA
Marianne Winslett

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, F., Agrawal, G. (2009). SEEDEEP: A System for Exploring and Querying Scientific Deep Web Data Sources. In: Winslett, M. (eds) Scientific and Statistical Database Management. SSDBM 2009. Lecture Notes in Computer Science, vol 5566. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02279-1_6

Download citation

DOI: https://doi.org/10.1007/978-3-642-02279-1_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-02278-4
Online ISBN: 978-3-642-02279-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

SEEDEEP: A System for Exploring and Querying Scientific Deep Web Data Sources

Abstract

Access this chapter

Preview

Similar content being viewed by others

SimbaQL: A Query Language for Multi-source Heterogeneous Data

An Open Repository Model for Acquiring Knowledge About Scientific Experiments

xBook, a Framework for Common Scientific Databases

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

SEEDEEP: A System for Exploring and Querying Scientific Deep Web Data Sources

Abstract

Access this chapter

Preview

Similar content being viewed by others

SimbaQL: A Query Language for Multi-source Heterogeneous Data

An Open Repository Model for Acquiring Knowledge About Scientific Experiments

xBook, a Framework for Common Scientific Databases

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation