Abstract
Increasingly, biological data is being shared over the deep web. Many biological queries can only be answered by successively searching a number of distinct web-sites. This paper introduces a system that exploits parallelization for accelerating search over multiple deep web data sources. An interactive, two-stage multi-threading system is developed to achieve task parallelization, thread parallelization, and pipelined parallelization. We show the effectiveness of our system by considering a number of queries involving SNP datasets. We show that most of the queries can be accelerated significantly by exploiting these three forms of parallelism.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Babu, P., Boddepalli, R., Lakshmi, V., Rao, G.: Dod: Database of databases–updated molecular biology databases. Silico. Biol. 5 (2005)
Wang, F., Agrawal, G., Jin, R., Piontkivska, H.: Snpminer: A domain-specific deep web mining tool. In: Proceedings of the 7th IEEE International Conference on Bioinformatics and Bioengineering, pp. 192–199 (2007)
Wang, F., Agrawal, G., Jin, R.: Query planning for searching inter-dependent deep-web databases. In: Ludäscher, B., Mamoulis, N. (eds.) SSDBM 2008. LNCS, vol. 5069, pp. 24–41. Springer, Heidelberg (2008)
Wang, F., Agrawal, G.: Seedeep: A system for exploring and enquiring scientific deep web data sources. In: Proceedings of SSDBM 2009 (2009) (to appear)
He, B., Zhang, Z., Chang, K.C.C.: Knocking the door to the deep web: Integrating web query interfaces. In: Proceedings of the 2004 ACM SIGMOD international conference on Management of Data, pp. 913–914 (2004)
Chang, K., He, B., Zhang, Z.: Toward large scale integration: Building a metaquerier over databases on the web (2005)
He, H., Meng, W., Yu, C., Wu, Z.: Automatic integration of web search interfaces with wise_integrator. The international Journal on Very Large Data Bases 12, 256–273 (2004)
Wang, F., Agrawal, G., Jin, R.: A system for relational keyword searches over deep web data sources. Technical Report OSU-CISRC-03/08-TR10, The Ohio State University (March 2008)
Warnick, W.L., Lederman, A., Scott, R.L., Spence, K.J., Johnson, L.A., Allen, V.S.: Searching the deep web: Directed query engine applications at the department of energy. Technical report (2001)
Braga, D., Ceri, S., Daniel, F., Martinenghi, D.: Optimization of multi-domain queries on the web. In: Proceedings of VLDB 2008, pp. 562–573 (2008)
Bruno, N., Gravano, L., Marian, A.: Evaluating top-k queries over web-accessible databases. In: ICDE, p. 2004 (2002)
Deshpande, A., Hellerstein, L.: Flow algorithms for parallel query optimization. In: IEEE 24th International Conference on Data Engineering, 2008. ICDE 2008, pp. 754–763 (2008)
Srivastava, U., Munagala, K., Widom, J., Motwani, R.: Query optimization over web services. In: VLDB 2006: Proceedings of the 32nd international conference on Very large data bases, VLDB Endowment, pp. 355–366 (2006)
Churches, D., Gombas, G., Harrison, A., Maassen, J., Robinson, C., Shields, M., Taylor, I., Wang, I.: Programming Scientific and Distributed Workflow with Triana Services. Concurrency and Computation: Practice and Experience (Special Issue: Workflow in Grid Systems) 18(10), 1021–1037 (2006)
Rasolofo, Y.: Approaches to collection selection and results merging for distributed information retrieval. In: CIKM, pp. 191–198 (2001)
Orlando, S., Perego, R., Silvestri, F.: Design of a parallel and distributed web search engine. In: Proceedings of Parallel Computing (ParCo) 2001 conference, pp. 197–204. College Press, Imperial (2001)
Chaudhuri, S.: An overview of query optimization in relational systems. In: PODS, AC, pp. 34–43 (1998)
Hong, W., Stonebraker, M.: Optimization of parallel query execution plans in xprs. Technical Report UCB/ERL M91/50, EECS Department. University of California, Berkeley (1991)
Hasan, W.: Optimization of sql queries for parallel machines. PhD thesis, Stanford University (1995)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Liu, T., Wang, F., Agrawal, G. (2009). Exploiting Parallelism to Accelerate Keyword Search on Deep-Web Sources. In: Paton, N.W., Missier, P., Hedeler, C. (eds) Data Integration in the Life Sciences. DILS 2009. Lecture Notes in Computer Science(), vol 5647. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02879-3_12
Download citation
DOI: https://doi.org/10.1007/978-3-642-02879-3_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-02878-6
Online ISBN: 978-3-642-02879-3
eBook Packages: Computer ScienceComputer Science (R0)