Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Scalable multi-query optimization for exploratory queries over federated scientific databases

Published: 01 August 2008 Publication History

Abstract

The diversity and large volumes of data processed in the Natural Sciences today has led to a proliferation of highly-specialized and autonomous scientific databases with inherent and often intricate relationships. As a user-friendly method for querying this complex, ever-expanding network of sources for correlations, we propose exploratory queries. Exploratory queries are loosely-structured, hence requiring only minimal user knowledge of the source network. Evaluating an exploratory query usually involves the evaluation of many distributed queries. As the number of such distributed queries can quickly become large, we attack the optimization problem for exploratory queries by proposing several multi-query optimization algorithms that compute a global evaluation plan while minimizing the total communication cost, a key bottleneck in distributed settings. The proposed algorithms are necessarily heuristics, as computing an optimal global evaluation plan is shown to be NP-hard. Finally, we present an implementation of our algorithms, along with experiments that illustrate their potential not only for the optimization of exploratory queries, but also for the multiquery optimization of large batches of standard queries.

References

[1]
Nucleid acid research database list. http://www.oxfordjournals.org/nar/database/cap/.
[2]
S. Abiteboul, R. Hull, and V. Vianu. Foundations Of Databases. Addison-Wesley, 1995.
[3]
P. A. Bernstein and D.-M. W. Chiu. Using semi-joins to solve relational queries. J. ACM, 28(1):25--40, 1981.
[4]
P. A. Bernstein, et al. Query processing in a system for distributed databases (SDD-1). TODS, 6:602--625, 1981.
[5]
J. Bleiholder, et al. Query planning in the presence of overlapping sources. In EDBT, p. 811--828, 2006.
[6]
S. C. Boulakia, et al. Bioguidesrs: querying multiple sources with a user-centric perspective. Bioinformatics, 23(10):1301--1303, 2007.
[7]
S. C. Boulakia, et al. Path-based systems to guide scientists in the maze of biological data sources. J. Bioinformatics and Computational Biology, 4(5):1069--1096, 2006.
[8]
J. Chen, D. J. DeWitt, F. Tian, and Y. Wang. NiagaraCQ: a scalable continuous query system for internet databases. In SIGMOD, p. 379--390. ACM, 2000.
[9]
D.-M. W. Chiu, P. A. Bernstein, and Y.-C. Ho. Optimizing chain queries in a distributed database system. SIAM J. Comput., 13(1):116--134, 1984.
[10]
S. B. Davidson, G. C. Overton, and P. Buneman. Challenges in integrating biological data sources. J. of Computational Biology, 2(4):557--572, 1995.
[11]
S. B. Davidson, G. C. Overton, V. Tannen, and L. Wong. Biokleisli: A digital library for biomedical researchers. Int. J. on Digital Libraries, 1(1):36--53, 1997.
[12]
M. Y. Galperin. The molecular biology database collection: 2008 update. Nucleic Acids Research, 36:D2-D4, 2008.
[13]
J. Gray and A. S. Szalay. Where the rubber meets the sky: Bridging the gap between databases and science. IEEE Data Engineering Bulletin, 27(4):3--11, 2004.
[14]
T. Green, G. Karvounarakis, Z. Ives, and V. Tannen. Update exchange with mappings and provenance. In VLDB, p. 675--686, 2007.
[15]
L. M. Haas, et al. Discoverylink: A system for integrated access to life sciences data sources. IBM Systems Journal, 40(2):489--511, 2001.
[16]
A. Halevy, Z. Ives, J. Madhavan, P. Mork, D. Suciu, and I. Tatarinov. The piazza peer data management system. IEEE Trans. Knowl. Data Eng., 16(7):787--798, 2004.
[17]
D. S. Parker Jr. and C. Lee. Pairwise partial order alignment as a supergraph problem. 2003.
[18]
A. Kementsietsidis, M. Arenas, and R. Miller. Mapping data in peer-to-peer systems: Semantics and algorithmic issues. In SIGMOD, p. 325--336, 2003.
[19]
A. Kementsietsidis, F. Neven, and D. Van de Craen. Bioscout: A life-science query monitoring system. Demo to be presented at EDBT, 2008.
[20]
T. Kirsten and E. Rahm. Biofuice: Mapping-based data integration in bioinformatics. In DILS, p. 124--135, 2006.
[21]
D. Kossmann. The state of the art in distributed query processing. ACM Comput. Surv., 32(4):422--469, 2000.
[22]
U. Leser and F. Naumann. (almost) hands-off information integration for the life sciences. In CIDR, p. 131--143, 2005.
[23]
T. Malik et al. Skyquery: A web service approach to federate databases. In CIDR, 2003.
[24]
M. V. Mannino, P. Chu, and T. Sager. Statistical profile estimation in database systems. ACM Computing Surveys, 20(3):191--221, 1988.
[25]
NCBI. Genetic sequence data bank release notes, December 2007. ftp://ftp.ncbi.nih.gov/genbank/gbrel.txt.
[26]
F. Neven and D. Van de Craen. Optimizing monitoring queries over distributed data. In EDBT, p. 829--846, 2006.
[27]
M. Öszu and P. Valduriez. Principles of Distributed Database Systems. Prentice Hall, 2nd edition, 1999.
[28]
S. Pramanik and D. Vineyard. Optimizing join queries in distributed databases. IEEE Trans. Software Eng., 14(9):1319--1326, 1988.
[29]
K.-J. Räihä and E. Ukkonen. The shortest common supersequence problem over binary alphabet is NP-complete. TCS, 16:187--198, 1981.
[30]
N. Roussopoulos. View indexing in relational databases. TODS, 7(2):258--290, 1982.
[31]
P. Roy, S. Seshadri, S. Sudarshan, and S. Bhobe. Efficient and extensible algorithms for multi query optimization. In SIGMOD, p. 249--260. ACM, 2000.
[32]
T. Sellis and S. Ghosh. On the multiple-query optimization problem. Trans. Knowl. Data Eng., 2(2):262--266, 1990.
[33]
T. K. Sellis. Multiple-query optimization. TODS, 13(1):23--52, 1988.
[34]
K. Stocker, et al. Integrating semi-join-reducers into state of the art query processors. In ICDE, p. 575--584., 2001.
[35]
N. Trigoni, Y. Yao, A. J. Demers, J. Gehrke, and R. Rajaraman. Multi-query optimization for sensor networks. In DCOSS, p. 307--321, 2005.
[36]
C. Wang and M.-S. Chen. On the complexity of distributed query optimization. IEEE Trans. Knowl. Data Eng, 8(4):650--662, 1996.

Cited By

View all
  • (2021)Compliant Geo-distributed Query ProcessingProceedings of the 2021 International Conference on Management of Data10.1145/3448016.3453687(181-193)Online publication date: 9-Jun-2021
  • (2020)Effective and efficient crowd-assisted similarity retrieval of medical images in resource-constraint Mobile telemedicine systemsMultimedia Tools and Applications10.1007/s11042-020-08755-379:27-28(19893-19923)Online publication date: 1-Jul-2020
  • (2019)Leon: A Distributed RDF Engine for Multi-query ProcessingDatabase Systems for Advanced Applications10.1007/978-3-030-18576-3_44(742-759)Online publication date: 22-Apr-2019
  • Show More Cited By

Index Terms

  1. Scalable multi-query optimization for exploratory queries over federated scientific databases

                        Recommendations

                        Comments

                        Information & Contributors

                        Information

                        Published In

                        cover image Proceedings of the VLDB Endowment
                        Proceedings of the VLDB Endowment  Volume 1, Issue 1
                        August 2008
                        1216 pages

                        Publisher

                        VLDB Endowment

                        Publication History

                        Published: 01 August 2008
                        Published in PVLDB Volume 1, Issue 1

                        Qualifiers

                        • Research-article

                        Contributors

                        Other Metrics

                        Bibliometrics & Citations

                        Bibliometrics

                        Article Metrics

                        • Downloads (Last 12 months)6
                        • Downloads (Last 6 weeks)0
                        Reflects downloads up to 23 Dec 2024

                        Other Metrics

                        Citations

                        Cited By

                        View all
                        • (2021)Compliant Geo-distributed Query ProcessingProceedings of the 2021 International Conference on Management of Data10.1145/3448016.3453687(181-193)Online publication date: 9-Jun-2021
                        • (2020)Effective and efficient crowd-assisted similarity retrieval of medical images in resource-constraint Mobile telemedicine systemsMultimedia Tools and Applications10.1007/s11042-020-08755-379:27-28(19893-19923)Online publication date: 1-Jul-2020
                        • (2019)Leon: A Distributed RDF Engine for Multi-query ProcessingDatabase Systems for Advanced Applications10.1007/978-3-030-18576-3_44(742-759)Online publication date: 22-Apr-2019
                        • (2016)Multiple query optimization on the D-Wave 2X adiabatic quantum computerProceedings of the VLDB Endowment10.14778/2947618.29476219:9(648-659)Online publication date: 1-May-2016
                        • (2016)Spatial Consensus Queries in a Collaborative EnvironmentACM Transactions on Spatial Algorithms and Systems10.1145/28299432:1(1-37)Online publication date: 30-Mar-2016
                        • (2016)Efficient batch similarity join processing of social images based on arbitrary featuresWorld Wide Web10.1007/s11280-015-0355-z19:4(725-753)Online publication date: 1-Jul-2016
                        • (2015)Progressive Batch Medical Image Retrieval Processing in Mobile Wireless NetworksACM Transactions on Internet Technology10.1145/278343715:3(1-27)Online publication date: 14-Aug-2015
                        • (2012)Stratified k-means clustering over a deep web data sourceProceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining10.1145/2339530.2339705(1113-1121)Online publication date: 12-Aug-2012
                        • (2011)Answering complex structured queries over the deep webProceedings of the 15th Symposium on International Database Engineering & Applications10.1145/2076623.2076638(115-123)Online publication date: 21-Sep-2011
                        • (2010)Query reuse based query planning for searches over the deep webProceedings of the 21st international conference on Database and expert systems applications: Part II10.5555/1887568.1887575(64-79)Online publication date: 30-Aug-2010
                        • Show More Cited By

                        View Options

                        Login options

                        Full Access

                        View options

                        PDF

                        View or Download as a PDF file.

                        PDF

                        eReader

                        View online with eReader.

                        eReader

                        Media

                        Figures

                        Other

                        Tables

                        Share

                        Share

                        Share this Publication link

                        Share on social media