Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/375663.375722acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
Article

On supporting containment queries in relational database management systems

Published: 01 May 2001 Publication History

Abstract

Virtually all proposals for querying XML include a class of query we term “containment queries”. It is also clear that in the foreseeable future, a substantial amount of XML data will be stored in relational database systems. This raises the question of how to support these containment queries. The inverted list technology that underlies much of Information Retrieval is well-suited to these queries, but should we implement this technology (a) in a separate loosely-coupled IR engine, or (b) using the native tables and query execution machinery of the RDBMS? With option (b), more than twenty years of work on RDBMS query optimization, query execution, scalability, and concurrency control and recovery immediately extend to the queries and structures that implement these new operations. But all this will be irrelevant if the performance of option (b) lags that of (a) by too much. In this paper, we explore some performance implications of both options using native implementations in two commercial relational database systems and in a special purpose inverted list engine. Our performance study shows that while RDBMSs are generally poorly suited for such queries, under conditions they can outperform an inverted list engine. Our analysis further identifies two significant causes that differentiate the performance of the IR and RDBMS implementations: the join algorithms employed and the hardware cache utilization. Our results suggest that contrary to most expectations, with some modifications, a native implementations in an RDBMS can support this class of query much more efficiently.

References

[1]
S. Abiteboul, D. Quass, J. McHuge, J. Widom, and J. Wiener. The lorel query language for semistructured data. International Journal on Digital Libraries, 1(1):68-88, April 1997.
[2]
A. Ailamaki, D.J.DeWitt, M.D.Hill, and D.A.Wood. Dbmss on a modern processor: Where does time go? In Proceedings of the 25th International Conference on Very Large Data Bases, pages 266-277, September 1999.
[3]
Ricardo Baeza-Yates and Gonzalo Navarro. Integrating contents and structure in text retrieval. SIGMOD Record, 25(1):67-69, March 1996.
[4]
G. E. Blake, M. P. Consens, P. Kilpelainen, P.-A. Larson, T. Snider, and F. W. Tompa. Text/relational database management systems: Harmonizing sql and sgml. In Proceedings of the International Conference on Applications of Databases, pages 267-280, June 1994.
[5]
Don Chamberlin, Peter Frankhauser, Massimo Marchiori, and Jonathan Robie. Xml query requirements. In W3C Working Draft, http://www.w3.org/TR/xmlquery-req, August 2000.
[6]
Don Chamberlin, Jonathan Robie, and Daniela Florescu. Quilt: An xml query language for heterogeneous data sources. In Lecture Notes in Computer Science, Springer-Verlag, 2000.
[7]
C. L. Clarke, G. V. Cormack, and F. J. Burkowski. An algebra for structured text search and a framework for its implementation. The Computer Journal, 38(1):43-56, 1995.
[8]
C.L.Clarke, G.V.Cormack, and F.J.Burkowski. Schema-independent retrieval from heterogeneous structured text. In Fourth Annual Symposium on Document Analysis and Information Retrieval, pages 279-289, 1995.
[9]
Robin Cover. The xml cover pages. In http://www.oasis-open.org/cover/xml.html, July 2000.
[10]
Tuong Dao, Ron Sacks-Davis, and James A. Thom. Indexing structured text for queries on containment relationships. In Proceedings of the 7th Australasian Database Conference, 1996.
[11]
Stefan Dessloch and Nelson Mattos. Integrating sql databases with content-specific search engines. In Proceedings of the 23rd VLDB Conference, 1997.
[12]
Alin Deutsch, Mary Fernandez, Daniela Florescu, Alon Levy, and Dan Suciu. A query language for xml. In Proceedings of Eighth International World Wide Web Conference, 1999.
[13]
Daniela Florescu and Donald Kossman. Storing and querying xml data using an rdbms. IEEE Data Engineering Bulletin, 22(3):27-34, 1999.
[14]
Daniela Florescu, Donald Kossmann, and Ioana Manolescu. Integrating keyword search into xml query processing. WWW9/Computer Networks, 33(1-6):119-135, 2000.
[15]
International Organization for Standardization. Information processing-text and office systems-standard generalised markup language (sgml), iso/iec 8879, 1986.
[16]
GMD. Gmd-ipsi xql engine. http://xml.darmstadt.gmd.de/xql/index.html, August 1999.
[17]
Roy Goldman and Jennifer Widom. Wsq/dsq: A practical approach for combined querying of databases and the web. In Proceedings of the 2000 Sigmod Conference, pages 285-296, 2000.
[18]
IBM. Db2 text extender. http://www- 4.ibm.com/software/data/db2/extenders/text.htm, July 2000.
[19]
INRIA. Minixyleme project. http://wwwrocq.inria.fr/caguilera/xoql/minixyleme/readme.html.
[20]
Intel. Intel architecture software developer's manual, volume 1: Basic architecture, 1999.
[21]
J.Clark and S.DeRose. Xml path language (xpath) version 1.0, w3c recommendation. In http://www.w3.org/TR/xpath.html, November 1999.
[22]
J.McHugh, J.Widom, S.Abiteboul, Q.Luo, and A.Rajaraman. Indexing semistructured data. In Stanford Technical Report, January 1998.
[23]
J.McHugh, S.Abiteboul, R.Goldman, D.Quass, and J.Widom. Lore: A database management system for semistructured data. SIGMOD Record, 26(3):54-66, 1997.
[24]
J.Robie, J.Lapp, and D.Schach. Xml query language (xql). http://www.w3.org/TandS/QL/QL98/pp/xql.html.
[25]
University of Washington. The tukwila data integration system. http://data.cs.washington.edu/integration/tukwila/.
[26]
University of Wisconsin. The niagara system. http://www.cs.wisc.edu/niagara/.
[27]
Oracle. Oracle8i intermedia text reference, release 8.1.5. http://oradoc.photo.net/ora81/DOC/inter.815/a67843/toc.htm.
[28]
Steve Putz. Using a relational database for an inverted text index. In Xerox Palo Alto Research Center Technical Report SSL-91-20, Xerox PARC, January 1991.
[29]
Jun Rao and Kenneth A. Ross. Making b+-trees cache conscious in main memory. In SIGMOD Conference, pages 475-486, 2000.
[30]
Ron Sacks-Davis, Timothy Arnold-Moore, and Justin Zobel. Database systems for structured documents. In Proceedings of the International Symposium on Advanced Database Technologies and Their Integration (ADTI'94), pages 272-283, October 1994.
[31]
Arnaud Sahuguet. Kweelt. http://db.cis.upenn.edu/Kweelt/.
[32]
G. Salton and M.J. McGill. Introduction to Modern Information Retrieval. McGraw-Hill, New York, 1983.
[33]
S.Cluet, S.Jacqmin, and J.Simeon. The new yatl: Design and specifications. In Technical Report, INRIA, 1999.
[34]
Jayavel Shanmugasundaram, He Gang, Kristin Tufte, Chun Zhang, David DeWitt, and Jeffrey Naughton. Relational databases for querying xml documents: Limitations and opportunities. In Proceedings of the 1999 VLDB Conference, September 1999.
[35]
Jayavel Shanmugasundaram, E. Shekita, R.Barr, Michael J. Carey, Bruce G. Lindsay, Hamid Pirahesh, and Berthold Reinwald. Efficiently publishing relational data as xml documents. In VLDB Conference, September 2000.
[36]
A. Shatdal, C. Kant, and J.F.Naughton. Cache conscious algorithms for relational query processing. In Proceedings of the 20th VLDB Conference, 1994.
[37]
Sleepycat Software. The berkeley database. http://www.sleepycat.com.
[38]
T.Arnold-Moore, M.Fuller, B.Lowe, J.Thom, and R.Wilkinson. The elf data model and sgql query language for structured document databases. In Proceedings of the Australasian Database Conference, Adelaide, Australia, pages 17-26, 1995.
[39]
Tak W. Yan and Jurgen Annevelink. Integrating a structured-text retrieval system with an object-oriented database system. In VLDB Conference, September 1994.
[40]
Chun Zhang, Jeffrey Naughton, David DeWitt, Qiong Luo, and Guy Lohman. On supporting containment queries in relational database management systems. full version. http://www.cs.wisc.edu/niagara/papers/ZND+01full.pdf, 2001.
[41]
G. K. Zipf. Human Behavior and the Principle of Least Effort. Addison-Wesley, Reading MA, 1949.

Cited By

View all
  • (2022)PIRSE: Plant Identification and Repository for the Smart Environment2022 26th International Computer Science and Engineering Conference (ICSEC)10.1109/ICSEC56337.2022.10049317(78-83)Online publication date: 21-Dec-2022
  • (2022)A Performance Evaluation of the Clustering-based Labelling Scheme2022 International Conference on Engineering & MIS (ICEMIS)10.1109/ICEMIS56295.2022.9914208(1-6)Online publication date: 4-Jul-2022
  • (2021)Data structure set-trie for storing and querying sets: Theoretical and empirical analysisPLOS ONE10.1371/journal.pone.024512216:2(e0245122)Online publication date: 10-Feb-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '01: Proceedings of the 2001 ACM SIGMOD international conference on Management of data
May 2001
630 pages
ISBN:1581133324
DOI:10.1145/375663
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 May 2001

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

SIGMOD/PODS01
Sponsor:

Acceptance Rates

SIGMOD '01 Paper Acceptance Rate 44 of 293 submissions, 15%;
Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)51
  • Downloads (Last 6 weeks)5
Reflects downloads up to 15 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2022)PIRSE: Plant Identification and Repository for the Smart Environment2022 26th International Computer Science and Engineering Conference (ICSEC)10.1109/ICSEC56337.2022.10049317(78-83)Online publication date: 21-Dec-2022
  • (2022)A Performance Evaluation of the Clustering-based Labelling Scheme2022 International Conference on Engineering & MIS (ICEMIS)10.1109/ICEMIS56295.2022.9914208(1-6)Online publication date: 4-Jul-2022
  • (2021)Data structure set-trie for storing and querying sets: Theoretical and empirical analysisPLOS ONE10.1371/journal.pone.024512216:2(e0245122)Online publication date: 10-Feb-2021
  • (2021)Demythization of Structural XML Query Processing: Comparison of Holistic and Binary ApproachesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2019.294615733:4(1439-1452)Online publication date: 1-Apr-2021
  • (2021)An Efficient Data Access Approach With Queue and Stack in Optimized Hybrid JoinIEEE Access10.1109/ACCESS.2021.30642029(41261-41274)Online publication date: 2021
  • (2021)Implementation and Experiment of Join Optimization Algorithm for Inverted Index in an RDBMSAdvances in Computer Science and Ubiquitous Computing10.1007/978-981-15-9343-7_11(79-85)Online publication date: 5-Jan-2021
  • (2020)FLUXProceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering10.1145/3324884.3418916(1379-1381)Online publication date: 21-Dec-2020
  • (2020)Systematic Partitioning and Labeling XML Subtrees for Efficient Processing of XML Queries in IoT EnvironmentsIEEE Access10.1109/ACCESS.2020.29846008(61817-61833)Online publication date: 2020
  • (2020)TADOC: Text analytics directly on compressionThe VLDB Journal10.1007/s00778-020-00636-3Online publication date: 19-Sep-2020
  • (2020)Distributed Tree-Pattern Matching in Big Data Analytics SystemsAdvances in Databases and Information Systems10.1007/978-3-030-54832-2_14(171-186)Online publication date: 17-Aug-2020
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media