Abstract
This paper introduces the BIRD family of numbering schemes for tree databases, which is based on a structural summary such as the DataGuide. Given the BIRD IDs of two database nodes and the corresponding nodes in the structural summary we decide the extended XPath relations Child, Child + , Child ∗ , Following, NextSibling, NextSibling + , NextSibling ∗ for the nodes without access to the database. Similarly we can reconstruct the parent node and neighbouring siblings of a given node. All decision and reconstruction steps are based on simple arithmetic operations. The BIRD scheme offers high expressivity and efficiency paired with modest storage demands. Compared to other identification schemes with similar expressivity, BIRD performs best in terms of both storage consumption and execution time, with experiments underlining the crucial role of ID reconstruction in query evaluation. A very attractive feature of the BIRD scheme is that all extended XPath relations can be decided and reconstructed in constant time, i.e., independent of tree position and distance of the nodes involved. All results are shown to scale up to the multi-Gigabyte level.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Gottlob, G., Koch, C.: Monadic Datalog and the Expressive Power of Web Information Extraction Languages. Journal of the ACM 51, 74–113 (2004)
University of Pennsylvania: The Penn Treebank Project, Available at www.cis.upenn.edu/~treebank/home.html
Boag, S., Chamberlin, D., Fernández, M.F., Florescu, D., Robie, J., Siméon, J.: XQuery 1.0: An XML Query Language. W3C Working Draft (2004)
Berglund, A., Boag, S., Chamberlin, D., Fernández, M.F., Kay, M., Robie, J., Siméon, J.: XML Path Language (XPath) 2.0. W3C Working Draft (2004)
Schlieder, T., Naumann, F.: Approximate tree embedding for querying xml data. In: Proc. ACM SIGIR Workshop On XML and Information Retrieval (2002)
McHugh, J., Abiteboul, S., Goldman, R., Quass, D., Widom, J.: Lore: A Database Management System for Semistructured Data. SIGMOD Record 26, 54–66 (1997)
Baeza-Yates, R.A., Navarro, G.: XQL and proximal nodes. Journal American Society for Information Science and Technology (JASIST) 53, 504–514 (2002)
Kanne, C.C., Moerkotte, G.: Efficient Storage of XML Data. In: Proc. 16th Int. Conf. on Data Engineering (ICDE), p. 198 (2000)
Li, Q., Moon, B.: Indexing and Querying XML Data for Regular Path Expressions. In: Proc. 27th Int. Conf. on Very Large Data Bases (VLDB), pp. 361–370 (2001)
Grust, T., Sakr, S., Teubner, J.: XQuery on SQL Hosts. In: Proc. 30th Int. Conf. on Very Large Data Bases (VLDB), pp. 252–263 (2004)
Pal, S., et al.: Indexing XML Data Stored in a Relational Database. In: Proc. 30th Int. Conf. on Very Large Data Bases (VLDB), pp. 1134–1145 (2004)
Meuss, H., Schulz, K.U., Weigel, F., et al.: Visual Exploration and Retrieval of XML Document Collections with the Generic System X2. Journ. Dig. Lib. 5, 1–70 (2005)
Gottlob, G., Koch, C., Schulz, K.U.: Conjunctive Queries over Trees. In: Proc. 23rd ACM Symposium on Principles of Database Systems (PODS), pp. 189–200 (2004)
Lee, Y.K., Yoo, S.J., Yoon, K., Berra, P.B.: Index structures for structured documents. In: Proc. 1st ACM Int. Conf. on Digital Libraries, pp. 91–99 (1996)
Bremer, J.M., Gertz, M.: An Efficient XML Node Identification and Indexing Scheme. Technical Report CSE-2003-04, University of California at Davis (2003)
Zhang, C., et al.: On Supporting Containment Queries in Relational Database Management Systems. In: Proc. 20th ACM SIGMOD Conference, pp. 425–436 (2001)
Al-Khalifa, S., et al.: Structural Joins: A Primitive for Efficient XML Query Pattern Matching. In: Proc. 18th Int. Conf. on Data Engineering (ICDE), pp. 141–152 (2002)
Chien, S.Y., Vagena, Z., Zhang, D., Tsotras, V.J.: Efficient Structural Joins on Indexed XML Documents. In: Proc. 28th Int. Conf. on Very Large Data Bases, pp. 263–274 (2002)
Goldman, R., Widom, J.: DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases. In: Proc. 23rd VLDB Conf., pp. 436–445 (1997)
Weigel, F., Schulz, K.U., Meuss, H.: The BIRD Numbering Scheme for XML and Tree Databases. Technical report, CIS, University of Munich (2005), http://www.cis.uni-muenchen.de/~weigel/Literatur/weigel05birdtech.pdf
McHugh, J., Widom, J., Abiteboul, S., Luo, Q., Rajamaran, A.: Indexing Semistructured Data. Technical report, Stanford University, Computer Science Dept. (1998)
Grust, T.: Accelerating XPath location steps. In: Proc. 21st ACM SIGMOD Int. Conf. on Management of Data, pp. 109–120 (2002)
Tatarinov, I., et al.: Storing and Querying Ordered XML Using a Relational Database System. In: Proc. 21st SIGMOD Int. Conf. on Management of Data, pp. 204–215 (2002)
Milo, T., Suciu, D.: Index Structures for Path Expressions. In: Proc. 7th Int. Conf. on Database Theory (ICDT), pp. 277–295 (1999)
O’Neil, P., et al.: ORDPATHs: Insert-Friendly XML Node Labels. In: Proc. 23rd ACM SIGMOD Int. Conf. on Management of Data, pp. 903–908 (2004)
Dietz, P., Sleator, D.: Two Algorithms for Maintaining Order in a List. In: Proc. 19th ACM Symposium on Theory of Computing (STOC), pp. 365–372 (1987)
IMDb: Internet Movie Database, Available at www.imdb.com
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Weigel, F., Schulz, K.U., Meuss, H. (2005). The BIRD Numbering Scheme for XML and Tree Databases – Deciding and Reconstructing Tree Relations Using Efficient Arithmetic Operations. In: Bressan, S., et al. Database and XML Technologies. XSym 2005. Lecture Notes in Computer Science, vol 3671. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11547273_5
Download citation
DOI: https://doi.org/10.1007/11547273_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28583-0
Online ISBN: 978-3-540-31968-9
eBook Packages: Computer ScienceComputer Science (R0)