Abstract
In many applications, XML documents need to be modelled as graphs. The query processing of graph-structured XML documents brings new challenges. In this paper, we design a method based on labelling scheme for structural queries processing on graph-structured XML documents. We give each node some labels, the reachability labelling scheme. By extending an interval-based reachability labelling scheme for DAG by Rakesh et al., we design labelling schemes to support the judgements of reachability relationships for general graphs. Based on the labelling schemes, we design graph structural join algorithms to answer the structural queries with only ancestor-descendant relationship efficiently. For the processing of subgraph query, we design a subgraph join algorithm. With efficient data structure, the subgraph join algorithm can process subgraph queries with various structures efficiently. Experimental results show that our algorithms have good performance and scalability.
Similar content being viewed by others
References
Al-Khalifa, S., Jagadish, H.V., Patel, J.M., Wu, Y., Koudas, N., Srivastava, D.: Structural joins: a primitive for efficient XML query pattern matching. In: Proceedings of the 18th International Conference on Data Engineering (ICDE 2002), pp. 141–152 (2002)
Braga, D., Campi, A.: XQBE: a graphical environment to query xml data. World Wide Web 8(3), 287–316 (2005)
Bruno, N., Koudas, N., Srivastava, D.: Holistic twig joins: optimal XML pattern matching. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data (SIGMOD 2002), pp. 310–321 (2002)
Chamberlin, D.D., Florescu, D., Robie, J.: XQuery: a query language for XML. In: W3C Working Draft. http://www.w3.org/TR/xquery (2001)
Chen, L., Gupta, A., Kurul, M.E.: Stack-based algorithms for pattern matching on dags. In: VLDB, pp. 493–504 (2005)
Cheng, J., Yu, J.X., Lin, X., Wang, H., Yu, P.S.: Fast computation of reachability labeling for large graphs. In: Ioannidis, Y.E., Scholl, M.H., Schmidt, J.W., Matthes, F., Hatzopoulos, M., Böhm, K., Kemper, A., Grust, T., Böhm, C. (eds.) EDBT. Lecture Notes in Computer Science, vol. 3896, pp. 961–979. Springer (2006)
Chien, S.-Y., Vagena, Z., Zhang, D., Tsotras, V.J., Zaniolo, C.: Efficient structural joins on indexed XML documents. In: Proceedings of 28th International Conference on Very Large Data Bases (VLDB 2002), pp. 263–274 (2002)
Clark, J., DeRose, S.: XML path language (XPath). In W3C recommendation, 16 November 1999. http://www.w3.org/TR/xpath (1999)
Cohen, E., Halperin, E., Kaplan, H., Zwick, U.: Reachability and distance queries via 2-hop labels. In: SODA, pp. 937–946 (2002)
Grust, T.: Accelerating XPath location steps. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data (SIGMOD 2002), pp. 109–120. Hong Kong, China August (2002)
He, H., Wang, H., Yang, J., Yu, P.S.: Compact reachability labeling for graph-structured data. In: Herzog, O., Schek, H.-J., Fuhr, N., Chowdhury, A., Teiken, W. (eds.) Proceedings of the 2005 ACM CIKM International Conference on Information and Knowledge Management (CIKM2005), Bremen, Germany, October 31–November 5, 2005, pp. 594–601. ACM (2005)
Jiang, H., Lu, H., Wang, W., Ooi, B.C.: XR-Tree: indexing XML data for efficient structural join. In: Proceedings of the 19th International Conference on Data Engineering (ICDE 2003), pp. 253–263 (2003)
Kameda, T.: On the vector representation of the reachability in planar directed graphs. Information Process Letters 3(3), 78–80 (1975)
Kaushik, R., Bohannon, P., Naughton, J.F., Korth, H.F.: Covering indexes for branching path queries. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data (SIGMOD 2002), pp. 133–144 (2002)
Li, Q., Moon, B.: Indexing and querying XML data for regular path expressions. In: Proceedings of 27th International Conference on Very Large Data Base (VLDB 2001), pp. 361–370 (2001)
Milo, T., Suciu, D.: Index structures for path expressions. In: Proceedings of the 7th International Conference on Database Theory (ICDT 1999), pp. 277–295 (1999)
Rakesh Agrawal, H.V.J., Borgida A.: Efficient management of transitive relationships in large data and knowledge bases. In: Proceedings of the 1989 ACM SIGMOD International Conference on Management of Data (SIGMOD 1989), pp. 253–262. Portland, Oregon, May (1989)
Ralf Schenkel, G.W., Theobald, A.: HOPI: an efficient connection index for complex xml document collections. In: Advances in Database Technology—EDBT 2004, 9th International Conference on Extending Database Technology(EDBT04), pp. 237–255, Heraklion, Crete, Greece, March 14–18 (2004)
Schmidt, A., Waas, F., Kersten, M.L., Carey, M.J., Manolescu, I., Busse, R.: XMark: a benchmark for XML data management. In: Proceedings of 28th International Conference on Very Large Data Bases (VLDB 2002), pp. 974–985 (2002)
Cormen, C.L.T., Rivest, R.: Introduction to Algorithms. MIT Press, Cambridge MA (1990)
Bray, J.P.T., Sperberg-McQueen, C.M., Yergeau, F.: Extensible markup language (xml) 1.0 (third edition). In: W3C Recommendation 04 February 2004. http://www.w3.org/TR/REC-xml/ (2004)
Trißl, S., Leser, U.: Fast and practical indexing and querying of very large graphs. In: SIGMOD Conference, pp. 845–856 (2007)
Vassilis Christophides, M.S.S.T., Plexousakis, D.: On labeling schemes for the semantic web. In: Proceedings of the Twelfth International World Wide Web Conference(WWW2003), pp. 544–555. Budapest, Hungary, May (2003)
Wang, H., He, H., Yang, J., Yu, P.S., Yu, J.X.: Dual labeling: answering graph reachability queries in constant time. In: Liu, L., Reuter, A., Whang, K.-Y., Zhang, J. (eds.) Proceedings of the 22nd International Conference on Data Engineering, ICDE 2006, 3–8 April 2006, Atlanta, GA, USA, p 75. IEEE Computer Society (2006)
Wang, H., Li, J., Wang, H.: Clustered chain path index for xml document: efficiently processing branch queries. World Wide Web 11(1), 153–168 (2008)
Wang, W., Jiang, H., Lu, H., Yu, J.X.: PBiTree coding and efficient processing of containment joins. In: Proceedings of the 19th International Conference on Data Engineering (ICDE 2003), pp. 391–402 (2003)
Wong, K.-F., Yu, J.X., Tang, N.: Answering xml queries using path-based indexes: a survey. World Wide Web 9(3), 277–299 (2006)
Zhang, C., Naughton, J.F., DeWitt, D.J., Luo, Q., Lohman, G.M.: On supporting containment queries in relational database management systems. In: Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data (SIGMOD 2001), pp. 425–436 (2001)
Zografoula Vagena, V.J.T., Moura Moro, M.: Twig query processing over graph-structured xml data. In: Proceedings of the Seventh International Workshop on the Web and Databases(WebDB 2004), pp. 43–48 (2004)
Author information
Authors and Affiliations
Corresponding author
Additional information
Support by the Key Program of the National Natural Science Foundation of China under Grant No.60533110; the National Grand Fundamental Research 973 Program of China under Grant No. 2006CB303000; the National Natural Science Foundation of China under Grant No. 60773068 and No. 60773063.
Rights and permissions
About this article
Cite this article
Wang, H., Li, J., Wang, W. et al. Coding-based Join Algorithms for Structural Queries on Graph-Structured XML Document. World Wide Web 11, 485–510 (2008). https://doi.org/10.1007/s11280-008-0050-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11280-008-0050-4