article

Free access

Query evaluation over probabilistic XML

Authors:

Benny Kimelfeld,

Yuri Kosharovsky,

Yehoshua SagivAuthors Info & Claims

The VLDB Journal — The International Journal on Very Large Data Bases, Volume 18, Issue 5

Pages 1117 - 1140

https://doi.org/10.1007/s00778-009-0150-5

Published: 01 October 2009 Publication History

Abstract

Query evaluation over probabilistic XML is explored. The queries are twig patterns with projection, and the data is represented in terms of three models of probabilistic XML (that extend existing ones in the literature). The first model makes an assumption of independence among the probabilistic junctions, whereas the second model can encode probabilistic dependencies. The third model combines the first two and, hence, is the most general. An efficient algorithm (under data complexity) is given for query evaluation in the first model. In addition, various optimizations are proposed, and their effectiveness is shown both analytically and experimentally. For the other two models, it is shown that every query is either intractable or trivial. Nonetheless, efficient (additive and multiplicative) approximation algorithms are given for these two models. Finally, Boolean queries are enriched by allowing disjunctions and negations of branches. The above algorithm for the first model is extended to handle these queries. For the other two models, there is an efficient additive approximation, and a multiplicative one also exists if there is no negation; in addition, it is shown that if the query is non-monotonic, then no efficient multiplicative approximation exists unless NP = RP.

References

[1]

Kimelfeld, B., Sagiv, Y.: Matching twigs in probabilistic XML. In: Proceedings of the Thirty Third International Conference on Very Large Data Bases, pp. 27-38. ACM Press, New York (2007).

Digital Library

[2]

Kimelfeld, B., Kosharovski, Y., Sagiv, Y.: Query efficiency in probabilistic XML models. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM Press, New York (2008).

Digital Library

[3]

Pittarelli, M.: An algebra for probabilistic databases. IEEE Trans. Knowl. Data Eng. 6(2), 293-303 (1994).

Digital Library

[4]

Re, C., Dalvi, N.N., Suciu, D.: Efficient top-k query evaluation on probabilistic data. In: Proceedings of the 23rd International Conference on Data Engineering, pp. 886-895. IEEE Computer Society, USA (2007).

[5]

Dalvi, N.N., Suciu, D.: The dichotomy of conjunctive queries on probabilistic structures. In: Proceedings of the 26th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 293-302. ACM Press, New York (2007).

Digital Library

[6]

Dalvi, N.N., Suciu, D.: Efficient query evaluation on probabilistic databases. VLDB J. 16(4), 523-544 (2007).

Digital Library

[7]

Dey, D., Sarkar, S.: A probabilistic relational model and algebra. ACM Trans. Database Syst. 21(3), 339-369 (1996).

Digital Library

[8]

Fuhr, N., Rölleke, T.: A probabilistic relational algebra for the integration of information retrieval and database systems. ACM Trans. Inf. Syst. 15(1), 32-66 (1997).

Digital Library

[9]

Lakshmanan, L.V.S., Leone, N., Ross, R.B., Subrahmanian, V.S.: ProbView: a flexible probabilistic database system. ACM Trans. Database Syst. 22(3), 419-469 (1997).

Digital Library

[10]

Barbará, D., Garcia-Molina, H., Porter, D.: The management of probabilistic data. IEEE Trans. Knowl. Data Eng. 4(5), 487- 502 (1992).

Digital Library

[11]

Nierman, A., Jagadish, H.V.: ProTDB: probabilistic data in XML. In: Proceedings of 28th International Conference on Very Large Data Bases, pp. 646-657. MorganKaufmann, San Francisco (2002).

Digital Library

[12]

Abiteboul, S., Senellart, P.: Querying and updating probabilistic information in XML. In: Advances in Database Technology-- EDBT 2006, 10th International Conference on Extending Database Technology, Lecture Notes in Computer Science, vol. 3896, pp. 1059-1068. Springer, Berlin (2006).

Digital Library

[13]

Senellart, P., Abiteboul, S.: On the complexity of managing probabilistic XML data. In: Proceedings of the 26th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 283-292. ACM Press, New York (2007).

Digital Library

[14]

Hung, E., Getoor, L., Subrahmanian, V.S.: PXML: a probabilistic semistructured data model and algebra. In: Proceedings of the 19th International Conference on Data Engineering, pp. 467-478 (2003).

[15]

Hung, E., Getoor, L., Subrahmanian, V.S.: Probabilistic interval XML. ACM Trans. Comput. Logic 8(4) (2007).

Digital Library

[16]

van Keulen, M., de Keijzer, A., Alink, W.: A probabilistic XML approach to data integration. In: Proceedings of the 21st International Conference on Data Engineering, ICDE 2005, pp. 459-470. IEEE Computer Society, USA (2005).

Digital Library

[17]

Li, T., Shao, Q., Chen, Y.: PEPX: a query-friendly probabilistic XML database. In: Proceedings of the 2006 ACM CIKM International Conference on Information and Knowledge Management, pp. 848-849. ACM Press, New York (2006).

Digital Library

[18]

Abiteboul, S., Kimelfeld, B., Sagiv, Y., Senellart, P.: On the expressiveness of probabilistic XML models. VLDB J. (2009). 1007/s00778-009-0146-1.

Digital Library

[19]

Miklau, G., Suciu, D.: Containment and equivalence for an XPath fragment. In: Proceedings of the 21st ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 65-76. ACM Press, New York (2002).

Digital Library

[20]

Kimelfeld, B., Sagiv, Y.: Revisiting redundancy and minimization in an XPath fragment. In: 11th International Conference on Extending-Database Technology, pp. 61-72. ACM Press, New York (2008).

Digital Library

[21]

Bruno, N., Koudas, N., Srivastava, D.: Holistic twig joins: optimal XML pattern matching. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, pp. 310-321. ACM Press, New York (2002).

Digital Library

[22]

Cohen, S., Kanza, Y., Kogan, Y.A., Sagiv, Y., Nutt, W., Serebrenik, A.: EquiX--a search and query language for XML. J. Am. Soc. Inf. Sci. Technol. 53(6), 454-46 (2002).

Digital Library

[23]

Vardi, M.Y.: The complexity of relational query languages (extended abstract). In: Proceedings of the Fourteenth Annual ACM Symposium on Theory of Computing, pp. 137-146. ACM Press, New York (1982).

Digital Library

[24]

Johnson, D., Yannakakis, M., Papadimitriou, C.: On generating all maximal independent sets. Inf. Process. Lett. 27, 119-123 (1988).

Digital Library

[25]

Chandra, A.K., Merlin, P.M.: Optimal implementation of conjunctive queries in relational data bases. In: Conference Record of the Ninth Annual ACM Symposium on Theory of Computing, pp. 77- 90. ACM Press, New York (1977).

Digital Library

[26]

Papadimitriou, C.H., Yannakakis, M.: On the complexity of database queries. J. Comput. Syst. Sci. 58(3), 407-427 (1999).

Digital Library

[27]

Downey, R.G., Fellows, M.R.: Fixed-parameter tractability and completeness. I. Basic results. SIAM J. Comput. 24(4), 873- 921 (1995).

Digital Library

[28]

Toda, S., Ogiwara, M.: Counting classes are at least as hard as the polynomial-time hierarchy. SIAM J. Comput. 21(2), 316-328 (1992).

Digital Library

[29]

Grädel, E., Gurevich, Y., Hirsch, C.: The complexity of query reliability. In: Proceedings of the 17th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 227- 234. ACM Press, New York (1998).

Digital Library

[30]

Provan, J.S., Ball, M.O.: The complexity of counting cuts and of computing the probability that a graph is connected. SIAM J. Comput. 12(4), 777-788 (1983).

Digital Library

[31]

Warren, D.S.: Memoing for logic programs. Commun. ACM 35(3), 93-111 (1992).

Digital Library

[32]

Karp, R.M., Luby, M., Madras, N.: Monte-Carlo approximation algorithms for enumeration problems. J. Algorithms 10(3), 429- 448 (1989).

Digital Library

[33]

Ko, K.I.: Some observations on the probabilistic algorithms and NP-hard problems. Inf. Process. Lett. 14(1), 39-43 (1982).

[34]

Zachos, S.: Probabilistic quantifiers and games. J. Comput. Syst. Sci. 36(3), 433-451 (1988).

Digital Library

[35]

Roth, D.: On the hardness of approximate reasoning. Artif. Intell. 82(1-2), 273-302 (1996).

Digital Library

[36]

Cohen, S., Kimelfeld, B., Sagiv, Y.: Incorporating constraints in probabilistic XML. In: Proceedings of the 27th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 109-118. ACM Press, New York (2008).

Digital Library

[37]

Cohen, S., Kimelfeld, B., Sagiv, Y.: Running tree automata on probabilistic XML. In: Proceedings of the Twenty-Eigthth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 227-236. ACM (2009).

Digital Library

[38]

Meyer, A.R.: Weak monadic second-order theory of successor is not elementary recursive. Log. Colloquim 453, 132-154 (1975).

[39]

Frick, M., Grohe, M.: The complexity of first-order and monadic second-order logic revisited. In: LICS, pp. 215-224. IEEE Computer Society, USA (2002).

Digital Library

Cited By

Li LWang HLi JGao H(2020)A survey of uncertain data managementFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-017-7063-z14:1(162-190)Online publication date: 1-Feb-2020
https://dl.acm.org/doi/10.1007/s11704-017-7063-z
Deutch DGilad AMoskovitch Y(2018)Efficient provenance tracking for datalog using top-k queriesThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-018-0496-727:2(245-269)Online publication date: 1-Apr-2018
https://dl.acm.org/doi/10.1007/s00778-018-0496-7
Zhao XBi XWang GZhang ZYang H(2016)Uncertain XML documents classification using Extreme Learning MachineNeurocomputing10.1016/j.neucom.2015.02.095174:PA(375-382)Online publication date: 22-Jan-2016
https://dl.acm.org/doi/10.1016/j.neucom.2015.02.095
Show More Cited By

Index Terms

Query evaluation over probabilistic XML

Recommendations

Query efficiency in probabilistic XML models
SIGMOD '08: Proceedings of the 2008 ACM SIGMOD international conference on Management of data

Various known models of probabilistic XML can be represented as instantiations of abstract p-documents. Such documents have, in addition to ordinary nodes, distributional nodes that specify the probabilistic process of generating a random document. ...
Sensitivity analysis and explanations for robust query evaluation in probabilistic databases
SIGMOD '11: Proceedings of the 2011 ACM SIGMOD International Conference on Management of data

Probabilistic database systems have successfully established themselves as a tool for managing uncertain data. However, much of the research in this area has focused on efficient query evaluation and has largely ignored two key issues that commonly ...
Efficient processing of top-k twig queries over probabilistic XML data

The flexibility of XML data model allows a more natural representation of uncertain data compared with the relational model. Matching twig pattern against XML data is a fundamental problem in querying information from XML documents. For a probabilistic ...

Comments

Information & Contributors

Information

Published In

cover image The VLDB Journal — The International Journal on Very Large Data Bases

The VLDB Journal — The International Journal on Very Large Data Bases Volume 18, Issue 5

October 2009

252 pages

ISSN:1066-8888

Issue’s Table of Contents

Copyright © Copyright © 2009 Springer-Verlag.

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 01 October 2009

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

23
Total Citations
View Citations
220
Total Downloads

Downloads (Last 12 months)23
Downloads (Last 6 weeks)8

Reflects downloads up to 06 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Li LWang HLi JGao H(2020)A survey of uncertain data managementFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-017-7063-z14:1(162-190)Online publication date: 1-Feb-2020
https://dl.acm.org/doi/10.1007/s11704-017-7063-z
Deutch DGilad AMoskovitch Y(2018)Efficient provenance tracking for datalog using top-k queriesThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-018-0496-727:2(245-269)Online publication date: 1-Apr-2018
https://dl.acm.org/doi/10.1007/s00778-018-0496-7
Zhao XBi XWang GZhang ZYang H(2016)Uncertain XML documents classification using Extreme Learning MachineNeurocomputing10.1016/j.neucom.2015.02.095174:PA(375-382)Online publication date: 22-Jan-2016
https://dl.acm.org/doi/10.1016/j.neucom.2015.02.095
Deutch DGilad AMoskovitch Y(2015)Selective provenance for datalog programs using top-k queriesProceedings of the VLDB Endowment10.14778/2824032.28240398:12(1394-1405)Online publication date: 1-Aug-2015
https://dl.acm.org/doi/10.14778/2824032.2824039
Ba MAbdessalem TSenellart PMarinai SMarriott K(2013)Uncertain version control in open collaborative editing of tree-structured documentsProceedings of the 2013 ACM symposium on Document engineering10.1145/2494266.2494277(27-36)Online publication date: 10-Sep-2013
https://dl.acm.org/doi/10.1145/2494266.2494277
Zhou RLiu CLi JYu J(2013)ELCA evaluation for keyword search on probabilistic XML dataWorld Wide Web10.1007/s11280-012-0166-416:2(171-193)Online publication date: 1-Mar-2013
https://dl.acm.org/doi/10.1007/s11280-012-0166-4
Liu JMa Z(2013)Formal transformation from fuzzy object-oriented databases to fuzzy XMLApplied Intelligence10.1007/s10489-013-0438-439:3(630-641)Online publication date: 1-Oct-2013
https://dl.acm.org/doi/10.1007/s10489-013-0438-4
Amarilli ASenellart P(2013)On the connections between relational and XML probabilistic data modelsProceedings of the 29th British National conference on Big Data10.1007/978-3-642-39467-6_13(121-134)Online publication date: 8-Jul-2013
https://dl.acm.org/doi/10.1007/978-3-642-39467-6_13
Cautis BKharlamov E(2012)Answering queries using views over probabilistic XMLProceedings of the VLDB Endowment10.14778/2350229.23502355:11(1148-1159)Online publication date: 1-Jul-2012
https://dl.acm.org/doi/10.14778/2350229.2350235
Souihli ASenellart PChen XLebanon GWang HZaki M(2012)Demonstrating ProApproX 2.0Proceedings of the 21st ACM international conference on Information and knowledge management10.1145/2396761.2398744(2734-2736)Online publication date: 29-Oct-2012
https://dl.acm.org/doi/10.1145/2396761.2398744
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Figures

Tables

Media

View Issue’s Table of Contents