Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article
Free access

Query evaluation over probabilistic XML

Published: 01 October 2009 Publication History

Abstract

Query evaluation over probabilistic XML is explored. The queries are twig patterns with projection, and the data is represented in terms of three models of probabilistic XML (that extend existing ones in the literature). The first model makes an assumption of independence among the probabilistic junctions, whereas the second model can encode probabilistic dependencies. The third model combines the first two and, hence, is the most general. An efficient algorithm (under data complexity) is given for query evaluation in the first model. In addition, various optimizations are proposed, and their effectiveness is shown both analytically and experimentally. For the other two models, it is shown that every query is either intractable or trivial. Nonetheless, efficient (additive and multiplicative) approximation algorithms are given for these two models. Finally, Boolean queries are enriched by allowing disjunctions and negations of branches. The above algorithm for the first model is extended to handle these queries. For the other two models, there is an efficient additive approximation, and a multiplicative one also exists if there is no negation; in addition, it is shown that if the query is non-monotonic, then no efficient multiplicative approximation exists unless NP = RP.

References

[1]
Kimelfeld, B., Sagiv, Y.: Matching twigs in probabilistic XML. In: Proceedings of the Thirty Third International Conference on Very Large Data Bases, pp. 27-38. ACM Press, New York (2007).
[2]
Kimelfeld, B., Kosharovski, Y., Sagiv, Y.: Query efficiency in probabilistic XML models. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM Press, New York (2008).
[3]
Pittarelli, M.: An algebra for probabilistic databases. IEEE Trans. Knowl. Data Eng. 6(2), 293-303 (1994).
[4]
Re, C., Dalvi, N.N., Suciu, D.: Efficient top-k query evaluation on probabilistic data. In: Proceedings of the 23rd International Conference on Data Engineering, pp. 886-895. IEEE Computer Society, USA (2007).
[5]
Dalvi, N.N., Suciu, D.: The dichotomy of conjunctive queries on probabilistic structures. In: Proceedings of the 26th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 293-302. ACM Press, New York (2007).
[6]
Dalvi, N.N., Suciu, D.: Efficient query evaluation on probabilistic databases. VLDB J. 16(4), 523-544 (2007).
[7]
Dey, D., Sarkar, S.: A probabilistic relational model and algebra. ACM Trans. Database Syst. 21(3), 339-369 (1996).
[8]
Fuhr, N., Rölleke, T.: A probabilistic relational algebra for the integration of information retrieval and database systems. ACM Trans. Inf. Syst. 15(1), 32-66 (1997).
[9]
Lakshmanan, L.V.S., Leone, N., Ross, R.B., Subrahmanian, V.S.: ProbView: a flexible probabilistic database system. ACM Trans. Database Syst. 22(3), 419-469 (1997).
[10]
Barbará, D., Garcia-Molina, H., Porter, D.: The management of probabilistic data. IEEE Trans. Knowl. Data Eng. 4(5), 487- 502 (1992).
[11]
Nierman, A., Jagadish, H.V.: ProTDB: probabilistic data in XML. In: Proceedings of 28th International Conference on Very Large Data Bases, pp. 646-657. MorganKaufmann, San Francisco (2002).
[12]
Abiteboul, S., Senellart, P.: Querying and updating probabilistic information in XML. In: Advances in Database Technology-- EDBT 2006, 10th International Conference on Extending Database Technology, Lecture Notes in Computer Science, vol. 3896, pp. 1059-1068. Springer, Berlin (2006).
[13]
Senellart, P., Abiteboul, S.: On the complexity of managing probabilistic XML data. In: Proceedings of the 26th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 283-292. ACM Press, New York (2007).
[14]
Hung, E., Getoor, L., Subrahmanian, V.S.: PXML: a probabilistic semistructured data model and algebra. In: Proceedings of the 19th International Conference on Data Engineering, pp. 467-478 (2003).
[15]
Hung, E., Getoor, L., Subrahmanian, V.S.: Probabilistic interval XML. ACM Trans. Comput. Logic 8(4) (2007).
[16]
van Keulen, M., de Keijzer, A., Alink, W.: A probabilistic XML approach to data integration. In: Proceedings of the 21st International Conference on Data Engineering, ICDE 2005, pp. 459-470. IEEE Computer Society, USA (2005).
[17]
Li, T., Shao, Q., Chen, Y.: PEPX: a query-friendly probabilistic XML database. In: Proceedings of the 2006 ACM CIKM International Conference on Information and Knowledge Management, pp. 848-849. ACM Press, New York (2006).
[18]
Abiteboul, S., Kimelfeld, B., Sagiv, Y., Senellart, P.: On the expressiveness of probabilistic XML models. VLDB J. (2009). 1007/s00778-009-0146-1.
[19]
Miklau, G., Suciu, D.: Containment and equivalence for an XPath fragment. In: Proceedings of the 21st ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 65-76. ACM Press, New York (2002).
[20]
Kimelfeld, B., Sagiv, Y.: Revisiting redundancy and minimization in an XPath fragment. In: 11th International Conference on Extending-Database Technology, pp. 61-72. ACM Press, New York (2008).
[21]
Bruno, N., Koudas, N., Srivastava, D.: Holistic twig joins: optimal XML pattern matching. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, pp. 310-321. ACM Press, New York (2002).
[22]
Cohen, S., Kanza, Y., Kogan, Y.A., Sagiv, Y., Nutt, W., Serebrenik, A.: EquiX--a search and query language for XML. J. Am. Soc. Inf. Sci. Technol. 53(6), 454-46 (2002).
[23]
Vardi, M.Y.: The complexity of relational query languages (extended abstract). In: Proceedings of the Fourteenth Annual ACM Symposium on Theory of Computing, pp. 137-146. ACM Press, New York (1982).
[24]
Johnson, D., Yannakakis, M., Papadimitriou, C.: On generating all maximal independent sets. Inf. Process. Lett. 27, 119-123 (1988).
[25]
Chandra, A.K., Merlin, P.M.: Optimal implementation of conjunctive queries in relational data bases. In: Conference Record of the Ninth Annual ACM Symposium on Theory of Computing, pp. 77- 90. ACM Press, New York (1977).
[26]
Papadimitriou, C.H., Yannakakis, M.: On the complexity of database queries. J. Comput. Syst. Sci. 58(3), 407-427 (1999).
[27]
Downey, R.G., Fellows, M.R.: Fixed-parameter tractability and completeness. I. Basic results. SIAM J. Comput. 24(4), 873- 921 (1995).
[28]
Toda, S., Ogiwara, M.: Counting classes are at least as hard as the polynomial-time hierarchy. SIAM J. Comput. 21(2), 316-328 (1992).
[29]
Grädel, E., Gurevich, Y., Hirsch, C.: The complexity of query reliability. In: Proceedings of the 17th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 227- 234. ACM Press, New York (1998).
[30]
Provan, J.S., Ball, M.O.: The complexity of counting cuts and of computing the probability that a graph is connected. SIAM J. Comput. 12(4), 777-788 (1983).
[31]
Warren, D.S.: Memoing for logic programs. Commun. ACM 35(3), 93-111 (1992).
[32]
Karp, R.M., Luby, M., Madras, N.: Monte-Carlo approximation algorithms for enumeration problems. J. Algorithms 10(3), 429- 448 (1989).
[33]
Ko, K.I.: Some observations on the probabilistic algorithms and NP-hard problems. Inf. Process. Lett. 14(1), 39-43 (1982).
[34]
Zachos, S.: Probabilistic quantifiers and games. J. Comput. Syst. Sci. 36(3), 433-451 (1988).
[35]
Roth, D.: On the hardness of approximate reasoning. Artif. Intell. 82(1-2), 273-302 (1996).
[36]
Cohen, S., Kimelfeld, B., Sagiv, Y.: Incorporating constraints in probabilistic XML. In: Proceedings of the 27th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 109-118. ACM Press, New York (2008).
[37]
Cohen, S., Kimelfeld, B., Sagiv, Y.: Running tree automata on probabilistic XML. In: Proceedings of the Twenty-Eigthth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 227-236. ACM (2009).
[38]
Meyer, A.R.: Weak monadic second-order theory of successor is not elementary recursive. Log. Colloquim 453, 132-154 (1975).
[39]
Frick, M., Grohe, M.: The complexity of first-order and monadic second-order logic revisited. In: LICS, pp. 215-224. IEEE Computer Society, USA (2002).

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image The VLDB Journal — The International Journal on Very Large Data Bases
The VLDB Journal — The International Journal on Very Large Data Bases  Volume 18, Issue 5
October 2009
252 pages

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 01 October 2009

Author Tags

  1. Approximate query evaluation
  2. Probabilistic XML
  3. Probabilistic databases
  4. Query optimization
  5. Query processing

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)23
  • Downloads (Last 6 weeks)8
Reflects downloads up to 06 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2020)A survey of uncertain data managementFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-017-7063-z14:1(162-190)Online publication date: 1-Feb-2020
  • (2018)Efficient provenance tracking for datalog using top-k queriesThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-018-0496-727:2(245-269)Online publication date: 1-Apr-2018
  • (2016)Uncertain XML documents classification using Extreme Learning MachineNeurocomputing10.1016/j.neucom.2015.02.095174:PA(375-382)Online publication date: 22-Jan-2016
  • (2015)Selective provenance for datalog programs using top-k queriesProceedings of the VLDB Endowment10.14778/2824032.28240398:12(1394-1405)Online publication date: 1-Aug-2015
  • (2013)Uncertain version control in open collaborative editing of tree-structured documentsProceedings of the 2013 ACM symposium on Document engineering10.1145/2494266.2494277(27-36)Online publication date: 10-Sep-2013
  • (2013)ELCA evaluation for keyword search on probabilistic XML dataWorld Wide Web10.1007/s11280-012-0166-416:2(171-193)Online publication date: 1-Mar-2013
  • (2013)Formal transformation from fuzzy object-oriented databases to fuzzy XMLApplied Intelligence10.1007/s10489-013-0438-439:3(630-641)Online publication date: 1-Oct-2013
  • (2013)On the connections between relational and XML probabilistic data modelsProceedings of the 29th British National conference on Big Data10.1007/978-3-642-39467-6_13(121-134)Online publication date: 8-Jul-2013
  • (2012)Answering queries using views over probabilistic XMLProceedings of the VLDB Endowment10.14778/2350229.23502355:11(1148-1159)Online publication date: 1-Jul-2012
  • (2012)Demonstrating ProApproX 2.0Proceedings of the 21st ACM international conference on Information and knowledge management10.1145/2396761.2398744(2734-2736)Online publication date: 29-Oct-2012
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media