Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Incorporating constraints in probabilistic XML

Published: 03 September 2009 Publication History

Abstract

Constraints are important, not only for maintaining data integrity, but also because they capture natural probabilistic dependencies among data items. A probabilistic XML database (PXDB) is the probability subspace comprising the instances of a p-document that satisfy a set of constraints. In contrast to existing models that can express probabilistic dependencies, it is shown that query evaluation is tractable in PXDBs. The problems of sampling and determining well-definedness (i.e., whether the aforesaid subspace is nonempty) are also tractable. Furthermore, queries and constraints can include the aggregate functions count, max, min, and ratio. Finally, this approach can be easily extended to allow a probabilistic interpretation of constraints.

References

[1]
Abiteboul, S., Kimelfeld, B., Sagiv, Y., and Senellart, P. 2009. On the expressiveness of probabilistic XML models. VLDB J.
[2]
Abiteboul, S. and Senellart, P. 2006. Querying and updating probabilistic information in XML. In Proceedings of the International Conference on Extending Database Technology (EDBT). Springer, 1059--1068.
[3]
Bidoit, N. and Colazzo, D. 2007. Testing XML constraint satisfiability. Electr. Notes Theor. Comput. Sci. 174, 6, 45--61.
[4]
Bruno, N., Koudas, N., and Srivastava, D. 2002. Holistic twig joins: Optimal XML pattern matching. In Proceedings of the ACM SIGMOD Conference on Management of Data. ACM, 310--321.
[5]
Buneman, P., Davidson, S. B., Fan, W., Hara, C. S., and Tan, W. C. 2002. Keys for XML. Comput. Netw. 39, 5, 473--487.
[6]
Cohen, S., Kimelfeld, B., and Sagiv, Y. 2008. Incorporating constraints in probabilistic XML. In Proceedings of the 27th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. ACM, 109--118.
[7]
Cohen, S., Kimelfeld, B., and Sagiv, Y. 2009. Running tree automata on probabilistic XML. In Proceedings of the 28th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. ACM, 227--236.
[8]
Cooper, G. F. 1990. The computational complexity of probabilistic inference using Bayesian belief networks. Artif. Intell. 42, 2-3, 393--405.
[9]
Dagum, P. and Luby, M. 1993. Approximating probabilistic inference in bayesian belief networks is NP-hard. Artif. Intell. 60, 1, 141--153.
[10]
Dalvi, N. N. and Suciu, D. 2004. Efficient query evaluation on probabilistic databases. In Proceedings of the International Conference on Very Large Database (VLDB). Morgan Kaufmann, 864--875.
[11]
Dalvi, N. N. and Suciu, D. 2007. The dichotomy of conjunctive queries on probabilistic structures. In Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS). ACM, 293--302.
[12]
Fan, W., Kuper, G. M., and Siméon, J. 2002. A unified constraint model for XML. Comput. Netw. 39, 5, 489--505.
[13]
Fan, W. and Libkin, L. 2002. On XML integrity constraints in the presence of DTDs. J. ACM 49, 3, 368--406.
[14]
Fan, W. and Siméon, J. 2003. Integrity constraints for XML. J. Comput. Syst. Sci. 66, 1, 254-- 291.
[15]
Frick, M. and Grohe, M. 2002. The complexity of first-order and monadic second-order logic revisited. In Proceedings of the Annual IEEE Symposium on Logic in Computer Science (LICS). IEEE Computer Society, 215--224.
[16]
Hung, E., Getoor, L., and Subrahmanian, V. S. 2003a. Probabilistic interval XML. In Proceedings of the International Conference on Database Theory (ICDT). Springer, 361--377.
[17]
Hung, E., Getoor, L., and Subrahmanian, V. S. 2003b. PXML: A probabilistic semistructured data model and algebra. In Proceedings of the International Conference on Data Engineering (ICDE). IEEE Computer Society, 467--478.
[18]
Kimelfeld, B., Kosharovsky, Y., and Sagiv, Y. 2008. Query efficiency in probabilistic XML models. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM, 701--714.
[19]
Kimelfeld, B., Kosharovsky, Y., and Sagiv, Y. 2009. Query evaluation over probabilistic XML. VLDB J.
[20]
Kimelfeld, B. and Sagiv, Y. 2007a. Matching twigs in probabilistic XML. In Proceedings of the International Conference on Very Large Databases (VLDB). ACM, 27--38.
[21]
Kimelfeld, B. and Sagiv, Y. 2007b. Maximally joining probabilistic data. In Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS). ACM, 303--312.
[22]
Li, T., Shao, Q., and Chen, Y. 2006. PEPX: A query-friendly probabilistic XML database. In Proceedings of the 2006 ACM CIKM International Conference on Information and Knowledge Management. ACM Press, 848--849.
[23]
Neven, F. and Schwentick, T. 2002. Query automata over finite trees. Theor. Comput. Sci. 275, 1-2, 633--674.
[24]
Nierman, A. and Jagadish, H. V. 2002. ProTDB: Probabilistic data in XML. In Proceedings of the International Conference on Very Large Database (VLDB). ACM, 646--657.
[25]
Pearl, J. 1985. Bayesian networks: A model of self-activated memory for evidential reasoning. In Proceedings of the CogSci. Cognitive Science Society, University of California, Irvine, CA, 329--334.
[26]
Provan, J. S. and Ball, M. O. 1983. The complexity of counting cuts and of computing the probability that a graph is connected. SIAM J. Comput. 12, 4, 777--788.
[27]
Re, C., Dalvi, N. N., and Suciu, D. 2007. Efficient top-k query evaluation on probabilistic data. In Proceedings of the International Conference on Data Engineering (ICDE). IEEE, 886--895.
[28]
Re, C. and Suciu, D. 2007. Efficient evaluation of HAVING queries on a probabilistic database. In Proceedings of the International Conference on Database Programming Languages (DBPL). Springer, 186--200.
[29]
Senellart, P. and Abiteboul, S. 2007. On the complexity of managing probabilistic XML data. In Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS). ACM, 283--292.
[30]
Tamaki, H. and Sato, T. 1986. OLD resolution with tabulation. In Proceedings of the International Conference on Logic Programming (ICLP). Springer, 84--98.
[31]
Toda, S. and Ogiwara, M. 1992. Counting classes are at least as hard as the polynomial-time hierarchy. SIAM J. Comput. 21, 2, 316--328.
[32]
van Keulen, M., de Keijzer, A., and Alink, W. 2005. A probabilistic XML approach to data integration. In Proceedings of the International Conference on Data Engineering (ICDE). IEEE Computer Society, 459--470.
[33]
Warren, D. S. 1992. Memoing for logic programs. Comm. ACM 35, 3, 93--111.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Database Systems
ACM Transactions on Database Systems  Volume 34, Issue 3
August 2009
269 pages
ISSN:0362-5915
EISSN:1557-4644
DOI:10.1145/1567274
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 September 2009
Accepted: 01 June 2009
Revised: 01 March 2009
Received: 01 September 2008
Published in TODS Volume 34, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Probabilistic databases
  2. constraints
  3. probabilistic XML
  4. sampling probabilistic data

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 30 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2018)ELCA evaluation for keyword search on probabilistic XML dataWorld Wide Web10.1007/s11280-012-0166-416:2(171-193)Online publication date: 25-Dec-2018
  • (2018)An approach of top-k keyword querying for fuzzy XMLComputing10.1007/s00607-017-0577-2100:3(303-330)Online publication date: 1-Mar-2018
  • (2018)Generating, Sampling and Counting Subclasses of Regular Tree LanguagesTheory of Computing Systems10.1007/s00224-012-9428-x52:3(542-585)Online publication date: 25-Dec-2018
  • (2015)Structurally Tractable Uncertain DataProceedings of the 2015 ACM SIGMOD on PhD Symposium10.1145/2744680.2744690(39-44)Online publication date: 31-May-2015
  • (2014)Quasi-SLCA Based Keyword Query Processing over Probabilistic XML DataIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2013.6726:4(957-969)Online publication date: Apr-2014
  • (2014)Consistency checking and querying in probabilistic databases under integrity constraintsJournal of Computer and System Sciences10.1016/j.jcss.2014.04.02680:7(1448-1489)Online publication date: Nov-2014
  • (2013)Probabilistic XML: Models and ComplexityAdvances in Probabilistic Databases for Uncertain Information Management10.1007/978-3-642-37509-5_3(39-66)Online publication date: 2013
  • (2012)Efficient probabilistic XML query processing using an extended labeling scheme and a lightweight indexInformation Processing & Management10.1016/j.ipm.2012.01.00548:6(1181-1202)Online publication date: Nov-2012
  • (2011)Efficient query evaluation over probabilistic XML with long-distance dependenciesProceedings of the 2011 Joint EDBT/ICDT Ph.D. Workshop10.1145/1966874.1966880(32-37)Online publication date: 25-Mar-2011
  • (2011)Generating, sampling and counting subclasses of regular tree languagesProceedings of the 14th International Conference on Database Theory10.1145/1938551.1938559(30-41)Online publication date: 21-Mar-2011
  • Show More Cited By

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media