Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Uniqueness Constraints for Object Stores

Published: 22 June 2023 Publication History

Abstract

Object stores offer an increasingly popular choice for data management and analytics. As with every data model, managing the integrity of objects is fundamental for data quality but also important for the efficiency of update and query operations. In response to shortcomings of unique and existence constraints in object stores, we propose a new principled class of constraints that separates uniqueness from existence dimensions of data quality, and fully supports multiple labels and composite properties. We illustrate benefits of the constraints on real-world examples of property graphs where node integrity is enforced for better update and query performance. The benefits are quantified experimentally in terms of perfectly scaling the access to data through indices that result from the constraints. We establish axiomatic and algorithmic characterizations for the underlying implication problem. In addition, we fully characterize which non-redundant families of constraints attain maximum cardinality for any given finite sets of labels and properties. We exemplify further use cases of the constraints: elicitation of business rules, identification of data quality problems, and design for data quality. Finally, we propose extensions to managing the integrity of objects in object stores such as graph databases.

References

[1]
Munqath Alattar and Attila Sali. 2020. Strongly possible keys for SQL. Journal on Data Semantics 9, 2–3 (2020), 85–99.
[2]
Renzo Angles, Angela Bonifati, Stefania Dumbrava, George Fletcher, Keith W. Hare, Jan Hidders, Victor E. Lee, Bei Li, Leonid Libkin, Wim Martens, Filip Murlak, Josh Perryman, Ognjen Savkovic, Michael Schmidt, Juan F. Sequeda, Slawek Staworko, and Dominik Tomaszuk. 2021. PG-Keys: Keys for property graphs. In Proceedings of the SIGMOD’21: International Conference on Management of Data. 2423–2436.
[3]
Nishita Balamuralikrishna, Yingnan Jiang, Henning Koehler, Uwe Leck, Sebastian Link, and Henri Prade. 2019. Possibilistic keys. Fuzzy Sets and Systems 376 (2019), 1–36.
[4]
Carlo Batini and Andrea Maurino. 2018. Design for data quality. In Proceedings of the Encyclopedia of Database Systems, Second Edition, Ling Liu and M. Tamer Özsu (Eds.).
[5]
Joachim Biskup. 2012. Some remarks on relational database schemes having few minimal keys. In Proceedings of the Conceptual Modelling and Its Theoretical Foundations - Essays Dedicated to Bernhard Thalheim on the Occasion of His 60th Birthday.Antje Düsterhöft, Meike Klettke, and Klaus-Dieter Schewe (Eds.), Lecture Notes in Computer Science, Vol. 7260, Springer. 19–28.
[6]
Joachim Biskup, Umeshwar Dayal, and Philip A. Bernstein. 1979. Synthesizing independent database schemas. In Proceedings of the 1979 ACM SIGMOD International Conference on Management of Data. 143–151.
[7]
Joachim Biskup and Torsten Polle. 2003. Adding inclusion dependencies to an object-oriented data model with uniqueness constraints. Acta Informatica 39, 6–7 (2003), 391–449.
[8]
Angela Bonifati, George H. L. Fletcher, Hannes Voigt, and Nikolay Yakovets. 2018. Querying Graphs. Morgan & Claypool Publishers.
[9]
Pieta Brown and Sebastian Link. 2017. Probabilistic keys. IEEE Transactions on Knowledge and Data Engineering 29, 3 (2017), 670–682.
[10]
Edgar F. Codd. 1970. A relational model of data for large shared data banks. Communications of the ACM 13, 6 (1970), 377–387.
[11]
E. F. Codd. 1971. Further normalization of the data base relational model. Research Report / RJ / IBM / San Jose, California RJ909 (1971), 1–33.
[12]
János Demetrovics. 1978. On the number of candidate keys. Information Processing Letters 7, 6 (1978), 266–269.
[13]
János Demetrovics, Zoltán Füredi, and Gyula O. H. Katona. 1985. Minimum matrix representation of closure operations. Discrete Applied Mathematics 11, 2 (1985), 115–128.
[14]
János Demetrovics, Gyula O. H. Katona, Dezsö Miklós, Oleg Seleznjev, and Bernhard Thalheim. 1998. Asymptotic properties of keys and functional dependencies in random databases. Theoretical Computer Science 190, 2 (1998), 151–166.
[15]
Konrad Engel. 1997. Sperner Theory. Cambridge University Press.
[16]
Ronald Fagin. 1977. Multivalued dependencies and a new normal form for relational databases. ACM Transactions on Database Systems 2, 3 (1977), 262–278.
[17]
Ronald Fagin. 1981. A normal form for relational databases that is based on domains and keys. ACM Transactions on Database Systems 6, 3 (1981), 387–415.
[18]
Wenfei Fan. 2019. Dependencies for graphs: Challenges and opportunities. ACM Journal of Data and Information Quality 11, 2 (2019), 5:1–5:12.
[19]
Wenfei Fan, Zhe Fan, Chao Tian, and Xin Luna Dong. 2015. Keys for graphs. PVLDB 8, 12 (2015), 1590–1601.
[20]
Georg Gottlob. 2004. Hypergraph transversals. In Proceedings of theInternational Symposium on Foundations of Information and Knowledge Systems.Dietmar Seipel and Jose Maria Turull Torres (Eds.), Lecture Notes in Computer Science, Vol. 2942. Springer, 1–5.
[21]
Jerrold R. Griggs. 1984. Maximum antichains in the product of chains. Order 1 (1984), 21–28.
[22]
Miika Hannula and Sebastian Link. 2018. Automated reasoning about key sets. In Proceedings of theInternational Joint Conference on Automated Reasoning.Didier Galmiche, Stephan Schulz, and Roberto Sebastiani (Eds.), Vol. 10900. Springer, 47–63.
[23]
Sven Hartmann and Sebastian Link. 2009. Efficient reasoning about a robust XML key fragment. ACM Transactions on Database Systems 34, 2 (2009), 10:1–10:33.
[24]
I. J. Heath. 1971. Unacceptable file operations in a relational data base. In Proceedings of the 1971 ACM-SIGFIDET Workshop on Data Description, Access and Control. 19–33.
[25]
Arvid Heise, Jorge-Arnulfo Quiané-Ruiz, Ziawasch Abedjan, Anja Jentzsch, and Felix Naumann. 2013. Scalable discovery of unique column combinations. Proceedings of the VLDB Endowment 7, 4 (2013), 301–312.
[26]
Christian S. Jensen, Richard T. Snodgrass, and Michael D. Soo. 1996. Extending existing dependency theory to temporal databases. IEEE TKDE 8, 4 (1996), 563–582.
[27]
Gyula O. H. Katona and Krisztián Tichler. 2006. Some contributions to the minimum representation problem of key systems. In Proceedings of the International Symposium on Foundations of Information and Knowledge Systems. 240–257.
[28]
Vitaliy L. Khizder and Grant E. Weddell. 2003. Reasoning about uniqueness constraints in object relational databases. IEEE Transactions on Knowledge and Data Engineering 15, 5 (2003), 1295–1306.
[29]
Henning Köhler, Uwe Leck, Sebastian Link, and Xiaofang Zhou. 2016. Possible and certain keys for SQL. Proceedings of the VLDB Endowment 25, 4 (2016), 571–596.
[30]
Georg Lausen. 2007. Relational databases in RDF: Keys and foreign keys. In Proceedings of the SWDB-ODBIS. 43–56.
[31]
Sebastian Link. 2018. Old keys that open new doors. In Proceedings of the International Symposium on Foundations of Information and Knowledge Systems. 3–13.
[32]
Sebastian Link. 2020. Neo4j keys. In Proceedings of the International Conference on Conceptual Modeling. 19–33.
[33]
Sebastian Link and Ziheng Wei. 2021. Logical schema design that quantifies update inefficiency and join efficiency. In Proceedings of the 2021 International Conference on Management of Data. 1169–1181.
[34]
Witold Litwin, Mohammad A. Ketabchi, and Ravi Krishnamurthy. 1991. First order normal form for relational databases and multidatabases. ACM SIGMOD Record 20, 4 (1991), 74–76.
[35]
Claudio L. Lucchesi and Sylvia L. Osborn. 1978. Candidate keys for relations. Journal of Computer and System Sciences 17, 2 (1978), 270–279.
[36]
Sofía Maiolo, Lorena Etcheverry, and Adriana Marotta. 2020. Data profiling in property graph databases. ACM Journal of Data and Information Quality 12, 4 (2020), 20:1–20:27.
[37]
Wai Yin Mok. 2016. Utilizing nested normal form to design redundancy free JSON schemas. International Journal of Recent Contributions from Engineering, Science & IT 4, 4 (2016), 21–25.
[38]
Wai Yin Mok, Yiu-Kai Ng, and David W. Embley. 1996. A normal form for precisely characterizing redundancy in nested relations. ACM Transactions on Database Systems 21, 1 (1996), 77–106.
[39]
N. G. de Bruijn, C. A. v. E. Tengbergen, and D. Kruyswijk. 1951. On the set of divisors of a number. Nieuw Arch. Wiskunde 23, 2 (1951), 191–193.
[40]
Jaroslav Pokorný, Michal Valenta, and Jirí Kovacic. 2017. Integrity constraints in graph databases. In Proceedings of the 8th International Conference on Ambient Systems, Networks and Technologies and the 7th International Conference on Sustainable Energy Information Technology (Procedia Computer Science), Vol. 109. Elsevier, 975–981.
[41]
Attila Sali. 2004. Minimal keys in higher-order datamodels. In Proceedings of the International Symposium on Foundations of Information and Knowledge Systems. 242–251.
[42]
Philipp Skavantzos, Kaiqi Zhao, and Sebastian Link. 2021. Uniqueness constraints on property graphs. In Proceedings of the International Conference on Advanced Information Systems Engineering. 280–295.
[43]
Bernhard Thalheim. 1989. On semantic issues connected with keys in relational databases permitting null values. Elektronische Informationsverarbeitung und Kybernetik 25, 1/2 (1989), 11–20.
[44]
David Toman and Grant E. Weddell. 2008. On keys and functional dependencies as first-class citizens in description logics. Journal of Automated Reasoning 40, 2–3 (2008), 117–132.
[45]
Millist W. Vincent. 1997. A corrected 5NF definition for relational database design. Theoretical Computer Science 185, 2 (1997), 379–391.
[46]
Ziheng Wei, Uwe Leck, and Sebastian Link. 2019. Discovery and ranking of embedded uniqueness constraints. PVLDB 12, 13 (2019), 2339–2352.
[47]
Ziheng Wei, Uwe Leck, and Sebastian Link. 2019. Entity integrity, referential integrity, and query optimization with embedded uniqueness constraints. In Proceedings of the 35th IEEE International Conference on Data Engineering. 1694–1697.
[48]
Ziheng Wei and Sebastian Link. 2019. Embedded functional dependencies and data-completeness tailored database design. Proceedings of the VLDB Endowment 12, 11 (2019), 1458–1470.
[49]
Ziheng Wei and Sebastian Link. 2021. Embedded functional dependencies and data-completeness tailored database design. ACM Transactions on Database Systems 46, 2 (2021), 7:1–7:46.
[50]
Jef Wijsen. 1999. Temporal FDs on complex objects. ACM Transactions on Database Systems 24, 1 (1999), 127–176.

Cited By

View all
  • (2024)PG-FD: Mapping Functional Dependencies to the Future Property Graph Schema StandardAdvances in Databases and Information Systems10.1007/978-3-031-70626-4_4(45-59)Online publication date: 28-Aug-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Journal of Data and Information Quality
Journal of Data and Information Quality  Volume 15, Issue 2
June 2023
363 pages
ISSN:1936-1955
EISSN:1936-1963
DOI:10.1145/3605909
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 June 2023
Online AM: 19 January 2023
Accepted: 13 January 2023
Revised: 05 November 2022
Received: 09 March 2022
Published in JDIQ Volume 15, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Completeness
  2. computational problem
  3. data management
  4. data quality
  5. data semantics
  6. existence
  7. graph database
  8. uniqueness
  9. object store

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)73
  • Downloads (Last 6 weeks)20
Reflects downloads up to 11 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)PG-FD: Mapping Functional Dependencies to the Future Property Graph Schema StandardAdvances in Databases and Information Systems10.1007/978-3-031-70626-4_4(45-59)Online publication date: 28-Aug-2024

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media