Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Size Bounds for Factorised Representations of Query Results

Published: 25 March 2015 Publication History

Abstract

We study two succinct representation systems for relational data based on relational algebra expressions with unions, Cartesian products, and singleton relations: f-representations, which employ algebraic factorisation using distributivity of product over union, and d-representations, which are f-representations where further succinctness is brought by explicit sharing of repeated subexpressions.
In particular we study such representations for results of conjunctive queries. We derive tight asymptotic bounds for representation sizes and present algorithms to compute representations within these bounds. We compare the succinctness of f-representations and d-representations for results of equi-join queries, and relate them to fractional edge covers and fractional hypertree decompositions of the query hypergraph.
Recent work showed that f-representations can significantly boost the performance of query evaluation in centralised and distributed settings and of machine learning tasks.

Supplementary Material

a2-olteanu-apndx.pdf (olteanu.zip)
Supplemental movie, appendix, image and software files for, Size Bounds for Factorised Representations of Query Results

References

[1]
S. Abiteboul and N. Bidoit. 1986. Non first normal form relations: An algebra allowing data restructuring. J. Comput. Syst. Sci. 33, 3, 361--393.
[2]
S. Abiteboul, R. Hull, and V. Vianu. 1995. Foundations of Databases. Addison-Wesley.
[3]
S. Agrawal, V. Narasayya, and B. Yang. 2004. Integrating vertical and horizontal partitioning into automated physical database design. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'04). 359--370.
[4]
A. Atserias, M. Grohe, and D. Marx. 2008. Size bounds and query plans for relational joins. In Proceedings of the 49th Annual IEEE Symposium on Foundations of Computer Science (FOCS'08). 739--748.
[5]
G. Bagan, A. Durand, and E. Grandjean. 2007. On acyclic conjunctive queries and constant delay enumeration. In Proceedings of the 21st International Workshop on Computer Science Logic (CSL'07). Lecture Notes in Computer Science, vol. 4646. Springer, 208--222.
[6]
N. Bakibayev, T. Kočiský, D. Olteanu, and J. Závodný. 2013. Aggregation and ordering in factorized databases. Proc. VLDB Endow. 6, 14, 1990--2001.
[7]
N. Bakibayev, D. Olteanu, and J. Závodný. 2012. FDB: A query engine for factorised relational databases. Proc. VLDB Endow. 5, 11, 1232--1243.
[8]
F. Bancilhon, P. Richard, and M. Scholl. 1982. On line processing of compacted relations. In Proceedings of the 8th International Conference on Very Large Data Bases (VLDB'82). 263--269.
[9]
D. S. Batory. 1979. On searching transposed files. ACM Trans. Database Syst. 4, 4, 531--544.
[10]
P. A. Boncz, S. Manegold, and M. L. Kersten. 1999. Database architecture optimized for the new bottleneck: Memory access. In Proceedings of the 25th International Conference on Very Large Data Bases (VLDB'99). 54--65.
[11]
R. K. Brayton. 1987. Factoring logic functions. IBM J. Res. Devel. 31, 2, 187--198.
[12]
D. Buchfuhrer and C. Umans. 2008. The complexity of Boolean formula minimization. In Proceedings of the 35th International Colloquium on Automata, Languages and Programming (ICALP'08). 24--35.
[13]
K. Cattell, M. J. Dinneen, and M. R. Fellows. 1996. A simple linear-time algorithm for finding path-decompositions of small width. Inf. Process. Lett. 57, 4, 197--203.
[14]
L. Cerf, J. Besson, C. Robardet, and J.-F. Boulicaut. 2009. Closed patterns meet n-ary relations. ACM Trans. Knowl. Discov. Data 3, 1, 3.
[15]
H. Chen and M. Grohe. 2010. Constraint satisfaction with succinctly specified relations. J. Comput. Syst. Sci. 76, 8, 847--860.
[16]
P. Cudré-Mauroux, E. Wu, and S. Madden. 2009. The case for Rodentstore: An adaptive, declarative storage system. In Proceedings of the 4th Biennial Conference on Innovative Data Systems Research (CIDR'09).
[17]
N. Dalvi and D. Suciu. 2007. Efficient query evaluation on probabilistic databases. VLDB J. 16, 4, 523--544.
[18]
C. Delobel. 1978. Normalization and hierarchical dependencies in the relational data model. ACM Trans. Database Syst. 3, 3, 201--222.
[19]
F. Geerts, B. Goethals, and T. Mielikäinen. 2004. Tiling databases. In Proceedings of the 7th International Conference on Discovery Science (DS'04). Lecture Notes in Computer Science, vol. 3245, Springer, 278--289.
[20]
G. Gottlob. 2012. On minimal constraint networks. Artif. Intell. 191-192, 42--60.
[21]
G. Gottlob, N. Leone, and F. Scarcello. 2000. A comparison of structural CSP decomposition methods. Artif. Intell. 124, 2, 243--282.
[22]
T. J. Green, G. Karvounarakis, and V. Tannen. 2007. Provenance semirings. In Proceedings of the 26th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS'07). 31--40.
[23]
M. Grohe and D. Marx. 2006. Constraint solving via fractional edge covers. In Proceedings of the 17th Annual ACM-SIAM Symposium on Discrete Algorithm (SODA'06). 289--298.
[24]
M. Grund, J. Krüger, H. Plattner, A. Zeier, P. Cudre-Mauroux, and S. Madden. 2010. HYRISE—A main memory hybrid storage engine. Proc. VLDB Endow. 4, 2, 105--116.
[25]
F. Henglein and K. F. Larsen. 2010. Generic multiset programming with discrimination-based joins and symbolic Cartesian products. Higher-Order Symbol. Comput. 23, 3, 337--370.
[26]
T. Imielinski, S. Naqvi, and K. Vadaparty. 1991. Incomplete object-A data model for design and planning applications. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'91). 288--297.
[27]
G. Jaeschke and H. J. Schek. 1982. Remarks on the algebra of non first normal form relations. In Proceedings of the 1st ACM SIGACT-SIGMOD Symposium on Principles of Database Systems (PODS'82). 124--138.
[28]
W. Kent. 1983. A simple guide to five normal forms in relational database theory. Comm. ACM 26, 2, 120--125.
[29]
E. Korach and N. Solel. 1993. Tree-width, path-width, and cutwidth. Discr. Appl. Math. 43, 1, 97--101.
[30]
A. Makinouchi. 1977. A consideration on normal form of not-necessarily-normalized relation in the relational data model. In Proceedings of the 3rd International Conference on Very Large Data Bases (VLDB'77). Vol. 3. 447--453.
[31]
D. Marx. 2010. Approximating fractional hypertree width. ACM Trans. Algor. 6, 2, 29:1--29:17.
[32]
H. Q. Ngo, D. T. Nguyen, C. Ré, and A. Rudra. 2013. Towards instance optimal join algorithms for data in indexes. http://arxiv.org/abs/1302.0914.
[33]
H. Q. Ngo, E. Porat, C. Ré, and A. Rudra. 2012. Worst-case optimal join algorithms (extended abstract). In Proceedings of the 31st Symposium on Principles of Database Systems (PODS'12). 37--48.
[34]
D. Olteanu and J. Huang. 2008. Using OBDDs for efficient query evaluation on probabilistic databases. In Proceedings of the 2nd International Conference on Scalable Uncertainty Management (SUM'08). 326--340.
[35]
D. Olteanu, C. Koch, and L. Antova. 2007. World-set decompositions: Expressiveness and efficient algorithms. In Proceedings of the 11th International Conference on Database Theory (ICDT'07). 194--208.
[36]
D. Olteanu and J. Závodný. 2011. On factorisation of provenance polynomials. In Proceedings of the 3rd USENIX Workshop on the Theory and Practice of Provenance.
[37]
D. Olteanu and J. Závodný. 2012. Factorised representations of query results: Size bounds and readability. In Proceedings of the 15th International Conference on Database Theory (ICDT'12). 285--298.
[38]
Z. M. Ozsoyoglu and L.-Y. Yuan. 1987. A new normal form for nested relations. ACM Trans. Database Syst. 12, 1, 111--136.
[39]
J. Pearl. 1989. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, San Fransisco.
[40]
S. Rendle. 2013. Scaling factorization machines to relational data. Proc. VLDB Endow. 6, 5, 337--348.
[41]
P. Sen, A. Deshpande, and L. Getoor. 2010. Read-once functions and query evaluation in probabilistic databases. Proc. VLDB Endow. 3, 1, 1068--1079.
[42]
J. Shute, R. Vingralek, B. Samwel, B. Handy, C. Whipkey, E. Rollins, M. Oancea, K. Little-Field, D. Menestrina, S. Ellner, J. Cieslewicz, I. Rae, T. Stancescu, and H. Apte. 2013. F1: A distributed SQL database that scales. Proc. VLDB Endow. 6, 11, 1068--1079.
[43]
M. Stonebraker, D. J. Abadi, A. Batkin, X. Chen, M. Cherniack, M. Ferreira, E. Lau, A. Lin, S. Madden, E. O'Neil, P. O'Neil, A. Rasin, N. Tran, and S. Zdonik. 2005. C-store: A column-oriented DBMS. In Proceedings of the 31st International Conference on Very Large Data Bases (VLDB'05). 553--564.
[44]
T. L. Veldhuizen. 2014. Triejoin: A simple, worst-case optimal join algorithm. In Proceedings of the 17th International Conference on Database Theory (ICDT'14). 96--106.

Cited By

View all
  • (2024)Robust Join Processing with Diamond Hardened JoinsProceedings of the VLDB Endowment10.14778/3681954.368199517:11(3215-3228)Online publication date: 1-Jul-2024
  • (2024)Minimally Factorizing the Provenance of Self-join Free Conjunctive QueriesProceedings of the ACM on Management of Data10.1145/36516052:2(1-24)Online publication date: 14-May-2024
  • (2024)Tight Bounds of Circuits for Sum-Product QueriesProceedings of the ACM on Management of Data10.1145/36515882:2(1-20)Online publication date: 14-May-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Database Systems
ACM Transactions on Database Systems  Volume 40, Issue 1
March 2015
260 pages
ISSN:0362-5915
EISSN:1557-4644
DOI:10.1145/2751312
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 March 2015
Accepted: 01 July 2014
Revised: 01 May 2014
Received: 01 July 2013
Published in TODS Volume 40, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Succinct representation
  2. conjunctive queries
  3. data factorisation
  4. hypertree decompositions
  5. query evaluation
  6. size bounds

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • EPSRC DTA

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)78
  • Downloads (Last 6 weeks)9
Reflects downloads up to 12 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Robust Join Processing with Diamond Hardened JoinsProceedings of the VLDB Endowment10.14778/3681954.368199517:11(3215-3228)Online publication date: 1-Jul-2024
  • (2024)Minimally Factorizing the Provenance of Self-join Free Conjunctive QueriesProceedings of the ACM on Management of Data10.1145/36516052:2(1-24)Online publication date: 14-May-2024
  • (2024)Tight Bounds of Circuits for Sum-Product QueriesProceedings of the ACM on Management of Data10.1145/36515882:2(1-20)Online publication date: 14-May-2024
  • (2024)Recent Increments in Incremental View MaintenanceCompanion of the 43rd Symposium on Principles of Database Systems10.1145/3635138.3654763(8-17)Online publication date: 9-Jun-2024
  • (2024)Compact Path Representations for Graph Database Pattern Matching2024 IEEE 40th International Conference on Data Engineering Workshops (ICDEW)10.1109/ICDEW61823.2024.00059(379-380)Online publication date: 13-May-2024
  • (2024)Givens rotations for QR decomposition, SVD and PCA over database joinsThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-023-00818-933:4(1013-1037)Online publication date: 1-Jul-2024
  • (2024)F-IVM: analytics over relational databases under updatesThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-023-00817-w33:4(903-929)Online publication date: 1-Jul-2024
  • (2023)JoinBoost: Grow Trees over Normalized Data Using Only SQLProceedings of the VLDB Endowment10.14778/3611479.361150916:11(3071-3084)Online publication date: 24-Aug-2023
  • (2023)Saibot: A Differentially Private Data Search PlatformProceedings of the VLDB Endowment10.14778/3611479.361150816:11(3057-3070)Online publication date: 24-Aug-2023
  • (2023)ADOPT: Adaptively Optimizing Attribute Orders for Worst-Case Optimal Join Algorithms via Reinforcement LearningProceedings of the VLDB Endowment10.14778/3611479.361148916:11(2805-2817)Online publication date: 24-Aug-2023
  • Show More Cited By

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media