Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Size Bounds for Factorised Representations of Query Results

Published: 25 March 2015 Publication History
  • Get Citation Alerts
  • Abstract

    We study two succinct representation systems for relational data based on relational algebra expressions with unions, Cartesian products, and singleton relations: f-representations, which employ algebraic factorisation using distributivity of product over union, and d-representations, which are f-representations where further succinctness is brought by explicit sharing of repeated subexpressions.
    In particular we study such representations for results of conjunctive queries. We derive tight asymptotic bounds for representation sizes and present algorithms to compute representations within these bounds. We compare the succinctness of f-representations and d-representations for results of equi-join queries, and relate them to fractional edge covers and fractional hypertree decompositions of the query hypergraph.
    Recent work showed that f-representations can significantly boost the performance of query evaluation in centralised and distributed settings and of machine learning tasks.

    Supplementary Material

    a2-olteanu-apndx.pdf (olteanu.zip)
    Supplemental movie, appendix, image and software files for, Size Bounds for Factorised Representations of Query Results

    References

    [1]
    S. Abiteboul and N. Bidoit. 1986. Non first normal form relations: An algebra allowing data restructuring. J. Comput. Syst. Sci. 33, 3, 361--393.
    [2]
    S. Abiteboul, R. Hull, and V. Vianu. 1995. Foundations of Databases. Addison-Wesley.
    [3]
    S. Agrawal, V. Narasayya, and B. Yang. 2004. Integrating vertical and horizontal partitioning into automated physical database design. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'04). 359--370.
    [4]
    A. Atserias, M. Grohe, and D. Marx. 2008. Size bounds and query plans for relational joins. In Proceedings of the 49th Annual IEEE Symposium on Foundations of Computer Science (FOCS'08). 739--748.
    [5]
    G. Bagan, A. Durand, and E. Grandjean. 2007. On acyclic conjunctive queries and constant delay enumeration. In Proceedings of the 21st International Workshop on Computer Science Logic (CSL'07). Lecture Notes in Computer Science, vol. 4646. Springer, 208--222.
    [6]
    N. Bakibayev, T. Kočiský, D. Olteanu, and J. Závodný. 2013. Aggregation and ordering in factorized databases. Proc. VLDB Endow. 6, 14, 1990--2001.
    [7]
    N. Bakibayev, D. Olteanu, and J. Závodný. 2012. FDB: A query engine for factorised relational databases. Proc. VLDB Endow. 5, 11, 1232--1243.
    [8]
    F. Bancilhon, P. Richard, and M. Scholl. 1982. On line processing of compacted relations. In Proceedings of the 8th International Conference on Very Large Data Bases (VLDB'82). 263--269.
    [9]
    D. S. Batory. 1979. On searching transposed files. ACM Trans. Database Syst. 4, 4, 531--544.
    [10]
    P. A. Boncz, S. Manegold, and M. L. Kersten. 1999. Database architecture optimized for the new bottleneck: Memory access. In Proceedings of the 25th International Conference on Very Large Data Bases (VLDB'99). 54--65.
    [11]
    R. K. Brayton. 1987. Factoring logic functions. IBM J. Res. Devel. 31, 2, 187--198.
    [12]
    D. Buchfuhrer and C. Umans. 2008. The complexity of Boolean formula minimization. In Proceedings of the 35th International Colloquium on Automata, Languages and Programming (ICALP'08). 24--35.
    [13]
    K. Cattell, M. J. Dinneen, and M. R. Fellows. 1996. A simple linear-time algorithm for finding path-decompositions of small width. Inf. Process. Lett. 57, 4, 197--203.
    [14]
    L. Cerf, J. Besson, C. Robardet, and J.-F. Boulicaut. 2009. Closed patterns meet n-ary relations. ACM Trans. Knowl. Discov. Data 3, 1, 3.
    [15]
    H. Chen and M. Grohe. 2010. Constraint satisfaction with succinctly specified relations. J. Comput. Syst. Sci. 76, 8, 847--860.
    [16]
    P. Cudré-Mauroux, E. Wu, and S. Madden. 2009. The case for Rodentstore: An adaptive, declarative storage system. In Proceedings of the 4th Biennial Conference on Innovative Data Systems Research (CIDR'09).
    [17]
    N. Dalvi and D. Suciu. 2007. Efficient query evaluation on probabilistic databases. VLDB J. 16, 4, 523--544.
    [18]
    C. Delobel. 1978. Normalization and hierarchical dependencies in the relational data model. ACM Trans. Database Syst. 3, 3, 201--222.
    [19]
    F. Geerts, B. Goethals, and T. Mielikäinen. 2004. Tiling databases. In Proceedings of the 7th International Conference on Discovery Science (DS'04). Lecture Notes in Computer Science, vol. 3245, Springer, 278--289.
    [20]
    G. Gottlob. 2012. On minimal constraint networks. Artif. Intell. 191-192, 42--60.
    [21]
    G. Gottlob, N. Leone, and F. Scarcello. 2000. A comparison of structural CSP decomposition methods. Artif. Intell. 124, 2, 243--282.
    [22]
    T. J. Green, G. Karvounarakis, and V. Tannen. 2007. Provenance semirings. In Proceedings of the 26th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS'07). 31--40.
    [23]
    M. Grohe and D. Marx. 2006. Constraint solving via fractional edge covers. In Proceedings of the 17th Annual ACM-SIAM Symposium on Discrete Algorithm (SODA'06). 289--298.
    [24]
    M. Grund, J. Krüger, H. Plattner, A. Zeier, P. Cudre-Mauroux, and S. Madden. 2010. HYRISE—A main memory hybrid storage engine. Proc. VLDB Endow. 4, 2, 105--116.
    [25]
    F. Henglein and K. F. Larsen. 2010. Generic multiset programming with discrimination-based joins and symbolic Cartesian products. Higher-Order Symbol. Comput. 23, 3, 337--370.
    [26]
    T. Imielinski, S. Naqvi, and K. Vadaparty. 1991. Incomplete object-A data model for design and planning applications. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'91). 288--297.
    [27]
    G. Jaeschke and H. J. Schek. 1982. Remarks on the algebra of non first normal form relations. In Proceedings of the 1st ACM SIGACT-SIGMOD Symposium on Principles of Database Systems (PODS'82). 124--138.
    [28]
    W. Kent. 1983. A simple guide to five normal forms in relational database theory. Comm. ACM 26, 2, 120--125.
    [29]
    E. Korach and N. Solel. 1993. Tree-width, path-width, and cutwidth. Discr. Appl. Math. 43, 1, 97--101.
    [30]
    A. Makinouchi. 1977. A consideration on normal form of not-necessarily-normalized relation in the relational data model. In Proceedings of the 3rd International Conference on Very Large Data Bases (VLDB'77). Vol. 3. 447--453.
    [31]
    D. Marx. 2010. Approximating fractional hypertree width. ACM Trans. Algor. 6, 2, 29:1--29:17.
    [32]
    H. Q. Ngo, D. T. Nguyen, C. Ré, and A. Rudra. 2013. Towards instance optimal join algorithms for data in indexes. http://arxiv.org/abs/1302.0914.
    [33]
    H. Q. Ngo, E. Porat, C. Ré, and A. Rudra. 2012. Worst-case optimal join algorithms (extended abstract). In Proceedings of the 31st Symposium on Principles of Database Systems (PODS'12). 37--48.
    [34]
    D. Olteanu and J. Huang. 2008. Using OBDDs for efficient query evaluation on probabilistic databases. In Proceedings of the 2nd International Conference on Scalable Uncertainty Management (SUM'08). 326--340.
    [35]
    D. Olteanu, C. Koch, and L. Antova. 2007. World-set decompositions: Expressiveness and efficient algorithms. In Proceedings of the 11th International Conference on Database Theory (ICDT'07). 194--208.
    [36]
    D. Olteanu and J. Závodný. 2011. On factorisation of provenance polynomials. In Proceedings of the 3rd USENIX Workshop on the Theory and Practice of Provenance.
    [37]
    D. Olteanu and J. Závodný. 2012. Factorised representations of query results: Size bounds and readability. In Proceedings of the 15th International Conference on Database Theory (ICDT'12). 285--298.
    [38]
    Z. M. Ozsoyoglu and L.-Y. Yuan. 1987. A new normal form for nested relations. ACM Trans. Database Syst. 12, 1, 111--136.
    [39]
    J. Pearl. 1989. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, San Fransisco.
    [40]
    S. Rendle. 2013. Scaling factorization machines to relational data. Proc. VLDB Endow. 6, 5, 337--348.
    [41]
    P. Sen, A. Deshpande, and L. Getoor. 2010. Read-once functions and query evaluation in probabilistic databases. Proc. VLDB Endow. 3, 1, 1068--1079.
    [42]
    J. Shute, R. Vingralek, B. Samwel, B. Handy, C. Whipkey, E. Rollins, M. Oancea, K. Little-Field, D. Menestrina, S. Ellner, J. Cieslewicz, I. Rae, T. Stancescu, and H. Apte. 2013. F1: A distributed SQL database that scales. Proc. VLDB Endow. 6, 11, 1068--1079.
    [43]
    M. Stonebraker, D. J. Abadi, A. Batkin, X. Chen, M. Cherniack, M. Ferreira, E. Lau, A. Lin, S. Madden, E. O'Neil, P. O'Neil, A. Rasin, N. Tran, and S. Zdonik. 2005. C-store: A column-oriented DBMS. In Proceedings of the 31st International Conference on Very Large Data Bases (VLDB'05). 553--564.
    [44]
    T. L. Veldhuizen. 2014. Triejoin: A simple, worst-case optimal join algorithm. In Proceedings of the 17th International Conference on Database Theory (ICDT'14). 96--106.

    Cited By

    View all
    • (2024)Minimally Factorizing the Provenance of Self-join Free Conjunctive QueriesProceedings of the ACM on Management of Data10.1145/36516052:2(1-24)Online publication date: 14-May-2024
    • (2024)Tight Bounds of Circuits for Sum-Product QueriesProceedings of the ACM on Management of Data10.1145/36515882:2(1-20)Online publication date: 14-May-2024
    • (2024)Recent Increments in Incremental View MaintenanceCompanion of the 43rd Symposium on Principles of Database Systems10.1145/3635138.3654763(8-17)Online publication date: 9-Jun-2024
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Database Systems
    ACM Transactions on Database Systems  Volume 40, Issue 1
    March 2015
    260 pages
    ISSN:0362-5915
    EISSN:1557-4644
    DOI:10.1145/2751312
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 25 March 2015
    Accepted: 01 July 2014
    Revised: 01 May 2014
    Received: 01 July 2013
    Published in TODS Volume 40, Issue 1

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Succinct representation
    2. conjunctive queries
    3. data factorisation
    4. hypertree decompositions
    5. query evaluation
    6. size bounds

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    • EPSRC DTA

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)72
    • Downloads (Last 6 weeks)6

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Minimally Factorizing the Provenance of Self-join Free Conjunctive QueriesProceedings of the ACM on Management of Data10.1145/36516052:2(1-24)Online publication date: 14-May-2024
    • (2024)Tight Bounds of Circuits for Sum-Product QueriesProceedings of the ACM on Management of Data10.1145/36515882:2(1-20)Online publication date: 14-May-2024
    • (2024)Recent Increments in Incremental View MaintenanceCompanion of the 43rd Symposium on Principles of Database Systems10.1145/3635138.3654763(8-17)Online publication date: 9-Jun-2024
    • (2024)Compact Path Representations for Graph Database Pattern Matching2024 IEEE 40th International Conference on Data Engineering Workshops (ICDEW)10.1109/ICDEW61823.2024.00059(379-380)Online publication date: 13-May-2024
    • (2023)JoinBoost: Grow Trees over Normalized Data Using Only SQLProceedings of the VLDB Endowment10.14778/3611479.361150916:11(3071-3084)Online publication date: 24-Aug-2023
    • (2023)Saibot: A Differentially Private Data Search PlatformProceedings of the VLDB Endowment10.14778/3611479.361150816:11(3057-3070)Online publication date: 24-Aug-2023
    • (2023)ADOPT: Adaptively Optimizing Attribute Orders for Worst-Case Optimal Join Algorithms via Reinforcement LearningProceedings of the VLDB Endowment10.14778/3611479.361148916:11(2805-2817)Online publication date: 24-Aug-2023
    • (2023)Representing Paths in Graph Database Pattern MatchingProceedings of the VLDB Endowment10.14778/3587136.358715116:7(1790-1803)Online publication date: 8-May-2023
    • (2023)Kùzu: A Database Management System For "Beyond Relational" WorkloadsACM SIGMOD Record10.1145/3631504.363151452:3(39-40)Online publication date: 2-Nov-2023
    • (2023)Lightweight Materialization for Fast Dashboards Over JoinsProceedings of the ACM on Management of Data10.1145/36267351:4(1-27)Online publication date: 12-Dec-2023
    • Show More Cited By

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media