Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

Deductive optimization of relational data storage

Published: 13 November 2020 Publication History
  • Get Citation Alerts
  • Abstract

    Optimizing the physical data storage and retrieval of data are two key database management problems. In this paper, we propose a language that can express both a relational query and the layout of its data. Our language can express a wide range of physical database layouts, going well beyond the row- and column-based methods that are widely used in database management systems. We use deductive program synthesis to turn a high-level relational representation of a database query into a highly optimized low-level implementation which operates on a specialized layout of the dataset. We build an optimizing compiler for this language and conduct experiments using a popular database benchmark, which shows that the performance of our specialized queries is better than a state-of-the-art in memory compiled database system while achieving an order-of-magnitude reduction in memory use.

    Supplementary Material

    Auxiliary Presentation Video (oopsla20main-p171-p-video.mp4)
    Presentation video for the paper "Deductive Optimization of Relational Data Storage" at OOPSLA 2020.

    References

    [1]
    Anastassia Ailamaki, David J. DeWitt, Mark D. Hill, and Marios Skounakis. 2001. Weaving Relations for Cache Performance. In VLDB 2001, Proceedings of 27th International Conference on Very Large Data Bases, September 11-14, 2001, Roma, Italy, Peter M. G. Apers, Paolo Atzeni, Stefano Ceri, Stefano Paraboschi, Kotagiri Ramamohanarao, and Richard T. Snodgrass (Eds.). Morgan Kaufmann, 169-180. http://www.vldb.org/conf/2001/P169.pdf
    [2]
    Lee Blaine, Limei Gilham, Junbo Liu, Douglas R. Smith, and Stephen J. Westfold. 1998. Planware-Domain-Specific Synthesis of High-Performance Schedulers. In The Thirteenth IEEE Conference on Automated Software Engineering, ASE 1998, Honolulu, Hawaii, USA, October 13-16, 1998. IEEE Computer Society, 270. https://doi.org/10.1109/ASE. 1998.732672
    [3]
    Peter A. Boncz and Martin L. Kersten. 1999. MIL Primitives for Querying a Fragmented World. VLDB J. 8, 2 ( 1999 ), 101-119. https://doi.org/10.1007/s007780050076
    [4]
    Fabiano C. Botelho, Rasmus Pagh, and Nivio Ziviani. 2007. Simple and Space-Eficient Minimal Perfect Hash Functions. In Algorithms and Data Structures, 10th International Workshop, WADS 2007, Halifax, Canada, August 15-17, 2007, Proceedings (Lecture Notes in Computer Science), Frank K. H. A. Dehne, Jörg-Rüdiger Sack, and Norbert Zeh (Eds.), Vol. 4619. Springer, 139-150. https://doi.org/10.1007/978-3-540-73951-7_13
    [5]
    Nicolas Bruno and Surajit Chaudhuri. 2005. Automatic Physical Database Tuning: A Relaxation-based Approach. In Proceedings of the ACM SIGMOD International Conference on Management of Data, Baltimore, Maryland, USA, June 14-16, 2005, Fatma Özcan (Ed.). ACM, 227-238. https://doi.org/10.1145/1066157.1066184
    [6]
    Surajit Chaudhuri. 1998. An Overview of Query Optimization in Relational Systems. In Proceedings of the Seventeenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, June 1-3, 1998, Seattle, Washington, USA, Alberto O. Mendelzon and Jan Paredaens (Eds.). ACM Press, 34-43. https://doi.org/10.1145/275487.275492
    [7]
    Alvin Cheung, Armando Solar-Lezama, and Samuel Madden. 2013. Optimizing database-backed applications with query synthesis. In ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '13, Seattle, WA, USA, June 16-19, 2013, Hans-Juergen Boehm and Cormac Flanagan (Eds.). ACM, 3-14. https://doi.org/10.1145/2491956.2462180
    [8]
    Rada Chirkova and Michael R. Genesereth. 2000. Linearly Bounded Reformulations of Conjunctive Databases. In Computational Logic-CL 2000, First International Conference, London, UK, 24-28 July, 2000, Proceedings (Lecture Notes in Computer Science), John W. Lloyd, Verónica Dahl, Ulrich Furbach, Manfred Kerber, Kung-Kiu Lau, Catuscia Palamidessi, Luís Moniz Pereira, Yehoshua Sagiv, and Peter J. Stuckey (Eds.), Vol. 1861. Springer, 987-1001. https://doi.org/10.1007/3-540-44957-4_66
    [9]
    E. F. Codd. 1970. A Relational Model of Data for Large Shared Data Banks. Commun. ACM 13, 6 ( 1970 ), 377-387. https: //doi.org/10.1145/362384.362685
    [10]
    E. F. Codd. 1971. A Database Sublanguage Founded on the Relational Calculus. In Proceedings of 1971 ACM-SIGFIDET Workshop on Data Description, Access and Control, San Diego, California, USA, November 11-12, 1971, E. F. Codd and A. L. Dean (Eds.). ACM, 35-68.
    [11]
    Transaction Processing Performance Council. 2008. TPC-H Benchmark Specification. 21 ( 2008 ), 592-603.
    [12]
    Philippe Cudré-Mauroux, Eugene Wu, and Samuel Madden. 2009. The Case for RodentStore: An Adaptive, Declarative Storage System. In CIDR 2009, Fourth Biennial Conference on Innovative Data Systems Research, Asilomar, CA, USA, January 4-7, 2009, Online Proceedings. www.cidrdb.org. http://www-db.cs.wisc.edu/cidr/cidr2009/Paper_97.pdf
    [13]
    Davi de Castro Reis, Djamel Belazzougui, Fabiano Cupertino Botelho, and Nivio Ziviani. 2011. CMPH: C Minimal Perfect Hashing Library. http://cmph.sourceforge.net
    [14]
    Benjamin Delaware, Clément Pit-Claudel, Jason Gross, and Adam Chlipala. 2015. Fiat: Deductive Synthesis of Abstract Data Types in a Proof Assistant. In Proceedings of the 42nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2015, Mumbai, India, January 15-17, 2015, Sriram K. Rajamani and David Walker (Eds.). ACM, 689-700. https://doi.org/10.1145/2676726.2677006
    [15]
    Matthias Felleisen and Robert Hieb. 1992. The Revised Report on the Syntactic Theories of Sequential Control and State. Theor. Comput. Sci. 103, 2 ( 1992 ), 235-271. https://doi.org/10.1016/ 0304-3975 ( 92 ) 90014-7
    [16]
    Goetz Graefe. 1994. Volcano-An Extensible and Parallel Query Evaluation System. IEEE Trans. Knowl. Data Eng. 6, 1 ( 1994 ), 120-135. https://doi.org/10.1109/69.273032
    [17]
    Himanshu Gupta, Venky Harinarayan, Anand Rajaraman, and Jefrey D. Ullman. 1997. Index Selection for OLAP. In Proceedings of the Thirteenth International Conference on Data Engineering, April 7-11, 1997, Birmingham, UK, W. A. Gray and Per-Åke Larson (Eds.). IEEE Computer Society, 208-219. https://doi.org/10.1109/ICDE. 1997.581755
    [18]
    Angélica García Gutiérrez and Peter Baumann. 2007. Modeling Fundamental Geo-Raster Operations with Array Algebra. In Workshops Proceedings of the 7th IEEE International Conference on Data Mining (ICDM 2007 ), October 28-31, 2007, Omaha, Nebraska, USA. 607-612. https://doi.org/10.1109/ICDMW. 2007.53
    [19]
    Alon Y. Halevy. 2001. Answering queries using views: A survey. VLDB J. 10, 4 ( 2001 ), 270-294. https://doi.org/10.1007/ s007780100054
    [20]
    Peter Hawkins, Alex Aiken, Kathleen Fisher, Martin C. Rinard, and Mooly Sagiv. 2010. Data Structure Fusion. In Programming Languages and Systems-8th Asian Symposium, APLAS 2010, Shanghai, China, November 28-December 1, 2010. Proceedings (Lecture Notes in Computer Science), Kazunori Ueda (Ed.), Vol. 6461. Springer, 204-221. https://doi.org/10.1007/978-3-642-17164-2_15
    [21]
    Peter Hawkins, Alex Aiken, Kathleen Fisher, Martin C. Rinard, and Mooly Sagiv. 2011. Data representation synthesis. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2011, San Jose, CA, USA, June 4-8, 2011, Mary W. Hall and David A. Padua (Eds.). ACM, 38-49. https://doi.org/10.1145/1993498.1993504
    [22]
    Matthias Jarke and Jürgen Koch. 1984. Query Optimization in Database Systems. ACM Comput. Surv. 16, 2 ( 1984 ), 111-152. https://doi.org/10.1145/356924.356928
    [23]
    Yannis Klonatos, Christoph Koch, Tiark Rompf, and Hassan Chafi. 2014. Building Eficient Query Engines in a High-Level Language. Proc. VLDB Endow. 7, 10 ( 2014 ), 853-864. https://doi.org/10.14778/2732951.2732959
    [24]
    Calvin Loncaric, Michael D. Ernst, and Emina Torlak. 2018. Generalized data structure synthesis. In Proceedings of the 40th International Conference on Software Engineering, ICSE 2018, Gothenburg, Sweden, May 27-June 03, 2018, Michel Chaudron, Ivica Crnkovic, Marsha Chechik, and Mark Harman (Eds.). ACM, 958-968. https://doi.org/10.1145/3180155.3180211
    [25]
    Calvin Loncaric, Emina Torlak, and Michael D. Ernst. 2016. Fast synthesis of fast collections. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2016, Santa Barbara, CA, USA, June 13-17, 2016, Chandra Krintz and Emery Berger (Eds.). ACM, 355-368. https://doi.org/10.1145/2908080.2908122
    [26]
    Thomas Neumann. 2011. Eficiently Compiling Eficient Query Plans for Modern Hardware. Proc. VLDB Endow. 4, 9 ( 2011 ), 539-550. https://doi.org/10.14778/2002938.2002940
    [27]
    Thomas Neumann and Alfons Kemper. 2015. Unnesting Arbitrary Queries. In Datenbanksysteme für Business, Technologie und Web (BTW), 16. Fachtagung des GI-Fachbereichs "Datenbanken und Informationssysteme" (DBIS), 4.-6.3.2015 in Hamburg, Germany. Proceedings (LNI), Thomas Seidl, Norbert Ritter, Harald Schöning, Kai-Uwe Sattler, Theo Härder, Stefen Friedrich, and Wolfram Wingerath (Eds.), Vol. P-241. GI, 383-402. https://dl.gi. de/20.500.12116/2418
    [28]
    Rachel Pottinger and Alon Y. Levy. 2000. A Scalable Algorithm for Answering Queries Using Views. In VLDB 2000, Proceedings of 26th International Conference on Very Large Data Bases, September 10-14, 2000, Cairo, Egypt, Amr El Abbadi, Michael L. Brodie, Sharma Chakravarthy, Umeshwar Dayal, Nabil Kamel, Gunter Schlageter, and Kyu-Young Whang (Eds.). Morgan Kaufmann, 484-495. http://www.vldb.org/conf/2000/P484.pdf
    [29]
    Markus Püschel, José M. F. Moura, Jeremy R. Johnson, David A. Padua, Manuela M. Veloso, Bryan Singer, Jianxin Xiong, Franz Franchetti, Aca Gacic, Yevgen Voronenko, Kang Chen, Robert W. Johnson, and Nicholas Rizzolo. 2005. SPIRAL: Code Generation for DSP Transforms. Proc. IEEE 93, 2 ( 2005 ), 232-275. https://doi.org/10.1109/JPROC. 2004.840306
    [30]
    Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Frédo Durand, and Saman P. Amarasinghe. 2013. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. ( 2013 ), 519-530. https://doi.org/10.1145/2491956.2462176
    [31]
    Tiark Rompf and Nada Amin. 2015. Functional pearl: a SQL to C compiler in 500 lines of code. In Proceedings of the 20th ACM SIGPLAN International Conference on Functional Programming, ICFP 2015, Vancouver, BC, Canada, September 1-3, 2015, Kathleen Fisher and John H. Reppy (Eds.). ACM, 2-9. https://doi.org/10.1145/2784731.2784760
    [32]
    Amir Shaikhha, Yannis Klonatos, Lionel Parreaux, Lewis Brown, Mohammad Dashti, and Christoph Koch. 2016. How to Architect a Query Compiler. In Proceedings of the 2016 International Conference on Management of Data, SIGMOD Conference 2016, San Francisco, CA, USA, June 26-July 01, 2016, Fatma Özcan, Georgia Koutrika, and Sam Madden (Eds.). ACM, 1907-1922. https://doi.org/10.1145/2882903.2915244
    [33]
    Michael Stonebraker. 1974. The choice of partial inversions and combined indices. International Journal of Parallel Programming 3, 2 ( 1974 ), 167-188. https://doi.org/10.1007/BF00976642
    [34]
    Michael Stonebraker. 2012. SciDB: An Open-Source DBMS for Scientific Data. ERCIM News 2012, 89 ( 2012 ). http://ercimnews.ercim.eu/en89/special/scidb-an-open-source-dbms-for-scientific-data
    [35]
    Michael Stonebraker, Daniel J. Abadi, Adam Batkin, Xuedong Chen, Mitch Cherniack, Miguel Ferreira, Edmond Lau, Amerson Lin, Samuel Madden, Elizabeth J. O'Neil, Patrick E. O'Neil, Alex Rasin, Nga Tran, and Stanley B. Zdonik. 2005. C-Store: A Column-oriented DBMS. In Proceedings of the 31st International Conference on Very Large Data Bases, Trondheim, Norway, August 30-September 2, 2005, Klemens Böhm, Christian S. Jensen, Laura M. Haas, Martin L. Kersten, Per-Åke Larson, and Beng Chin Ooi (Eds.). ACM, 553-564. http://www.vldb.org/archives/website/2005/program/paper/thu/p553-stonebraker.pdf
    [36]
    Arvind K. Sujeeth, Kevin J. Brown, HyoukJoong Lee, Tiark Rompf, Hassan Chafi, Martin Odersky, and Kunle Olukotun. 2014. Delite: A Compiler Architecture for Performance-Oriented Embedded Domain-Specific Languages. ACM Trans. Embed. Comput. Syst. 13, 4s ( 2014 ), 1-25. https://doi.org/10.1145/2584665
    [37]
    Ruby Y. Tahboub, Grégory M. Essertel, and Tiark Rompf. 2018. How to Architect a Query Compiler, Revisited. In Proceedings of the 2018 International Conference on Management of Data, SIGMOD Conference 2018, Houston, TX, USA, June 10-15, 2018, Gautam Das, Christopher M. Jermaine, and Philip A. Bernstein (Eds.). ACM, 307-322. https://doi.org/10.1145/3183713. 3196893
    [38]
    Zohreh Asgharzadeh Talebi, Rada Chirkova, Yahya Fathi, and Matthias F. Stallmann. 2008. Exact and inexact methods for selecting views and indexes for OLAP performance improvement. In EDBT 2008, 11th International Conference on Extending Database Technology, Nantes, France, March 25-29, 2008, Proceedings (ACM International Conference Proceeding Series), Alfons Kemper, Patrick Valduriez, Noureddine Mouaddib, Jens Teubner, Mokrane Bouzeghoub, Volker Markl, Laurent Amsaleg, and Ioana Manolescu (Eds.), Vol. 261. ACM, 311-322. https://doi.org/10.1145/1353343.1353383
    [39]
    Eelco Visser. 2005. A survey of strategies in rule-based program transformation systems. J. Symb. Comput. 40, 1 ( 2005 ), 831-873. https://doi.org/10.1016/j.jsc. 2004. 12.011
    [40]
    Cong Yan and Alvin Cheung. 2019. Generating Application-specific Data Layouts for In-memory Databases. Proc. VLDB Endow. 12, 11 ( 2019 ), 1513-1525. https://doi.org/10.14778/3342263.3342630
    [41]
    Kuat Yessenov, Ivan Kuraj, and Armando Solar-Lezama. 2017. DemoMatch: API discovery from demonstrations. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2017, Barcelona, Spain, June 18-23, 2017, Albert Cohen and Martin T. Vechev (Eds.). ACM, 64-78. https://doi.org/10.1145/3062341.3062386

    Cited By

    View all
    • (2024)VeriEQL: Bounded Equivalence Verification for Complex SQL Queries with Integrity ConstraintsProceedings of the ACM on Programming Languages10.1145/36498498:OOPSLA1(1071-1099)Online publication date: 29-Apr-2024
    • (2023)Satisfiability Modulo Custom Theories in Z3Verification, Model Checking, and Abstract Interpretation10.1007/978-3-031-24950-1_5(91-105)Online publication date: 17-Jan-2023
    • (2022)Synthesis-powered optimization of smart contracts via data type refactoringProceedings of the ACM on Programming Languages10.1145/35633086:OOPSLA2(560-588)Online publication date: 31-Oct-2022

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Proceedings of the ACM on Programming Languages
    Proceedings of the ACM on Programming Languages  Volume 4, Issue OOPSLA
    November 2020
    3108 pages
    EISSN:2475-1421
    DOI:10.1145/3436718
    Issue’s Table of Contents
    This work is licensed under a Creative Commons Attribution International 4.0 License.

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 13 November 2020
    Published in PACMPL Volume 4, Issue OOPSLA

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. data representation synthesis
    2. databases
    3. deductive synthesis

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)150
    • Downloads (Last 6 weeks)12

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)VeriEQL: Bounded Equivalence Verification for Complex SQL Queries with Integrity ConstraintsProceedings of the ACM on Programming Languages10.1145/36498498:OOPSLA1(1071-1099)Online publication date: 29-Apr-2024
    • (2023)Satisfiability Modulo Custom Theories in Z3Verification, Model Checking, and Abstract Interpretation10.1007/978-3-031-24950-1_5(91-105)Online publication date: 17-Jan-2023
    • (2022)Synthesis-powered optimization of smart contracts via data type refactoringProceedings of the ACM on Programming Languages10.1145/35633086:OOPSLA2(560-588)Online publication date: 31-Oct-2022

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media