Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3519939.3523712acmconferencesArticle/Chapter ViewAbstractPublication PagespldiConference Proceedingsconference-collections
research-article
Open access

Synthesizing analytical SQL queries from computation demonstration

Published: 09 June 2022 Publication History

Abstract

Analytical SQL is widely used in modern database applications and data analysis. However, its partitioning and grouping operators are challenging for novice users. Unfortunately, programming by example, shown effective on standard SQL, are less attractive because examples for analytical queries are more laborious to solve by hand.
To make demonstrations easier to author, we designed a new end-user specification, programming by computation demonstration, that allows the user to demonstrate the task using a (possibly incomplete) cell-level computation trace. This specification is exploited in a new abstraction-based synthesis algorithm to prove that a partially formed query cannot be completed to satisfy the specification, allowing us to prune the search tree.
We implemented our approach in a tool named Sickle and tested it on 80 real-world analytical SQL tasks. Results show that even from small demonstrations, Sickle can solve 76 tasks, in 12.8 seconds on average, while the prior approaches can solve only 60 tasks and are on average 22.5 times slower. Furthermore, our user study with 13 participants reveals that our specification increases user efficiency and confidence on challenging tasks.

References

[1]
Maaz Bin Safeer Ahmad, Jonathan Ragan-Kelley, Alvin Cheung, and Shoaib Kamil. 2019. Automatically translating image processing libraries to halide. ACM Trans. Graph., 38, 6 (2019), 204:1–204:13. https://doi.org/10.1145/3355089.3356549
[2]
Rohan Bavishi, Caroline Lemieux, Roy Fox, Koushik Sen, and Ion Stoica. 2019. AutoPandas: neural-backed generators for program synthesis. Proc. ACM Program. Lang., 3, OOPSLA (2019), 168:1–168:27. https://doi.org/10.1145/3360594
[3]
Rohan Bavishi, Caroline Lemieux, Koushik Sen, and Ion Stoica. 2021. Gauss: program synthesis by reasoning over graphs. Proc. ACM Program. Lang., 5, OOPSLA (2021), 1–29. https://doi.org/10.1145/3485511
[4]
Sarah Chasins and Rastislav Bodík. 2017. Skip blocks: reusing execution history to accelerate web scripts. Proc. ACM Program. Lang., 1, OOPSLA (2017), 51:1–51:28. https://doi.org/10.1145/3133875
[5]
Sarah E. Chasins, Maria Mueller, and Rastislav Bodík. 2018. Rousillon: Scraping Distributed Hierarchical Web Data. In UIST 2018, Berlin, Germany, October 14-17, 2018. ACM, 963–975. https://doi.org/10.1145/3242587.3242661
[6]
Surajit Chaudhuri and Umeshwar Dayal. 1997. An Overview of Data Warehousing and OLAP Technology. SIGMOD Rec., 26, 1 (1997), 65–74. https://doi.org/10.1145/248603.248616
[7]
Alvin Cheung, Armando Solar-Lezama, and Samuel Madden. 2013. Optimizing database-backed applications with query synthesis. In PLDI ’13, Seattle, WA, USA, June 16-19, 2013. ACM, 3–14. https://doi.org/10.1145/2491956.2462180
[8]
Patrick Cousot and Radhia Cousot. 1977. Abstract Interpretation: A Unified Lattice Model for Static Analysis of Programs by Construction or Approximation of Fixpoints. In POPL’97, Los Angeles, California, USA. ACM, 238–252. https://doi.org/10.1145/512950.512973
[9]
Jacob Devlin, Jonathan Uesato, Surya Bhupatiraju, Rishabh Singh, Abdel-rahman Mohamed, and Pushmeet Kohli. 2017. RobustFill: Neural Program Learning under Noisy I/O. In ICML 2017, Sydney, NSW, Australia, 6-11 August 2017 (Proceedings of Machine Learning Research, Vol. 70). PMLR, 990–998. http://proceedings.mlr.press/v70/devlin17a.html
[10]
Kevin Ellis and Sumit Gulwani. 2017. Learning to Learn Programs from Examples: Going Beyond Program Structure. In IJCAI 2017, Melbourne, Australia, August 19-25, 2017. ijcai.org, 1638–1645. https://doi.org/10.24963/ijcai.2017/227
[11]
Yu Feng, Ruben Martins, Osbert Bastani, and Isil Dillig. 2018. Program synthesis using conflict-driven learning. In PLDI 2018, Philadelphia, PA, USA, June 18-22, 2018. ACM, 420–435. https://doi.org/10.1145/3192366.3192382
[12]
Yu Feng, Ruben Martins, Jacob Van Geffen, Isil Dillig, and Swarat Chaudhuri. 2017. Component-based synthesis of table consolidation and transformation tasks from examples. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2017, Barcelona, Spain, June 18-23, 2017. 422–436. https://doi.org/10.1145/3062341.3062351
[13]
Yu Feng, Ruben Martins, Yuepeng Wang, Isil Dillig, and Thomas W. Reps. 2017. Component-based synthesis for complex APIs. In Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages, POPL 2017, Paris, France, January 18-20, 2017, Giuseppe Castagna and Andrew D. Gordon (Eds.). ACM, 599–612. https://doi.org/10.1145/3009837.3009851
[14]
Kasra Ferdowsifard, Allen Ordookhanians, Hila Peleg, Sorin Lerner, and Nadia Polikarpova. 2020. Small-Step Live Programming by Example. In UIST ’20, Virtual Event, USA, October 20-23, 2020. ACM, 614–626. https://doi.org/10.1145/3379337.3415869
[15]
Sumit Gulwani. 2011. Automating string processing in spreadsheets using input-output examples. In POPL 2011, Austin, TX, USA, January 26-28, 2011. ACM, 317–330. https://doi.org/10.1145/1926385.1926423
[16]
Ruyi Ji, Jingjing Liang, Yingfei Xiong, Lu Zhang, and Zhenjiang Hu. 2020. Question selection for interactive program synthesis. In PLDI 2020, London, UK, June 15-20, 2020. ACM, 1143–1158. https://doi.org/10.1145/3385412.3386025
[17]
Sean Kandel, Jeffrey Heer, Catherine Plaisant, Jessie Kennedy, Frank van Ham, Nathalie Henry Riche, Chris Weaver, Bongshin Lee, Dominique Brodbeck, and Paolo Buono. 2011. Research Directions in Data Wrangling: Visualizations and Transformations for Usable and Credible Data. Information Visualization Journal, 10, 4 (2011), 271–288.
[18]
Sean Kandel, Andreas Paepcke, Joseph M. Hellerstein, and Jeffrey Heer. 2012. Enterprise Data Analysis and Visualization: An Interview Study. IEEE Trans. Vis. Comput. Graph., 18, 12 (2012), 2917–2926. https://doi.org/10.1109/TVCG.2012.219
[19]
Jinwoo Kim, Qinheping Hu, Loris D’Antoni, and Thomas W. Reps. 2021. Semantics-guided synthesis. Proc. ACM Program. Lang., 5, POPL (2021), 1–32. https://doi.org/10.1145/3434311
[20]
Vu Le and Sumit Gulwani. 2014. FlashExtract: a framework for data extraction by examples. In PLDI ’14, Edinburgh, United Kingdom - June 09 - 11, 2014. ACM, 542–553. https://doi.org/10.1145/2594291.2594333
[21]
Mikaël Mayer, Gustavo Soares, Maxim Grechkin, Vu Le, Mark Marron, Oleksandr Polozov, Rishabh Singh, Benjamin G. Zorn, and Sumit Gulwani. 2015. User Interaction Models for Disambiguation in Programming by Example. In Proceedings of the 28th Annual ACM Symposium on User Interface Software & Technology, UIST 2015, Charlotte, NC, USA, November 8-11, 2015, Celine Latulipe, Bjoern Hartmann, and Tovi Grossman (Eds.). ACM, 291–301. https://doi.org/10.1145/2807442.2807459
[22]
Anders Miltner, Kathleen Fisher, Benjamin C. Pierce, David Walker, and Steve Zdancewic. 2018. Synthesizing bijective lenses. PACMPL, 2, POPL (2018), 1:1–1:30. https://doi.org/10.1145/3158089
[23]
Raghunath Othayoth Nambiar and Meikel Poess. 2006. The Making of TPC-DS. In Proceedings of the 32nd International Conference on Very Large Data Bases, Seoul, Korea, September 12-15, 2006. ACM, 1049–1058. http://dl.acm.org/citation.cfm?id=1164217
[24]
Aditya G. Parameswaran. 2019. Enabling Data Science for the Majority. Proc. VLDB Endow., 12, 12 (2019), 2309–2322. https://doi.org/10.14778/3352063.3352148
[25]
Hila Peleg, Roi Gabay, Shachar Itzhaky, and Eran Yahav. 2020. Programming with a read-eval-synth loop. Proc. ACM Program. Lang., 4, OOPSLA (2020), 159:1–159:30. https://doi.org/10.1145/3428227
[26]
Phitchaya Mangpo Phothilimthana, Aditya Thakur, Rastislav Bodík, and Dinakar Dhurjati. 2016. Scaling up Superoptimization. In ASPLOS 2016, Atlanta, GA, USA, April 2-6, 2016. ACM, 297–310. https://doi.org/10.1145/2872362.2872387
[27]
Nadia Polikarpova, Ivan Kuraj, and Armando Solar-Lezama. 2016. Program synthesis from polymorphic refinement types. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2016, Santa Barbara, CA, USA, June 13-17, 2016. 522–538. https://doi.org/10.1145/2908080.2908093
[28]
Oleksandr Polozov and Sumit Gulwani. 2015. FlashMeta: A framework for inductive program synthesis. In Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications. 107–126.
[29]
Mukund Raghothaman, Jonathan Mendelson, David Zhao, Mayur Naik, and Bernhard Scholz. 2020. Provenance-guided synthesis of Datalog programs. Proc. ACM Program. Lang., 4, POPL (2020), 62:1–62:27. https://doi.org/10.1145/3371130
[30]
Andrew Reynolds, Haniel Barbosa, Andres Nötzli, Clark W. Barrett, and Cesare Tinelli. 2019. cvc4sy: Smart and Fast Term Enumeration for Syntax-Guided Synthesis. In CAV 2019, New York City, NY, USA, July 15-18, 2019,. 11562, Springer, 74–83. https://doi.org/10.1007/978-3-030-25543-5_5
[31]
Xujie Si, Woosuk Lee, Richard Zhang, Aws Albarghouthi, Paraschos Koutris, and Mayur Naik. 2018. Syntax-guided synthesis of Datalog programs. In ESEC/SIGSOFT FSE 2018, Lake Buena Vista, FL, USA, November 04-09, 2018. ACM, 515–527. https://doi.org/10.1145/3236024.3236034
[32]
Rishabh Singh and Sumit Gulwani. 2012. Synthesizing Number Transformations from Input-Output Examples. In CAV 2012, Berkeley, CA, USA, July 7-13, 2012 Proceedings (Lecture Notes in Computer Science, Vol. 7358). Springer, 634–651. https://doi.org/10.1007/978-3-642-31424-7_44
[33]
Calvin Smith and Aws Albarghouthi. 2016. MapReduce program synthesis. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2016, Santa Barbara, CA, USA, June 13-17, 2016, Chandra Krintz and Emery Berger (Eds.). ACM, 326–340. https://doi.org/10.1145/2908080.2908102
[34]
Armando Solar-Lezama. 2009. The Sketching Approach to Program Synthesis. In Programming Languages and Systems, 7th Asian Symposium, APLAS 2009, Seoul, Korea, December 14-16, 2009. Proceedings, Zhenjiang Hu (Ed.) (Lecture Notes in Computer Science, Vol. 5904). Springer, 4–13. https://doi.org/10.1007/978-3-642-10672-9_3
[35]
Keita Takenouchi, Takashi Ishio, Joji Okada, and Yuji Sakata. 2021. PATSQL: Efficient Synthesis of SQL Queries from Example Tables with Quick Inference of Projected Columns. Proc. VLDB Endow., 14, 11 (2021), 1937–1949. https://doi.org/10.14778/3476249.3476253
[36]
Aalok Thakkar, Aaditya Naik, Nathaniel Sands, Rajeev Alur, Mayur Naik, and Mukund Raghothaman. 2021. Example-guided synthesis of relational queries. In PLDI ’21: 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation, Virtual Event, Canada, June 20-25, 2021, Stephen N. Freund and Eran Yahav (Eds.). ACM, 1110–1125. https://doi.org/10.1145/3453483.3454098
[37]
Quoc Trung Tran, Chee-Yong Chan, and Srinivasan Parthasarathy. 2009. Query by output. In SIGMOD 2009, Providence, Rhode Island, USA, June 29 - July 2, 2009. 535–548. https://doi.org/10.1145/1559845.1559902
[38]
Abhishek Udupa, Arun Raghavan, Jyotirmoy V. Deshmukh, Sela Mador-Haim, Milo M.K. Martin, and Rajeev Alur. 2013. TRANSIT: Specifying Protocols with Concolic Snippets. SIGPLAN Not., 48, 6 (2013), June, 287–296. issn:0362-1340 https://doi.org/10.1145/2499370.2462174
[39]
Chenglong Wang, Alvin Cheung, and Rastislav Bodík. 2017. Synthesizing highly expressive SQL queries from input-output examples. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2017, Barcelona, Spain, June 18-23, 2017. 452–466. https://doi.org/10.1145/3062341.3062365
[40]
Chenglong Wang, Yu Feng, Rastislav Bodík, Alvin Cheung, and Isil Dillig. 2020. Visualization by example. Proc. ACM Program. Lang., 4, POPL (2020), 49:1–49:28. https://doi.org/10.1145/3371117
[41]
Chenglong Wang, Yu Feng, Rastislav Bodík, Isil Dillig, Alvin Cheung, and Amy J. Ko. 2021. Falx: Synthesis-Powered Visualization Authoring. In CHI ’21, Virtual Event / Yokohama, Japan, May 8-13, 2021. ACM, 106:1–106:15. https://doi.org/10.1145/3411764.3445249
[42]
Xinyu Wang, Isil Dillig, and Rishabh Singh. 2017. Synthesis of Data Completion Scripts using Finite Tree Automata. In Proc. International Conference on Object-Oriented Programming, Systems, Languages, and Applications. ACM, 62:1–62:26.
[43]
Xinyu Wang, Isil Dillig, and Rishabh Singh. 2017. Synthesis of data completion scripts using finite tree automata. Proc. ACM Program. Lang., 1, OOPSLA (2017), 62:1–62:26. https://doi.org/10.1145/3133886
[44]
Xinyu Wang, Isil Dillig, and Rishabh Singh. 2018. Program synthesis using abstraction refinement. Proc. ACM Program. Lang., 2, POPL (2018), 63:1–63:30. https://doi.org/10.1145/3158151
[45]
Max Willsey, Chandrakana Nandi, Yisu Remy Wang, Oliver Flatt, Zachary Tatlock, and Pavel Panchekha. 2021. egg: Fast and extensible equality saturation. Proc. ACM Program. Lang., 5, POPL (2021), 1–29. https://doi.org/10.1145/3434304
[46]
Navid Yaghmazadeh, Christian Klinger, Isil Dillig, and Swarat Chaudhuri. 2016. Synthesizing transformations on hierarchically structured data. In PLDI 2016, Santa Barbara, CA, USA, June 13-17, 2016. ACM, 508–521. https://doi.org/10.1145/2908080.2908088
[47]
Moshé M. Zloof. 1975. Query-by-Example: The Invocation and Definition of Tables and Forms. In Proceedings of the 1st International Conference on Very Large Data Bases (VLDB ’75). Association for Computing Machinery, New York, NY, USA. 1–24. isbn:9781450339209 https://doi.org/10.1145/1282480.1282482

Cited By

View all
  • (2024)Refinement Types for VisualizationProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695550(1871-1881)Online publication date: 27-Oct-2024
  • (2023)Relational Query Synthesis ⋈ Decision Tree LearningProceedings of the VLDB Endowment10.14778/3626292.362630617:2(250-263)Online publication date: 1-Oct-2023
  • (2023)SlabCity: Whole-Query Optimization Using Program SynthesisProceedings of the VLDB Endowment10.14778/3611479.361151516:11(3151-3164)Online publication date: 1-Jul-2023
  • Show More Cited By

Index Terms

  1. Synthesizing analytical SQL queries from computation demonstration

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    PLDI 2022: Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation
    June 2022
    1038 pages
    ISBN:9781450392655
    DOI:10.1145/3519939
    • General Chair:
    • Ranjit Jhala,
    • Program Chair:
    • Işil Dillig
    This work is licensed under a Creative Commons Attribution 4.0 International License.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 09 June 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Badges

    Author Tags

    1. program synthesis
    2. programing languages

    Qualifiers

    • Research-article

    Conference

    PLDI '22
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 406 of 2,067 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)231
    • Downloads (Last 6 weeks)36
    Reflects downloads up to 23 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Refinement Types for VisualizationProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695550(1871-1881)Online publication date: 27-Oct-2024
    • (2023)Relational Query Synthesis ⋈ Decision Tree LearningProceedings of the VLDB Endowment10.14778/3626292.362630617:2(250-263)Online publication date: 1-Oct-2023
    • (2023)SlabCity: Whole-Query Optimization Using Program SynthesisProceedings of the VLDB Endowment10.14778/3611479.361151516:11(3151-3164)Online publication date: 1-Jul-2023
    • (2023)Programming by Example Made EasyACM Transactions on Software Engineering and Methodology10.1145/360718533:1(1-36)Online publication date: 7-Jul-2023
    • (2023)ImageEye: Batch Image Processing using Program SynthesisProceedings of the ACM on Programming Languages10.1145/35912487:PLDI(686-711)Online publication date: 6-Jun-2023
    • (2023)Grounded Copilot: How Programmers Interact with Code-Generating ModelsProceedings of the ACM on Programming Languages10.1145/35860307:OOPSLA1(85-111)Online publication date: 6-Apr-2023

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media