Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3519939.3523709acmconferencesArticle/Chapter ViewAbstractPublication PagespldiConference Proceedingsconference-collections
research-article
Open access

Visualization question answering using introspective program synthesis

Published: 09 June 2022 Publication History

Abstract

While data visualization plays a crucial role in gaining insights from data, generating answers over complex visualizations from natural language questions is far from an easy task. Mainstream approaches reduce data visualization queries to a semantic parsing problem, which either relies on expensive-to-annotate supervised training data that pairs natural language questions with logical forms, or weakly supervised models that incorporate a larger corpus but fail on long-tailed queries without explanations. This paper aims to answer data visualization queries by automatically synthesizing the corresponding program from natural language. At the core of our technique is an abstract synthesis engine that is bootstrapped by an off-the-shelf weakly supervised model and an optimal synthesis algorithm guided by triangle alignment constraints, which represent consistency among natural language, visualization, and the synthesized program.
Starting with a few tentative answers obtained from an off-the-shelf statistical model, our approach first involves an abstract synthesizer that generates a set of sketches that are consistent with the answers. Then we design an instance of optimal synthesis to complete one of the candidate sketches by satisfying common type constraints and maximizing the consistency among three parties, i.e., natural language, the visualization, and the candidate program.
We implement the proposed idea in a system called Poe that can answer visualization queries from natural language. Our method is fully automated and does not require users to know the underlying schema of the visualizations. We evaluate Poe on 629 visualization queries and our experiment shows that Poe outperforms state-of-the-arts by improving the accuracy from 44% to 59%.

References

[1]
Maaz Bin Safeer Ahmad and Alvin Cheung. 2016. Leveraging Parallel Data Processing Frameworks with Verified Lifting. In Proceedings Fifth Workshop on Synthesis, SYNT@CAV 2016, Toronto, Canada, July 17-18, 2016, Ruzica Piskac and Rayna Dimitrova (Eds.) (EPTCS, Vol. 229). 67–83. https://doi.org/10.4204/EPTCS.229.7
[2]
Greg Anderson, Shankara Pailoor, Isil Dillig, and Swarat Chaudhuri. 2019. Optimization and Abstraction: A Synergistic Approach for Analyzing Neural Network Robustness. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2019). Association for Computing Machinery, New York, NY, USA. 731–744. isbn:9781450367127 https://doi.org/10.1145/3314221.3314614
[3]
Matej Balog, Alexander L Gaunt, Marc Brockschmidt, Sebastian Nowozin, and Daniel Tarlow. 2017. Deepcoder: Learning to write programs. In Proc. International Conference on Learning Representations. OpenReview. https://doi.org/10.48550/arXiv.1611.01989
[4]
Shraddha Barke, Hila Peleg, and Nadia Polikarpova. 2020. Just-in-Time Learning for Bottom-up Enumerative Synthesis. Proc. ACM Program. Lang., 4, OOPSLA (2020), nov, https://doi.org/10.1145/3428295
[5]
Rohan Bavishi, Caroline Lemieux, Roy Fox, Koushik Sen, and Ion Stoica. 2019. AutoPandas: neural-backed generators for program synthesis. PACMPL, 3, OOPSLA (2019), 168:1–168:27. https://doi.org/10.1145/3360594
[6]
Jonathan Berant, Andrew Chou, Roy Frostig, and Percy Liang. 2013. Semantic Parsing on Freebase from Question-Answer Pairs. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Seattle, Washington, USA. 1533–1544. https://www.aclweb.org/anthology/D13-1160
[7]
Steven Bird and Edward Loper. 2004. NLTK: The Natural Language Toolkit. In Proceedings of the ACL Interactive Poster and Demonstration Sessions. Association for Computational Linguistics, Barcelona, Spain. 214–217. https://aclanthology.org/P04-3031
[8]
James Bornholt and Emina Torlak. 2017. Synthesizing memory models from framework sketches and Litmus tests. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2017, Barcelona, Spain, June 18-23, 2017. 467–481. https://doi.org/10.1145/3062341.3062353
[9]
Qiaochu Chen, Xinyu Wang, Xi Ye, Greg Durrett, and Isil Dillig. 2020. Multi-Modal Synthesis of Regular Expressions. In Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2020). Association for Computing Machinery, New York, NY, USA. 487–502. isbn:9781450376136 https://doi.org/10.1145/3385412.3385988
[10]
Yanju Chen, Ruben Martins, and Yu Feng. 2019. Maximal multi-layer specification synthesis. In Proceedings of the ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/SIGSOFT FSE 2019, Tallinn, Estonia, August 26-30, 2019. 602–612. https://doi.org/10.1145/3338906.3338951
[11]
Yanju Chen, Chenglong Wang, Osbert Bastani, Isil Dillig, and Yu Feng. 2020. Program Synthesis Using Deduction-Guided Reinforcement Learning. In Computer Aided Verification, Shuvendu K Lahiri and Chao Wang (Eds.). Springer International Publishing, Cham. 587–610. isbn:978-3-030-53291-8 https://doi.org/10.1007/978-3-030-53291-8_30
[12]
Kevin Clark, Urvashi Khandelwal, Omer Levy, and Christopher D. Manning. 2019. What Does BERT Look At? An Analysis of BERT’s Attention. arxiv:1906.04341.
[13]
Philip Edmonds and Graeme Hirst. 2002. Near-Synonymy and Lexical Choice. Computational Linguistics, 28, 2 (2002), 105–144. issn:0891-2017 https://doi.org/10.1162/089120102760173625
[14]
Kevin Ellis, Daniel Ritchie, Armando Solar-Lezama, and Josh Tenenbaum. 2018. Learning to infer graphics programs from hand-drawn images. In Advances in neural information processing systems. 6059–6068. https://doi.org/10.48550/arXiv.1707.09627
[15]
Yu Feng, Ruben Martins, Osbert Bastani, and Isil Dillig. 2018. Program synthesis using conflict-driven learning. In Proc. Conference on Programming Language Design and Implementation. 420–435. https://doi.org/10.1145/3192366.3192382
[16]
Yu Feng, Ruben Martins, Jacob Van Geffen, Isil Dillig, and Swarat Chaudhuri. 2017. Component-based synthesis of table consolidation and transformation tasks from examples. In Proc. Conference on Programming Language Design and Implementation. 422–436. https://doi.org/10.1145/3062341.3062351
[17]
John K. Feser, Swarat Chaudhuri, and Isil Dillig. 2015. Synthesizing data structure transformations from input-output examples. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation, Portland, OR, USA, June 15-17, 2015. 229–239. https://doi.org/10.1145/2737924.2737977
[18]
Sumit Gulwani. 2011. Automating string processing in spreadsheets using input-output examples. In Proc. Symposium on Principles of Programming Languages. ACM, 317–330. https://doi.org/10.1145/1926385.1926423
[19]
Zellig S Harris. 1954. Distributional Structure. WORD, 10, 2-3 (1954), 146–162. https://doi.org/10.1080/00437956.1954.11659520
[20]
Jonathan Herzig, Pawel Krzysztof Nowak, Thomas Müller, Francesco Piccinno, and Julian Martin Eisenschlos. 2020. TaPas: Weakly Supervised Table Parsing via Pre-training. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020. Association for Computational Linguistics, 4320–4333. https://doi.org/10.18653/v1/2020.acl-main.398
[21]
Susmit Jha, Sumit Gulwani, Sanjit A. Seshia, and Ashish Tiwari. 2010. Oracle-guided component-based program synthesis. In Proc. International Conference on Software Engineering. ACM/IEEE, 215–224. https://doi.org/10.1145/1806799.1806833
[22]
Dae Hyun Kim, Enamul Hoque, and Maneesh Agrawala. 2020. Answering Questions about Charts and Generating Visual Explanations. In CHI ’20: CHI Conference on Human Factors in Computing Systems, Honolulu, HI, USA, April 25-30, 2020, Regina Bernhaupt, Florian ’Floyd’ Mueller, David Verweij, Josh Andres, Joanna McGrenere, Andy Cockburn, Ignacio Avellino, Alix Goguey, Pernille Bjøn, Shengdong Zhao, Briane Paul Samson, and Rafal Kocielnik (Eds.). ACM, 1–13. https://doi.org/10.1145/3313831.3376467
[23]
Mina Lee, Sunbeom So, and Hakjoo Oh. 2016. Synthesizing regular expressions from examples for introductory automata assignments. In Proceedings of the 2016 ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences. 70–80. https://doi.org/10.1145/3093335.2993244
[24]
Ruben Martins, Jia Chen, Yanju Chen, Yu Feng, and Isil Dillig. 2019. Trinity: an extensible synthesis framework for data science. Proceedings of the VLDB Endowment, 12, 12 (2019), 1914–1917. https://doi.org/10.14778/3352063.3352098
[25]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed Representations of Words and Phrases and their Compositionality. In Advances in Neural Information Processing Systems, C J C Burges, L Bottou, M Welling, Z Ghahramani, and K Q Weinberger (Eds.). 26, Curran Associates, Inc. https://doi.org/10.48550/arXiv.1310.4546
[26]
Anders Miltner, Solomon Maina, Kathleen Fisher, Benjamin C Pierce, David Walker, and Steve Zdancewic. 2019. Synthesizing symmetric lenses. Proceedings of the ACM on Programming Languages, 3, ICFP (2019), 1–28. https://doi.org/10.1145/3341699
[27]
Ines Montani, Matthew Honnibal, Matthew Honnibal, Sofie Van Landeghem, Adriane Boyd, Henning Peters, Paul O’Leary McCann, Maxim Samsonov, Jim Geovedi, Jim O’Regan, György Orosz, Duygu Altinok, Søren Lind Kristiansen, Roman, Explosion Bot, Leander Fiedler, Grégory Howard, Wannaphong Phatthiyaphaibun, Yohei Tamura, Sam Bozek, murat, Mark Amery, Björn Böing, Pradeep Kumar Tippa, Leif Uwe Vogelsang, Bram Vanroy, Ramanan Balakrishnan, Vadim Mazaev, and GregDubbin. 2021. explosion/spaCy: v3.2.0: Registered scoring functions, Doc input, floret vectors and more. https://doi.org/10.5281/zenodo.5648257
[28]
Jeffrey Pennington, Richard Socher, and Christopher D Manning. 2014. GloVe: Global Vectors for Word Representation. In Empirical Methods in Natural Language Processing (EMNLP). 1532–1543. https://doi.org/10.3115/v1/D14-1162
[29]
Fabio Petroni, Tim Rocktäschel, Sebastian Riedel, Patrick S. H. Lewis, Anton Bakhtin, Yuxiang Wu, and Alexander H. Miller. 2019. Language Models as Knowledge Bases? In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP. 2463–2473. https://doi.org/10.18653/v1/D19-1250
[30]
Nadia Polikarpova, Ivan Kuraj, and Armando Solar-Lezama. 2016. Program synthesis from polymorphic refinement types. Proc. Conference on Programming Language Design and Implementation, 522–538. https://doi.org/10.1145/2908080.2908093
[31]
Oleksandr Polozov and Sumit Gulwani. 2015. FlashMeta: a framework for inductive program synthesis. In Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2015, part of SPLASH 2015, Pittsburgh, PA, USA, October 25-30, 2015. 107–126. https://doi.org/10.1145/2814270.2814310
[32]
Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. "Why Should I Trust You?": Explaining the Predictions of Any Classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1135–1144. https://doi.org/10.1145/2939672.2939778
[33]
Ohad Rubin and Jonathan Berant. 2021. SmBoP: Semi-autoregressive Bottom-up Semantic Parsing. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies NAACL-HLT. 311–324. https://doi.org/10.18653/v1/2021.spnlp-1.2
[34]
Arvind Satyanarayan, Dominik Moritz, Kanit Wongsuphasawat, and Jeffrey Heer. 2017. Vega-Lite: A Grammar of Interactive Graphics. IEEE Transactions on Visualization and Computer Graphics, 23, 1 (2017), 341–350. https://doi.org/10.1109/TVCG.2016.2599030
[35]
Richard Shin, Miltiadis Allamanis, Marc Brockschmidt, and Oleksandr Polozov. 2019. Program Synthesis and Semantic Parsing with Learned Code Idioms. In Proceedings of the 33rd International Conference on Neural Information Processing Systems. Curran Associates Inc., Red Hook, NY, USA. Article 971, 11 pages. https://doi.org/10.48550/arXiv.1906.10816
[36]
Xujie Si, Hanjun Dai, Mukund Raghothaman, Mayur Naik, and Le Song. 2018. Learning Loop Invariants for Program Verification. In Proceedings of the 32nd International Conference on Neural Information Processing Systems (NIPS’18). Curran Associates Inc., Red Hook, NY, USA. 7762–7773.
[37]
Xujie Si, Yuan Yang, Hanjun Dai, Mayur Naik, and Le Song. 2019. Learning a Meta-Solver for Syntax-Guided Program Synthesis. In International Conference on Learning Representations. https://openreview.net/forum?id=Syl8Sn0cK7
[38]
Armando Solar-Lezama, Liviu Tancau, Rastislav Bodík, Sanjit A. Seshia, and Vijay A. Saraswat. 2006. Combinatorial sketching for finite programs. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2006, San Jose, CA, USA, October 21-25, 2006. 404–415. https://doi.org/10.1145/1168857.1168907
[39]
Emina Torlak and Rastislav Bodík. 2014. A lightweight symbolic virtual machine for solver-aided host languages. In ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’14, Edinburgh, United Kingdom - June 09 - 11, 2014. 530–541. https://doi.org/10.1145/2594291.2594340
[40]
W3C. 2017. Accessible Rich Internet Applications (WAI-ARIA) 1.1. https://www.w3.org/TR/wai-aria/ Accessed: 2021-11-14
[41]
Bailin Wang, Richard Shin, Xiaodong Liu, Oleksandr Polozov, and Matthew Richardson. 2020. RAT-SQL: Relation-Aware Schema Encoding and Linking for Text-to-SQL Parsers. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 7567–7578. https://doi.org/10.18653/v1/2020.acl-main.677
[42]
Chenglong Wang, Yu Feng, Rastislav Bodík, Alvin Cheung, and Isil Dillig. 2020. Visualization by example. Proc. ACM Program. Lang., 4, POPL (2020), 49:1–49:28. https://doi.org/10.1145/3371117
[43]
Xinyu Wang, Isil Dillig, and Rishabh Singh. 2018. Program Synthesis using Abstraction Refinement. In Proc. Symposium on Principles of Programming Languages. ACM, 63:1–63:30. https://doi.org/10.1145/3158151
[44]
Navid Yaghmazadeh, Yuepeng Wang, Isil Dillig, and Thomas Dillig. 2017. SQLizer: Query Synthesis from Natural Language. In Proc. International Conference on Object-Oriented Programming, Systems, Languages, and Applications. ACM, 63:1–63:26. https://doi.org/10.1145/3133887
[45]
Pengcheng Yin and Graham Neubig. 2017. A Syntactic Neural Model for General-Purpose Code Generation. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 440–450. https://doi.org/10.18653/v1/P17-1041
[46]
Tao Yu, Michihiro Yasunaga, Kai Yang, Rui Zhang, Dongxu Wang, Zifan Li, and Dragomir Radev. 2018. SyntaxSQLNet: Syntax Tree Networks for Complex and Cross-Domain Text-to-SQL Task. In Proceedings of EMNLP. Association for Computational Linguistics. https://doi.org/10.18653/v1/D18-1193
[47]
Tao Yu, Rui Zhang, Kai Yang, Michihiro Yasunaga, Dongxu Wang, Zifan Li, James Ma, Irene Li, Qingning Yao, Shanelle Roman, Zilin Zhang, and Dragomir Radev. 2018. Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross- Domain Semantic Parsing and Text-to-SQL Task. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 3911–3921. https://doi.org/10.18653/v1/D18-1425
[48]
Victor Zhong, Caiming Xiong, and Richard Socher. 2017. Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning. In arXiv:1709.00103[cs]. https://doi.org/10.48550/arXiv.1709.00103
[49]
He Zhu, Zikang Xiong, Stephen Magill, and Suresh Jagannathan. 2019. An Inductive Synthesis Framework for Verifiable Reinforcement Learning. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2019). Association for Computing Machinery, New York, NY, USA. 686–701. isbn:9781450367127 https://doi.org/10.1145/3314221.3314638

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PLDI 2022: Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation
June 2022
1038 pages
ISBN:9781450392655
DOI:10.1145/3519939
  • General Chair:
  • Ranjit Jhala,
  • Program Chair:
  • Işil Dillig
This work is licensed under a Creative Commons Attribution 4.0 International License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 June 2022

Permissions

Request permissions for this article.

Check for updates

Badges

Author Tags

  1. Machine Learning
  2. Natural Language Processing
  3. Program Synthesis
  4. Visualization

Qualifiers

  • Research-article

Conference

PLDI '22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 406 of 2,067 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 677
    Total Downloads
  • Downloads (Last 12 months)184
  • Downloads (Last 6 weeks)24
Reflects downloads up to 26 Sep 2024

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media