research-article

Open access

Type-directed synthesis of visualizations from natural language queries

Authors:

Shankara Pailoor,

Celeste Barnaby,

Chenglong Wang,

Işil DilligAuthors Info & Claims

Proceedings of the ACM on Programming Languages, Volume 6, Issue OOPSLA2

Article No.: 144, Pages 532 - 559

https://doi.org/10.1145/3563307

Published: 31 October 2022 Publication History

Abstract

We propose a new technique based on program synthesis for automatically generating visualizations from natural language queries. Our method parses the natural language query into a refinement type specification using the intents-and-slots paradigm and leverages type-directed synthesis to generate a set of visualization programs that are most likely to meet the user's intent. Our refinement type system captures useful hints present in the natural language query and allows the synthesis algorithm to reject visualizations that violate well-established design guidelines for the input data set. We have implemented our ideas in a tool called Graphy and evaluated it on NLVCorpus, which consists of 3 popular datasets and over 700 real-world natural language queries. Our experiments show that Graphy significantly outperforms state-of-the-art natural language based visualization tools, including transformer and rule-based ones.

References

[1]

Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural Machine Translation by Jointly Learning to Align and Translate. In International Conference on Learning Representations. arxiv:1409.0473

[2]

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel Ziegler, Jeffrey Wu, Clemens Winter, Chris Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language Models are Few-Shot Learners. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin (Eds.). 33, Curran Associates, Inc., 1877–1901. https://proceedings.neurips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf

[3]

Qiaochu Chen, Shankara Pailoor, Celeste Barnaby, Abby Criswell, Chenglong Wang, Greg Durrett, and Isil Dillig. 2022. Type-Directed Synthesis of Visualizations from Natural Language Queries. https://doi.org/10.48550/ARXIV.2209.01081

[4]

Yanju Chen, Chenglong Wang, Osbert Bastani, Isil Dillig, and Yu Feng. 2020. Program Synthesis Using Deduction-Guided Reinforcement Learning. In Computer Aided Verification: 32nd International Conference, CAV 2020, Los Angeles, CA, USA, July 21–24, 2020, Proceedings, Part II. Springer-Verlag, Berlin, Heidelberg. 587–610. isbn:978-3-030-53290-1 https://doi.org/10.1007/978-3-030-53291-8_30

Digital Library

[5]

William Craig. 1957. Linear reasoning. A new form of the Herbrand-Gentzen theorem. Journal of Symbolic Logic, 22, 3 (1957), 250–268. https://doi.org/10.2307/2963593

[6]

Deborah A. Dahl, Madeleine Bates, Michael Brown, William Fisher, Kate Hunicke-Smith, David Pallett, Christine Pao, Alexander Rudnicky, and Elizabeth Shriberg. 1994. Expanding the Scope of the ATIS Task: The ATIS-3 Corpus. In Human Language Technology: Proceedings of a Workshop held at Plainsboro, New Jersey, March 8-11, 1994. https://aclanthology.org/H94-1010

Digital Library

[7]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota. 4171–4186. https://doi.org/10.18653/v1/N19-1423

[8]

Yu Feng, Ruben Martins, Osbert Bastani, and Isil Dillig. 2018. Program Synthesis Using Conflict-Driven Learning. 53, 4 (2018), jun, 420–435. issn:0362-1340 https://doi.org/10.1145/3296979.3192382

Digital Library

[9]

John K. Feser, Swarat Chaudhuri, and Isil Dillig. 2015. Synthesizing Data Structure Transformations from Input-Output Examples. SIGPLAN Not., 50, 6 (2015), jun, 229–239. issn:0362-1340 https://doi.org/10.1145/2813885.2737977

Digital Library

[10]

Jonathan Frankle, Peter-Michael Osera, David Walker, and Steve Zdancewic. 2016. Example-Directed Synthesis: A Type-Theoretic Interpretation. In Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’16). Association for Computing Machinery, New York, NY, USA. 802–815. isbn:9781450335492 https://doi.org/10.1145/2837614.2837629

Digital Library

[11]

Tong Gao, Mira Dontcheva, Eytan Adar, Zhicheng Liu, and Karrie G. Karahalios. 2015. DataTone: Managing Ambiguity in Natural Language Interfaces for Data Visualization. In Proceedings of the 28th Annual ACM Symposium on User Interface Software & Technology (UIST ’15). Association for Computing Machinery, New York, NY, USA. 489–500. isbn:9781450337793 https://doi.org/10.1145/2807442.2807478

Digital Library

[12]

Sumit Gulwani and Mark Marron. 2014. NLyze: Interactive Programming by Natural Language for Spreadsheet Data Analysis and Manipulation. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data (SIGMOD ’14). Association for Computing Machinery, New York, NY, USA. 803–814. isbn:9781450323765 https://doi.org/10.1145/2588555.2612177

Digital Library

[13]

Tihomir Gvero and Viktor Kuncak. 2015. Synthesizing Java Expressions from Free-Form Queries. In Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA 2015). Association for Computing Machinery, New York, NY, USA. 416–432. isbn:9781450336895 https://doi.org/10.1145/2814270.2814295

Digital Library

[14]

Charles T. Hemphill, John J. Godfrey, and George R. Doddington. 1990. The ATIS Spoken Language Systems Pilot Corpus. In Speech and Natural Language: Proceedings of a Workshop Held at Hidden Valley, Pennsylvania, June 24-27,1990. https://aclanthology.org/H90-1021

[15]

Minwoo Jeong and Gary Geunbae Lee. 2006. Exploiting Non-Local Features for Spoken Language Understanding. In Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions. Association for Computational Linguistics, Sydney, Australia. 412–419. https://aclanthology.org/P06-2054

[16]

Tristan Knoth, Di Wang, Nadia Polikarpova, and Jan Hoffmann. 2019. Resource-Guided Program Synthesis. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2019). Association for Computing Machinery, New York, NY, USA. 253–268. isbn:9781450367127 https://doi.org/10.1145/3314221.3314602

Digital Library

[17]

Kenneth Knowles and Cormac Flanagan. 2009. Compositional Reasoning and Decidable Checking for Dependent Contract Types. In Proceedings of the 3rd Workshop on Programming Languages Meets Program Verification (PLPV ’09). Association for Computing Machinery, New York, NY, USA. 27–38. isbn:9781605583303 https://doi.org/10.1145/1481848.1481853

Digital Library

[18]

Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Veselin Stoyanov, and Luke Zettlemoyer. 2020. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online. 7871–7880. https://doi.org/10.18653/v1/2020.acl-main.703

[19]

Xi Victoria Lin, Richard Socher, and Caiming Xiong. 2020. Bridging Textual and Tabular Data for Cross-Domain Text-to-SQL Semantic Parsing. In Findings of the Association for Computational Linguistics: EMNLP 2020. Association for Computational Linguistics, Online. 4870–4888. https://doi.org/10.18653/v1/2020.findings-emnlp.438

[20]

Xi Victoria Lin, Chenglong Wang, Luke Zettlemoyer, and Michael D. Ernst. 2018. NL2Bash: A Corpus and Semantic Parser for Natural Language Interface to the Linux Operating System. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). European Language Resources Association (ELRA), Miyazaki, Japan. https://aclanthology.org/L18-1491

[21]

Ilya Loshchilov and Frank Hutter. 2019. Decoupled Weight Decay Regularization. In International Conference on Learning Representations. https://openreview.net/forum?id=Bkg6RiCqY7

[22]

Yuyu Luo, Nan Tang, Guoliang Li, Chengliang Chai, Wenbo Li, and Xuedi Qin. 2021. Synthesizing Natural Language to Visualization (NL2VIS) Benchmarks from NL2SQL Benchmarks. Association for Computing Machinery, New York, NY, USA. 1235–1247. isbn:9781450383431 https://doi.org/10.1145/3448016.3457261

Digital Library

[23]

Y. Luo, N. Tang, G. Li, J. Tang, C. Chai, and X. Qin. 2022. Natural Language to Visualization by Neural Machine Translation. IEEE Transactions on Visualization and Computer Graphics, 28, 01 (2022), jan, 217–226. issn:1941-0506 https://doi.org/10.1109/TVCG.2021.3114848

Digital Library

[24]

Jock Mackinlay, Pat Hanrahan, and Chris Stolte. 2007. Show Me: Automatic Presentation for Visual Analysis. IEEE Transactions on Visualization and Computer Graphics, 13, 6 (2007), 1137–1144. https://doi.org/10.1109/TVCG.2007.70594

Digital Library

[25]

P. Martin-Lof, Z. A. Lozinski, Michael Francis Atiyah, Cecil Arthur Hoare, and J. C. Shepherdson. 1984. Constructive mathematics and computer programming. Philosophical Transactions of the Royal Society of London. Series A, Mathematical and Physical Sciences, 312, 1522 (1984), 501–518. https://doi.org/10.1098/rsta.1984.0073 arxiv:https://royalsocietypublishing.org/doi/pdf/10.1098/rsta.1984.0073.

[26]

Dominik Moritz, Chenglong Wang, Greg L. Nelson, Halden Lin, Adam M. Smith, Bill Howe, and Jeffrey Heer. 2019. Formalizing Visualization Design Knowledge as Constraints: Actionable and Extensible Models in Draco. IEEE Transactions on Visualization and Computer Graphics, 25, 1 (2019), 438–448. https://doi.org/10.1109/TVCG.2018.2865240

Digital Library

[27]

Arpit Narechania, Arjun Srinivasan, and John Stasko. 2021. NL4DV: A Toolkit for Generating Analytic Specifications for Data Visualization from Natural Language Queries. IEEE Transactions on Visualization and Computer Graphics, 27, 2 (2021), Feb, 369–379. issn:2160-9306 https://doi.org/10.1109/tvcg.2020.3030378

[28]

Peter-Michael Osera. 2019. Constraint-Based Type-Directed Program Synthesis. In Proceedings of the 4th ACM SIGPLAN International Workshop on Type-Driven Development (TyDe 2019). Association for Computing Machinery, New York, NY, USA. 64–76. isbn:9781450368155 https://doi.org/10.1145/3331554.3342608

Digital Library

[29]

Peter-Michael Osera and Steve Zdancewic. 2015. Type-and-Example-Directed Program Synthesis. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’15). Association for Computing Machinery, New York, NY, USA. 619–630. isbn:9781450334686 https://doi.org/10.1145/2737924.2738007

Digital Library

[30]

Benjamin C. Pierce. 2002. Types and Programming Languages (1st ed.). The MIT Press. isbn:0262162091

[31]

Gabriel Poesia, Alex Polozov, Vu Le, Ashish Tiwari, Gustavo Soares, Christopher Meek, and Sumit Gulwani. 2022. Synchromesh: Reliable Code Generation from Pre-trained Language Models. In International Conference on Learning Representations. https://openreview.net/forum?id=KmtVD97J43e

[32]

Nadia Polikarpova, Ivan Kuraj, and Armando Solar-Lezama. 2016. Program Synthesis from Polymorphic Refinement Types. SIGPLAN Not., 51, 6 (2016), jun, 522–538. issn:0362-1340 https://doi.org/10.1145/2980983.2908093

Digital Library

[33]

Xuedi Qin, Yuyu Luo, Nan Tang, and Guoliang Li. 2018. DeepEye: An automatic big data visualization framework. Big Data Mining and Analytics, 1, 1 (2018), 75–82. https://doi.org/10.26599/BDMA.2018.9020007

[34]

Patrick M. Rondon, Ming Kawaguchi, and Ranjit Jhala. 2008. Liquid Types. SIGPLAN Not., 43, 6 (2008), jun, 159–169. issn:0362-1340 https://doi.org/10.1145/1379022.1375602

Digital Library

[35]

Abigail See, Peter J. Liu, and Christopher D. Manning. 2017. Get To The Point: Summarization with Pointer-Generator Networks. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Vancouver, Canada. 1073–1083. https://doi.org/10.18653/v1/P17-1099

[36]

Arjun Srinivasan, Nikhila Nyapathy, Bongshin Lee, Steven M. Drucker, and John Stasko. 2021. Collecting and Characterizing Natural Language Utterances for Specifying Data Visualizations. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (CHI ’21). Association for Computing Machinery, New York, NY, USA. Article 464, 10 pages. isbn:9781450380966 https://doi.org/10.1145/3411764.3445400

Digital Library

[37]

Yiwen Sun, Jason Leigh, Andrew E. Johnson, and Sangyoon Lee. 2010. Articulate: A Semi-automated Model for Translating Natural Language Queries into Meaningful Visualizations. In Smart Graphics, 10th International Symposium on Smart Graphics, Banff, Canada, June 24-26, 2010, Proceedings, Robyn Taylor, Pierre Boulanger, Antonio Krüger, and Patrick Olivier (Eds.) (Lecture Notes in Computer Science, Vol. 6133). Springer, 184–195. https://doi.org/10.1007/978-3-642-13544-6_18

[38]

Gokhan Tur, Dilek Hakkani-Tür, and Larry Heck. 2010. What is left to be understood in ATIS? In 2010 IEEE Spoken Language Technology Workshop. 19–24. https://doi.org/10.1109/SLT.2010.5700816

[39]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). 30, Curran Associates, Inc. https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf

[40]

Bailin Wang, Richard Shin, Xiaodong Liu, Oleksandr Polozov, and Matthew Richardson. 2020. RAT-SQL: Relation-Aware Schema Encoding and Linking for Text-to-SQL Parsers. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online. 7567–7578. https://doi.org/10.18653/v1/2020.acl-main.677

[41]

Chenglong Wang, Yu Feng, Rastislav Bodik, Alvin Cheung, and Isil Dillig. 2019. Visualization by Example. Proc. ACM Program. Lang., 4, POPL (2019), Article 49, dec, 28 pages. https://doi.org/10.1145/3371117

Digital Library

[42]

Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Remi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, and Alexander Rush. 2020. Transformers: State-of-the-Art Natural Language Processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Association for Computational Linguistics, Online. 38–45. https://doi.org/10.18653/v1/2020.emnlp-demos.6

[43]

Kanit Wongsuphasawat, Dominik Moritz, Anushka Anand, Jock Mackinlay, Bill Howe, and Jeffrey Heer. 2015. Voyager: Exploratory analysis via faceted browsing of visualization recommendations. IEEE transactions on visualization and computer graphics, 22, 1 (2015), 649–658.

[44]

Navid Yaghmazadeh, Yuepeng Wang, Isil Dillig, and Thomas Dillig. 2017. SQLizer: Query Synthesis from Natural Language. Proc. ACM Program. Lang., 1, OOPSLA (2017), Article 63, oct, 26 pages. https://doi.org/10.1145/3133887

Digital Library

[45]

Xi Ye, Qiaochu Chen, Isil Dillig, and Greg Durrett. 2021. Optimal Neural Program Synthesis from Multimodal Specifications. In Findings of the Association for Computational Linguistics: EMNLP 2021. Association for Computational Linguistics, Punta Cana, Dominican Republic. 1691–1704. https://doi.org/10.18653/v1/2021.findings-emnlp.146

[46]

Bowen Yu and Claudio T. Silva. 2020. FlowSense: A Natural Language Interface for Visual Data Exploration within a Dataflow System. IEEE Transactions on Visualization and Computer Graphics, 26, 1 (2020), Jan, 1–11. issn:2160-9306 https://doi.org/10.1109/tvcg.2019.2934668

[47]

Bowen Yu and Cláudio T. Silva. 2020. FlowSense: A Natural Language Interface for Visual Data Exploration within a Dataflow System. IEEE Trans. Vis. Comput. Graph., 26, 1 (2020), 1–11. https://doi.org/10.1109/TVCG.2019.2934668

[48]

John M. Zelle and Raymond J. Mooney. 1996. Learning to Parse Database Queries using Inductive Logic Programming. In AAAI/IAAI. AAAI Press/MIT Press, Portland, OR. 1050–1055. http://www.cs.utexas.edu/users/ai-lab?zelle:aaai96

[49]

Luke S. Zettlemoyer and Michael Collins. 2005. Learning to Map Sentences to Logical Form: Structured Classification with Probabilistic Categorial Grammars. In Proceedings of the Twenty-First Conference on Uncertainty in Artificial Intelligence (UAI’05). AUAI Press, Arlington, Virginia, USA. 658–666. isbn:0974903914

Cited By

Vaithilingam PGlassman EInala JWang C(2024)DynaVis: Dynamically Synthesized UI Widgets for Visualization EditingProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642639(1-17)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3642639
Vázquez P(2024)Are LLMs ready for Visualization?2024 IEEE 17th Pacific Visualization Conference (PacificVis)10.1109/PacificVis60374.2024.00049(343-352)Online publication date: 23-Apr-2024
https://doi.org/10.1109/PacificVis60374.2024.00049
Ram GMuthumanikandan V(2024)Visistant: A Conversational Chatbot for Natural Language to Visualizations With Gemini Large Language ModelsIEEE Access10.1109/ACCESS.2024.346554112(138547-138563)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3465541
Show More Cited By

Index Terms

Type-directed synthesis of visualizations from natural language queries
1. Human-centered computing
  1. Visualization
    1. Visualization systems and tools
      1. Visualization toolkits
2. Software and its engineering
  1. Software creation and management
    1. Software development techniques
      1. Automatic programming

Recommendations

Visualization question answering using introspective program synthesis
PLDI 2022: Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation

While data visualization plays a crucial role in gaining insights from data, generating answers over complex visualizations from natural language questions is far from an easy task. Mainstream approaches reduce data visualization queries to a semantic ...
SQLizer: query synthesis from natural language

This paper presents a new technique for automatically synthesizing SQL queries from natural language (NL). At the core of our technique is a new NL-based program synthesis methodology that combines semantic parsing techniques from the NLP community with ...
Multi-modal synthesis of regular expressions
PLDI 2020: Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation

In this paper, we propose a multi-modal synthesis technique for automatically constructing regular expressions (regexes) from a combination of examples and natural language. Using multiple modalities is useful in this context because natural language ...

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Programming Languages

Proceedings of the ACM on Programming Languages Volume 6, Issue OOPSLA2

October 2022

1932 pages

EISSN:2475-1421

DOI:10.1145/3554307

Editor:
Philip Wadler
University of Edinburgh, UK

Issue’s Table of Contents

Copyright © 2022 Owner/Author.

This work is licensed under a Creative Commons Attribution 4.0 International License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 31 October 2022

Published in PACMPL Volume 6, Issue OOPSLA2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Science Foundation

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
407
Total Downloads

Downloads (Last 12 months)233
Downloads (Last 6 weeks)19

Reflects downloads up to 10 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Vaithilingam PGlassman EInala JWang C(2024)DynaVis: Dynamically Synthesized UI Widgets for Visualization EditingProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642639(1-17)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3642639
Vázquez P(2024)Are LLMs ready for Visualization?2024 IEEE 17th Pacific Visualization Conference (PacificVis)10.1109/PacificVis60374.2024.00049(343-352)Online publication date: 23-Apr-2024
https://doi.org/10.1109/PacificVis60374.2024.00049
Ram GMuthumanikandan V(2024)Visistant: A Conversational Chatbot for Natural Language to Visualizations With Gemini Large Language ModelsIEEE Access10.1109/ACCESS.2024.346554112(138547-138563)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3465541
Gou QDong YKe Q(2024)SynthoMinds: Bridging human programming intuition with retrieval, analogy, and reasoning in program synthesisJournal of Systems and Software10.1016/j.jss.2024.112140216(112140)Online publication date: Oct-2024
https://doi.org/10.1016/j.jss.2024.112140
Wang CThompson JLee B(2023)Data Formulator: AI-Powered Concept-Driven Visualization AuthoringIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2023.332658530:1(1128-1138)Online publication date: 23-Oct-2023
https://dl.acm.org/doi/10.1109/TVCG.2023.3326585

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents