Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

Type-directed synthesis of visualizations from natural language queries

Published: 31 October 2022 Publication History

Abstract

We propose a new technique based on program synthesis for automatically generating visualizations from natural language queries. Our method parses the natural language query into a refinement type specification using the intents-and-slots paradigm and leverages type-directed synthesis to generate a set of visualization programs that are most likely to meet the user's intent. Our refinement type system captures useful hints present in the natural language query and allows the synthesis algorithm to reject visualizations that violate well-established design guidelines for the input data set. We have implemented our ideas in a tool called Graphy and evaluated it on NLVCorpus, which consists of 3 popular datasets and over 700 real-world natural language queries. Our experiments show that Graphy significantly outperforms state-of-the-art natural language based visualization tools, including transformer and rule-based ones.

References

[1]
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural Machine Translation by Jointly Learning to Align and Translate. In International Conference on Learning Representations. arxiv:1409.0473
[2]
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel Ziegler, Jeffrey Wu, Clemens Winter, Chris Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language Models are Few-Shot Learners. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin (Eds.). 33, Curran Associates, Inc., 1877–1901. https://proceedings.neurips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf
[3]
Qiaochu Chen, Shankara Pailoor, Celeste Barnaby, Abby Criswell, Chenglong Wang, Greg Durrett, and Isil Dillig. 2022. Type-Directed Synthesis of Visualizations from Natural Language Queries. https://doi.org/10.48550/ARXIV.2209.01081
[4]
Yanju Chen, Chenglong Wang, Osbert Bastani, Isil Dillig, and Yu Feng. 2020. Program Synthesis Using Deduction-Guided Reinforcement Learning. In Computer Aided Verification: 32nd International Conference, CAV 2020, Los Angeles, CA, USA, July 21–24, 2020, Proceedings, Part II. Springer-Verlag, Berlin, Heidelberg. 587–610. isbn:978-3-030-53290-1 https://doi.org/10.1007/978-3-030-53291-8_30
[5]
William Craig. 1957. Linear reasoning. A new form of the Herbrand-Gentzen theorem. Journal of Symbolic Logic, 22, 3 (1957), 250–268. https://doi.org/10.2307/2963593
[6]
Deborah A. Dahl, Madeleine Bates, Michael Brown, William Fisher, Kate Hunicke-Smith, David Pallett, Christine Pao, Alexander Rudnicky, and Elizabeth Shriberg. 1994. Expanding the Scope of the ATIS Task: The ATIS-3 Corpus. In Human Language Technology: Proceedings of a Workshop held at Plainsboro, New Jersey, March 8-11, 1994. https://aclanthology.org/H94-1010
[7]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota. 4171–4186. https://doi.org/10.18653/v1/N19-1423
[8]
Yu Feng, Ruben Martins, Osbert Bastani, and Isil Dillig. 2018. Program Synthesis Using Conflict-Driven Learning. 53, 4 (2018), jun, 420–435. issn:0362-1340 https://doi.org/10.1145/3296979.3192382
[9]
John K. Feser, Swarat Chaudhuri, and Isil Dillig. 2015. Synthesizing Data Structure Transformations from Input-Output Examples. SIGPLAN Not., 50, 6 (2015), jun, 229–239. issn:0362-1340 https://doi.org/10.1145/2813885.2737977
[10]
Jonathan Frankle, Peter-Michael Osera, David Walker, and Steve Zdancewic. 2016. Example-Directed Synthesis: A Type-Theoretic Interpretation. In Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’16). Association for Computing Machinery, New York, NY, USA. 802–815. isbn:9781450335492 https://doi.org/10.1145/2837614.2837629
[11]
Tong Gao, Mira Dontcheva, Eytan Adar, Zhicheng Liu, and Karrie G. Karahalios. 2015. DataTone: Managing Ambiguity in Natural Language Interfaces for Data Visualization. In Proceedings of the 28th Annual ACM Symposium on User Interface Software & Technology (UIST ’15). Association for Computing Machinery, New York, NY, USA. 489–500. isbn:9781450337793 https://doi.org/10.1145/2807442.2807478
[12]
Sumit Gulwani and Mark Marron. 2014. NLyze: Interactive Programming by Natural Language for Spreadsheet Data Analysis and Manipulation. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data (SIGMOD ’14). Association for Computing Machinery, New York, NY, USA. 803–814. isbn:9781450323765 https://doi.org/10.1145/2588555.2612177
[13]
Tihomir Gvero and Viktor Kuncak. 2015. Synthesizing Java Expressions from Free-Form Queries. In Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA 2015). Association for Computing Machinery, New York, NY, USA. 416–432. isbn:9781450336895 https://doi.org/10.1145/2814270.2814295
[14]
Charles T. Hemphill, John J. Godfrey, and George R. Doddington. 1990. The ATIS Spoken Language Systems Pilot Corpus. In Speech and Natural Language: Proceedings of a Workshop Held at Hidden Valley, Pennsylvania, June 24-27,1990. https://aclanthology.org/H90-1021
[15]
Minwoo Jeong and Gary Geunbae Lee. 2006. Exploiting Non-Local Features for Spoken Language Understanding. In Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions. Association for Computational Linguistics, Sydney, Australia. 412–419. https://aclanthology.org/P06-2054
[16]
Tristan Knoth, Di Wang, Nadia Polikarpova, and Jan Hoffmann. 2019. Resource-Guided Program Synthesis. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2019). Association for Computing Machinery, New York, NY, USA. 253–268. isbn:9781450367127 https://doi.org/10.1145/3314221.3314602
[17]
Kenneth Knowles and Cormac Flanagan. 2009. Compositional Reasoning and Decidable Checking for Dependent Contract Types. In Proceedings of the 3rd Workshop on Programming Languages Meets Program Verification (PLPV ’09). Association for Computing Machinery, New York, NY, USA. 27–38. isbn:9781605583303 https://doi.org/10.1145/1481848.1481853
[18]
Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Veselin Stoyanov, and Luke Zettlemoyer. 2020. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online. 7871–7880. https://doi.org/10.18653/v1/2020.acl-main.703
[19]
Xi Victoria Lin, Richard Socher, and Caiming Xiong. 2020. Bridging Textual and Tabular Data for Cross-Domain Text-to-SQL Semantic Parsing. In Findings of the Association for Computational Linguistics: EMNLP 2020. Association for Computational Linguistics, Online. 4870–4888. https://doi.org/10.18653/v1/2020.findings-emnlp.438
[20]
Xi Victoria Lin, Chenglong Wang, Luke Zettlemoyer, and Michael D. Ernst. 2018. NL2Bash: A Corpus and Semantic Parser for Natural Language Interface to the Linux Operating System. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). European Language Resources Association (ELRA), Miyazaki, Japan. https://aclanthology.org/L18-1491
[21]
Ilya Loshchilov and Frank Hutter. 2019. Decoupled Weight Decay Regularization. In International Conference on Learning Representations. https://openreview.net/forum?id=Bkg6RiCqY7
[22]
Yuyu Luo, Nan Tang, Guoliang Li, Chengliang Chai, Wenbo Li, and Xuedi Qin. 2021. Synthesizing Natural Language to Visualization (NL2VIS) Benchmarks from NL2SQL Benchmarks. Association for Computing Machinery, New York, NY, USA. 1235–1247. isbn:9781450383431 https://doi.org/10.1145/3448016.3457261
[23]
Y. Luo, N. Tang, G. Li, J. Tang, C. Chai, and X. Qin. 2022. Natural Language to Visualization by Neural Machine Translation. IEEE Transactions on Visualization and Computer Graphics, 28, 01 (2022), jan, 217–226. issn:1941-0506 https://doi.org/10.1109/TVCG.2021.3114848
[24]
Jock Mackinlay, Pat Hanrahan, and Chris Stolte. 2007. Show Me: Automatic Presentation for Visual Analysis. IEEE Transactions on Visualization and Computer Graphics, 13, 6 (2007), 1137–1144. https://doi.org/10.1109/TVCG.2007.70594
[25]
P. Martin-Lof, Z. A. Lozinski, Michael Francis Atiyah, Cecil Arthur Hoare, and J. C. Shepherdson. 1984. Constructive mathematics and computer programming. Philosophical Transactions of the Royal Society of London. Series A, Mathematical and Physical Sciences, 312, 1522 (1984), 501–518. https://doi.org/10.1098/rsta.1984.0073 arxiv:https://royalsocietypublishing.org/doi/pdf/10.1098/rsta.1984.0073.
[26]
Dominik Moritz, Chenglong Wang, Greg L. Nelson, Halden Lin, Adam M. Smith, Bill Howe, and Jeffrey Heer. 2019. Formalizing Visualization Design Knowledge as Constraints: Actionable and Extensible Models in Draco. IEEE Transactions on Visualization and Computer Graphics, 25, 1 (2019), 438–448. https://doi.org/10.1109/TVCG.2018.2865240
[27]
Arpit Narechania, Arjun Srinivasan, and John Stasko. 2021. NL4DV: A Toolkit for Generating Analytic Specifications for Data Visualization from Natural Language Queries. IEEE Transactions on Visualization and Computer Graphics, 27, 2 (2021), Feb, 369–379. issn:2160-9306 https://doi.org/10.1109/tvcg.2020.3030378
[28]
Peter-Michael Osera. 2019. Constraint-Based Type-Directed Program Synthesis. In Proceedings of the 4th ACM SIGPLAN International Workshop on Type-Driven Development (TyDe 2019). Association for Computing Machinery, New York, NY, USA. 64–76. isbn:9781450368155 https://doi.org/10.1145/3331554.3342608
[29]
Peter-Michael Osera and Steve Zdancewic. 2015. Type-and-Example-Directed Program Synthesis. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’15). Association for Computing Machinery, New York, NY, USA. 619–630. isbn:9781450334686 https://doi.org/10.1145/2737924.2738007
[30]
Benjamin C. Pierce. 2002. Types and Programming Languages (1st ed.). The MIT Press. isbn:0262162091
[31]
Gabriel Poesia, Alex Polozov, Vu Le, Ashish Tiwari, Gustavo Soares, Christopher Meek, and Sumit Gulwani. 2022. Synchromesh: Reliable Code Generation from Pre-trained Language Models. In International Conference on Learning Representations. https://openreview.net/forum?id=KmtVD97J43e
[32]
Nadia Polikarpova, Ivan Kuraj, and Armando Solar-Lezama. 2016. Program Synthesis from Polymorphic Refinement Types. SIGPLAN Not., 51, 6 (2016), jun, 522–538. issn:0362-1340 https://doi.org/10.1145/2980983.2908093
[33]
Xuedi Qin, Yuyu Luo, Nan Tang, and Guoliang Li. 2018. DeepEye: An automatic big data visualization framework. Big Data Mining and Analytics, 1, 1 (2018), 75–82. https://doi.org/10.26599/BDMA.2018.9020007
[34]
Patrick M. Rondon, Ming Kawaguchi, and Ranjit Jhala. 2008. Liquid Types. SIGPLAN Not., 43, 6 (2008), jun, 159–169. issn:0362-1340 https://doi.org/10.1145/1379022.1375602
[35]
Abigail See, Peter J. Liu, and Christopher D. Manning. 2017. Get To The Point: Summarization with Pointer-Generator Networks. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Vancouver, Canada. 1073–1083. https://doi.org/10.18653/v1/P17-1099
[36]
Arjun Srinivasan, Nikhila Nyapathy, Bongshin Lee, Steven M. Drucker, and John Stasko. 2021. Collecting and Characterizing Natural Language Utterances for Specifying Data Visualizations. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (CHI ’21). Association for Computing Machinery, New York, NY, USA. Article 464, 10 pages. isbn:9781450380966 https://doi.org/10.1145/3411764.3445400
[37]
Yiwen Sun, Jason Leigh, Andrew E. Johnson, and Sangyoon Lee. 2010. Articulate: A Semi-automated Model for Translating Natural Language Queries into Meaningful Visualizations. In Smart Graphics, 10th International Symposium on Smart Graphics, Banff, Canada, June 24-26, 2010, Proceedings, Robyn Taylor, Pierre Boulanger, Antonio Krüger, and Patrick Olivier (Eds.) (Lecture Notes in Computer Science, Vol. 6133). Springer, 184–195. https://doi.org/10.1007/978-3-642-13544-6_18
[38]
Gokhan Tur, Dilek Hakkani-Tür, and Larry Heck. 2010. What is left to be understood in ATIS? In 2010 IEEE Spoken Language Technology Workshop. 19–24. https://doi.org/10.1109/SLT.2010.5700816
[39]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). 30, Curran Associates, Inc. https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
[40]
Bailin Wang, Richard Shin, Xiaodong Liu, Oleksandr Polozov, and Matthew Richardson. 2020. RAT-SQL: Relation-Aware Schema Encoding and Linking for Text-to-SQL Parsers. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online. 7567–7578. https://doi.org/10.18653/v1/2020.acl-main.677
[41]
Chenglong Wang, Yu Feng, Rastislav Bodik, Alvin Cheung, and Isil Dillig. 2019. Visualization by Example. Proc. ACM Program. Lang., 4, POPL (2019), Article 49, dec, 28 pages. https://doi.org/10.1145/3371117
[42]
Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Remi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, and Alexander Rush. 2020. Transformers: State-of-the-Art Natural Language Processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Association for Computational Linguistics, Online. 38–45. https://doi.org/10.18653/v1/2020.emnlp-demos.6
[43]
Kanit Wongsuphasawat, Dominik Moritz, Anushka Anand, Jock Mackinlay, Bill Howe, and Jeffrey Heer. 2015. Voyager: Exploratory analysis via faceted browsing of visualization recommendations. IEEE transactions on visualization and computer graphics, 22, 1 (2015), 649–658.
[44]
Navid Yaghmazadeh, Yuepeng Wang, Isil Dillig, and Thomas Dillig. 2017. SQLizer: Query Synthesis from Natural Language. Proc. ACM Program. Lang., 1, OOPSLA (2017), Article 63, oct, 26 pages. https://doi.org/10.1145/3133887
[45]
Xi Ye, Qiaochu Chen, Isil Dillig, and Greg Durrett. 2021. Optimal Neural Program Synthesis from Multimodal Specifications. In Findings of the Association for Computational Linguistics: EMNLP 2021. Association for Computational Linguistics, Punta Cana, Dominican Republic. 1691–1704. https://doi.org/10.18653/v1/2021.findings-emnlp.146
[46]
Bowen Yu and Claudio T. Silva. 2020. FlowSense: A Natural Language Interface for Visual Data Exploration within a Dataflow System. IEEE Transactions on Visualization and Computer Graphics, 26, 1 (2020), Jan, 1–11. issn:2160-9306 https://doi.org/10.1109/tvcg.2019.2934668
[47]
Bowen Yu and Cláudio T. Silva. 2020. FlowSense: A Natural Language Interface for Visual Data Exploration within a Dataflow System. IEEE Trans. Vis. Comput. Graph., 26, 1 (2020), 1–11. https://doi.org/10.1109/TVCG.2019.2934668
[48]
John M. Zelle and Raymond J. Mooney. 1996. Learning to Parse Database Queries using Inductive Logic Programming. In AAAI/IAAI. AAAI Press/MIT Press, Portland, OR. 1050–1055. http://www.cs.utexas.edu/users/ai-lab?zelle:aaai96
[49]
Luke S. Zettlemoyer and Michael Collins. 2005. Learning to Map Sentences to Logical Form: Structured Classification with Probabilistic Categorial Grammars. In Proceedings of the Twenty-First Conference on Uncertainty in Artificial Intelligence (UAI’05). AUAI Press, Arlington, Virginia, USA. 658–666. isbn:0974903914

Cited By

View all
  • (2024)DynaVis: Dynamically Synthesized UI Widgets for Visualization EditingProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642639(1-17)Online publication date: 11-May-2024
  • (2024)Are LLMs ready for Visualization?2024 IEEE 17th Pacific Visualization Conference (PacificVis)10.1109/PacificVis60374.2024.00049(343-352)Online publication date: 23-Apr-2024
  • (2024)Visistant: A Conversational Chatbot for Natural Language to Visualizations With Gemini Large Language ModelsIEEE Access10.1109/ACCESS.2024.346554112(138547-138563)Online publication date: 2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Programming Languages
Proceedings of the ACM on Programming Languages  Volume 6, Issue OOPSLA2
October 2022
1932 pages
EISSN:2475-1421
DOI:10.1145/3554307
Issue’s Table of Contents
This work is licensed under a Creative Commons Attribution 4.0 International License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 31 October 2022
Published in PACMPL Volume 6, Issue OOPSLA2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Data Visualization
  2. Program Synthesis
  3. Programming by Natural Languages

Qualifiers

  • Research-article

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)233
  • Downloads (Last 6 weeks)19
Reflects downloads up to 10 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)DynaVis: Dynamically Synthesized UI Widgets for Visualization EditingProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642639(1-17)Online publication date: 11-May-2024
  • (2024)Are LLMs ready for Visualization?2024 IEEE 17th Pacific Visualization Conference (PacificVis)10.1109/PacificVis60374.2024.00049(343-352)Online publication date: 23-Apr-2024
  • (2024)Visistant: A Conversational Chatbot for Natural Language to Visualizations With Gemini Large Language ModelsIEEE Access10.1109/ACCESS.2024.346554112(138547-138563)Online publication date: 2024
  • (2024)SynthoMinds: Bridging human programming intuition with retrieval, analogy, and reasoning in program synthesisJournal of Systems and Software10.1016/j.jss.2024.112140216(112140)Online publication date: Oct-2024
  • (2023)Data Formulator: AI-Powered Concept-Driven Visualization AuthoringIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2023.332658530:1(1128-1138)Online publication date: 23-Oct-2023

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media