research-article

Natural Language Dataset Generation Framework for Visualizations Powered by Large Language Models

Authors:

Jinwook SeoAuthors Info & Claims

CHI '24: Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems

Article No.: 843, Pages 1 - 22

https://doi.org/10.1145/3613904.3642943

Published: 11 May 2024 Publication History

Abstract

We introduce VL2NL, a Large Language Model (LLM) framework that generates rich and diverse NL datasets using Vega-Lite specifications as input, thereby streamlining the development of Natural Language Interfaces (NLIs) for data visualization. To synthesize relevant chart semantics accurately and enhance syntactic diversity in each NL dataset, we leverage 1) a guided discovery incorporated into prompting so that LLMs can steer themselves to create faithful NL datasets in a self-directed manner; 2) a score-based paraphrasing to augment NL syntax along with four language axes. We also present a new collection of 1,981 real-world Vega-Lite specifications that have increased diversity and complexity than existing chart collections. When tested on our chart collection, VL2NL extracted chart semantics and generated L1/L2 captions with 89.4% and 76.0% accuracy, respectively. It also demonstrated generating and paraphrasing utterances and questions with greater diversity compared to the benchmarks. Last, we discuss how our NL datasets and framework can be utilized in real-world scenarios. The codes and chart collection are available at https://github.com/hyungkwonko/chart-llm.

Supplemental Material

MP4 File - Video Preview

Video Preview

Download
15.82 MB

MP4 File - Video Presentation

Video Presentation

Transcript for: Video Presentation

ZIP File - Code and Dataset

Code and Dataset

Download
46.83 MB

References

[1]

Rinat Abdrashitov, Fanny Chevalier, and Karan Singh. 2020. Interactive Exploration and Refinement of Facial Expression Using Manifold Learning. In Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology (Virtual Event, USA) (UIST ’20). Association for Computing Machinery, New York, NY, USA, 778–790. https://doi.org/10.1145/3379337.3415877

Digital Library

[2]

Robert Amar, James Eagan, and John Stasko. 2005. Low-level components of analytic activity in information visualization. In IEEE Symposium on Information Visualization, 2005. INFOVIS 2005. IEEE, 111–117.

[3]

Toshiki Aoki, Rintaro Chujo, Katsufumi Matsui, Saemi Choi, and Ari Hautasaari. 2022. EmoBalloon-Conveying Emotional Arousal in Text Chats with Speech Balloons. In CHI Conference on Human Factors in Computing Systems. 1–16.

[4]

Yuntao Bai, Saurav Kadavath, Sandipan Kundu, Amanda Askell, Jackson Kernion, Andy Jones, Anna Chen, Anna Goldie, Azalia Mirhoseini, Cameron McKinnon, 2022. Constitutional ai: Harmlessness from ai feedback. arXiv preprint arXiv:2212.08073 (2022).

[5]

Michael S Bernstein, Greg Little, Robert C Miller, Björn Hartmann, Mark S Ackerman, David R Karger, David Crowell, and Katrina Panovich. 2010. Soylent: a word processor with a crowd inside. In Proceedings of the 23nd annual ACM symposium on User interface software and technology. 313–322.

Digital Library

[6]

Michelle A Borkin, Azalea A Vo, Zoya Bylinskii, Phillip Isola, Shashank Sunkavalli, Aude Oliva, and Hanspeter Pfister. 2013. What makes a visualization memorable?IEEE transactions on visualization and computer graphics 19, 12 (2013), 2306–2315.

[7]

Ann L Brown and Joseph C Campione. 1994. Guided discovery in a community of learners.The MIT Press.

[8]

Stuart K Card, Jock Mackinlay, and Ben Shneiderman. 1999. Readings in information visualization: using vision to think. Morgan Kaufmann.

[9]

Harrison Chase. 2022. LangChain. https://github.com/hwchase17/langchain

[10]

Ritwick Chaudhry, Sumit Shekhar, Utkarsh Gupta, Pranav Maneriker, Prann Bansal, and Ajay Joshi. 2020. Leaf-qa: Locate, encode & attend for figure question answering. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 3512–3521.

[11]

Chen Chen and Zhicheng Liu. 2023. The State of the Art in Creating Visualization Corpora for Automated Chart Analysis. Computer Graphics Forum (2023). https://doi.org/10.1111/cgf.14855

[12]

Xi Chen, Wei Zeng, Yanna Lin, Hayder Mahdi Ai-Maneea, Jonathan Roberts, and Remco Chang. 2020. Composition and configuration patterns in multiple-view visualizations. IEEE Transactions on Visualization and Computer Graphics 27, 2 (2020), 1514–1524.

[13]

Kenneth Cox, Rebecca E Grinter, Stacie L Hibino, Lalita Jategaonkar Jagadeesan, and David Mantilla. 2001. A multi-modal natural language interface to an information visualization environment. International Journal of Speech Technology 4 (2001), 297–314.

[14]

Kenny Davila, Srirangaraj Setlur, David Doermann, Bhargava Urala Kota, and Venu Govindaraju. 2020. Chart mining: A survey of methods for automated chart analysis. IEEE transactions on pattern analysis and machine intelligence 43, 11 (2020), 3799–3819.

[15]

Ton De Jong and Wouter R Van Joolingen. 1998. Scientific discovery learning with computer simulations of conceptual domains. Review of educational research 68, 2 (1998), 179–201.

[16]

Biplab Deka, Zifeng Huang, Chad Franzen, Joshua Hibschman, Daniel Afergan, Yang Li, Jeffrey Nichols, and Ranjitha Kumar. 2017. Rico: A mobile app dataset for building data-driven design applications. In Proceedings of the 30th annual ACM symposium on user interface software and technology. 845–854.

Digital Library

[17]

Victor Dibia. 2023. LIDA: A Tool for Automatic Generation of Grammar-Agnostic Visualizations and Infographics using Large Language Models. arXiv preprint arXiv:2303.02927 (2023).

[18]

Victor Dibia and Çağatay Demiralp. 2019. Data2vis: Automatic generation of data visualizations using sequence-to-sequence recurrent neural networks. IEEE computer graphics and applications 39, 5 (2019), 33–46.

[19]

Siwei Fu, Kai Xiong, Xiaodong Ge, Siliang Tang, Wei Chen, and Yingcai Wu. 2020. Quda: natural language queries for visual data analytics. arXiv preprint arXiv:2005.03257 (2020).

[20]

Tong Gao, Mira Dontcheva, Eytan Adar, Zhicheng Liu, and Karrie G Karahalios. 2015. Datatone: Managing ambiguity in natural language interfaces for data visualization. In Proceedings of the 28th annual acm symposium on user interface software & technology. 489–500.

Digital Library

[21]

Perttu Hämäläinen, Mikke Tavast, and Anton Kunnari. 2023. Evaluating large language models in generating synthetic hci research data: a case study. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–19.

Digital Library

[22]

Jonathan Harper and Maneesh Agrawala. 2017. Converting basic D3 charts into reusable style templates. IEEE transactions on visualization and computer graphics 24, 3 (2017), 1274–1286.

[23]

Cindy E Hmelo-Silver, Ravit Golan Duncan, and Clark A Chinn. 2007. Scaffolding and achievement in problem-based and inquiry learning: a response to Kirschner, Sweller, and. Educational psychologist 42, 2 (2007), 99–107.

[24]

Enamul Hoque, Parsa Kavehzadeh, and Ahmed Masry. 2022. Chart question answering: State of the art and future directions. In Computer Graphics Forum, Vol. 41. Wiley Online Library, 555–572.

[25]

Enamul Hoque, Vidya Setlur, Melanie Tory, and Isaac Dykeman. 2017. Applying pragmatics principles for interaction with visual analytics. IEEE transactions on visualization and computer graphics 24, 1 (2017), 309–318.

[26]

Kevin Hu, Diana Orghian, and César Hidalgo. 2018. DIVE: A mixed-initiative system supporting integrated data exploration workflows. In Proceedings of the workshop on human-in-the-loop data analytics. 1–7.

Digital Library

[27]

Maeve Hutchinson, Aidan Slingsby, Radu Jianu, and Pranava Madhyastha. 2023. Towards Visualisation Specifications from Multilingual Natural Language Queries using Large Language Models. In EuroVis 2023 - Posters, Christina Gillmann, Michael Krone, and Simone Lenti (Eds.). The Eurographics Association. https://doi.org/10.2312/evp.20231072

[28]

Ziwei Ji, Nayeon Lee, Rita Frieske, Tiezheng Yu, Dan Su, Yan Xu, Etsuko Ishii, Ye Jin Bang, Andrea Madotto, and Pascale Fung. 2023. Survey of hallucination in natural language generation. Comput. Surveys 55, 12 (2023), 1–38.

Digital Library

[29]

Daekyoung Jung, Wonjae Kim, Hyunjoo Song, Jeong-in Hwang, Bongshin Lee, Bohyoung Kim, and Jinwook Seo. 2017. Chartsense: Interactive data extraction from chart images. In Proceedings of the 2017 chi conference on human factors in computing systems. 6706–6717.

Digital Library

[30]

Sean Kandel, Ravi Parikh, Andreas Paepcke, Joseph M Hellerstein, and Jeffrey Heer. 2012. Profiler: Integrated statistical analysis and visualization for data quality assessment. In Proceedings of the International Working Conference on Advanced Visual Interfaces. 547–554.

Digital Library

[31]

Shankar Kantharaj, Rixie Tiffany Ko Leong, Xiang Lin, Ahmed Masry, Megh Thakkar, Enamul Hoque, and Shafiq Joty. 2022. Chart-to-text: A large-scale benchmark for chart summarization. arXiv preprint arXiv:2203.06486 (2022).

[32]

Dae Hyun Kim, Enamul Hoque, and Maneesh Agrawala. 2020. Answering questions about charts and generating visual explanations. In Proceedings of the 2020 CHI conference on human factors in computing systems. 1–13.

Digital Library

[33]

Dae Hyun Kim, Enamul Hoque, Juho Kim, and Maneesh Agrawala. 2018. Facilitating document reading by linking text and tables. In Proceedings of the 31st Annual ACM Symposium on User Interface Software and Technology. 423–434.

Digital Library

[34]

Dae Hyun Kim, Vidya Setlur, and Maneesh Agrawala. 2021. Towards understanding how readers integrate charts and captions: A case study with line charts. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1–11.

Digital Library

[35]

Robert Kincaid and Graham Pollock. 2017. Nicky: Toward a virtual assistant for test and measurement instrument recommendations. In 2017 IEEE 11th International Conference on Semantic Computing (ICSC). IEEE, 196–203.

[36]

Aniket Kittur, Boris Smus, Susheel Khamkar, and Robert E Kraut. 2011. Crowdforge: Crowdsourcing complex work. In Proceedings of the 24th annual ACM symposium on User interface software and technology. 43–52.

Digital Library

[37]

Mark Klein and Ana Cristina Bicharra Garcia. 2015. High-speed idea filtering with the bag of lemons. Decision Support Systems 78 (2015), 39–50.

Digital Library

[38]

Hyung-Kwon Ko, Subin An, Gwanmo Park, Seung Kwon Kim, Daesik Kim, Bohyoung Kim, Jaemin Jo, and Jinwook Seo. 2022. We-toon: A Communication Support System between Writers and Artists in Collaborative Webtoon Sketch Revision. In The 35th Annual ACM Symposium on User Interface Software and Technology. 1–14.

Digital Library

[39]

Nicholas Kong, Marti A Hearst, and Maneesh Agrawala. 2014. Extracting references between text and charts via crowdsourcing. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems. 31–40.

Digital Library

[40]

Yixuan Li, Yusheng Qi, Yang Shi, Qing Chen, Nan Cao, and Siming Chen. 2022. Diverse interaction recommendation for public users exploring multi-view visualization using deep learning. IEEE Transactions on Visualization and Computer Graphics 29, 1 (2022), 95–105.

[41]

Zhenyu Li, Sunqi Fan, Yu Gu, Xiuxing Li, Zhichao Duan, Bowen Dong, Ning Liu, and Jianyong Wang. 2023. FlexKBQA: A Flexible LLM-Powered Framework for Few-Shot Knowledge Base Question Answering. arXiv preprint arXiv:2308.12060 (2023).

[42]

Can Liu, Yun Han, Ruike Jiang, and Xiaoru Yuan. 2021. Advisor: Automatic visualization answer for natural-language question on tabular data. In 2021 IEEE 14th Pacific Visualization Symposium (PacificVis). IEEE, 11–20.

[43]

Fangyu Liu, Julian Martin Eisenschlos, Francesco Piccinno, Syrine Krichene, Chenxi Pang, Kenton Lee, Mandar Joshi, Wenhu Chen, Nigel Collier, and Yasemin Altun. 2022. DePlot: One-shot visual language reasoning by plot-to-table translation. arXiv preprint arXiv:2212.10505 (2022).

[44]

Fangyu Liu, Francesco Piccinno, Syrine Krichene, Chenxi Pang, Kenton Lee, Mandar Joshi, Yasemin Altun, Nigel Collier, and Julian Martin Eisenschlos. 2022. Matcha: Enhancing visual language pretraining with math reasoning and chart derendering. arXiv preprint arXiv:2212.09662 (2022).

[45]

Alan Lundgard and Arvind Satyanarayan. 2021. Accessible visualization via natural language descriptions: A four-level model of semantic content. IEEE transactions on visualization and computer graphics 28, 1 (2021), 1073–1083.

Digital Library

[46]

Yuyu Luo, Xuedi Qin, Nan Tang, and Guoliang Li. 2018. Deepeye: Towards automatic data visualization. In 2018 IEEE 34th international conference on data engineering (ICDE). IEEE, 101–112.

[47]

Yuyu Luo, Nan Tang, Guoliang Li, Chengliang Chai, Wenbo Li, and Xuedi Qin. 2021. Synthesizing natural language to visualization (NL2VIS) benchmarks from NL2SQL benchmarks. In Proceedings of the 2021 International Conference on Management of Data. 1235–1247.

Digital Library

[48]

Yuyu Luo, Nan Tang, Guoliang Li, Jiawei Tang, Chengliang Chai, and Xuedi Qin. 2021. Natural language to visualization by neural machine translation. IEEE Transactions on Visualization and Computer Graphics 28, 1 (2021), 217–226.

Digital Library

[49]

Jock Mackinlay. 1986. Automating the design of graphical presentations of relational information. Acm Transactions On Graphics (Tog) 5, 2 (1986), 110–141.

Digital Library

[50]

Jock Mackinlay, Pat Hanrahan, and Chris Stolte. 2007. Show me: Automatic presentation for visual analysis. IEEE transactions on visualization and computer graphics 13, 6 (2007), 1137–1144.

[51]

Anita Mahinpei, Zona Kostic, and Chris Tanner. 2022. LineCap: Line Charts for Data Visualization Captioning Models. In 2022 IEEE Visualization and Visual Analytics (VIS). IEEE, 35–39.

[52]

Ahmed Masry, Do Xuan Long, Jia Qing Tan, Shafiq Joty, and Enamul Hoque. 2022. ChartQA: A benchmark for question answering about charts with visual and logical reasoning. arXiv preprint arXiv:2203.10244 (2022).

[53]

Leland McInnes, John Healy, and Steve Astels. 2017. hdbscan: Hierarchical density based clustering.J. Open Source Softw. 2, 11 (2017), 205.

[54]

Leland McInnes, John Healy, and James Melville. 2018. Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426 (2018).

[55]

Yu Meng, Jiaxin Huang, Yu Zhang, and Jiawei Han. 2022. Generating training data with language models: Towards zero-shot language understanding. Advances in Neural Information Processing Systems 35 (2022), 462–477.

[56]

Valerie S Morash, Yue-Ting Siu, Joshua A Miele, Lucia Hasty, and Steven Landau. 2015. Guiding novice web workers in making image descriptions using templates. ACM Transactions on Accessible Computing (TACCESS) 7, 4 (2015), 1–21.

Digital Library

[57]

Dominik Moritz, Chenglong Wang, Greg L Nelson, Halden Lin, Adam M Smith, Bill Howe, and Jeffrey Heer. 2018. Formalizing visualization design knowledge as constraints: Actionable and extensible models in draco. IEEE transactions on visualization and computer graphics 25, 1 (2018), 438–448.

[58]

Mohammad Amin Mozaffari, Xinyuan Zhang, Jinghui Cheng, and Jin LC Guo. 2022. GANSpiration: Balancing Targeted and Serendipitous Inspiration in User Interface Design with Style-Based Generative Adversarial Network. In CHI Conference on Human Factors in Computing Systems. 1–15.

[59]

Arpit Narechania, Adam Fourney, Bongshin Lee, and Gonzalo Ramos. 2021. DIY: Assessing the correctness of natural language to sql systems. In 26th International Conference on Intelligent User Interfaces. 597–607.

Digital Library

[60]

Arpit Narechania, Arjun Srinivasan, and John Stasko. 2020. NL4DV: A toolkit for generating analytic specifications for data visualization from natural language queries. IEEE Transactions on Visualization and Computer Graphics 27, 2 (2020), 369–379.

[61]

Jason Obeid and Enamul Hoque. 2020. Chart-to-text: Generating natural language descriptions for charts by adapting the transformer model. arXiv preprint arXiv:2010.09142 (2020).

[62]

Joon Sung Park, Joseph C O’Brien, Carrie J Cai, Meredith Ringel Morris, Percy Liang, and Michael S Bernstein. 2023. Generative agents: Interactive simulacra of human behavior. arXiv preprint arXiv:2304.03442 (2023).

[63]

Jorge Poco and Jeffrey Heer. 2017. Reverse-engineering visualizations: Recovering visual encodings from chart images. In Computer graphics forum, Vol. 36. Wiley Online Library, 353–363.

[64]

Xin Qian, Eunyee Koh, Fan Du, Sungchul Kim, Joel Chan, Ryan A Rossi, Sana Malik, and Tak Yeon Lee. 2021. Generating accurate caption units for figure captioning. In Proceedings of the Web Conference 2021. 2792–2804.

Digital Library

[65]

Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics. https://arxiv.org/abs/1908.10084

[66]

Samuel Rhys Cox, Yunlong Wang, Ashraf Abdul, Christian Von Der Weth, and Brian Y. Lim. 2021. Directed diversity: Leveraging language embedding distances for collective creativity in crowd ideation. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1–35.

[67]

Andy Rosenbaum, Saleh Soltan, Wael Hamza, Amir Saffari, Macro Damonte, and Isabel Groves. 2022. CLASP: Few-shot cross-lingual data augmentation for semantic parsing. arXiv preprint arXiv:2210.07074 (2022).

[68]

Arvind Satyanarayan, Dominik Moritz, Kanit Wongsuphasawat, and Jeffrey Heer. 2017. Vega-Lite: A Grammar of Interactive Graphics. IEEE Transactions on Visualization & Computer Graphics (Proc. InfoVis) (2017). https://doi.org/10.1109/tvcg.2016.2599030

Digital Library

[69]

Arvind Satyanarayan, Dominik Moritz, Kanit Wongsuphasawat, and Jeffrey Heer. 2023. Vega Editor.

[70]

Arvind Satyanarayan, Dominik Moritz, Kanit Wongsuphasawat, and Jeffrey Heer. 2023. Vega-Lite gallery.

[71]

Arvind Satyanarayan, Ryan Russell, Jane Hoffswell, and Jeffrey Heer. 2015. Reactive vega: A streaming dataflow architecture for declarative interactive visualization. IEEE transactions on visualization and computer graphics 22, 1 (2015), 659–668.

[72]

Timo Schick and Hinrich Schütze. 2021. Generating datasets with pretrained language models. arXiv preprint arXiv:2104.07540 (2021).

[73]

Vidya Setlur, Sarah E Battersby, Melanie Tory, Rich Gossweiler, and Angel X Chang. 2016. Eviza: A natural language interface for visual analysis. In Proceedings of the 29th annual symposium on user interface software and technology. 365–377.

Digital Library

[74]

Leixian Shen, Enya Shen, Yuyu Luo, Xiaocong Yang, Xuming Hu, Xiongshuai Zhang, Zhiwei Tai, and Jianmin Wang. 2022. Towards natural language interfaces for data visualization: A survey. IEEE transactions on visualization and computer graphics (2022).

[75]

Leixian Shen, Yizhi Zhang, Haidong Zhang, and Yun Wang. 2023. Data player: Automatic generation of data videos with narration-animation interplay. arXiv preprint arXiv:2308.04703 (2023).

[76]

Pao Siangliulue, Joel Chan, Steven P Dow, and Krzysztof Z Gajos. 2016. IdeaHound: improving large-scale collaborative ideation with crowd-powered real-time semantic modeling. In Proceedings of the 29th Annual Symposium on User Interface Software and Technology. 609–624.

Digital Library

[77]

Andrea Spreafico and Giuseppe Carenini. 2020. Neural data-driven captioning of time-series line charts. In Proceedings of the International Conference on Advanced Visual Interfaces. 1–5.

Digital Library

[78]

Arjun Srinivasan, Steven M Drucker, Alex Endert, and John Stasko. 2018. Augmenting visualizations with interactive data facts to facilitate interpretation and communication. IEEE transactions on visualization and computer graphics 25, 1 (2018), 672–681.

[79]

Arjun Srinivasan, Nikhila Nyapathy, Bongshin Lee, Steven M Drucker, and John Stasko. 2021. Collecting and characterizing natural language utterances for specifying data visualizations. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1–10.

Digital Library

[80]

Arjun Srinivasan and John Stasko. 2017. Natural language interfaces for data analysis with visualization: Considering what has and could be asked. In Proceedings of the Eurographics/IEEE VGTC conference on visualization: Short papers. 55–59.

[81]

Arjun Srinivasan and John Stasko. 2017. Orko: Facilitating multimodal interaction for visual exploration and analysis of networks. IEEE transactions on visualization and computer graphics 24, 1 (2017), 511–521.

[82]

Chris Stolte, Diane Tang, and Pat Hanrahan. 2002. Polaris: A system for query, analysis, and visualization of multidimensional relational databases. IEEE Transactions on Visualization and Computer Graphics 8, 1 (2002), 52–65.

Digital Library

[83]

Nicole Sultanum and Arjun Srinivasan. 2023. DataTales: Investigating the use of Large Language Models for Authoring Data-Driven Articles. arXiv preprint arXiv:2308.04076 (2023).

[84]

Benny J. Tang, Angie Boggust, and Arvind Satyanarayan. 2023. VisText: A Benchmark for Semantically Rich Chart Captioning. In The Annual Meeting of the Association for Computational Linguistics (ACL). http://vis.csail.mit.edu/pubs/vistext

[85]

Manasi Vartak, Sajjadur Rahman, Samuel Madden, Aditya Parameswaran, and Neoklis Polyzotis. 2015. Seedb: Efficient data-driven visualization recommendations to support visual analytics. In Proceedings of the VLDB Endowment International Conference on Very Large Data Bases, Vol. 8. NIH Public Access, 2182.

[86]

Yun Wang, Zhitao Hou, Leixian Shen, Tongshuang Wu, Jiaqi Wang, He Huang, Haidong Zhang, and Dongmei Zhang. 2022. Towards Natural Language-Based Visualization Authoring. IEEE Transactions on Visualization and Computer Graphics 29, 1 (2022), 1222–1232.

[87]

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, 2022. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems 35 (2022), 24824–24837.

[88]

Kanit Wongsuphasawat, Dominik Moritz, Anushka Anand, Jock Mackinlay, Bill Howe, and Jeffrey Heer. 2015. Voyager: Exploratory analysis via faceted browsing of visualization recommendations. IEEE transactions on visualization and computer graphics 22, 1 (2015), 649–658.

[89]

Kanit Wongsuphasawat, Dominik Moritz, Anushka Anand, Jock Mackinlay, Bill Howe, and Jeffrey Heer. 2016. Towards a general-purpose query language for visualization recommendation. In Proceedings of the Workshop on Human-In-the-Loop Data Analytics. 1–6.

Digital Library

[90]

Kanit Wongsuphasawat, Zening Qu, Dominik Moritz, Riley Chang, Felix Ouk, Anushka Anand, Jock Mackinlay, Bill Howe, and Jeffrey Heer. 2017. Voyager 2: Augmenting visual analysis with partial view specifications. In Proceedings of the 2017 chi conference on human factors in computing systems. 2648–2659.

Digital Library

[91]

Jason Wu, Siyan Wang, Siman Shen, Yi-Hao Peng, Jeffrey Nichols, and Jeffrey P Bigham. 2023. WebUI: A Dataset for Enhancing Visual UI Understanding with Web Semantics. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–14.

Digital Library

[92]

Yumo Xu and Shay B Cohen. 2018. Stock movement prediction from tweets and historical prices. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1970–1979.

[93]

Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, and Colin Raffel. 2022. Byt5: Towards a token-free future with pre-trained byte-to-byte models. Transactions of the Association for Computational Linguistics 10 (2022), 291–306.

[94]

Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. 2023. ReAct: Synergizing Reasoning and Acting in Language Models. In International Conference on Learning Representations (ICLR).

[95]

Jiacheng Ye, Jiahui Gao, Qintong Li, Hang Xu, Jiangtao Feng, Zhiyong Wu, Tao Yu, and Lingpeng Kong. 2022. Zerogen: Efficient zero-shot learning via dataset generation. arXiv preprint arXiv:2202.07922 (2022).

[96]

Jiacheng Ye, Chengzu Li, Lingpeng Kong, and Tao Yu. 2023. Generating Data for Symbolic Language with Large Language Models. arXiv preprint arXiv:2305.13917 (2023).

[97]

Bowen Yu and Cláudio T Silva. 2019. FlowSense: A natural language interface for visual data exploration within a dataflow system. IEEE transactions on visualization and computer graphics 26, 1 (2019), 1–11.

[98]

Jian Zhao, Mingming Fan, and Mi Feng. 2020. Chartseer: Interactive steering exploratory visual analysis with machine intelligence. IEEE Transactions on Visualization and Computer Graphics 28, 3 (2020), 1500–1513.

Cited By

Li JZhang MLi NWeyns DJin ZTei K(2024)Generative AI for Self-Adaptive Systems: State of the Art and Research RoadmapACM Transactions on Autonomous and Adaptive Systems10.1145/368680319:3(1-60)Online publication date: 30-Sep-2024
https://dl.acm.org/doi/10.1145/3686803
Zeng XLin HYe YZeng W(2024)Advancing Multimodal Large Language Models in Chart Question Answering with Visualization-Referenced Instruction TuningIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2024.345615931:1(525-535)Online publication date: 10-Sep-2024
https://dl.acm.org/doi/10.1109/TVCG.2024.3456159

Index Terms

Natural Language Dataset Generation Framework for Visualizations Powered by Large Language Models

Index terms have been assigned to the content through auto-classification.

Recommendations

Collecting and Characterizing Natural Language Utterances for Specifying Data Visualizations
CHI '21: Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems

Natural language interfaces (NLIs) for data visualization are becoming increasingly popular both in academic research and in commercial software. Yet, there is a lack of empirical understanding of how people specify visualizations through natural ...
Distributed Dataset Framework for Large Language Models Pre-training
Knowledge Management and Acquisition for Intelligent Systems
Abstract
The performance of Large Language Models (LLMs) relies on massive and high-quality pre-processed datasets, often exceeding several terabytes in size. Maintaining such large datasets is challenging for a single organization. A collaborative ...
The economics of natural language interfaces: natural language processing technology as a scarce resource

This paper discusses appropriate application areas for natural language interfaces (NLIs) to databases. This requires comparing NLIs with competing approaches, including other user-friendly interfaces, and training of users with less user-friendly ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CHI '24: Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems

May 2024

18961 pages

ISBN:9798400703300

DOI:10.1145/3613904

Editors:
Florian Floyd Mueller
Monash University
,
Penny Kyburz
The Australian National University
,
Julie R. Williamson
University of Glasgow
,
Corina Sas
Lancaster University
,
Max L. Wilson
University of Nottingham
,
Phoebe Toups Dugas
Monash University/New Mexico State University
,
Irina Shklovski
University of Copenhagen

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 May 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Badges

Artifacts Available / v1.1

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

National Research Foundation of Korea
Institute of Information & communications Technology Planning & Evaluation

Conference

CHI '24

Sponsor:

CHI '24: CHI Conference on Human Factors in Computing Systems

May 11 - 16, 2024

HI, Honolulu, USA

Acceptance Rates

Overall Acceptance Rate 6,199 of 26,314 submissions, 24%

Upcoming Conference

CHI 2025

Sponsor:
sigchi

ACM CHI Conference on Human Factors in Computing Systems

April 26 - May 1, 2025

Yokohama , Japan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
1,546
Total Downloads

Downloads (Last 12 months)1,546
Downloads (Last 6 weeks)289

Reflects downloads up to 01 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Li JZhang MLi NWeyns DJin ZTei K(2024)Generative AI for Self-Adaptive Systems: State of the Art and Research RoadmapACM Transactions on Autonomous and Adaptive Systems10.1145/368680319:3(1-60)Online publication date: 30-Sep-2024
https://dl.acm.org/doi/10.1145/3686803
Zeng XLin HYe YZeng W(2024)Advancing Multimodal Large Language Models in Chart Question Answering with Visualization-Referenced Instruction TuningIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2024.345615931:1(525-535)Online publication date: 10-Sep-2024
https://dl.acm.org/doi/10.1109/TVCG.2024.3456159

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View full text|Download PDF

View Table of Conten