demonstration

Experiencing InstructPipe: Building Multi-modal AI Pipelines via Prompting LLMs and Visual Programming

CHI EA '24: Extended Abstracts of the CHI Conference on Human Factors in Computing Systems

May 2024

Article No.: 402, Pages 1 - 5

https://doi.org/10.1145/3613905.3648656

Published: 11 May 2024 Publication History

Abstract

Foundational multi-modal models have democratized AI access, yet the construction of complex, customizable machine learning pipelines by novice users remains a grand challenge. This paper demonstrates a visual programming system that allows novices to rapidly prototype multimodal AI pipelines. We first conducted a formative study with 58 contributors and collected 236 proposals of multimodal AI pipelines that served various practical needs. We then distilled our findings into a design matrix of primitive nodes for prototyping multimodal AI visual programming pipelines, and implemented a system with 65 nodes. To support users’ rapid prototyping experience, we built InstructPipe, an AI assistant based on large language models (LLMs) that allows users to generate a pipeline by writing text-based instructions. We believe InstructPipe enhances novice users onboarding experience of visual programming and the controllability of LLMs by offering non-experts a platform to easily update the generation.

Supplemental Material

MP4 File - Demonstration Video

Demonstration Video

References

[1]

2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. https://www.tensorflow.org/ Software available from tensorflow.org.

[2]

Stuart K Card, Jock D Mackinlay, and George G Robertson. 1991. A Morphological Analysis of the Design Space of Input Devices. ACM Transactions on Information Systems (TOIS) 9, 2 (1991), 99–122. https://doi.org/10.1145/123078.128726

Digital Library

[3]

Xi Chen, Josip Djolonga, Piotr Padlewski, Basil Mustafa, Soravit Changpinyo, Jialin Wu, Carlos Riquelme Ruiz, Sebastian Goodman, Xiao Wang, Yi Tay, Siamak Shakeri, Mostafa Dehghani, Daniel Salz, Mario Lucic, Michael Tschannen, Arsha Nagrani, Hexiang Hu, Mandar Joshi, Bo Pang, Ceslee Montgomery, Paulina Pietrzyk, Marvin Ritter, AJ Piergiovanni, Matthias Minderer, Filip Pavetic, Austin Waters, Gang Li, Ibrahim Alabdulmohsin, Lucas Beyer, Julien Amelot, Kenton Lee, Andreas Peter Steiner, Yang Li, Daniel Keysers, Anurag Arnab, Yuanzhong Xu, Keran Rong, Alexander Kolesnikov, Mojtaba Seyedhosseini, Anelia Angelova, Xiaohua Zhai, Neil Houlsby, and Radu Soricut. 2023. PaLI-X: On Scaling Up a Multilingual Vision and Language Model. arxiv:2305.18565 [cs.CV]

[4]

Xi Chen, Xiao Wang, Soravit Changpinyo, AJ Piergiovanni, Piotr Padlewski, Daniel Salz, Sebastian Goodman, Adam Grycner, Basil Mustafa, Lucas Beyer, Alexander Kolesnikov, Joan Puigcerver, Nan Ding, Keran Rong, Hassan Akbari, Gaurav Mishra, Linting Xue, Ashish Thapliyal, James Bradbury, Weicheng Kuo, Mojtaba Seyedhosseini, Chao Jia, Burcu Karagol Ayan, Carlos Riquelme, Andreas Steiner, Anelia Angelova, Xiaohua Zhai, Neil Houlsby, and Radu Soricut. 2023. PaLI: A Jointly-Scaled Multilingual Language-Image Model. arxiv:2209.06794 [cs.CV]

[5]

Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann, Parker Schuh, Kensen Shi, Sasha Tsvyashchenko, Joshua Maynez, Abhishek Rao, Parker Barnes, Yi Tay, Noam Shazeer, Vinodkumar Prabhakaran, Emily Reif, Nan Du, Ben Hutchinson, Reiner Pope, James Bradbury, Jacob Austin, Michael Isard, Guy Gur-Ari, Pengcheng Yin, Toju Duke, Anselm Levskaya, Sanjay Ghemawat, Sunipa Dev, Henryk Michalewski, Xavier Garcia, Vedant Misra, Kevin Robinson, Liam Fedus, Denny Zhou, Daphne Ippolito, David Luan, Hyeontaek Lim, Barret Zoph, Alexander Spiridonov, Ryan Sepassi, David Dohan, Shivani Agrawal, Mark Omernick, Andrew M. Dai, Thanumalayan Sankaranarayana Pillai, Marie Pellat, Aitor Lewkowycz, Erica Moreira, Rewon Child, Oleksandr Polozov, Katherine Lee, Zongwei Zhou, Xuezhi Wang, Brennan Saeta, Mark Diaz, Orhan Firat, Michele Catasta, Jason Wei, Kathy Meier-Hellstern, Douglas Eck, Jeff Dean, Slav Petrov, and Noah Fiedel. 2023. PaLM: Scaling Language Modeling with Pathways. Journal of Machine Learning Research 24, 240 (2023), 1–113. http://jmlr.org/papers/v24/22-1144.html

[6]

Ruofei Du, Na Li, Jing Jin, Michelle Carney, Scott Miles, Maria Kleiner, Xiuxiu Yuan, Yinda Zhang, Anuva Kulkarni, Xingyu Liu, Ahmed Sabie, Sergio Orts-Escolano, Abhishek Kar, Ping Yu, Ram Iyengar, Adarsh Kowdle, and Alex Olwal. 2023. Rapsai: Accelerating Machine Learning Prototyping of Multimedia Applications Through Visual Programming. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (Hamburg, Germany) (CHI ’23). Association for Computing Machinery, New York, NY, USA, Article 125, 23 pages. https://doi.org/10.1145/3544548.3581338

Digital Library

[7]

Ruofei Du, Na Li, Jing Jin, Michelle Carney, Xiuxiu Yuan, Kristen Wright, Mark Sherwood, Jason Mayes, Lin Chen, Jun Jiang, Jingtao Zhou, Zhongyi Zhou, Ping Yu, Adarsh Kowdle, Ram Iyengar, and Alex Olwal. 2023. Experiencing Visual Blocks for ML: Visual Prototyping of AI Pipelines. In Adjunct Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (San Francisco, CA, USA) (UIST ’23 Adjunct). Association for Computing Machinery, New York, NY, USA, Article 76, 3 pages. https://doi.org/10.1145/3586182.3615817

Digital Library

[8]

GitHub. 2023. GitHub Copilot · Your AI Pair Programmer. https://github.com/features/copilot

[9]

Gemini Team Google. 2023. Gemini: A Family of Highly Capable Multimodal Models. https://doi.org/10.48550/arXiv.2312.11805 arxiv:2312.11805 [cs.CL]

[10]

Tanmay Gupta and Aniruddha Kembhavi. 2023. Visual Programming: Compositional Visual Reasoning Without Training. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. arXiv. https://doi.org/10.48550/arXiv.2211.11559

[11]

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. Advances in Neural Information Processing Systems 32 (2019). https://doi.org/10.48550/arXiv.1912.01703

[12]

Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily Denton, Seyed Kamyar Seyed Ghasemipour, Burcu Karagol Ayan, S. Sara Mahdavi, Rapha Gontijo Lopes, Tim Salimans, Jonathan Ho, David J Fleet, and Mohammad Norouzi. 2022. Photorealistic Text-to-Image Diffusion Models With Deep Language Understanding. arxiv:2205.11487 [cs.CV]

[13]

Ray Smith, Daria Antonova, and Dar-Shyang Lee. 2009. Adapting the Tesseract open source OCR engine for multilingual OCR. In Proceedings of the International Workshop on Multilingual OCR. 1–8.

Digital Library

[14]

Dídac Surís, Sachit Menon, and Carl Vondrick. 2023. ViperGPT: Visual Inference via Python Execution for Reasoning. arxiv:2303.08128 [cs.CV]

[15]

Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, 2019. Huggingface’s Transformers: State-of-the-Art Natural Language Processing. ArXiv Preprint ArXiv:1910.03771 (2019). https://arxiv.org/pdf/1910.03771

[16]

Zhongyi Zhou, Jing Jin, Vrushank Phadnis, Xiuxiu Yuan, Jun Jiang, Xun Qian, Jingtao Zhou, Yiyi Huang, Zheng Xu, Yinda Zhang, Kristen Wright, Jason Mayes, Mark Sherwood, Johnny Lee, Alex Olwal, David Kim, Ram Iyengar, Na Li, and Ruofei Du. 2023. InstructPipe: Building Visual Programming Pipelines with Human Instructions. https://doi.org/10.48550/arXiv.2312.09672 arxiv:2312.09672 [cs.HC]

Index Terms

Experiencing InstructPipe: Building Multi-modal AI Pipelines via Prompting LLMs and Visual Programming
1. Computing methodologies
  1. Machine learning
  2. Modeling and simulation
    1. Simulation types and techniques
      1. Visual analytics
2. Software and its engineering
  1. Software notations and tools
    1. Context specific languages
      1. Visual languages

Recommendations

Experiencing Visual Blocks for ML: Visual Prototyping of AI Pipelines
UIST '23 Adjunct: Adjunct Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology

We demonstrate Visual Blocks for ML, a visual programming platform that facilitates rapid prototyping of ML-based multimedia applications. As the public version of Rapsai [3], we further integrated large language models and custom APIs into the ...
Read More
Experiencing Rapid Prototyping of Machine Learning Based Multimedia Applications in Rapsai
CHI EA '23: Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems

We demonstrate Rapsai, a visual programming platform that aims to streamline the rapid and iterative development of end-to-end machine learning (ML)-based multimedia applications. Rapsai features a node-graph editor that enables interactive ...
Read More
Rapsai: Accelerating Machine Learning Prototyping of Multimedia Applications through Visual Programming
CHI '23: Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems

In recent years, there has been a proliferation of multimedia applications that leverage machine learning (ML) for interactive experiences. Prototyping ML-based applications is, however, still challenging, given complex workflows that are not ideal for ...
Read More

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CHI EA '24: Extended Abstracts of the 2024 CHI Conference on Human Factors in Computing Systems

May 2024

4761 pages

ISBN:9798400703317

DOI:10.1145/3613905

Editors:
Florian Floyd Mueller
Monash University
,
Penny Kyburz
The Australian National University
,
Julie R. Williamson
University of Glasgow
,
Corina Sas
Lancaster University

Copyright © 2024 Owner/Author.

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 May 2024

Check for updates

Author Tags

Qualifiers

Demonstration
Research
Refereed limited

Data Availability

Demonstration Video: Demonstration Videohttps://dl.acm.org/doi/10.1145/3613905.3648656#3613905.3648656-demo-video.mp4

Conference

CHI '24

Sponsor:

CHI '24: CHI Conference on Human Factors in Computing Systems

May 11 - 16, 2024

HI, Honolulu, USA

Acceptance Rates

Overall Acceptance Rate 6,164 of 23,696 submissions, 26%

Upcoming Conference

CHI PLAY '24

Sponsor:
sigchi

The Annual Symposium on Computer-Human Interaction in Play

October 14 - 17, 2024

Tampere , Finland

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
177
Total Downloads

Downloads (Last 12 months)177
Downloads (Last 6 weeks)92

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View full text|Download PDF

View Table of Contents