Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3643834.3661547acmconferencesArticle/Chapter ViewAbstractPublication PagesdisConference Proceedingsconference-collections
research-article
Open access

How People Prompt Generative AI to Create Interactive VR Scenes

Published: 01 July 2024 Publication History

Abstract

Generative AI tools can provide people with the ability to create virtual environments and scenes with natural language prompts. Yet, how people will formulate such prompts is unclear—particularly when they inhabit the environment that they are designing. For instance, it is likely that a person might say, “Put a chair here,” while pointing at a location. If such linguistic and embodied features are common to people’s prompts, we need to tune models to accommodate them. In this work, we present a Wizard of Oz elicitation study with 22 participants, where we studied people’s implicit expectations when verbally prompting such programming agents to create interactive VR scenes. Our findings show when people prompted the agent, they had several implicit expectations of these agents: (1) they should have an embodied knowledge of the environment; (2) they should understand embodied prompts by users; (3) they should recall previous states of the scene and the conversation, and that (4) they should have a commonsense understanding of objects in the scene. Further, we found that participants prompted differently when they were prompting in situ (i.e. within the VR environment) versus ex situ (i.e. viewing the VR environment from the outside). To explore how these lessons could be applied, we designed and built Ostaad, a conversational programming agent that allows non-programmers to design interactive VR experiences that they inhabit. Based on these explorations, we outline new opportunities and challenges for conversational programming agents that create VR environments.

Supplemental Material

ZIP File
This zip folder contains two different items of supplementary material. * video.mp4 is the video figure for our submission. * data-viewer is a tool that allows readers to view the data that we collected, along with the referents that were used in the study. To make the data-viewer tool run, please read data-viewer/README.md

References

[1]
Rahul Arora, Rubaiat Habib Kazi, Tovi Grossman, George Fitzmaurice, and Karan Singh. 2018. SymbiosisSketch: Combining 2D & 3D Sketching for Designing Detailed 3D Objects in Situ. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. ACM, Montreal QC Canada, 1–15. https://doi.org/10.1145/3173574.3173759
[2]
Rahul Arora, Rubaiat Habib Kazi, Danny M. Kaufman, Wilmot Li, and Karan Singh. 2019. MagicalHands: Mid-Air Hand Gestures for Animating in VR. In Proceedings of the 32nd Annual ACM Symposium on User Interface Software and Technology. ACM, New Orleans LA USA, 463–477. https://doi.org/10.1145/3332165.3347942
[3]
Armen Avetisyan, Christopher Xie, Henry Howard-Jenkins, Tsun-Yi Yang, Samir Aroudj, Suvam Patra, Fuyang Zhang, Duncan Frost, Luke Holland, Campbell Orme, and others. 2024. SceneScript: Reconstructing Scenes With An Autoregressive Structured Language Model. arXiv preprint arXiv:2403.13064 (2024).
[4]
Till Ballendat, Nicolai Marquardt, and Saul Greenberg. 2010. Proxemic interaction: designing for a proximity and orientation-aware environment. In ACM International Conference on Interactive Tabletops and Surfaces. ACM, Saarbrücken Germany, 121–130. https://doi.org/10.1145/1936652.1936676
[5]
Richard A. Bolt. 1980. “Put-that-there”: Voice and gesture at the graphics interface. ACM SIGGRAPH Computer Graphics 14, 3 (July 1980), 262–270. https://doi.org/10.1145/965105.807503
[6]
Stephen Brade, Bryan Wang, Mauricio Sousa, Sageev Oore, and Tovi Grossman. 2023. Promptify: Text-to-Image Generation through Interactive Prompt Exploration with Large Language Models. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology. ACM, San Francisco CA USA, 1–14. https://doi.org/10.1145/3586183.3606725
[7]
Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language Models are Few-Shot Learners. (2020). https://doi.org/10.48550/ARXIV.2005.14165 Publisher: arXiv Version Number: 4.
[8]
Julia Cambre and Chinmay Kulkarni. 2020. Methods and Tools for Prototyping Voice Interfaces. In Proceedings of the 2nd Conference on Conversational User Interfaces. ACM, Bilbao Spain, 1–4. https://doi.org/10.1145/3405755.3406148
[9]
Edwin Chan, Teddy Seyed, Wolfgang Stuerzlinger, Xing-Dong Yang, and Frank Maurer. 2016. User Elicitation on Single-hand Microgestures. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems. ACM, San Jose California USA, 3403–3414. https://doi.org/10.1145/2858036.2858589
[10]
Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian, Clemens Winter, Philippe Tillet, Felipe Petroski Such, Dave Cummings, Matthias Plappert, Fotios Chantzis, Elizabeth Barnes, Ariel Herbert-Voss, William Hebgen Guss, Alex Nichol, Alex Paino, Nikolas Tezak, Jie Tang, Igor Babuschkin, Suchir Balaji, Shantanu Jain, William Saunders, Christopher Hesse, Andrew N. Carr, Jan Leike, Josh Achiam, Vedant Misra, Evan Morikawa, Alec Radford, Matthew Knight, Miles Brundage, Mira Murati, Katie Mayer, Peter Welinder, Bob McGrew, Dario Amodei, Sam McCandlish, Ilya Sutskever, and Wojciech Zaremba. 2021. Evaluating Large Language Models Trained on Code. (2021). https://doi.org/10.48550/ARXIV.2107.03374 Publisher: arXiv Version Number: 2.
[11]
John Joon Young Chung, Wooseok Kim, Kang Min Yoo, Hwaran Lee, Eytan Adar, and Minsuk Chang. 2022. TaleBrush: Sketching Stories with Generative Pretrained Language Models. In CHI Conference on Human Factors in Computing Systems. ACM, New Orleans LA USA, 1–19. https://doi.org/10.1145/3491102.3501819
[12]
Herbert H. Clark. 1996. Using language. Cambridge University Press, New York, NY, US. https://doi.org/10.2277/0521561582
[13]
Anamaria Crisan, Maddie Shang, and Eric Brochu. 2023. Eliciting Model Steering Interactions from Users via Data and Visual Design Probes. http://arxiv.org/abs/2310.09314 arXiv:2310.09314 [cs].
[14]
Nils Dahlbäck, Arne Jönsson, and Lars Ahrenberg. 1993. Wizard of Oz studies: why and how. In Proceedings of the 1st international conference on Intelligent user interfaces - IUI ’93. ACM Press, Orlando, Florida, United States, 193–200. https://doi.org/10.1145/169891.169968
[15]
Hai Dang, Lukas Mecke, Florian Lehmann, Sven Goller, and Daniel Buschek. 2022. How to Prompt? Opportunities and Challenges of Zero- and Few-Shot Learning for Human-AI Interaction in Creative Applications of Generative Models. (2022). https://doi.org/10.48550/ARXIV.2209.01390 Publisher: arXiv Version Number: 1.
[16]
Fernanda De La Torre, Cathy Mengying Fang, Han Huang, Andrzej Banburski-Fahey, Judith Amores Fernandez, and Jaron Lanier. 2023. LLMR: Real-time Prompting of Interactive Worlds using Large Language Models. (2023). https://doi.org/10.48550/ARXIV.2309.12276 Publisher: arXiv Version Number: 2.
[17]
Zachary Eberhart, Aakash Bansal, and Collin McMillan. 2022. A Wizard of Oz Study Simulating API Usage Dialogues With a Virtual Assistant. IEEE Transactions on Software Engineering 48, 6 (June 2022), 1883–1904. https://doi.org/10.1109/TSE.2020.3040935
[18]
Barrett Ens, Fraser Anderson, Tovi Grossman, Michelle Annett, Pourang Irani, and George Fitzmaurice. 2017. Ivy: Exploring Spatially Situated Visual Programming for Authoring and Understanding Intelligent Environments. In Proceedings of the 43rd Graphics Interface Conference(GI ’17). Canadian Human-Computer Communications Society, Waterloo, CAN, 156–162. event-place: Edmonton, Alberta, Canada.
[19]
Ziv Epstein, Aaron Hertzmann, Investigators of Human Creativity, Memo Akten, Hany Farid, Jessica Fjeld, Morgan R Frank, Matthew Groh, Laura Herman, Neil Leach, and others. 2023. Art and the science of generative AI. Science 380, 6650 (2023), 1110–1111. Publisher: American Association for the Advancement of Science.
[20]
K. J. Kevin Feng, Q. Vera Liao, Ziang Xiao, Jennifer Wortman Vaughan, Amy X. Zhang, and David W. McDonald. 2024. Canvil: Designerly Adaptation for LLM-Powered User Experiences. (2024). https://doi.org/10.48550/ARXIV.2401.09051 Publisher: arXiv Version Number: 1.
[21]
Mike Fraser, Steve Benford, Jon Hindmarsh, and Christian Heath. 1999. Supporting awareness and interaction through collaborative virtual interfaces. In Proceedings of the 12th annual ACM symposium on User interface software and technology. 27–36.
[22]
Luyu Gao, Aman Madaan, Shuyan Zhou, Uri Alon, Pengfei Liu, Yiming Yang, Jamie Callan, and Graham Neubig. 2023. PAL: Program-aided Language Models. In Proceedings of the 40th International Conference on Machine Learning(Proceedings of Machine Learning Research, Vol. 202), Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett (Eds.). PMLR, 10764–10799. https://proceedings.mlr.press/v202/gao23f.html
[23]
G. Mark Grimes, Ryan M. Schuetzler, and Justin Scott Giboney. 2021. Mental models and expectation violations in conversational AI interactions. Decision Support Systems 144 (May 2021), 113515. https://doi.org/10.1016/j.dss.2021.113515
[24]
Jon Hindmarsh, Mike Fraser, Christian Heath, Steve Benford, and Chris Greenhalgh. 1998. Fragmented interaction: establishing mutual orientation in virtual environments. In Proceedings of the 1998 ACM conference on Computer supported cooperative work. 217–226.
[25]
Jon Hindmarsh, Mike Fraser, Christian Heath, Steve Benford, and Chris Greenhalgh. 2000. Object-focused interaction in collaborative virtual environments. ACM Transactions on Computer-Human Interaction (TOCHI) 7, 4 (2000), 477–509. Publisher: ACM New York, NY, USA.
[26]
Wenlong Huang, Pieter Abbeel, Deepak Pathak, and Igor Mordatch. 2022. Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents. In Proceedings of the 39th International Conference on Machine Learning(Proceedings of Machine Learning Research, Vol. 162), Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato (Eds.). PMLR, 9118–9147. https://proceedings.mlr.press/v162/huang22a.html
[27]
Wenlong Huang, Fei Xia, Ted Xiao, Harris Chan, Jacky Liang, Pete Florence, Andy Zeng, Jonathan Tompson, Igor Mordatch, Yevgen Chebotar, Pierre Sermanet, Tomas Jackson, Noah Brown, Linda Luu, Sergey Levine, Karol Hausman, and brian ichter. 2023. Inner Monologue: Embodied Reasoning through Planning with Language Models. In Proceedings of The 6th Conference on Robot Learning(Proceedings of Machine Learning Research, Vol. 205), Karen Liu, Dana Kulic, and Jeff Ichnowski (Eds.). PMLR, 1769–1782. https://proceedings.mlr.press/v205/huang23c.html
[28]
brian ichter, Anthony Brohan, Yevgen Chebotar, Chelsea Finn, Karol Hausman, Alexander Herzog, Daniel Ho, Julian Ibarz, Alex Irpan, Eric Jang, Ryan Julian, Dmitry Kalashnikov, Sergey Levine, Yao Lu, Carolina Parada, Kanishka Rao, Pierre Sermanet, Alexander T Toshev, Vincent Vanhoucke, Fei Xia, Ted Xiao, Peng Xu, Mengyuan Yan, Noah Brown, Michael Ahn, Omar Cortes, Nicolas Sievers, Clayton Tan, Sichun Xu, Diego Reyes, Jarek Rettinghouse, Jornell Quiambao, Peter Pastor, Linda Luu, Kuang-Huei Lee, Yuheng Kuang, Sally Jesmonth, Nikhil J. Joshi, Kyle Jeffrey, Rosario Jauregui Ruano, Jasmine Hsu, Keerthana Gopalakrishnan, Byron David, Andy Zeng, and Chuyuan Kelly Fu. 2023. Do As I Can, Not As I Say: Grounding Language in Robotic Affordances. In Proceedings of The 6th Conference on Robot Learning(Proceedings of Machine Learning Research, Vol. 205), Karen Liu, Dana Kulic, and Jeff Ichnowski (Eds.). PMLR, 287–318. https://proceedings.mlr.press/v205/ichter23a.html
[29]
Mina C Johnson-Glenberg. 2018. Immersive VR and education: Embodied design principles that include gesture and hand controls. Frontiers in Robotics and AI 5 (2018), 81. Publisher: Frontiers.
[30]
Tae Soo Kim, Yoonjoo Lee, Minsuk Chang, and Juho Kim. 2023. Cells, Generators, and Lenses: Design Framework for Object-Oriented Interaction with Large Language Models. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology. ACM, San Francisco CA USA, 1–18. https://doi.org/10.1145/3586183.3606833
[31]
Amy J. Ko, Brad A. Myers, and Htet Htet Aung. 2004. Six Learning Barriers in End-User Programming Systems. In 2004 IEEE Symposium on Visual Languages - Human Centric Computing. IEEE, Rome, Italy, 199–206. https://doi.org/10.1109/VLHCC.2004.47
[32]
Rafal Kocielnik, Saleema Amershi, and Paul N. Bennett. 2019. Will You Accept an Imperfect AI?: Exploring Designs for Adjusting End-user Expectations of AI Systems. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. ACM, Glasgow Scotland Uk, 1–14. https://doi.org/10.1145/3290605.3300641
[33]
Kartikaeya Kumar, Lev Poretski, Jiannan Li, and Anthony Tang. 2022. Tourgether360: Collaborative Exploration of 360 Videos using Pseudo-Spatial Navigation. Proceedings of the ACM on Human-Computer Interaction 6, CSCW2 (2022), 1–27. Publisher: ACM New York, NY, USA.
[34]
Sanna Kuoppamäki, Razan Jaberibraheem, Mikaela Hellstrand, and Donald McMillan. 2023. Designing Multi-Modal Conversational Agents for the Kitchen with Older Adults: A Participatory Design Study. International Journal of Social Robotics 15, 9-10 (Oct. 2023), 1507–1523. https://doi.org/10.1007/s12369-023-01055-4
[35]
Germán Leiva, Cuong Nguyen, Rubaiat Habib Kazi, and Paul Asente. 2020. Pronto: Rapid Augmented Reality Video Prototyping Using Sketches and Enaction. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. ACM, Honolulu HI USA, 1–13. https://doi.org/10.1145/3313831.3376160
[36]
Jacky Liang, Wenlong Huang, Fei Xia, Peng Xu, Karol Hausman, Brian Ichter, Pete Florence, and Andy Zeng. 2023. Code as Policies: Language Model Programs for Embodied Control. In 2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, London, United Kingdom, 9493–9500. https://doi.org/10.1109/ICRA48891.2023.10160591
[37]
Rui Ma, Akshay Gadi Patil, Matthew Fisher, Manyi Li, Sören Pirk, Binh-Son Hua, Sai-Kit Yeung, Xin Tong, Leonidas Guibas, and Hao Zhang. 2018. Language-driven synthesis of 3D scenes from scene databases. ACM Transactions on Graphics 37, 6 (Dec. 2018), 1–16. https://doi.org/10.1145/3272127.3275035
[38]
Nicolai Marquardt, Robert Diaz-Marino, Sebastian Boring, and Saul Greenberg. 2011. The proximity toolkit: prototyping proxemic interactions in ubiquitous computing ecologies. In Proceedings of the 24th annual ACM symposium on User interface software and technology(UIST ’11). Association for Computing Machinery, New York, NY, USA, 315–326. https://doi.org/10.1145/2047196.2047238
[39]
Nicolai Marquardt, Ken Hinckley, and Saul Greenberg. 2012. Cross-device interaction via micro-mobility and f-formations. In Proceedings of the 25th annual ACM symposium on User interface software and technology. ACM, Cambridge Massachusetts USA, 13–22. https://doi.org/10.1145/2380116.2380121
[40]
David Maulsby, Saul Greenberg, and Richard Mander. 1993. Prototyping an intelligent agent through Wizard of Oz. In Proceedings of the SIGCHI conference on Human factors in computing systems - CHI ’93. ACM Press, Amsterdam, The Netherlands, 277–284. https://doi.org/10.1145/169059.169215
[41]
Meredith Ringel Morris. 2012. Web on the wall: insights from a multimodal interaction elicitation study. In Proceedings of the 2012 ACM international conference on Interactive tabletops and surfaces. ACM, Cambridge Massachusetts USA, 95–104. https://doi.org/10.1145/2396636.2396651
[42]
Tran Pham, Jo Vermeulen, Anthony Tang, and Lindsay MacDonald Vermeulen. 2018. Scale Impacts Elicited Gestures for Manipulating Holograms: Implications for AR Gesture Design. In Proceedings of the 2018 Designing Interactive Systems Conference. ACM, Hong Kong China, 227–240. https://doi.org/10.1145/3196709.3196719
[43]
Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna, and Robin Rombach. 2023. SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis. (2023). https://doi.org/10.48550/ARXIV.2307.01952 Publisher: arXiv Version Number: 1.
[44]
D.E. Price, D.A. Dahlstrom, B. Newton, and J.L. Zachary. 2002. Off to see the Wizard: using a "Wizard of Oz" study to learn how to design a spoken language interface for programming. In 32nd Annual Frontiers in Education, Vol. 1. IEEE, Boston, MA, USA, T2G–23–T2G–29. https://doi.org/10.1109/FIE.2002.1157953
[45]
Byron Reeves and Clifford Nass. 1996. The media equation: How people treat computers, television, and new media like real people. Cambridge, UK 10, 10 (1996).
[46]
Laurel Riek. 2012. Wizard of Oz Studies in HRI: A Systematic Review and New Reporting Guidelines. Journal of Human-Robot Interaction (Aug. 2012), 119–136. https://doi.org/10.5898/JHRI.1.1.Riek
[47]
Baptiste Rozière, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat, Xiaoqing Ellen Tan, Yossi Adi, Jingyu Liu, Romain Sauvestre, Tal Remez, Jérémy Rapin, Artyom Kozhevnikov, Ivan Evtimov, Joanna Bitton, Manish Bhatt, Cristian Canton Ferrer, Aaron Grattafiori, Wenhan Xiong, Alexandre Défossez, Jade Copet, Faisal Azhar, Hugo Touvron, Louis Martin, Nicolas Usunier, Thomas Scialom, and Gabriel Synnaeve. 2024. Code Llama: Open Foundation Models for Code. http://arxiv.org/abs/2308.12950 arXiv:2308.12950 [cs].
[48]
Advait Sarkar, Andrew D. Gordon, Carina Negreanu, Christian Poelitz, Sruti Srinivasa Ragavan, and Ben Zorn. 2022. What is it like to program with artificial intelligence?http://arxiv.org/abs/2208.06213 arXiv:2208.06213 [cs].
[49]
Lee M. Seversky and Lijun Yin. 2006. Real-time automatic 3D scene generation from natural language voice and text descriptions. In Proceedings of the 14th ACM international conference on Multimedia. ACM, Santa Barbara CA USA, 61–64. https://doi.org/10.1145/1180639.1180660
[50]
Teddy Seyed, Chris Burns, Mario Costa Sousa, Frank Maurer, and Anthony Tang. 2012. Eliciting usable gestures for multi-display environments. In Proceedings of the 2012 ACM international conference on Interactive tabletops and surfaces. ACM, Cambridge Massachusetts USA, 41–50. https://doi.org/10.1145/2396636.2396643
[51]
Noah Shinn, Federico Cassano, Ashwin Gopinath, Karthik R. Narasimhan, and Shunyu Yao. 2023. Reflexion: language agents with verbal reinforcement learning. In Thirty-seventh Conference on Neural Information Processing Systems. https://openreview.net/forum?id=vAElhFcKW6
[52]
Misha Sra, Sergio Garrido-Jurado, and Pattie Maes. 2018. Oasis: Procedurally Generated Social Virtual Spaces from 3D Scanned Real Spaces. IEEE Transactions on Visualization and Computer Graphics 24, 12 (Dec. 2018), 3174–3187. https://doi.org/10.1109/TVCG.2017.2762691
[53]
Alexey Svyatkovskiy, Shao Kun Deng, Shengyu Fu, and Neel Sundaresan. 2020. Intellicode compose: Code generation using transformer. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 1433–1443.
[54]
Anthony Tang, Jonathan Massey, Nelson Wong, Derek Reilly, and W Keith Edwards. 2012. Verbal coordination in first person shooter games. In Proceedings of the ACM 2012 conference on Computer Supported Cooperative Work. 579–582.
[55]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems, I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Vol. 30. Curran Associates, Inc.https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
[56]
Santiago Villarreal-Narvaez, Jean Vanderdonckt, Radu-Daniel Vatavu, and Jacob O. Wobbrock. 2020. A Systematic Review of Gesture Elicitation Studies: What Can We Learn from 216 Studies?. In Proceedings of the 2020 ACM Designing Interactive Systems Conference. ACM, Eindhoven Netherlands, 855–872. https://doi.org/10.1145/3357236.3395511
[57]
Daniel Vogel and Ravin Balakrishnan. 2005. Distant freehand pointing and clicking on very large, high resolution displays. In Proceedings of the 18th annual ACM symposium on User interface software and technology. ACM, Seattle WA USA, 33–42. https://doi.org/10.1145/1095034.1095041
[58]
Sarah Theres Völkel, Daniel Buschek, Malin Eiband, Benjamin R. Cowan, and Heinrich Hussmann. 2021. Eliciting and Analysing Users’ Envisioned Dialogues with Perfect Voice Assistants. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. ACM, Yokohama Japan, 1–15. https://doi.org/10.1145/3411764.3445536
[59]
Lei Wang, Chen Ma, Xueyang Feng, Zeyu Zhang, Hao Yang, Jingsen Zhang, Zhiyuan Chen, Jiakai Tang, Xu Chen, Yankai Lin, Wayne Xin Zhao, Zhewei Wei, and Ji-Rong Wen. 2023. A Survey on Large Language Model based Autonomous Agents. (2023). https://doi.org/10.48550/ARXIV.2308.11432 Publisher: arXiv Version Number: 2.
[60]
Tianyi Wang, Xun Qian, Fengming He, Xiyun Hu, Yuanzhi Cao, and Karthik Ramani. 2021. GesturAR: An Authoring System for Creating Freehand Interactive Augmented Reality Applications. In The 34th Annual ACM Symposium on User Interface Software and Technology. ACM, Virtual Event USA, 552–567. https://doi.org/10.1145/3472749.3474769
[61]
Jacob O. Wobbrock, Meredith Ringel Morris, and Andrew D. Wilson. 2009. User-defined gestures for surface computing. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, Boston MA USA, 1083–1092. https://doi.org/10.1145/1518701.1518866
[62]
Tongshuang Wu, Ellen Jiang, Aaron Donsbach, Jeff Gray, Alejandra Molina, Michael Terry, and Carrie J Cai. 2022. PromptChainer: Chaining Large Language Model Prompts through Visual Programming. In CHI Conference on Human Factors in Computing Systems Extended Abstracts. ACM, New Orleans LA USA, 1–10. https://doi.org/10.1145/3491101.3519729
[63]
Tongshuang Wu, Michael Terry, and Carrie Jun Cai. 2022. AI Chains: Transparent and Controllable Human-AI Interaction by Chaining Large Language Model Prompts. In CHI Conference on Human Factors in Computing Systems. ACM, New Orleans LA USA, 1–22. https://doi.org/10.1145/3491102.3517582
[64]
Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. 2022. ReAct: Synergizing Reasoning and Acting in Language Models. (2022). https://doi.org/10.48550/ARXIV.2210.03629 Publisher: arXiv Version Number: 3.
[65]
J.D. Zamfirescu-Pereira, Richmond Y. Wong, Bjoern Hartmann, and Qian Yang. 2023. Why Johnny Can’t Prompt: How Non-AI Experts Try (and Fail) to Design LLM Prompts. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems(CHI ’23). Association for Computing Machinery, New York, NY, USA, 1–21. https://doi.org/10.1145/3544548.3581388
[66]
Beiqi Zhang, Peng Liang, Xiyu Zhou, Aakash Ahmad, and Muhammad Waseem. 2023. Demystifying Practices, Challenges and Expected Features of Using GitHub Copilot. arXiv preprint arXiv:2309.05687 (2023).
[67]
Ceyao Zhang, Kaijie Yang, Siyi Hu, Zihao Wang, Guanghe Li, Yihang Sun, Cheng Zhang, Zhaowei Zhang, Anji Liu, Song-Chun Zhu, Xiaojun Chang, Junge Zhang, Feng Yin, Yitao Liang, and Yaodong Yang. 2023. ProAgent: Building Proactive Cooperative Agents with Large Language Models. (2023). https://doi.org/10.48550/ARXIV.2308.11339 Publisher: arXiv Version Number: 3.
[68]
Lei Zhang and Steve Oney. 2020. FlowMatic: An Immersive Authoring Tool for Creating Interactive Scenes in Virtual Reality. In Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology. ACM, Virtual Event USA, 342–353. https://doi.org/10.1145/3379337.3415824
[69]
Eric Zhou and Dokyun Lee. 2023. Generative ai, human creativity, and art. Available at SSRN (2023).

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
DIS '24: Proceedings of the 2024 ACM Designing Interactive Systems Conference
July 2024
3616 pages
ISBN:9798400705830
DOI:10.1145/3643834
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 July 2024

Check for updates

Author Tags

  1. embodied interaction
  2. embodied prompting
  3. generative ai
  4. interactive virtual reality
  5. multi-modal
  6. prompting
  7. virtual reality

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

  • Singapore Ministry of Education

Conference

DIS '24
Sponsor:
DIS '24: Designing Interactive Systems Conference
July 1 - 5, 2024
Copenhagen, Denmark

Acceptance Rates

Overall Acceptance Rate 1,158 of 4,684 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 449
    Total Downloads
  • Downloads (Last 12 months)449
  • Downloads (Last 6 weeks)170
Reflects downloads up to 26 Sep 2024

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media