Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3637528.3671622acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article
Open access

GRILLBot In Practice: Lessons and Tradeoffs Deploying Large Language Models for Adaptable Conversational Task Assistants

Published: 24 August 2024 Publication History

Abstract

We tackle the challenge of building real-world multimodal assistants for complex real-world tasks. We describe the practicalities and challenges of developing and deploying GRILLBot, a leading (first and second prize winning in 2022 and 2023) system deployed in the Alexa Prize TaskBot Challenge. Building on our Open Assistant Toolkit (OAT) framework, we propose a hybrid architecture that leverages Large Language Models (LLMs) and specialised models tuned for specific subtasks requiring very low latency. OAT allows us to define when, how and which LLMs should be used in a structured and deployable manner. For knowledge-grounded question answering and live task adaptations, we show that LLM reasoning abilities over task context and world knowledge outweigh latency concerns. For dialogue state management, we implement a code generation approach and show that specialised smaller models have 84% effectiveness with 100x lower latency. Overall, we provide insights and discuss tradeoffs for deploying both traditional models and LLMs to users in complex real-world multimodal environments in the Alexa TaskBot challenge. These experiences will continue to evolve as LLMs become more capable and efficient -- fundamentally reshaping OAT and future assistant architectures.

Supplemental Material

MP4 File - Promotional Video for ADS 666
We present lessons learned, including tradeoffs for using LLMs, during the development of our conversational task assistant, GRILLBot. We created GRILLBot for the Alexa Prize TaskBot Challenge 1 & 2, during which users across the US could use GRILLBot as a multimodal assistant for cooking and home tasks. Research challenges include decision-making, grounded conversational question answering and task adaptation. We conclude that LLMs might not always be the answer - especially not for time-critical and high-accuracy-dependent components of a modular conversational agent.

References

[1]
Eugene Agichtein, Michael Johnston, Anna Gottardi, Cris Flagg, Lavina Vaz, Hangjie Shi, Desheng Zhang, Leslie Ball, Shaohua Liu, Luke Dai, et al. 2023. Alexa, let's work together: Introducing the second alexa prize taskbot challenge. 2nd Proceedings of the Alexa Prize Taskbot Challenge, Vol. 2 (2023).
[2]
Amazon. 2023. Amazon Skill Kit. https://developer.amazon.com/en-US/alexa/alexa-skills-kit
[3]
Paweł Budzianowski, Tsung-Hsien Wen, Bo-Hsiang Tseng, Inigo Casanueva, Stefan Ultes, Osman Ramadan, and Milica Gavsić. 2018. Multiwoz--a large-scale multi-domain wizard-of-oz dataset for task-oriented dialogue modelling. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (2018), 5016--5026.
[4]
Giovanni Campagna, Silei Xu, Mehrad Moradshahi, Richard Socher, and Monica S Lam. 2019. Genie: A generator of natural language semantic parsers for virtual assistant commands. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation. 394--410.
[5]
Jason Ingyu Choi, Saar Kuzi, Nikhita Vedula, Jie Zhao, Giuseppe Castellucci, Marcus Collins, Shervin Malmasi, Oleg Rokhlenko, and Eugene Agichtein. 2022. Wizard of tasks: A novel conversational dataset for solving real-world tasks in conversational settings. In Proceedings of the 29th International Conference on Computational Linguistics. 3514--3529.
[6]
Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Yunxuan Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, et al. 2022. Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416 (2022).
[7]
Rafael Ferreira, Diogo Tavares, Diogo Silva, Rodrigo Valério, Jo ao Bordalo, Inês Sim oes, Vasco Ramos, David Semedo, and Joao Magalhaes. 2023. TWIZ: The wizard of multimodal conversational-stimulus. In Alexa Prize TaskBot Challenge 2 Proceedings.
[8]
Sophie Fischer, Carlos Gemmell, Iain Mackie, and Jeffrey Dalton. 2022. VILT: Video Instructions Linking for Complex Tasks. In Proceedings of the 2nd International Workshop on Interactive Multimedia Retrieval. 41--47.
[9]
Sophie Fischer, Niklas Tecklenburg, Philip Zubel, Eva Kupcova, Ekaterina Terzieva, Daniel Armstrong, Carlos Gemmell, Iain Mackie, Federico Rossetto, and Jeff Dalton. 2023. GRILLBot-v2: Generative Models for Multi-Modal Task-Oriented Assistance. 2nd Proceedings of the Alexa Prize Taskbot Challenge (2023).
[10]
Carlos Gemmell, Sophie Fischer, Iain Mackie, Paul Owoicho, Federico Rossetto, and Jeff Dalton. 2022. GRILLBot: A flexible conversational agent for solving complex real-world tasks. 1st Proceedings of the Alexa Prize Taskbot Challenge (2022).
[11]
Carlos Gemmell, Sophie Fischer, Federico Rossetto, Paul Ochiwo, Iain Mackie, Philip Zubel, Niklas Tecklenburg, and Andrew Ramsay. 2023. Open Assistant Toolkit [OAT]: A research Platform for Multi-Modal Task Oriented Agents. https://github.com/grill-lab/OAT.
[12]
Google. 2023. Dialogflow. https://cloud.google.com/dialogflow
[13]
Hakan Inan, Kartikeya Upasani, Jianfeng Chi, Rashi Rungta, Krithika Iyer, Yuning Mao, Michael Tontchev, Qing Hu, Brian Fuller, Davide Testuggine, et al. 2023. Llama guard: Llm-based input-output safeguard for human-ai conversations. arXiv preprint arXiv:2312.06674 (2023).
[14]
Ziwei Ji, Nayeon Lee, Rita Frieske, Tiezheng Yu, Dan Su, Yan Xu, Etsuko Ishii, Ye Jin Bang, Andrea Madotto, and Pascale Fung. 2023. Survey of hallucination in natural language generation. Comput. Surveys, Vol. 55, 12 (2023), 1--38.
[15]
Daniel Khashabi, Sewon Min, Tushar Khot, Ashish Sabharwal, Oyvind Tafjord, Peter Clark, and Hannaneh Hajishirzi. 2020. Unifiedqa: Crossing format boundaries with a single qa system. Findings of the Association for Computational Linguistics: EMNLP 2020 (2020), 1896--1907.
[16]
Chandra Khatri, Behnam Hedayatnia, Anu Venkatesh, Jeff Nunn, Yi Pan, Qing Liu, Han Song, Anna Gottardi, Sanjeev Kwatra, Sanju Pancholi, et al. 2018. Advancing the state of the art in open domain dialog systems through the alexa prize. 2nd Proceedings of the Alexa Prize SocialBot Grand Challenge, Vol. 2 (2018).
[17]
Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, et al. 2020. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, Vol. 33 (2020), 9459--9474.
[18]
Javier Marin, Aritro Biswas, Ferda Ofli, Nicholas Hynes, Amaia Salvador, Yusuf Aytar, Ingmar Weber, and Antonio Torralba. 2018. Recipe1M: a dataset for learning cross-modal embeddings for cooking recipes and food images. arXiv preprint arXiv:1810.06553 (2018).
[19]
Andrea Morales-Garzón, Juan Gómez-Romero, and Maria J Martin-Bautista. 2021. A word embedding-based method for unsupervised adaptation of cooking recipes. IEEE Access, Vol. 9 (2021), 27389--27404.
[20]
OpenAI. 2022. Chatgpt: Optimizing language models for dialogue. https://openai.com/blog/chatgpt/
[21]
Paul Owoicho, Ivan Sekulic, Mohammad Aliannejadi, Jeffrey Dalton, and Fabio Crestani. 2023. Exploiting simulated user feedback for conversational search: Ranking, rewriting, and beyond. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval. 632--642.
[22]
Ashwin Paranjape, Abigail See, Kathleen Kenealy, Haojun Li, Amelia Hardy, Peng Qi, Kaushik Ram Sadagopan, Nguyet Minh Phu, Dilara Soylu, and Christopher D Manning. 2020. Neural generation meets real people: Towards emotionally engaging mixed-initiative conversations. 3rd Proceedings of the Alexa Prize SocialBot Grand Challenge, Vol. 3 (2020).
[23]
Chantal Pellegrini, Ege Özsoy, Monika Wintergerst, and Georg Groh. 2021. Exploiting Food Embeddings for Ingredient Substitution. In HEALTHINF. 67--77.
[24]
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. 2020. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, Vol. 21, 1 (2020), 5485--5551.
[25]
Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. 2016. Squad: 100,000 questions for machine comprehension of text. Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (2016), 2383--2392.
[26]
RASA. 2023. RASA. https://rasa.com/
[27]
Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics. https://arxiv.org/abs/1908.10084
[28]
Alexandra Rese and Pauline Tränkner. 2024. Perceived conversational ability of task-based chatbots--Which conversational elements influence the success of text-based dialogues? International Journal of Information Management, Vol. 74 (2024), 102699.
[29]
Sina Semnani, Violet Yao, Heidi Zhang, and Monica Lam. 2023. WikiChat: Stopping the Hallucination of Large Language Model Chatbots by Few-Shot Grounding on Wikipedia. In Findings of the Association for Computational Linguistics: EMNLP 2023. 2387--2413.
[30]
Kurt Shuster, Jing Xu, Mojtaba Komeili, Da Ju, Eric Michael Smith, Stephen Roller, Megan Ung, Moya Chen, Kushal Arora, Joshua Lane, et al. 2022. Blenderbot 3: a deployed conversational agent that continually learns to responsibly engage. arXiv preprint arXiv:2208.03188 (2022).
[31]
Rohan Taori, Ishaan Gulrajani, Tianyi Zhang, Yann Dubois, Xuechen Li, Carlos Guestrin, Percy Liang, and Tatsunori B. Hashimoto. 2023. Stanford Alpaca: An Instruction-following LLaMA model. https://github.com/tatsu-lab/stanford_alpaca.
[32]
Romal Thoppilan, Daniel De Freitas, Jamie Hall, Noam Shazeer, Apoorv Kulshreshtha, Heng-Tze Cheng, Alicia Jin, Taylor Bos, Leslie Baker, Yu Du, et al. 2022. Lamda: Language models for dialog applications. arXiv preprint arXiv:2201.08239 (2022).
[33]
Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, et al. 2023. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971 (2023).
[34]
Lei Wang, Chen Ma, Xueyang Feng, Zeyu Zhang, Hao Yang, Jingsen Zhang, Zhiyuan Chen, Jiakai Tang, Xu Chen, Yankai Lin, et al. 2023. A survey on large language model based autonomous agents. arXiv preprint arXiv:2308.11432 (2023).
[35]
Munazza Zaib, Wei Emma Zhang, Quan Z Sheng, Adnan Mahmood, and Yang Zhang. 2022. Conversational question answering: A survey. Knowledge and Information Systems, Vol. 64, 12 (2022), 3151--3195.
[36]
Hamed Zamani, Johanne R Trippas, Jeff Dalton, Filip Radlinski, et al. 2023. Conversational information seeking. Foundations and Trends® in Information Retrieval, Vol. 17, 3--4 (2023), 244--456.
[37]
Tianyi Zhang*, Varsha Kishore*, Felix Wu*, Kilian Q. Weinberger, and Yoav Artzi. 2020. BERTScore: Evaluating Text Generation with BERT. In International Conference on Learning Representations.
[38]
Diliara Zharikova, Daniel Kornev, Fedor Ignatov, Maxim Talimanchuk, Dmitry Evseev, Ksenya Petukhova, Veronika Smilga, Dmitry Karpov, Yana Shishkina, Dmitry Kosenko, et al. 2023. DeepPavlov dream: platform for building generative AI assistants. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations). 599--607.

Cited By

View all
  • (2024)On-device query intent prediction with lightweight LLMs to support ubiquitous conversationsScientific Reports10.1038/s41598-024-63380-614:1Online publication date: 3-Jun-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '24: Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
August 2024
6901 pages
ISBN:9798400704901
DOI:10.1145/3637528
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 August 2024

Check for updates

Author Tags

  1. conversational task assistants
  2. large language models

Qualifiers

  • Research-article

Conference

KDD '24
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)216
  • Downloads (Last 6 weeks)43
Reflects downloads up to 09 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)On-device query intent prediction with lightweight LLMs to support ubiquitous conversationsScientific Reports10.1038/s41598-024-63380-614:1Online publication date: 3-Jun-2024

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media