Multi-turn Instruction Invocation on Human-Robot Interaction by Large Language Models

Cheng, Baoping; Huang, Yong; Sun, Xiaoran; Hu, Jingxi; Li, Bo; Pu, Qiran; Wu, Zijian; Tao, Xiaoming

doi:10.1007/978-981-96-0780-8_15

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 15207))

Included in the following conference series:

International Conference on Intelligent Robotics and Applications

2 Accesses

Abstract

The developments of human-robot interaction (HRI) and Large Language Models (LLMs) have paved the way for a wide range of robotics applications spanning from industrial automation to service robotics. Although large language models (LLMs) have demonstrated impressive capabilities, their application in robotics is hindered by a critical limitation: the absence of real-world memory and common sense. This deficiency makes it challenging for robots to comprehend multi-turn instructional commands. For instance, a command such as ‘Remind me to take medicine tomorrow morning’ could lead to ambiguity, as the model may struggle to determine whether this is indeed an instruction and whether additional arguments are required. Moreover, there is a substantial gap in the literature regarding comparative studies on the efficacy of prompt engineering versus supervised fine-tuning for tasks that involve the invocation of robot instructions based on LLMs. Addressing this gap is essential for advancing the integration of LLMs into practical robotics systems and improving human-robot interaction capabilities.

In this study, we present a novel multi-turn instruction invocation framework designed to address the challenges of multi-turn instruction invocation and other dialogue-related tasks in robotics. Using a real-world robot dataset, we conduct a comprehensive evaluation of various large-scale models to assess their performance in terms of instruction invocation. This systematic comparison enables us to identify the strengths and limitations of existing approaches and provide insights into the development of more effective robotics systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ahn, M., et al.: Do as i can, not as i say: grounding language in robotic affordances 0, 287–318 (2022)
Google Scholar
Bai, J., et al.: Qwen technical report. arXiv preprint arXiv:2309.16609 (2023)
Bubeck, S., et al.: Sparks of artificial general intelligence: early experiments with GPT-4 abs/2303.12712 (2023)
Google Scholar
Driess, D., et al.: Palm-e: an embodied multimodal language model, abs/2303.03378, 8469–8488 (2023)
Google Scholar
Du, Z., et al.: GLM: general language model pretraining with autoregressive blank infilling, pp. 320–335 (2022)
Google Scholar
Gugger, S., et al.: Accelerate: training and inference at scale made simple, efficient and adaptable. https://github.com/huggingface/accelerate (2022)
Holland, J., et al.: Service robots in the healthcare sector 10, 47 (2021)
Google Scholar
Hu, E.J., et al.: LoRA: low-rank adaptation of large language models. In: International Conference on Learning Representations (2022). https://openreview.net/forum?id=nZeVKeeFYf9
Huang, W., et al.: Inner monologue: embodied reasoning through planning with language models, pp. 1769–1782 (2022)
Google Scholar
Liu, P., Yuan, W., Fu, J., Jiang, Z., Hayashi, H., Neubig, G.: Pre-train, prompt, and predict: a systematic survey of prompting methods in natural language processing 55, 195:1–195:35 (2021)
Google Scholar
Motahar, T., Farden, M.F., Islam, M.A., Rony, R.J., Sarkar, D.P.: Mini nurse-bot: a healthcare assistance for elderly people, pp. 170–173 (2018)
Google Scholar
Qin, Y., et al.: Toolllm: facilitating large language models to master 16000+ real-world APIS. abs/2307.16789 (2023)
Google Scholar
Shah, D., Osinski, B., Ichter, B., Levine, S.: LM-NAV: robotic navigation with large pre-trained models of language, vision, and action, pp. 492–504 (2022)
Google Scholar
Touvron, H., et al.: Llama: open and efficient foundation language models (2023)
Google Scholar
Touvron, H., et al.: Llama 2: Open foundation and fine-tuned chat models (2023)
Google Scholar
Wang, Y., et al.: Self-instruct: aligning language model with self generated instructions. In: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 13484–13508 (2022)
Google Scholar
Wolf, T., et al.: Transformers: State-of-the-art natural language processing. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp. 38–45. Association for Computational Linguistics (2020). https://www.aclweb.org/anthology/2020.emnlp-demos.6
Xiao, S., Liu, Z., Zhang, P., Muennighof, N.: C-pack: packaged resources to advance general Chinese embedding, abs/2309.07597 (2023)
Google Scholar
Ye, Y., Cong, X., Qin, Y., Lin, Y., Liu, Z., Sun, M.: Large language model as autonomous decision maker, abs/2308.12519 (2023)
Google Scholar
Zeng, F., Gan, W., Wang, Y., Liu, N., Yu, P.S.: Large language models for robotics: a survey. abs/2311.07226 (2023)
Google Scholar
Zhao, W.X., et al.: A survey of large language models, abs/2303.18223 (2023)
Google Scholar

Download references

Author information

Authors and Affiliations

China Mobile(HangZhou) Information Technology Co., Ltd., Hangzhou, China
Baoping Cheng, Yong Huang, Xiaoran Sun, Jingxi Hu, Bo Li, Qiran Pu & Zijian Wu
Tsinghua University, Beijing, China
Baoping Cheng & Xiaoming Tao

Authors

Baoping Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Yong Huang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoran Sun
View author publications
You can also search for this author in PubMed Google Scholar
Jingxi Hu
View author publications
You can also search for this author in PubMed Google Scholar
Bo Li
View author publications
You can also search for this author in PubMed Google Scholar
Qiran Pu
View author publications
You can also search for this author in PubMed Google Scholar
Zijian Wu
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoming Tao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yong Huang .

Editor information

Editors and Affiliations

Xi’an Jiaotong University, Xi’an, China
Xuguang Lan
Xi’an Jiaotong University, Xi’an, China
Xuesong Mei
Xi’an Jiaotong University, Xi’an, China
Caigui Jiang
Xi’an Jiaotong University, Xi’an, China
Fei Zhao
Xi'an Jiaotong University, Xi'an, China
Zhiqiang Tian

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cheng, B. et al. (2025). Multi-turn Instruction Invocation on Human-Robot Interaction by Large Language Models. In: Lan, X., Mei, X., Jiang, C., Zhao, F., Tian, Z. (eds) Intelligent Robotics and Applications. ICIRA 2024. Lecture Notes in Computer Science(), vol 15207. Springer, Singapore. https://doi.org/10.1007/978-981-96-0780-8_15

Download citation

DOI: https://doi.org/10.1007/978-981-96-0780-8_15
Published: 21 January 2025
Publisher Name: Springer, Singapore
Print ISBN: 978-981-96-0779-2
Online ISBN: 978-981-96-0780-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Multi-turn Instruction Invocation on Human-Robot Interaction by Large Language Models