Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Multi-turn Instruction Invocation on Human-Robot Interaction by Large Language Models

  • Conference paper
  • First Online:
Intelligent Robotics and Applications (ICIRA 2024)

Abstract

The developments of human-robot interaction (HRI) and Large Language Models (LLMs) have paved the way for a wide range of robotics applications spanning from industrial automation to service robotics. Although large language models (LLMs) have demonstrated impressive capabilities, their application in robotics is hindered by a critical limitation: the absence of real-world memory and common sense. This deficiency makes it challenging for robots to comprehend multi-turn instructional commands. For instance, a command such as ‘Remind me to take medicine tomorrow morning’ could lead to ambiguity, as the model may struggle to determine whether this is indeed an instruction and whether additional arguments are required. Moreover, there is a substantial gap in the literature regarding comparative studies on the efficacy of prompt engineering versus supervised fine-tuning for tasks that involve the invocation of robot instructions based on LLMs. Addressing this gap is essential for advancing the integration of LLMs into practical robotics systems and improving human-robot interaction capabilities.

In this study, we present a novel multi-turn instruction invocation framework designed to address the challenges of multi-turn instruction invocation and other dialogue-related tasks in robotics. Using a real-world robot dataset, we conduct a comprehensive evaluation of various large-scale models to assess their performance in terms of instruction invocation. This systematic comparison enables us to identify the strengths and limitations of existing approaches and provide insights into the development of more effective robotics systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 119.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ahn, M., et al.: Do as i can, not as i say: grounding language in robotic affordances 0, 287–318 (2022)

    Google Scholar 

  2. Bai, J., et al.: Qwen technical report. arXiv preprint arXiv:2309.16609 (2023)

  3. Bubeck, S., et al.: Sparks of artificial general intelligence: early experiments with GPT-4 abs/2303.12712 (2023)

    Google Scholar 

  4. Driess, D., et al.: Palm-e: an embodied multimodal language model, abs/2303.03378, 8469–8488 (2023)

    Google Scholar 

  5. Du, Z., et al.: GLM: general language model pretraining with autoregressive blank infilling, pp. 320–335 (2022)

    Google Scholar 

  6. Gugger, S., et al.: Accelerate: training and inference at scale made simple, efficient and adaptable. https://github.com/huggingface/accelerate (2022)

  7. Holland, J., et al.: Service robots in the healthcare sector 10, 47 (2021)

    Google Scholar 

  8. Hu, E.J., et al.: LoRA: low-rank adaptation of large language models. In: International Conference on Learning Representations (2022). https://openreview.net/forum?id=nZeVKeeFYf9

  9. Huang, W., et al.: Inner monologue: embodied reasoning through planning with language models, pp. 1769–1782 (2022)

    Google Scholar 

  10. Liu, P., Yuan, W., Fu, J., Jiang, Z., Hayashi, H., Neubig, G.: Pre-train, prompt, and predict: a systematic survey of prompting methods in natural language processing 55, 195:1–195:35 (2021)

    Google Scholar 

  11. Motahar, T., Farden, M.F., Islam, M.A., Rony, R.J., Sarkar, D.P.: Mini nurse-bot: a healthcare assistance for elderly people, pp. 170–173 (2018)

    Google Scholar 

  12. Qin, Y., et al.: Toolllm: facilitating large language models to master 16000+ real-world APIS. abs/2307.16789 (2023)

    Google Scholar 

  13. Shah, D., Osinski, B., Ichter, B., Levine, S.: LM-NAV: robotic navigation with large pre-trained models of language, vision, and action, pp. 492–504 (2022)

    Google Scholar 

  14. Touvron, H., et al.: Llama: open and efficient foundation language models (2023)

    Google Scholar 

  15. Touvron, H., et al.: Llama 2: Open foundation and fine-tuned chat models (2023)

    Google Scholar 

  16. Wang, Y., et al.: Self-instruct: aligning language model with self generated instructions. In: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 13484–13508 (2022)

    Google Scholar 

  17. Wolf, T., et al.: Transformers: State-of-the-art natural language processing. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp. 38–45. Association for Computational Linguistics (2020). https://www.aclweb.org/anthology/2020.emnlp-demos.6

  18. Xiao, S., Liu, Z., Zhang, P., Muennighof, N.: C-pack: packaged resources to advance general Chinese embedding, abs/2309.07597 (2023)

    Google Scholar 

  19. Ye, Y., Cong, X., Qin, Y., Lin, Y., Liu, Z., Sun, M.: Large language model as autonomous decision maker, abs/2308.12519 (2023)

    Google Scholar 

  20. Zeng, F., Gan, W., Wang, Y., Liu, N., Yu, P.S.: Large language models for robotics: a survey. abs/2311.07226 (2023)

    Google Scholar 

  21. Zhao, W.X., et al.: A survey of large language models, abs/2303.18223 (2023)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yong Huang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Cheng, B. et al. (2025). Multi-turn Instruction Invocation on Human-Robot Interaction by Large Language Models. In: Lan, X., Mei, X., Jiang, C., Zhao, F., Tian, Z. (eds) Intelligent Robotics and Applications. ICIRA 2024. Lecture Notes in Computer Science(), vol 15207. Springer, Singapore. https://doi.org/10.1007/978-981-96-0780-8_15

Download citation

  • DOI: https://doi.org/10.1007/978-981-96-0780-8_15

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-96-0779-2

  • Online ISBN: 978-981-96-0780-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics