research-article

TidyBot: personalized robot assistance with large language models

Authors:

Jeannette Bohg,

Szymon Rusinkiewicz,

Thomas FunkhouserAuthors Info & Claims

Autonomous Robots, Volume 47, Issue 8

Pages 1087 - 1102

https://doi.org/10.1007/s10514-023-10139-z

Published: 16 November 2023 Publication History

Abstract

For a robot to personalize physical assistance effectively, it must learn user preferences that can be generally reapplied to future scenarios. In this work, we investigate personalization of household cleanup with robots that can tidy up rooms by picking up objects and putting them away. A key challenge is determining the proper place to put each object, as people’s preferences can vary greatly depending on personal taste or cultural background. For instance, one person may prefer storing shirts in the drawer, while another may prefer them on the shelf. We aim to build systems that can learn such preferences from just a handful of examples via prior interactions with a particular person. We show that robots can combine language-based planning and perception with the few-shot summarization capabilities of large language models to infer generalized user preferences that are broadly applicable to future interactions. This approach enables fast adaptation and achieves 91.2% accuracy on unseen objects in our benchmark dataset. We also demonstrate our approach on a real-world mobile manipulator called TidyBot, which successfully puts away 85.0% of objects in real-world test scenarios.

References

[1]

Abdo, N., Stachniss, C., Spinello, L., & Burgard, W. (2015). Robot, organize my shelves! tidying up objects by predicting user preferences. In 2015 IEEE international conference on robotics and automation (ICRA).

[2]

Batra, D., Chang, A. X., Chernova, S., Davison, A. J., Deng, J., Koltun, V., Levine, S., Malik, J., Mordatch, I., & Mottaghi, R., et al. (2020). Rearrangement: A challenge for embodied ai. arXiv preprint arXiv:2011.01975

[3]

Brohan, A., Chebotar, Y., Finn, C., Hausman, K., Herzog, A., Ho, D., Ibarz, J., Irpan, A., Jang, E., & Julian, R. (2022). Do as i can, not as i say: Grounding language in robotic affordances. In 6th annual conference on robot learning.

[4]

Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A, et al. Language models are few-shot learners Advances in Neural Information Processing Systems 2020 33 1877-1901

[5]

Chen, W., Hu, S., Talak, R., & Carlone, L. (2022). Leveraging large language models for robot 3d scene understanding. arXiv preprint arXiv:2209.05629

[6]

Chen, M., Tworek, J., Jun, H., Yuan, Q., Pinto, H.P.d.O., Kaplan, J., Edwards, H., Burda, Y., Joseph, N., & Brockman, G., et al. (2021). Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374

[7]

Chen, B., Xia, F., Ichter, B., Rao, K., Gopalakrishnan, K., Ryoo, M.S., Stone, A., & Kappler, D. (2022). Open-vocabulary queryable scene representations for real world planning. arXiv preprint arXiv:2209.09874

[8]

Chowdhery, A., Narang, S., Devlin, J., Bosma, M., Mishra, G., Roberts, A., Barham, P., Chung, H.W., Sutton, C., & Gehrmann, S., et al. (2022). Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311

[9]

Coulter, R. C. (1992). Implementation of the pure pursuit path tracking algorithm. Technical report, Carnegie-Mellon UNIV Pittsburgh PA Robotics INST.

[10]

Dewi T, Risma P, and Oktarina Y Fruit sorting robot based on color and size for an agricultural product packaging system Bulletin of Electrical Engineering and Informatics 2020 9 4 1438-1445

[11]

Ehsani, K., Han, W., Herrasti, A., VanderBilt, E., Weihs, L., Kolve, E., Kembhavi, A., & Mottaghi, R. (2021). Manipulathor: A framework for visual object manipulation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition.

[12]

Gan, C., Zhou, S., Schwartz, J., Alter, S., Bhandwaldar, A., Gutfreund, D., Yamins, D. L., DiCarlo, J. J., McDermott, J., & Torralba, A. (2022). The threedworld transport challenge: A visually guided task-and-motion planning benchmark towards physically realistic embodied ai. In 2022 International conference on robotics and automation (ICRA).

[13]

Garrido-Jurado S, Muñoz-Salinas R, Madrid-Cuevas FJ, and Marín-Jiménez MJ Automatic generation and detection of highly reliable fiducial markers under occlusion Pattern Recognition 2014 47 6 2280-2292

[14]

Gu, X., Lin, T.-Y., Kuo, W., & Cui, Y. (2021). Open-vocabulary object detection via vision and language knowledge distillation. In International conference on learning representations.

[15]

Gupta, M., & Sukhatme, G. S. (2012). Using manipulation primitives for brick sorting in clutter. In 2012 IEEE international conference on robotics and automation.

[16]

Herde, M., Kottke, D., Calma, A., Bieshaar, M., Deist, S., & Sick, B. (2018). Active sorting: An efficient training of a sorting robot with active learning techniques. In 2018 international joint conference on neural networks (IJCNN).

[17]

Høeg, S. H., & Tingelstad, L. (2022). More than eleven thousand words: Towards using language models for robotic sorting of unseen objects into arbitrary categories. In Workshop on language and robotics at CoRL 2022.

[18]

Holmberg R and Khatib O Development and control of a holonomic mobile robot for mobile manipulation tasks The International Journal of Robotics Research 2000 19 11 1066-1074

[19]

Huang, W., Abbeel, P., Pathak, D., & Mordatch, I. (2022). Language models as zero-shot planners: Extracting actionable knowledge for embodied agents. arXiv preprint arXiv:2201.07207

[20]

Huang, E., Jia, Z., & Mason, M. T. (2019). Large-scale multi-object rearrangement. In 2019 international conference on robotics and automation (ICRA).

[21]

Huang, W., Xia, F., Xiao, T., Chan, H., Liang, J., Florence, P., Zeng, A., Tompson, J., Mordatch, I., & Chebotar, Y., et al. (2022). Inner monologue: Embodied reasoning through planning with language models. arXiv preprint arXiv:2207.05608

[22]

Kang, M., Kwon, Y., & Yoon, S.-E. (2018). Automated task planning using object arrangement optimization. In 2018 15th international conference on ubiquitous robots (UR), IEEE.

[23]

Kant, Y., Ramachandran, A., Yenamandra, S., Gilitschenski, I., Batra, D., Szot, A., & Agrawal, H. (2022). Housekeep: Tidying virtual households using commonsense reasoning. arXiv preprint arXiv:2205.10712

[24]

Kapelyukh, I., & Johns, E. (2022). My house, my rules: Learning tidying preferences with graph neural networks. In Conference on robot learning.

[25]

Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., & Iwasawa, Y. (2022). Large language models are zero-shot reasoners. arXiv preprint arXiv:2205.11916

[26]

Kolve, E., Mottaghi, R., Han, W., VanderBilt, E., Weihs, L., Herrasti, A., Gordon, D., Zhu, Y., Gupta, A., & Farhadi, A. (2017). Ai2-thor: An interactive 3d environment for visual ai. arXiv preprint arXiv:1712.05474

[27]

Kujala, J. V., Lukka, T. J., & Holopainen, H. (2016). Classifying and sorting cluttered piles of unknown objects with robots: A learning approach. In 2016 IEEE/RSJ international conference on intelligent robots and systems (IROS).

[28]

Li, C., Xia, F., Martín-Martín, R., Lingelbach, M., Srivastava, S., Shen, B., Vainio, K.E., Gokmen, C., Dharan, G., & Jain, T. (2022). igibson 2.0: Object-centric simulation for robot learning of everyday household tasks. In Conference on robot learning.

[29]

Li, C., Zhang, R., Wong, J., Gokmen, C., Srivastava, S., Martín-Martín, R., Wang, C., Levine, G., Lingelbach, M., & Sun, J. (2022). Behavior-1k: A benchmark for embodied ai with 1000 everyday activities and realistic simulation. In 6th annual conference on robot learning.

[30]

Liang, J., Huang, W., Xia, F., Xu, P., Hausman, K., Ichter, B., Florence, P., & Zeng, A. (2022). Code as policies: Language model programs for embodied control. arXiv preprint arXiv:2209.07753

[31]

Lin, K., Agia, C., Migimatsu, T., Pavone, M., Bohg, J. (2023). Text2motion: From natural language instructions to feasible plans. arXiv preprint arXiv:2303.12153

[32]

Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., & Stoyanov, V. (2019). Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692

[33]

Lukka, T. J., Tossavainen, T., Kujala, J. V., & Raiko, T. (2014). Zenrobotics recycler–robotic sorting using machine learning. In Proceedings of the international conference on sensor-based sorting (SBS).

[34]

Madaan, A., Zhou, S., Alon, U., Yang, Y., & Neubig, G. (2022). Language models of code are few-shot commonsense learners. arXiv preprint arXiv:2210.07128

[35]

Mees, O., Borja-Diaz, J., & Burgard, W. (2022). Grounding language with visual affordances over unstructured data. arXiv preprint arXiv:2210.01911

[36]

Miller GA Wordnet: A lexical database for english Communications of the ACM 1995 38 11 39-41

[37]

Minderer, M., Gritsenko, A., Stone, A., Neumann, M., Weissenborn, D., Dosovitskiy, A., Mahendran, A., Arnab, A., Dehghani, M., & Shen, Z., et al. (2022). Simple open-vocabulary object detection with vision transformers. arXiv preprint arXiv:2205.06230

[38]

Nye, M., Andreassen, A. J., Gur-Ari, G., Michalewski, H., Austin, J., Bieber, D., Dohan, D., Lewkowycz, A., Bosma, M., & Luan, D., et al. (2021). Show your work: Scratchpads for intermediate computation with language models. arXiv preprint arXiv:2112.00114

[39]

Pan, Z., Hauser, K. (2021). Decision making in joint push-grasp action space for large-scale object sorting. In 2021 IEEE international conference on robotics and automation (ICRA).

[40]

Puig, X., Ra, K., Boben, M., Li, J., Wang, T., Fidler, S., & Torralba, A. (2018). Virtualhome: Simulating household activities via programs. In Proceedings of the IEEE conference on computer vision and pattern recognition.

[41]

Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., & Clark, J. (2021). Learning transferable visual models from natural language supervision. In International conference on machine learning.

[42]

Raman, S. S., Cohen, V., Rosen, E., Idrees, I., Paulius, D., & Tellex, S. (2022). Planning with large language models via corrective re-prompting. arXiv preprint arXiv:2211.09935

[43]

Rasch R, Sprute D, Pörtner A, Battermann S, and König M Tidy up my room: Multi-agent cooperation for service tasks in smart environments Journal of Ambient Intelligence and Smart Environments 2019 11 3 261-275

[44]

Reimers, N., & Gurevych, I. (2019). Sentence-bert: Sentence embeddings using siamese bert-networks. In Proceedings of the 2019 Conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP).

[45]

Ren, A. Z., Govil, B., Yang, T.-Y., Narasimhan, K., & Majumdar, A. (2022). Leveraging language for accelerated learning of tool manipulation. arXiv preprint arXiv:2206.13074

[46]

Rytting C and Wingate D Leveraging the inductive bias of large language models for abstract textual reasoning Advances in Neural Information Processing Systems 2021 34 17111-17122

[47]

Sanh, V., Debut, L., Chaumond, J., & Wolf, T. (2019). Distilbert, a distilled version of bert: Smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108

[48]

Sarch, G., Fang, Z., Harley, A.W., Schydlo, P., Tarr, M.J., Gupta, S., & Fragkiadaki, K. (2022). Tidee: Tidying up novel rooms using visuo-semantic commonsense priors. In European conference on computer vision.

[49]

Shah, D., Osinski, B., Ichter, B., & Levine, S. (2022). LM-Nav: Robotic navigation with large pre-trained models of language, vision, and action. arXiv preprint arXiv:2207.04429

[50]

Shridhar, M., Thomason, J., Gordon, D., Bisk, Y., Han, W., Mottaghi, R., Zettlemoyer, L., & Fox, D. (2020). Alfred: A benchmark for interpreting grounded instructions for everyday tasks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[51]

Shridhar, M., Yuan, X., Côté, M.-A., Bisk, Y., Trischler, A., & Hausknecht, M. J. (2021). Alfworld: Aligning text and embodied environments for interactive learning. In ICLR.

[52]

Silver, T., Hariprasad, V., Shuttleworth, R. S., Kumar, N., Lozano-Pérez, T., & Kaelbling, L. P. (2022). Pddl planning with pretrained large language models. In NeurIPS 2022 foundation models for decision making workshop.

[53]

Singh, I., Blukis, V., Mousavian, A., Goyal, A., Xu, D., Tremblay, J., Fox, D., Thomason, J., & Garg, A. (2022). Progprompt: Generating situated robot task plans using large language models. arXiv preprint arXiv:2209.11302

[54]

Song, H., Haustein, J. A., Yuan, W., Hang, K., Wang, M.Y., Kragic, D., Stork, J. A. (2020). Multi-object rearrangement with monte Carlo tree search: A case study on planar nonprehensile sorting. In 2020 IEEE/RSJ international conference on intelligent robots and systems (IROS).

[55]

Srivastava, S., Li, C., Lingelbach, M., Martín-Martín, R., Xia, F., Vainio, K. E., Lian, Z., Gokmen, C., Buch, S., & Liu, K. (2022). Behavior: Benchmark for everyday household activities in virtual, interactive, and ecological environments. In Conference on robot learning.

[56]

Szabo, R., Lie, I. (2012). Automated colored object sorting application for robotic arms. In 2012 10th international symposium on electronics and telecommunications.

[57]

Szot A, Clegg A, Undersander E, Wijmans E, Zhao Y, Turner J, Maestre N, Mukadam M, Chaplot DS, Maksymets O, et al. Habitat 2.0: Training home assistants to rearrange their habitat Advances in Neural Information Processing Systems 2021 34 251-266

[58]

Taniguchi A, Isobe S, El Hafi L, Hagiwara Y, and Taniguchi T Autonomous planning based on spatial concepts to tidy up home environments with service robots Advanced Robotics 2021 35 8 471-489

[59]

Wei, J., Tay, Y., Bommasani, R., Raffel, C., Zoph, B., Borgeaud, S., Yogatama, D., Bosma, M., Zhou, D., & Metzler, D., et al. (2022). Emergent abilities of large language models. arXiv preprint arXiv:2206.07682

[60]

Wei, J., Wang, X., Schuurmans, D., Bosma, M., Chi, E., Le, Q., & Zhou, D. (2022). Chain of thought prompting elicits reasoning in large language models. arXiv preprint arXiv:2201.11903

[61]

Weihs, L., Deitke, M., Kembhavi, A., & Mottaghi, R. (2021). Visual room rearrangement. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition.

[62]

Wu, J., Antonova, R., Kan, A., Lepert, M., Zeng, A., Song, S., Bohg, J., Rusinkiewicz, S., & Funkhouser, T. (2023). Tidybot: Personalized robot assistance with large language models. In IEEE/rsj international conference on intelligent robots and systems (IROS).

[63]

Yan, Z., Crombez, N., Buisson, J., Ruichck, Y., Krajnik, T., & Sun, L. (2021). A quantifiable stratification strategy for tidy-up in service robotics. In 2021 IEEE international conference on advanced robotics and its social impacts (ARSO).

[64]

Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., & Cao, Y. (2022). React: Synergizing reasoning and acting in language models. arXiv preprint arXiv:2210.03629

[65]

Zeng, A., Wong, A., Welker, S., Choromanski, K., Tombari, F., Purohit, A., Ryoo, M., Sindhwani, V., Lee, J., & Vanhoucke, V., et al. (2022). Socratic models: Composing zero-shot multimodal reasoning with language. arXiv preprint arXiv:2204.00598

[66]

Zeng A, Song S, Lee J, Rodriguez A, and Funkhouser T Tossingbot: Learning to throw arbitrary objects with residual physics IEEE Transactions on Robotics 2020 36 4 1307-1319

[67]

Zeng A, Song S, Yu K-T, Donlon E, Hogan FR, Bauza M, Ma D, Taylor O, Liu M, Romo E, et al. Robotic pick-and-place of novel objects in clutter with multi-affordance grasping and cross-domain image matching The International Journal of Robotics Research 2022 41 7 690-705

Cited By

Liu SChen JRuan SSu HYin ZCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Exploring the Robustness of Decision-Level Through Adversarial Attacks on LLM-Based Embodied ModelsProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680616(8120-8128)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3680616
Padmanabha AYuan JGupta JKarachiwalla ZMajidi CAdmoni HErickson Z(2024)VoicePilot: Harnessing LLMs as Speech Interfaces for Physically Assistive RobotsProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3654777.3676401(1-18)Online publication date: 13-Oct-2024
https://dl.acm.org/doi/10.1145/3654777.3676401
Thalhammer SBauer DHönig PWeibel JGarcía-Rodríguez JVincze M(2024)Challenges for Monocular 6-D Object Pose Estimation in RoboticsIEEE Transactions on Robotics10.1109/TRO.2024.343387040(4065-4084)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TRO.2024.3433870
Show More Cited By

Index Terms

TidyBot: personalized robot assistance with large language models
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
  2. Machine learning
2. Human-centered computing

Index terms have been assigned to the content through auto-classification.

Recommendations

System architecture for autonomous mobile manipulation of everyday objects in domestic environments
PETRA '19: Proceedings of the 12th ACM International Conference on PErvasive Technologies Related to Assistive Environments

Assistive service robots have a great potential for helping elderly or motor-impaired people in everyday tasks. Specifically, enabling robots to manipulate objects in home environments is a critical step towards independent life. In this work, we focus ...
Large Language Models as Data Augmenters for Cold-Start Item Recommendation
WWW '24: Companion Proceedings of the ACM Web Conference 2024

The reasoning and generalization capabilities of LLMs can help us better understand user preferences and item characteristics, offering exciting prospects to enhance recommendation systems. Though effective while user-item interactions are abundant, ...
The domesticated robot: design guidelines for assisting older adults to age in place
HRI '12: Proceedings of the seventh annual ACM/IEEE international conference on Human-Robot Interaction

Many older adults wish to remain in their own homes as they age [16]. However, challenges in performing home upkeep tasks threaten an older adult's ability to age in place. Even healthy independently living older adults experience challenges in ...

Comments

Information & Contributors

Information

Published In

cover image Autonomous Robots

Autonomous Robots Volume 47, Issue 8

Dec 2023

598 pages

ISSN:0929-5593

Issue’s Table of Contents

© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 16 November 2023

Accepted: 25 August 2023

Received: 02 May 2023

Author Tags

Qualifiers

Research-article

Funding Sources

National Science Foundation

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

9
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 08 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Liu SChen JRuan SSu HYin ZCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Exploring the Robustness of Decision-Level Through Adversarial Attacks on LLM-Based Embodied ModelsProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680616(8120-8128)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3680616
Padmanabha AYuan JGupta JKarachiwalla ZMajidi CAdmoni HErickson Z(2024)VoicePilot: Harnessing LLMs as Speech Interfaces for Physically Assistive RobotsProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3654777.3676401(1-18)Online publication date: 13-Oct-2024
https://dl.acm.org/doi/10.1145/3654777.3676401
Thalhammer SBauer DHönig PWeibel JGarcía-Rodríguez JVincze M(2024)Challenges for Monocular 6-D Object Pose Estimation in RoboticsIEEE Transactions on Robotics10.1109/TRO.2024.343387040(4065-4084)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TRO.2024.3433870
Chen LLu WZhang KZhang YZhao LZheng Y(2024)TossNet: Learning to Accurately Measure and Predict Robot Throwing of Arbitrary Objects in Real Time With Proprioceptive SensingIEEE Transactions on Robotics10.1109/TRO.2024.341600940(3232-3251)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TRO.2024.3416009
Liu YBillard A(2024)Tube Acceleration: Robust Dexterous Throwing Against Release UncertaintyIEEE Transactions on Robotics10.1109/TRO.2024.338639140(2831-2849)Online publication date: 10-Apr-2024
https://dl.acm.org/doi/10.1109/TRO.2024.3386391
Kim YKim DChoi JPark JOh NPark D(2024)A survey on integration of large language models with intelligent robotsIntelligent Service Robotics10.1007/s11370-024-00550-517:5(1091-1107)Online publication date: 1-Sep-2024
https://dl.acm.org/doi/10.1007/s11370-024-00550-5
Zhao XJing MWu Y(2024)Agent Can Say No: Robot Task Planning by Natural Language Feedback Between Planner and ExecutorAdvanced Intelligent Computing Technology and Applications10.1007/978-981-97-5675-9_13(142-153)Online publication date: 5-Aug-2024
https://dl.acm.org/doi/10.1007/978-981-97-5675-9_13
Fan SLiu RWang WYang Y(2024)Navigation Instruction Generation with BEV Perception and Large Language ModelsComputer Vision – ECCV 202410.1007/978-3-031-72670-5_21(368-387)Online publication date: 29-Sep-2024
https://dl.acm.org/doi/10.1007/978-3-031-72670-5_21
Sun XZhao XLee JLu WKerzel MWermter S(2024)Details Make a Difference: Object State-Sensitive Neurorobotic Task PlanningArtificial Neural Networks and Machine Learning – ICANN 202410.1007/978-3-031-72341-4_18(261-275)Online publication date: 17-Sep-2024
https://dl.acm.org/doi/10.1007/978-3-031-72341-4_18

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents