Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

TidyBot: personalized robot assistance with large language models

Published: 16 November 2023 Publication History

Abstract

For a robot to personalize physical assistance effectively, it must learn user preferences that can be generally reapplied to future scenarios. In this work, we investigate personalization of household cleanup with robots that can tidy up rooms by picking up objects and putting them away. A key challenge is determining the proper place to put each object, as people’s preferences can vary greatly depending on personal taste or cultural background. For instance, one person may prefer storing shirts in the drawer, while another may prefer them on the shelf. We aim to build systems that can learn such preferences from just a handful of examples via prior interactions with a particular person. We show that robots can combine language-based planning and perception with the few-shot summarization capabilities of large language models to infer generalized user preferences that are broadly applicable to future interactions. This approach enables fast adaptation and achieves 91.2% accuracy on unseen objects in our benchmark dataset. We also demonstrate our approach on a real-world mobile manipulator called TidyBot, which successfully puts away 85.0% of objects in real-world test scenarios.

References

[1]
Abdo, N., Stachniss, C., Spinello, L., & Burgard, W. (2015). Robot, organize my shelves! tidying up objects by predicting user preferences. In 2015 IEEE international conference on robotics and automation (ICRA).
[2]
Batra, D., Chang, A. X., Chernova, S., Davison, A. J., Deng, J., Koltun, V., Levine, S., Malik, J., Mordatch, I., & Mottaghi, R., et al. (2020). Rearrangement: A challenge for embodied ai. arXiv preprint arXiv:2011.01975
[3]
Brohan, A., Chebotar, Y., Finn, C., Hausman, K., Herzog, A., Ho, D., Ibarz, J., Irpan, A., Jang, E., & Julian, R. (2022). Do as i can, not as i say: Grounding language in robotic affordances. In 6th annual conference on robot learning.
[4]
Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A, et al. Language models are few-shot learners Advances in Neural Information Processing Systems 2020 33 1877-1901
[5]
Chen, W., Hu, S., Talak, R., & Carlone, L. (2022). Leveraging large language models for robot 3d scene understanding. arXiv preprint arXiv:2209.05629
[6]
Chen, M., Tworek, J., Jun, H., Yuan, Q., Pinto, H.P.d.O., Kaplan, J., Edwards, H., Burda, Y., Joseph, N., & Brockman, G., et al. (2021). Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374
[7]
Chen, B., Xia, F., Ichter, B., Rao, K., Gopalakrishnan, K., Ryoo, M.S., Stone, A., & Kappler, D. (2022). Open-vocabulary queryable scene representations for real world planning. arXiv preprint arXiv:2209.09874
[8]
Chowdhery, A., Narang, S., Devlin, J., Bosma, M., Mishra, G., Roberts, A., Barham, P., Chung, H.W., Sutton, C., & Gehrmann, S., et al. (2022). Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311
[9]
Coulter, R. C. (1992). Implementation of the pure pursuit path tracking algorithm. Technical report, Carnegie-Mellon UNIV Pittsburgh PA Robotics INST.
[10]
Dewi T, Risma P, and Oktarina Y Fruit sorting robot based on color and size for an agricultural product packaging system Bulletin of Electrical Engineering and Informatics 2020 9 4 1438-1445
[11]
Ehsani, K., Han, W., Herrasti, A., VanderBilt, E., Weihs, L., Kolve, E., Kembhavi, A., & Mottaghi, R. (2021). Manipulathor: A framework for visual object manipulation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition.
[12]
Gan, C., Zhou, S., Schwartz, J., Alter, S., Bhandwaldar, A., Gutfreund, D., Yamins, D. L., DiCarlo, J. J., McDermott, J., & Torralba, A. (2022). The threedworld transport challenge: A visually guided task-and-motion planning benchmark towards physically realistic embodied ai. In 2022 International conference on robotics and automation (ICRA).
[13]
Garrido-Jurado S, Muñoz-Salinas R, Madrid-Cuevas FJ, and Marín-Jiménez MJ Automatic generation and detection of highly reliable fiducial markers under occlusion Pattern Recognition 2014 47 6 2280-2292
[14]
Gu, X., Lin, T.-Y., Kuo, W., & Cui, Y. (2021). Open-vocabulary object detection via vision and language knowledge distillation. In International conference on learning representations.
[15]
Gupta, M., & Sukhatme, G. S. (2012). Using manipulation primitives for brick sorting in clutter. In 2012 IEEE international conference on robotics and automation.
[16]
Herde, M., Kottke, D., Calma, A., Bieshaar, M., Deist, S., & Sick, B. (2018). Active sorting: An efficient training of a sorting robot with active learning techniques. In 2018 international joint conference on neural networks (IJCNN).
[17]
Høeg, S. H., & Tingelstad, L. (2022). More than eleven thousand words: Towards using language models for robotic sorting of unseen objects into arbitrary categories. In Workshop on language and robotics at CoRL 2022.
[18]
Holmberg R and Khatib O Development and control of a holonomic mobile robot for mobile manipulation tasks The International Journal of Robotics Research 2000 19 11 1066-1074
[19]
Huang, W., Abbeel, P., Pathak, D., & Mordatch, I. (2022). Language models as zero-shot planners: Extracting actionable knowledge for embodied agents. arXiv preprint arXiv:2201.07207
[20]
Huang, E., Jia, Z., & Mason, M. T. (2019). Large-scale multi-object rearrangement. In 2019 international conference on robotics and automation (ICRA).
[21]
Huang, W., Xia, F., Xiao, T., Chan, H., Liang, J., Florence, P., Zeng, A., Tompson, J., Mordatch, I., & Chebotar, Y., et al. (2022). Inner monologue: Embodied reasoning through planning with language models. arXiv preprint arXiv:2207.05608
[22]
Kang, M., Kwon, Y., & Yoon, S.-E. (2018). Automated task planning using object arrangement optimization. In 2018 15th international conference on ubiquitous robots (UR), IEEE.
[23]
Kant, Y., Ramachandran, A., Yenamandra, S., Gilitschenski, I., Batra, D., Szot, A., & Agrawal, H. (2022). Housekeep: Tidying virtual households using commonsense reasoning. arXiv preprint arXiv:2205.10712
[24]
Kapelyukh, I., & Johns, E. (2022). My house, my rules: Learning tidying preferences with graph neural networks. In Conference on robot learning.
[25]
Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., & Iwasawa, Y. (2022). Large language models are zero-shot reasoners. arXiv preprint arXiv:2205.11916
[26]
Kolve, E., Mottaghi, R., Han, W., VanderBilt, E., Weihs, L., Herrasti, A., Gordon, D., Zhu, Y., Gupta, A., & Farhadi, A. (2017). Ai2-thor: An interactive 3d environment for visual ai. arXiv preprint arXiv:1712.05474
[27]
Kujala, J. V., Lukka, T. J., & Holopainen, H. (2016). Classifying and sorting cluttered piles of unknown objects with robots: A learning approach. In 2016 IEEE/RSJ international conference on intelligent robots and systems (IROS).
[28]
Li, C., Xia, F., Martín-Martín, R., Lingelbach, M., Srivastava, S., Shen, B., Vainio, K.E., Gokmen, C., Dharan, G., & Jain, T. (2022). igibson 2.0: Object-centric simulation for robot learning of everyday household tasks. In Conference on robot learning.
[29]
Li, C., Zhang, R., Wong, J., Gokmen, C., Srivastava, S., Martín-Martín, R., Wang, C., Levine, G., Lingelbach, M., & Sun, J. (2022). Behavior-1k: A benchmark for embodied ai with 1000 everyday activities and realistic simulation. In 6th annual conference on robot learning.
[30]
Liang, J., Huang, W., Xia, F., Xu, P., Hausman, K., Ichter, B., Florence, P., & Zeng, A. (2022). Code as policies: Language model programs for embodied control. arXiv preprint arXiv:2209.07753
[31]
Lin, K., Agia, C., Migimatsu, T., Pavone, M., Bohg, J. (2023). Text2motion: From natural language instructions to feasible plans. arXiv preprint arXiv:2303.12153
[32]
Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., & Stoyanov, V. (2019). Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692
[33]
Lukka, T. J., Tossavainen, T., Kujala, J. V., & Raiko, T. (2014). Zenrobotics recycler–robotic sorting using machine learning. In Proceedings of the international conference on sensor-based sorting (SBS).
[34]
Madaan, A., Zhou, S., Alon, U., Yang, Y., & Neubig, G. (2022). Language models of code are few-shot commonsense learners. arXiv preprint arXiv:2210.07128
[35]
Mees, O., Borja-Diaz, J., & Burgard, W. (2022). Grounding language with visual affordances over unstructured data. arXiv preprint arXiv:2210.01911
[36]
Miller GA Wordnet: A lexical database for english Communications of the ACM 1995 38 11 39-41
[37]
Minderer, M., Gritsenko, A., Stone, A., Neumann, M., Weissenborn, D., Dosovitskiy, A., Mahendran, A., Arnab, A., Dehghani, M., & Shen, Z., et al. (2022). Simple open-vocabulary object detection with vision transformers. arXiv preprint arXiv:2205.06230
[38]
Nye, M., Andreassen, A. J., Gur-Ari, G., Michalewski, H., Austin, J., Bieber, D., Dohan, D., Lewkowycz, A., Bosma, M., & Luan, D., et al. (2021). Show your work: Scratchpads for intermediate computation with language models. arXiv preprint arXiv:2112.00114
[39]
Pan, Z., Hauser, K. (2021). Decision making in joint push-grasp action space for large-scale object sorting. In 2021 IEEE international conference on robotics and automation (ICRA).
[40]
Puig, X., Ra, K., Boben, M., Li, J., Wang, T., Fidler, S., & Torralba, A. (2018). Virtualhome: Simulating household activities via programs. In Proceedings of the IEEE conference on computer vision and pattern recognition.
[41]
Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., & Clark, J. (2021). Learning transferable visual models from natural language supervision. In International conference on machine learning.
[42]
Raman, S. S., Cohen, V., Rosen, E., Idrees, I., Paulius, D., & Tellex, S. (2022). Planning with large language models via corrective re-prompting. arXiv preprint arXiv:2211.09935
[43]
Rasch R, Sprute D, Pörtner A, Battermann S, and König M Tidy up my room: Multi-agent cooperation for service tasks in smart environments Journal of Ambient Intelligence and Smart Environments 2019 11 3 261-275
[44]
Reimers, N., & Gurevych, I. (2019). Sentence-bert: Sentence embeddings using siamese bert-networks. In Proceedings of the 2019 Conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP).
[45]
Ren, A. Z., Govil, B., Yang, T.-Y., Narasimhan, K., & Majumdar, A. (2022). Leveraging language for accelerated learning of tool manipulation. arXiv preprint arXiv:2206.13074
[46]
Rytting C and Wingate D Leveraging the inductive bias of large language models for abstract textual reasoning Advances in Neural Information Processing Systems 2021 34 17111-17122
[47]
Sanh, V., Debut, L., Chaumond, J., & Wolf, T. (2019). Distilbert, a distilled version of bert: Smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108
[48]
Sarch, G., Fang, Z., Harley, A.W., Schydlo, P., Tarr, M.J., Gupta, S., & Fragkiadaki, K. (2022). Tidee: Tidying up novel rooms using visuo-semantic commonsense priors. In European conference on computer vision.
[49]
Shah, D., Osinski, B., Ichter, B., & Levine, S. (2022). LM-Nav: Robotic navigation with large pre-trained models of language, vision, and action. arXiv preprint arXiv:2207.04429
[50]
Shridhar, M., Thomason, J., Gordon, D., Bisk, Y., Han, W., Mottaghi, R., Zettlemoyer, L., & Fox, D. (2020). Alfred: A benchmark for interpreting grounded instructions for everyday tasks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[51]
Shridhar, M., Yuan, X., Côté, M.-A., Bisk, Y., Trischler, A., & Hausknecht, M. J. (2021). Alfworld: Aligning text and embodied environments for interactive learning. In ICLR.
[52]
Silver, T., Hariprasad, V., Shuttleworth, R. S., Kumar, N., Lozano-Pérez, T., & Kaelbling, L. P. (2022). Pddl planning with pretrained large language models. In NeurIPS 2022 foundation models for decision making workshop.
[53]
Singh, I., Blukis, V., Mousavian, A., Goyal, A., Xu, D., Tremblay, J., Fox, D., Thomason, J., & Garg, A. (2022). Progprompt: Generating situated robot task plans using large language models. arXiv preprint arXiv:2209.11302
[54]
Song, H., Haustein, J. A., Yuan, W., Hang, K., Wang, M.Y., Kragic, D., Stork, J. A. (2020). Multi-object rearrangement with monte Carlo tree search: A case study on planar nonprehensile sorting. In 2020 IEEE/RSJ international conference on intelligent robots and systems (IROS).
[55]
Srivastava, S., Li, C., Lingelbach, M., Martín-Martín, R., Xia, F., Vainio, K. E., Lian, Z., Gokmen, C., Buch, S., & Liu, K. (2022). Behavior: Benchmark for everyday household activities in virtual, interactive, and ecological environments. In Conference on robot learning.
[56]
Szabo, R., Lie, I. (2012). Automated colored object sorting application for robotic arms. In 2012 10th international symposium on electronics and telecommunications.
[57]
Szot A, Clegg A, Undersander E, Wijmans E, Zhao Y, Turner J, Maestre N, Mukadam M, Chaplot DS, Maksymets O, et al. Habitat 2.0: Training home assistants to rearrange their habitat Advances in Neural Information Processing Systems 2021 34 251-266
[58]
Taniguchi A, Isobe S, El Hafi L, Hagiwara Y, and Taniguchi T Autonomous planning based on spatial concepts to tidy up home environments with service robots Advanced Robotics 2021 35 8 471-489
[59]
Wei, J., Tay, Y., Bommasani, R., Raffel, C., Zoph, B., Borgeaud, S., Yogatama, D., Bosma, M., Zhou, D., & Metzler, D., et al. (2022). Emergent abilities of large language models. arXiv preprint arXiv:2206.07682
[60]
Wei, J., Wang, X., Schuurmans, D., Bosma, M., Chi, E., Le, Q., & Zhou, D. (2022). Chain of thought prompting elicits reasoning in large language models. arXiv preprint arXiv:2201.11903
[61]
Weihs, L., Deitke, M., Kembhavi, A., & Mottaghi, R. (2021). Visual room rearrangement. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition.
[62]
Wu, J., Antonova, R., Kan, A., Lepert, M., Zeng, A., Song, S., Bohg, J., Rusinkiewicz, S., & Funkhouser, T. (2023). Tidybot: Personalized robot assistance with large language models. In IEEE/rsj international conference on intelligent robots and systems (IROS).
[63]
Yan, Z., Crombez, N., Buisson, J., Ruichck, Y., Krajnik, T., & Sun, L. (2021). A quantifiable stratification strategy for tidy-up in service robotics. In 2021 IEEE international conference on advanced robotics and its social impacts (ARSO).
[64]
Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., & Cao, Y. (2022). React: Synergizing reasoning and acting in language models. arXiv preprint arXiv:2210.03629
[65]
Zeng, A., Wong, A., Welker, S., Choromanski, K., Tombari, F., Purohit, A., Ryoo, M., Sindhwani, V., Lee, J., & Vanhoucke, V., et al. (2022). Socratic models: Composing zero-shot multimodal reasoning with language. arXiv preprint arXiv:2204.00598
[66]
Zeng A, Song S, Lee J, Rodriguez A, and Funkhouser T Tossingbot: Learning to throw arbitrary objects with residual physics IEEE Transactions on Robotics 2020 36 4 1307-1319
[67]
Zeng A, Song S, Yu K-T, Donlon E, Hogan FR, Bauza M, Ma D, Taylor O, Liu M, Romo E, et al. Robotic pick-and-place of novel objects in clutter with multi-affordance grasping and cross-domain image matching The International Journal of Robotics Research 2022 41 7 690-705

Cited By

View all
  • (2024)Exploring the Robustness of Decision-Level Through Adversarial Attacks on LLM-Based Embodied ModelsProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680616(8120-8128)Online publication date: 28-Oct-2024
  • (2024)VoicePilot: Harnessing LLMs as Speech Interfaces for Physically Assistive RobotsProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3654777.3676401(1-18)Online publication date: 13-Oct-2024
  • (2024)Challenges for Monocular 6-D Object Pose Estimation in RoboticsIEEE Transactions on Robotics10.1109/TRO.2024.343387040(4065-4084)Online publication date: 1-Jan-2024
  • Show More Cited By

Index Terms

  1. TidyBot: personalized robot assistance with large language models
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image Autonomous Robots
        Autonomous Robots  Volume 47, Issue 8
        Dec 2023
        598 pages

        Publisher

        Kluwer Academic Publishers

        United States

        Publication History

        Published: 16 November 2023
        Accepted: 25 August 2023
        Received: 02 May 2023

        Author Tags

        1. Service robotics
        2. Mobile manipulation
        3. Large language models

        Qualifiers

        • Research-article

        Funding Sources

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 08 Feb 2025

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)Exploring the Robustness of Decision-Level Through Adversarial Attacks on LLM-Based Embodied ModelsProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680616(8120-8128)Online publication date: 28-Oct-2024
        • (2024)VoicePilot: Harnessing LLMs as Speech Interfaces for Physically Assistive RobotsProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3654777.3676401(1-18)Online publication date: 13-Oct-2024
        • (2024)Challenges for Monocular 6-D Object Pose Estimation in RoboticsIEEE Transactions on Robotics10.1109/TRO.2024.343387040(4065-4084)Online publication date: 1-Jan-2024
        • (2024)TossNet: Learning to Accurately Measure and Predict Robot Throwing of Arbitrary Objects in Real Time With Proprioceptive SensingIEEE Transactions on Robotics10.1109/TRO.2024.341600940(3232-3251)Online publication date: 1-Jan-2024
        • (2024)Tube Acceleration: Robust Dexterous Throwing Against Release UncertaintyIEEE Transactions on Robotics10.1109/TRO.2024.338639140(2831-2849)Online publication date: 10-Apr-2024
        • (2024)A survey on integration of large language models with intelligent robotsIntelligent Service Robotics10.1007/s11370-024-00550-517:5(1091-1107)Online publication date: 1-Sep-2024
        • (2024)Agent Can Say No: Robot Task Planning by Natural Language Feedback Between Planner and ExecutorAdvanced Intelligent Computing Technology and Applications10.1007/978-981-97-5675-9_13(142-153)Online publication date: 5-Aug-2024
        • (2024)Navigation Instruction Generation with BEV Perception and Large Language ModelsComputer Vision – ECCV 202410.1007/978-3-031-72670-5_21(368-387)Online publication date: 29-Sep-2024
        • (2024)Details Make a Difference: Object State-Sensitive Neurorobotic Task PlanningArtificial Neural Networks and Machine Learning – ICANN 202410.1007/978-3-031-72341-4_18(261-275)Online publication date: 17-Sep-2024

        View Options

        View options

        Figures

        Tables

        Media

        Share

        Share

        Share this Publication link

        Share on social media