Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content

TidyBot: personalized robot assistance with large language models

Published: 16 November 2023 Publication History


For a robot to personalize physical assistance effectively, it must learn user preferences that can be generally reapplied to future scenarios. In this work, we investigate personalization of household cleanup with robots that can tidy up rooms by picking up objects and putting them away. A key challenge is determining the proper place to put each object, as people’s preferences can vary greatly depending on personal taste or cultural background. For instance, one person may prefer storing shirts in the drawer, while another may prefer them on the shelf. We aim to build systems that can learn such preferences from just a handful of examples via prior interactions with a particular person. We show that robots can combine language-based planning and perception with the few-shot summarization capabilities of large language models to infer generalized user preferences that are broadly applicable to future interactions. This approach enables fast adaptation and achieves 91.2% accuracy on unseen objects in our benchmark dataset. We also demonstrate our approach on a real-world mobile manipulator called TidyBot, which successfully puts away 85.0% of objects in real-world test scenarios.


Abdo, N., Stachniss, C., Spinello, L., & Burgard, W. (2015). Robot, organize my shelves! tidying up objects by predicting user preferences. In 2015 IEEE international conference on robotics and automation (ICRA).
Batra, D., Chang, A. X., Chernova, S., Davison, A. J., Deng, J., Koltun, V., Levine, S., Malik, J., Mordatch, I., & Mottaghi, R., et al. (2020). Rearrangement: A challenge for embodied ai. arXiv preprint arXiv:2011.01975
Brohan, A., Chebotar, Y., Finn, C., Hausman, K., Herzog, A., Ho, D., Ibarz, J., Irpan, A., Jang, E., & Julian, R. (2022). Do as i can, not as i say: Grounding language in robotic affordances. In 6th annual conference on robot learning.
Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A, et al. Language models are few-shot learners Advances in Neural Information Processing Systems 2020 33 1877-1901
Chen, W., Hu, S., Talak, R., & Carlone, L. (2022). Leveraging large language models for robot 3d scene understanding. arXiv preprint arXiv:2209.05629
Chen, M., Tworek, J., Jun, H., Yuan, Q., Pinto, H.P.d.O., Kaplan, J., Edwards, H., Burda, Y., Joseph, N., & Brockman, G., et al. (2021). Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374
Chen, B., Xia, F., Ichter, B., Rao, K., Gopalakrishnan, K., Ryoo, M.S., Stone, A., & Kappler, D. (2022). Open-vocabulary queryable scene representations for real world planning. arXiv preprint arXiv:2209.09874
Chowdhery, A., Narang, S., Devlin, J., Bosma, M., Mishra, G., Roberts, A., Barham, P., Chung, H.W., Sutton, C., & Gehrmann, S., et al. (2022). Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311
Coulter, R. C. (1992). Implementation of the pure pursuit path tracking algorithm. Technical report, Carnegie-Mellon UNIV Pittsburgh PA Robotics INST.
Dewi T, Risma P, and Oktarina Y Fruit sorting robot based on color and size for an agricultural product packaging system Bulletin of Electrical Engineering and Informatics 2020 9 4 1438-1445
Ehsani, K., Han, W., Herrasti, A., VanderBilt, E., Weihs, L., Kolve, E., Kembhavi, A., & Mottaghi, R. (2021). Manipulathor: A framework for visual object manipulation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition.
Gan, C., Zhou, S., Schwartz, J., Alter, S., Bhandwaldar, A., Gutfreund, D., Yamins, D. L., DiCarlo, J. J., McDermott, J., & Torralba, A. (2022). The threedworld transport challenge: A visually guided task-and-motion planning benchmark towards physically realistic embodied ai. In 2022 International conference on robotics and automation (ICRA).
Garrido-Jurado S, Muñoz-Salinas R, Madrid-Cuevas FJ, and Marín-Jiménez MJ Automatic generation and detection of highly reliable fiducial markers under occlusion Pattern Recognition 2014 47 6 2280-2292
Gu, X., Lin, T.-Y., Kuo, W., & Cui, Y. (2021). Open-vocabulary object detection via vision and language knowledge distillation. In International conference on learning representations.
Gupta, M., & Sukhatme, G. S. (2012). Using manipulation primitives for brick sorting in clutter. In 2012 IEEE international conference on robotics and automation.
Herde, M., Kottke, D., Calma, A., Bieshaar, M., Deist, S., & Sick, B. (2018). Active sorting: An efficient training of a sorting robot with active learning techniques. In 2018 international joint conference on neural networks (IJCNN).
Høeg, S. H., & Tingelstad, L. (2022). More than eleven thousand words: Towards using language models for robotic sorting of unseen objects into arbitrary categories. In Workshop on language and robotics at CoRL 2022.
Holmberg R and Khatib O Development and control of a holonomic mobile robot for mobile manipulation tasks The International Journal of Robotics Research 2000 19 11 1066-1074
Huang, W., Abbeel, P., Pathak, D., & Mordatch, I. (2022). Language models as zero-shot planners: Extracting actionable knowledge for embodied agents. arXiv preprint arXiv:2201.07207
Huang, E., Jia, Z., & Mason, M. T. (2019). Large-scale multi-object rearrangement. In 2019 international conference on robotics and automation (ICRA).
Huang, W., Xia, F., Xiao, T., Chan, H., Liang, J., Florence, P., Zeng, A., Tompson, J., Mordatch, I., & Chebotar, Y., et al. (2022). Inner monologue: Embodied reasoning through planning with language models. arXiv preprint arXiv:2207.05608
Kang, M., Kwon, Y., & Yoon, S.-E. (2018). Automated task planning using object arrangement optimization. In 2018 15th international conference on ubiquitous robots (UR), IEEE.
Kant, Y., Ramachandran, A., Yenamandra, S., Gilitschenski, I., Batra, D., Szot, A., & Agrawal, H. (2022). Housekeep: Tidying virtual households using commonsense reasoning. arXiv preprint arXiv:2205.10712
Kapelyukh, I., & Johns, E. (2022). My house, my rules: Learning tidying preferences with graph neural networks. In Conference on robot learning.
Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., & Iwasawa, Y. (2022). Large language models are zero-shot reasoners. arXiv preprint arXiv:2205.11916
Kolve, E., Mottaghi, R., Han, W., VanderBilt, E., Weihs, L., Herrasti, A., Gordon, D., Zhu, Y., Gupta, A., & Farhadi, A. (2017). Ai2-thor: An interactive 3d environment for visual ai. arXiv preprint arXiv:1712.05474
Kujala, J. V., Lukka, T. J., & Holopainen, H. (2016). Classifying and sorting cluttered piles of unknown objects with robots: A learning approach. In 2016 IEEE/RSJ international conference on intelligent robots and systems (IROS).
Li, C., Xia, F., Martín-Martín, R., Lingelbach, M., Srivastava, S., Shen, B., Vainio, K.E., Gokmen, C., Dharan, G., & Jain, T. (2022). igibson 2.0: Object-centric simulation for robot learning of everyday household tasks. In Conference on robot learning.
Li, C., Zhang, R., Wong, J., Gokmen, C., Srivastava, S., Martín-Martín, R., Wang, C., Levine, G., Lingelbach, M., & Sun, J. (2022). Behavior-1k: A benchmark for embodied ai with 1000 everyday activities and realistic simulation. In 6th annual conference on robot learning.
Liang, J., Huang, W., Xia, F., Xu, P., Hausman, K., Ichter, B., Florence, P., & Zeng, A. (2022). Code as policies: Language model programs for embodied control. arXiv preprint arXiv:2209.07753
Lin, K., Agia, C., Migimatsu, T., Pavone, M., Bohg, J. (2023). Text2motion: From natural language instructions to feasible plans. arXiv preprint arXiv:2303.12153
Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., & Stoyanov, V. (2019). Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692
Lukka, T. J., Tossavainen, T., Kujala, J. V., & Raiko, T. (2014). Zenrobotics recycler–robotic sorting using machine learning. In Proceedings of the international conference on sensor-based sorting (SBS).
Madaan, A., Zhou, S., Alon, U., Yang, Y., & Neubig, G. (2022). Language models of code are few-shot commonsense learners. arXiv preprint arXiv:2210.07128
Mees, O., Borja-Diaz, J., & Burgard, W. (2022). Grounding language with visual affordances over unstructured data. arXiv preprint arXiv:2210.01911
Miller GA Wordnet: A lexical database for english Communications of the ACM 1995 38 11 39-41
Minderer, M., Gritsenko, A., Stone, A., Neumann, M., Weissenborn, D., Dosovitskiy, A., Mahendran, A., Arnab, A., Dehghani, M., & Shen, Z., et al. (2022). Simple open-vocabulary object detection with vision transformers. arXiv preprint arXiv:2205.06230
Nye, M., Andreassen, A. J., Gur-Ari, G., Michalewski, H., Austin, J., Bieber, D., Dohan, D., Lewkowycz, A., Bosma, M., & Luan, D., et al. (2021). Show your work: Scratchpads for intermediate computation with language models. arXiv preprint arXiv:2112.00114
Pan, Z., Hauser, K. (2021). Decision making in joint push-grasp action space for large-scale object sorting. In 2021 IEEE international conference on robotics and automation (ICRA).
Puig, X., Ra, K., Boben, M., Li, J., Wang, T., Fidler, S., & Torralba, A. (2018). Virtualhome: Simulating household activities via programs. In Proceedings of the IEEE conference on computer vision and pattern recognition.
Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., & Clark, J. (2021). Learning transferable visual models from natural language supervision. In International conference on machine learning.
Raman, S. S., Cohen, V., Rosen, E., Idrees, I., Paulius, D., & Tellex, S. (2022). Planning with large language models via corrective re-prompting. arXiv preprint arXiv:2211.09935
Rasch R, Sprute D, Pörtner A, Battermann S, and König M Tidy up my room: Multi-agent cooperation for service tasks in smart environments Journal of Ambient Intelligence and Smart Environments 2019 11 3 261-275
Reimers, N., & Gurevych, I. (2019). Sentence-bert: Sentence embeddings using siamese bert-networks. In Proceedings of the 2019 Conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP).
Ren, A. Z., Govil, B., Yang, T.-Y., Narasimhan, K., & Majumdar, A. (2022). Leveraging language for accelerated learning of tool manipulation. arXiv preprint arXiv:2206.13074
Rytting C and Wingate D Leveraging the inductive bias of large language models for abstract textual reasoning Advances in Neural Information Processing Systems 2021 34 17111-17122
Sanh, V., Debut, L., Chaumond, J., & Wolf, T. (2019). Distilbert, a distilled version of bert: Smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108
Sarch, G., Fang, Z., Harley, A.W., Schydlo, P., Tarr, M.J., Gupta, S., & Fragkiadaki, K. (2022). Tidee: Tidying up novel rooms using visuo-semantic commonsense priors. In European conference on computer vision.
Shah, D., Osinski, B., Ichter, B., & Levine, S. (2022). LM-Nav: Robotic navigation with large pre-trained models of language, vision, and action. arXiv preprint arXiv:2207.04429
Shridhar, M., Thomason, J., Gordon, D., Bisk, Y., Han, W., Mottaghi, R., Zettlemoyer, L., & Fox, D. (2020). Alfred: A benchmark for interpreting grounded instructions for everyday tasks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.
Shridhar, M., Yuan, X., Côté, M.-A., Bisk, Y., Trischler, A., & Hausknecht, M. J. (2021). Alfworld: Aligning text and embodied environments for interactive learning. In ICLR.
Silver, T., Hariprasad, V., Shuttleworth, R. S., Kumar, N., Lozano-Pérez, T., & Kaelbling, L. P. (2022). Pddl planning with pretrained large language models. In NeurIPS 2022 foundation models for decision making workshop.
Singh, I., Blukis, V., Mousavian, A., Goyal, A., Xu, D., Tremblay, J., Fox, D., Thomason, J., & Garg, A. (2022). Progprompt: Generating situated robot task plans using large language models. arXiv preprint arXiv:2209.11302
Song, H., Haustein, J. A., Yuan, W., Hang, K., Wang, M.Y., Kragic, D., Stork, J. A. (2020). Multi-object rearrangement with monte Carlo tree search: A case study on planar nonprehensile sorting. In 2020 IEEE/RSJ international conference on intelligent robots and systems (IROS).
Srivastava, S., Li, C., Lingelbach, M., Martín-Martín, R., Xia, F., Vainio, K. E., Lian, Z., Gokmen, C., Buch, S., & Liu, K. (2022). Behavior: Benchmark for everyday household activities in virtual, interactive, and ecological environments. In Conference on robot learning.
Szabo, R., Lie, I. (2012). Automated colored object sorting application for robotic arms. In 2012 10th international symposium on electronics and telecommunications.
Szot A, Clegg A, Undersander E, Wijmans E, Zhao Y, Turner J, Maestre N, Mukadam M, Chaplot DS, Maksymets O, et al. Habitat 2.0: Training home assistants to rearrange their habitat Advances in Neural Information Processing Systems 2021 34 251-266
Taniguchi A, Isobe S, El Hafi L, Hagiwara Y, and Taniguchi T Autonomous planning based on spatial concepts to tidy up home environments with service robots Advanced Robotics 2021 35 8 471-489
Wei, J., Tay, Y., Bommasani, R., Raffel, C., Zoph, B., Borgeaud, S., Yogatama, D., Bosma, M., Zhou, D., & Metzler, D., et al. (2022). Emergent abilities of large language models. arXiv preprint arXiv:2206.07682
Wei, J., Wang, X., Schuurmans, D., Bosma, M., Chi, E., Le, Q., & Zhou, D. (2022). Chain of thought prompting elicits reasoning in large language models. arXiv preprint arXiv:2201.11903
Weihs, L., Deitke, M., Kembhavi, A., & Mottaghi, R. (2021). Visual room rearrangement. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition.
Wu, J., Antonova, R., Kan, A., Lepert, M., Zeng, A., Song, S., Bohg, J., Rusinkiewicz, S., & Funkhouser, T. (2023). Tidybot: Personalized robot assistance with large language models. In IEEE/rsj international conference on intelligent robots and systems (IROS).
Yan, Z., Crombez, N., Buisson, J., Ruichck, Y., Krajnik, T., & Sun, L. (2021). A quantifiable stratification strategy for tidy-up in service robotics. In 2021 IEEE international conference on advanced robotics and its social impacts (ARSO).
Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., & Cao, Y. (2022). React: Synergizing reasoning and acting in language models. arXiv preprint arXiv:2210.03629
Zeng, A., Wong, A., Welker, S., Choromanski, K., Tombari, F., Purohit, A., Ryoo, M., Sindhwani, V., Lee, J., & Vanhoucke, V., et al. (2022). Socratic models: Composing zero-shot multimodal reasoning with language. arXiv preprint arXiv:2204.00598
Zeng A, Song S, Lee J, Rodriguez A, and Funkhouser T Tossingbot: Learning to throw arbitrary objects with residual physics IEEE Transactions on Robotics 2020 36 4 1307-1319
Zeng A, Song S, Yu K-T, Donlon E, Hogan FR, Bauza M, Ma D, Taylor O, Liu M, Romo E, et al. Robotic pick-and-place of novel objects in clutter with multi-affordance grasping and cross-domain image matching The International Journal of Robotics Research 2022 41 7 690-705

Cited By

View all
  • (2024)Exploring the Robustness of Decision-Level Through Adversarial Attacks on LLM-Based Embodied ModelsProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680616(8120-8128)Online publication date: 28-Oct-2024
  • (2024)VoicePilot: Harnessing LLMs as Speech Interfaces for Physically Assistive RobotsProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3654777.3676401(1-18)Online publication date: 13-Oct-2024
  • (2024)Challenges for Monocular 6-D Object Pose Estimation in RoboticsIEEE Transactions on Robotics10.1109/TRO.2024.343387040(4065-4084)Online publication date: 1-Jan-2024
  • Show More Cited By

Index Terms

  1. TidyBot: personalized robot assistance with large language models
        Index terms have been assigned to the content through auto-classification.



        Information & Contributors


        Published In

        cover image Autonomous Robots
        Autonomous Robots  Volume 47, Issue 8
        Dec 2023
        598 pages


        Kluwer Academic Publishers

        United States

        Publication History

        Published: 16 November 2023
        Accepted: 25 August 2023
        Received: 02 May 2023

        Author Tags

        1. Service robotics
        2. Mobile manipulation
        3. Large language models


        • Research-article

        Funding Sources


        Other Metrics

        Bibliometrics & Citations


        Article Metrics

        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 08 Feb 2025

        Other Metrics


        Cited By

        View all
        • (2024)Exploring the Robustness of Decision-Level Through Adversarial Attacks on LLM-Based Embodied ModelsProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680616(8120-8128)Online publication date: 28-Oct-2024
        • (2024)VoicePilot: Harnessing LLMs as Speech Interfaces for Physically Assistive RobotsProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3654777.3676401(1-18)Online publication date: 13-Oct-2024
        • (2024)Challenges for Monocular 6-D Object Pose Estimation in RoboticsIEEE Transactions on Robotics10.1109/TRO.2024.343387040(4065-4084)Online publication date: 1-Jan-2024
        • (2024)TossNet: Learning to Accurately Measure and Predict Robot Throwing of Arbitrary Objects in Real Time With Proprioceptive SensingIEEE Transactions on Robotics10.1109/TRO.2024.341600940(3232-3251)Online publication date: 1-Jan-2024
        • (2024)Tube Acceleration: Robust Dexterous Throwing Against Release UncertaintyIEEE Transactions on Robotics10.1109/TRO.2024.338639140(2831-2849)Online publication date: 10-Apr-2024
        • (2024)A survey on integration of large language models with intelligent robotsIntelligent Service Robotics10.1007/s11370-024-00550-517:5(1091-1107)Online publication date: 1-Sep-2024
        • (2024)Agent Can Say No: Robot Task Planning by Natural Language Feedback Between Planner and ExecutorAdvanced Intelligent Computing Technology and Applications10.1007/978-981-97-5675-9_13(142-153)Online publication date: 5-Aug-2024
        • (2024)Navigation Instruction Generation with BEV Perception and Large Language ModelsComputer Vision – ECCV 202410.1007/978-3-031-72670-5_21(368-387)Online publication date: 29-Sep-2024
        • (2024)Details Make a Difference: Object State-Sensitive Neurorobotic Task PlanningArtificial Neural Networks and Machine Learning – ICANN 202410.1007/978-3-031-72341-4_18(261-275)Online publication date: 17-Sep-2024

        View Options

        View options






        Share this Publication link

        Share on social media