research-article

TRAD: Enhancing LLM Agents with Step-Wise Thought Retrieval and Aligned Decision

Authors:

Weinan ZhangAuthors Info & Claims

SIGIR '24: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval

Pages 3 - 13

https://doi.org/10.1145/3626772.3657788

Published: 11 July 2024 Publication History

Abstract

Several large language model (LLM) agents have been constructed for diverse purposes such as web navigation and online shopping, leveraging the broad knowledge and text comprehension capabilities of LLMs. Many of these works rely on in-context examples to achieve generalization without requiring fine-tuning. However, few have addressed the challenge of selecting and effectively utilizing these examples. Recent approaches have introduced trajectory-level retrieval with task meta-data and the use of trajectories as in-context examples to enhance overall performance in some sequential decision making tasks like computer control. Nevertheless, these methods face issues like plausible examples retrieved without task-specific state transition dynamics and long input with plenty of irrelevant context due to using complete trajectories. In this paper, we propose a novel framework (TRAD) to tackle these problems. TRAD first employs Thought Retrieval for step-level demonstration selection through thought matching, enhancing the quality of demonstrations and reducing irrelevant input noise. Then, Aligned Decision is introduced to complement retrieved demonstration steps with their preceding or subsequent steps, providing tolerance for imperfect thought and offering a balance between more context and less noise. Extensive experiments on ALFWorld and Mind2Web benchmarks demonstrate that TRAD not only surpasses state-of-the-art models but also effectively reduces noise and promotes generalization. Furthermore, TRAD has been deployed in real-world scenarios of a global business insurance company and yields an improved success rate of robotic process automation. Our codes are available at: https://github.com/skyriver-2000/TRAD-Official.

References

[1]

Constructions Aeronautiques, Adele Howe, Craig Knoblock, ISI Drew McDermott, Ashwin Ram, Manuela Veloso, Daniel Weld, David Wilkins SRI, Anthony Barrett, Dave Christianson, et al. 1998. Pddl| the planning domain definition language. Technical Report (1998).

[2]

Maciej Besta, Nils Blach, Ales Kubicek, Robert Gerstenberger, Lukas Gianinazzi, Joanna Gajda, Tomasz Lehmann, Michal Podstawski, Hubert Niewiadomski, Piotr Nyczyk, and Torsten Hoefler. 2023. Graph of Thoughts: Solving Elaborate Problems with Large Language Models. arXiv preprint arXiv:2308.09687 (2023).

[3]

Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language Models are Few-Shot Learners. In Proceedings of the 34th Advances in Neural Information Processing Systems (NeurIPS).

Digital Library

[4]

Xiang Deng, Yu Gu, Boyuan Zheng, Shijie Chen, Samuel Stevens, Boshi Wang, Huan Sun, and Yu Su. 2023. Mind2Web: Towards a Generalist Agent for the Web. In Proceedings of the 37th Advances in Neural Information Processing Systems (NeurIPS).

[5]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).

[6]

Ruomeng Ding, Chaoyun Zhang, Lu Wang, Yong Xu, Minghua Ma, Wei Zhang, Si Qin, Saravan Rajmohan, Qingwei Lin, and Dongmei Zhang. 2023. Everything of thoughts: Defying the law of penrose triangle for thought generation. arXiv preprint arXiv:2311.04254 (2023).

[7]

Izzeddin Gur, Hiroki Furuta, Austin Huang, Mustafa Safdari, Yutaka Matsuo, Douglas Eck, and Aleksandra Faust. 2024. A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis. In Proceedings of The 12th International Conference on Learning Representations (ICLR).

[8]

Izzeddin Gur, Ofir Nachum, Yingjie Miao, Mustafa Safdari, Austin Huang, Aakanksha Chowdhery, Sharan Narang, Noah Fiedel, and Aleksandra Faust. 2023. Understanding HTML with Large Language Models. In Findings of the Association for Computational Linguistics (EMNLP). 2803--2821.

[9]

Shibo Hao, Yi Gu, Haodi Ma, Joshua Jiahua Hong, Zhen Wang, Daisy Zhe Wang, and Zhiting Hu. 2023. Reasoning with Language Model is Planning with World Model. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP). 8154--8173.

[10]

Ari Holtzman, Jan Buys, Li Du, Maxwell Forbes, and Yejin Choi. 2020. The Curious Case of Neural Text Degeneration. In Proceedings of the 8th International Conference on Learning Representations (ICLR).

[11]

Vladimir Karpukhin, Barlas Oguz, Sewon Min, Patrick S. H. Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. 2020. Dense Passage Retrieval for Open-Domain Question Answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 6769--6781.

[12]

Geunwoo Kim, Pierre Baldi, and Stephen McAleer. 2023. Language Models can Solve Computer Tasks. In Proceedings of the 37th Advances in Neural Information Processing Systems (NeurIPS).

[13]

Jacky Liang, Wenlong Huang, Fei Xia, Peng Xu, Karol Hausman, Brian Ichter, Pete Florence, and Andy Zeng. 2023. Code as Policies: Language Model Programs for Embodied Control. In Proceedings of 2023 IEEE International Conference on Robotics and Automation (ICRA). 9493--9500.

[14]

Bo Liu, Yuqian Jiang, Xiaohan Zhang, Qiang Liu, Shiqi Zhang, Joydeep Biswas, and Peter Stone. 2023. LLMP: Empowering large language models with optimal planning proficiency. arXiv preprint arXiv:2304.11477 (2023).

[15]

Jiachang Liu, Dinghan Shen, Yizhe Zhang, Bill Dolan, Lawrence Carin, and Weizhu Chen. 2021. What Makes Good In-Context Examples for GPT-3? arXiv preprint arXiv:2101.06804 (2021).

[16]

Reiichiro Nakano, Jacob Hilton, Suchir Balaji, Jeff Wu, Long Ouyang, Christina Kim, Christopher Hesse, Shantanu Jain, Vineet Kosaraju, William Saunders, et al. 2021. Webgpt: Browser-assisted question-answering with human feedback. arXiv preprint arXiv:2112.09332 (2021).

[17]

OpenAI. 2023. GPT-4 Technical Report. arXiv preprint arXiv:2303.08774 (2023).

[18]

Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al. 2022. Training language models to follow instructions with human feedback. In Proceedings of the 36th Advances in Neural Information Processing Systems (NeurIPS). 27730--27744.

Digital Library

[19]

Joon Sung Park, Joseph O'Brien, Carrie Jun Cai, Meredith Ringel Morris, Percy Liang, and Michael S Bernstein. 2023. Generative agents: Interactive simulacra of human behavior. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (UIST). 1--22.

Digital Library

[20]

Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language models are unsupervised multitask learners. OpenAI Blog (2019).

[21]

Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 3980--3990.

[22]

Baptiste Roziere, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat, Xiaoqing Ellen Tan, Yossi Adi, Jingyu Liu, Tal Remez, Jérémy Rapin, et al. 2023. Code llama: Open foundation models for code. arXiv preprint arXiv:2308.12950 (2023).

[23]

Ohad Rubin, Jonathan Herzig, and Jonathan Berant. 2022. Learning To Retrieve Prompts for In-Context Learning. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT). 2655--2671.

[24]

Timo Schick, Jane Dwivedi-Yu, Roberto Dess`i, Roberta Raileanu, Maria Lomeli, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. 2023. Toolformer: Language models can teach themselves to use tools. In Proceedings of the 37th Advances in Neural Information Processing Systems (NeurIPS).

[25]

Tianlin Shi, Andrej Karpathy, Linxi Fan, Jonathan Hernandez, and Percy Liang. 2017. World of Bits: An Open-Domain Platform for Web-Based Agents. In Proceedings of the 34th International Conference on Machine Learning (ICML), Vol. 70. 3135--3144.

[26]

Noah Shinn, Federico Cassano, Ashwin Gopinath, Karthik R Narasimhan, and Shunyu Yao. 2023. Reflexion: Language agents with verbal reinforcement learning. In Proceedings of the 37th Advances in Neural Information Processing Systems (NeurIPS).

[27]

Mohit Shridhar, Jesse Thomason, Daniel Gordon, Yonatan Bisk, Winson Han, Roozbeh Mottaghi, Luke Zettlemoyer, and Dieter Fox. 2020. ALFRED: A Benchmark for Interpreting Grounded Instructions for Everyday Tasks. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 10737--10746.

[28]

Mohit Shridhar, Xingdi Yuan, Marc-Alexandre Cô té, Yonatan Bisk, Adam Trischler, and Matthew J. Hausknecht. 2021. ALFWorld: Aligning Text and Embodied Environments for Interactive Learning. In Proceedings of 9th International Conference on Learning Representations (ICLR).

[29]

The LongChat Team. 2023. How Long Can Open-Source LLMs Truly Promise on Context Length? https://lmsys.org/blog/2023-06--29-longchat/

[30]

Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, and Guillaume Lample. 2023. LLaMA: Open and Efficient Foundation Language Models. arXiv preprint arXiv:2302.13971 (2023).

[31]

Harsh Trivedi, Niranjan Balasubramanian, Tushar Khot, and Ashish Sabharwal. 2023. Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge-Intensive Multi-Step Questions. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL). 10014--10037.

[32]

Guanzhi Wang, Yuqi Xie, Yunfan Jiang, Ajay Mandlekar, Chaowei Xiao, Yuke Zhu, Linxi Fan, and Anima Anandkumar. 2023 d. Voyager: An open-ended embodied agent with large language models. arXiv preprint arXiv:2305.16291 (2023).

[33]

Lei Wang, Chen Ma, Xueyang Feng, Zeyu Zhang, Hao Yang, Jingsen Zhang, Zhiyuan Chen, Jiakai Tang, Xu Chen, Yankai Lin, et al. 2023 b. A survey on large language model based autonomous agents. arXiv preprint arXiv:2308.11432 (2023).

[34]

Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc V. Le, Ed H. Chi, Sharan Narang, Aakanksha Chowdhery, and Denny Zhou. 2023 c. Self-Consistency Improves Chain of Thought Reasoning in Language Models. In The 11th International Conference on Learning Representations, (ICLR).

[35]

Zihao Wang, Shaofei Cai, Anji Liu, Xiaojian Ma, and Yitao Liang. 2023 a. Describe, explain, plan and select: Interactive planning with large language models enables open-world multi-task agents. In Proceedings of the 37th Advances in Neural Information Processing Systems (NeurIPS).

[36]

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed H. Chi, Quoc V. Le, and Denny Zhou. 2022. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. In Proceedings of the 36th Advances in Neural Information Processing Systems (NeurIPS).

Digital Library

[37]

Zhiyong Wu, Yaoxiang Wang, Jiacheng Ye, and Lingpeng Kong. 2023. Self-Adaptive In-Context Learning: An Information Compression Perspective for In-Context Example Selection and Ordering. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL). 1423--1436.

[38]

Shunyu Yao, Howard Chen, John Yang, and Karthik Narasimhan. 2022. WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents. In Proceedings of 36th Conference on Neural Information Processing Systems (NeurIPS).

[39]

Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Thomas L. Griffiths, Yuan Cao, and Karthik Narasimhan. 2023 a. Tree of Thoughts: Deliberate Problem Solving with Large Language Models. In Proceedings of 37th Conference on Neural Information Processing Systems (NeurIPS).

[40]

Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik R. Narasimhan, and Yuan Cao. 2023 b. ReAct: Synergizing Reasoning and Acting in Language Models. In Proceedings of The 11th International Conference on Learning Representations (ICLR).

[41]

Yiming Zhang, Shi Feng, and Chenhao Tan. 2022. Active Example Selection for In-Context Learning. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP). 9134--9148.

[42]

Huaixiu Steven Zheng, Swaroop Mishra, Xinyun Chen, Heng-Tze Cheng, Ed H. Chi, Quoc V Le, and Denny Zhou. 2024 a. Step-Back Prompting Enables Reasoning Via Abstraction in Large Language Models. In Proceedings of The 12th International Conference on Learning Representations (ICLR).

[43]

Longtao Zheng, Rundong Wang, Xinrun Wang, and Bo An. 2024 b. Synapse: Trajectory-as-Exemplar Prompting with Memory for Computer Control. In Proceedings of 12th International Conference on Learning Representations (ICLR).

[44]

Denny Zhou, Nathanael Sch"a rli, Le Hou, Jason Wei, Nathan Scales, Xuezhi Wang, Dale Schuurmans, Claire Cui, Olivier Bousquet, Quoc V. Le, and Ed H. Chi. 2023. Least-to-Most Prompting Enables Complex Reasoning in Large Language Models. In The 11th International Conference on Learning Representations (ICLR).

[45]

Yutao Zhu, Huaying Yuan, Shuting Wang, Jiongnan Liu, Wenhan Liu, Chenlong Deng, Zhicheng Dou, and Ji-Rong Wen. 2023. Large language models for information retrieval: A survey. arXiv preprint arXiv:2308.07107 (2023). io

Cited By

Dammu PNejdl WAuer SKarras OCha MMoens MNajork M(2025)Towards Ethical and Personalized Web Navigation Agents: A Framework for User-Aligned Task ExecutionProceedings of the Eighteenth ACM International Conference on Web Search and Data Mining10.1145/3701551.3707420(1074-1076)Online publication date: 10-Mar-2025
https://dl.acm.org/doi/10.1145/3701551.3707420

Index Terms

TRAD: Enhancing LLM Agents with Step-Wise Thought Retrieval and Aligned Decision
1. Information systems
  1. Information retrieval

Recommendations

Synthetic Test Collections for Retrieval Evaluation
SIGIR '24: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval

Constructing test collections in Information Retrieval (IR) is vital for evaluating search algorithms. Obtaining a diverse set of user queries for test collection construction can be challenging, and acquiring relevance judgments, which indicate the ...
Exploration of query context for information retrieval
WWW '07: Proceedings of the 16th international conference on World Wide Web

A number of existing information retrieval systems propose the notion of query context to combine the knowledge of query and user into retrieval to reveal the most exact description of user's information needs. In this paper we interpret query context ...
Unifying Bias and Unfairness in Information Retrieval: New Challenges in the LLM Era
WSDM '25: Proceedings of the Eighteenth ACM International Conference on Web Search and Data Mining

With the rapid advancements of large language models (LLMs), information retrieval (IR) systems, such as search engines and recommender systems, have undergone a paradigm shift due to their integration. However, integrating LLMs into the IR pipelines has ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '24: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval

July 2024

3164 pages

ISBN:9798400704314

DOI:10.1145/3626772

General Chairs:
Grace Hui Yang
Georgetown University, USA
,
Hongning Wang
Tsinghua University, China
,
Sam Han
The Washington Post, USA
,
Program Chairs:
Claudia Hauff
Spotify, Netherlands
,
Guido Zuccon
The University of Queensland, Australia
,
Yi Zhang
University of California Santa Cruz, USA

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 July 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Shanghai Municipal Science and Technology Major Project
National Natural Science Foundation of China

Conference

SIGIR 2024

Sponsor:

SIGIR

SIGIR 2024: The 47th International ACM SIGIR Conference on Research and Development in Information Retrieval

July 14 - 18, 2024

Washington DC, USA

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
667
Total Downloads

Downloads (Last 12 months)667
Downloads (Last 6 weeks)127

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Dammu PNejdl WAuer SKarras OCha MMoens MNajork M(2025)Towards Ethical and Personalized Web Navigation Agents: A Framework for User-Aligned Task ExecutionProceedings of the Eighteenth ACM International Conference on Web Search and Data Mining10.1145/3701551.3707420(1074-1076)Online publication date: 10-Mar-2025
https://dl.acm.org/doi/10.1145/3701551.3707420

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten