2024
pdf
bib
abs
Navigating Noisy Feedback: Enhancing Reinforcement Learning with Error-Prone Language Models
Muhan Lin
|
Shuyang Shi
|
Yue Guo
|
Behdad Chalaki
|
Vaishnav Tadiparthi
|
Ehsan Moradi Pari
|
Simon Stepputtis
|
Joseph Campbell
|
Katia P. Sycara
Findings of the Association for Computational Linguistics: EMNLP 2024
The correct specification of reward models is a well-known challenge in reinforcement learning.Hand-crafted reward functions often lead to inefficient or suboptimal policies and may not be aligned with user values.Reinforcement learning from human feedback is a successful technique that can mitigate such issues, however, the collection of human feedback can be laborious.Recent works have solicited feedback from pre-trained large language models rather than humans to reduce or eliminate human effort, however, these approaches yield poor performance in the presence of hallucination and other errors.This paper studies the advantages and limitations of reinforcement learning from large language model feedback and proposes a simple yet effective method for soliciting and applying feedback as a potential-based shaping function.We theoretically show that inconsistent rankings – which approximate ranking errors – lead to uninformative rewards with our approach. Our method empirically improves convergence speed and policy returns over commonly used baselines even with significant ranking errors, and eliminates the need for complex post-processing of reward functions.
2023
pdf
bib
abs
Long-Horizon Dialogue Understanding for Role Identification in the Game of Avalon with Large Language Models
Simon Stepputtis
|
Joseph Campbell
|
Yaqi Xie
|
Zhengyang Qi
|
Wenxin Zhang
|
Ruiyi Wang
|
Sanketh Rangreji
|
Charles Lewis
|
Katia Sycara
Findings of the Association for Computational Linguistics: EMNLP 2023
Deception and persuasion play a critical role in long-horizon dialogues between multiple parties, especially when the interests, goals, and motivations of the participants are not aligned. Such complex tasks pose challenges for current Large Language Models (LLM) as deception and persuasion can easily mislead them, especially in long-horizon multi-party dialogues. To this end, we explore the game of Avalon: The Resistance, a social deduction game in which players must determine each other’s hidden identities to complete their team’s objective. We introduce an online testbed and a dataset containing 20 carefully collected and labeled games among human players that exhibit long-horizon deception in a cooperative-competitive setting. We discuss the capabilities of LLMs to utilize deceptive long-horizon conversations between six human players to determine each player’s goal and motivation. Particularly, we discuss the multimodal integration of the chat between the players and the game’s state that grounds the conversation, providing further insights into the true player identities. We find that even current state-of-the-art LLMs do not reach human performance, making our dataset a compelling benchmark to investigate the decision-making and language-processing capabilities of LLMs. Our dataset and online testbed can be found at our project website: https://sstepput.github.io/Avalon-NLU/
pdf
bib
abs
Theory of Mind for Multi-Agent Collaboration via Large Language Models
Huao Li
|
Yu Chong
|
Simon Stepputtis
|
Joseph Campbell
|
Dana Hughes
|
Charles Lewis
|
Katia Sycara
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
While Large Language Models (LLMs) have demonstrated impressive accomplishments in both reasoning and planning, their abilities in multi-agent collaborations remains largely unexplored. This study evaluates LLM-based agents in a multi-agent cooperative text game with Theory of Mind (ToM) inference tasks, comparing their performance with Multi-Agent Reinforcement Learning (MARL) and planning-based baselines. We observed evidence of emergent collaborative behaviors and high-order Theory of Mind capabilities among LLM-based agents. Our results reveal limitations in LLM-based agents’ planning optimization due to systematic failures in managing long-horizon contexts and hallucination about the task state. We explore the use of explicit belief state representations to mitigate these issues, finding that it enhances task performance and the accuracy of ToM inferences for LLM-based agents.
2008
pdf
bib
abs
Bridging the Gap between Linguists and Technology Developers: Large-Scale, Sociolinguistic Annotation for Dialect and Speaker Recognition
Christopher Cieri
|
Stephanie Strassel
|
Meghan Glenn
|
Reva Schwartz
|
Wade Shen
|
Joseph Campbell
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
Recent years have seen increased interest within the speaker recognition community in high-level features including, for example, lexical choice, idiomatic expressions or syntactic structures. The promise of speaker recognition in forensic applications drives development toward systems robust to channel differences by selecting features inherently robust to channel difference. Within the language recognition community, there is growing interest in differentiating not only languages but also mutually intelligible dialects of a single language. Decades of research in dialectology suggest that high-level features can enable systems to cluster speakers according to the dialects they speak. The Phanotics (Phonetic Annotation of Typicality in Conversational Speech) project seeks to identify high-level features characteristic of American dialects, annotate a corpus for these features, use the data to dialect recognition systems and also use the categorization to create better models for speaker recognition. The data, once published, should be useful to other developers of speaker and dialect recognition systems and to dialectologists and sociolinguists. We expect the methods will generalize well beyond the speakers, dialects, and languages discussed here and should, if successful, provide a model for how linguists and technology developers can collaborate in the future for the benefit of both groups and toward a deeper understanding of how languages vary and change.
2006
pdf
bib
abs
The Mixer and Transcript Reading Corpora: Resources for Multilingual, Crosschannel Speaker Recognition Research
Christopher Cieri
|
Walt Andrews
|
Joseph P. Campbell
|
George Doddington
|
Jack Godfrey
|
Shudong Huang
|
Mark Liberman
|
Alvin Martin
|
Hirotaka Nakasone
|
Mark Przybocki
|
Kevin Walker
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
This paper describes the planning and creation of the Mixer and Transcript Reading corpora, their properties and yields, and reports on the lessons learned during their development.
2004
pdf
bib
Conversational Telephone Speech Corpus Collection for the NIST Speaker Recognition Evaluation 2004
Alvin Martin
|
David Miller
|
Mark Przybocki
|
Joseph Campbell
|
Hirotaka Nakasone
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)
pdf
bib
The Mixer Corpus of Multilingual, Multichannel Speaker Recognition Data
Christopher Cieri
|
Joseph P. Campbell
|
Hirotaka Nakasone
|
David Miller
|
Kevin Walker
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)