Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–21 of 21 results for author: Suhr, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.11896  [pdf, other

    cs.LG

    DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning

    Authors: Hao Bai, Yifei Zhou, Mert Cemri, Jiayi Pan, Alane Suhr, Sergey Levine, Aviral Kumar

    Abstract: Training corpuses for vision language models (VLMs) typically lack sufficient amounts of decision-centric data. This renders off-the-shelf VLMs sub-optimal for decision-making tasks such as in-the-wild device control through graphical user interfaces (GUIs). While training with static demonstrations has shown some promise, we show that such methods fall short for controlling real GUIs due to their… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: 11 pages of main text, 28 pages in total

  2. arXiv:2405.10292  [pdf, other

    cs.AI cs.CL cs.CV cs.LG

    Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning

    Authors: Yuexiang Zhai, Hao Bai, Zipeng Lin, Jiayi Pan, Shengbang Tong, Yifei Zhou, Alane Suhr, Saining Xie, Yann LeCun, Yi Ma, Sergey Levine

    Abstract: Large vision-language models (VLMs) fine-tuned on specialized visual instruction-following data have exhibited impressive language reasoning capabilities across various scenarios. However, this fine-tuning paradigm may not be able to efficiently learn optimal decision-making agents in multi-step goal-directed tasks from interactive environments. To address this challenge, we propose an algorithmic… ▽ More

    Submitted 16 May, 2024; v1 submitted 16 May, 2024; originally announced May 2024.

  3. arXiv:2404.06474  [pdf, other

    cs.AI

    Autonomous Evaluation and Refinement of Digital Agents

    Authors: Jiayi Pan, Yichi Zhang, Nicholas Tomlin, Yifei Zhou, Sergey Levine, Alane Suhr

    Abstract: We show that domain-general automatic evaluators can significantly improve the performance of agents for web navigation and device control. We experiment with multiple evaluation models that trade off between inference cost, modularity of design, and accuracy. We validate the performance of these models in several popular benchmarks for digital agents, finding between 74.4 and 92.9% agreement with… ▽ More

    Submitted 10 April, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

    Comments: Code at https://github.com/Berkeley-NLP/Agent-Eval-Refine

  4. arXiv:2311.08469  [pdf, other

    cs.CL

    UNcommonsense Reasoning: Abductive Reasoning about Uncommon Situations

    Authors: Wenting Zhao, Justin T Chiu, Jena D. Hwang, Faeze Brahman, Jack Hessel, Sanjiban Choudhury, Yejin Choi, Xiang Lorraine Li, Alane Suhr

    Abstract: Language technologies that accurately model the dynamics of events must perform commonsense reasoning. Existing work evaluating commonsense reasoning focuses on making inferences about common, everyday situations. To instead investigate the ability to model unusual, unexpected, and unlikely situations, we explore the task of uncommonsense abductive reasoning. Given a piece of context with an unexp… ▽ More

    Submitted 1 May, 2024; v1 submitted 14 November, 2023; originally announced November 2023.

    Comments: accepted at NAACL'24

  5. arXiv:2310.20707  [pdf, other

    cs.CL cs.LG

    What's In My Big Data?

    Authors: Yanai Elazar, Akshita Bhagia, Ian Magnusson, Abhilasha Ravichander, Dustin Schwenk, Alane Suhr, Pete Walsh, Dirk Groeneveld, Luca Soldaini, Sameer Singh, Hanna Hajishirzi, Noah A. Smith, Jesse Dodge

    Abstract: Large text corpora are the backbone of language models. However, we have a limited understanding of the content of these corpora, including general statistics, quality, social factors, and inclusion of evaluation data (contamination). In this work, we propose What's In My Big Data? (WIMBD), a platform and a set of sixteen analyses that allow us to reveal and compare the contents of large text corp… ▽ More

    Submitted 5 March, 2024; v1 submitted 31 October, 2023; originally announced October 2023.

    Comments: Published at ICLR 2024 spotlight

  6. arXiv:2310.11324  [pdf, other

    cs.CL cs.AI cs.LG

    Quantifying Language Models' Sensitivity to Spurious Features in Prompt Design or: How I learned to start worrying about prompt formatting

    Authors: Melanie Sclar, Yejin Choi, Yulia Tsvetkov, Alane Suhr

    Abstract: As large language models (LLMs) are adopted as a fundamental component of language technologies, it is crucial to accurately characterize their performance. Because choices in prompt design can strongly influence model behavior, this design process is critical in effectively using any modern pre-trained generative language model. In this work, we focus on LLM sensitivity to a quintessential class… ▽ More

    Submitted 1 July, 2024; v1 submitted 17 October, 2023; originally announced October 2023.

    Comments: ICLR 2024 Camera Ready version. With respect to the original submission, we added text generation experiments, plots of entire accuracy distributions for each task + stdev computations, and prompt length correlation with spread analysis

  7. arXiv:2306.01693  [pdf, other

    cs.CL

    Fine-Grained Human Feedback Gives Better Rewards for Language Model Training

    Authors: Zeqiu Wu, Yushi Hu, Weijia Shi, Nouha Dziri, Alane Suhr, Prithviraj Ammanabrolu, Noah A. Smith, Mari Ostendorf, Hannaneh Hajishirzi

    Abstract: Language models (LMs) often exhibit undesirable text generation behaviors, including generating false, toxic, or irrelevant outputs. Reinforcement learning from human feedback (RLHF) - where human preference judgments on LM outputs are transformed into a learning signal - has recently shown promise in addressing these issues. However, such holistic feedback conveys limited information on long text… ▽ More

    Submitted 30 October, 2023; v1 submitted 2 June, 2023; originally announced June 2023.

    Comments: NeurIPS 2023 camera-ready

  8. arXiv:2306.00924  [pdf, other

    cs.CL cs.AI cs.LG

    Minding Language Models' (Lack of) Theory of Mind: A Plug-and-Play Multi-Character Belief Tracker

    Authors: Melanie Sclar, Sachin Kumar, Peter West, Alane Suhr, Yejin Choi, Yulia Tsvetkov

    Abstract: Theory of Mind (ToM)$\unicode{x2014}$the ability to reason about the mental states of other people$\unicode{x2014}$is a key element of our social intelligence. Yet, despite their ever more impressive performance, large-scale neural language models still lack basic theory of mind capabilities out-of-the-box. We posit that simply scaling up models will not imbue them with theory of mind due to the i… ▽ More

    Submitted 1 June, 2023; originally announced June 2023.

    Journal ref: ACL 2023

  9. arXiv:2304.14399  [pdf, other

    cs.CL

    We're Afraid Language Models Aren't Modeling Ambiguity

    Authors: Alisa Liu, Zhaofeng Wu, Julian Michael, Alane Suhr, Peter West, Alexander Koller, Swabha Swayamdipta, Noah A. Smith, Yejin Choi

    Abstract: Ambiguity is an intrinsic feature of natural language. Managing ambiguity is a key part of human language understanding, allowing us to anticipate misunderstanding as communicators and revise our interpretations as listeners. As language models (LMs) are increasingly employed as dialogue interfaces and writing aids, handling ambiguous language is critical to their success. We characterize ambiguit… ▽ More

    Submitted 20 October, 2023; v1 submitted 27 April, 2023; originally announced April 2023.

    Comments: EMNLP 2023 camera-ready

  10. arXiv:2301.12050  [pdf, other

    cs.LG cs.CL

    Do Embodied Agents Dream of Pixelated Sheep: Embodied Decision Making using Language Guided World Modelling

    Authors: Kolby Nottingham, Prithviraj Ammanabrolu, Alane Suhr, Yejin Choi, Hannaneh Hajishirzi, Sameer Singh, Roy Fox

    Abstract: Reinforcement learning (RL) agents typically learn tabula rasa, without prior knowledge of the world. However, if initialized with knowledge of high-level subgoals and transitions between subgoals, RL agents could utilize this Abstract World Model (AWM) for planning and exploration. We propose using few-shot large language models (LLMs) to hypothesize an AWM, that will be verified through world ex… ▽ More

    Submitted 27 April, 2023; v1 submitted 27 January, 2023; originally announced January 2023.

    Comments: in proceedings of ICML 23

  11. arXiv:2212.09710  [pdf, other

    cs.CL cs.AI cs.LG

    Continual Learning for Instruction Following from Realtime Feedback

    Authors: Alane Suhr, Yoav Artzi

    Abstract: We propose and deploy an approach to continually train an instruction-following agent from feedback provided by users during collaborative interactions. During interaction, human users instruct an agent using natural language, and provide realtime binary feedback as they observe the agent following their instructions. We design a contextual bandit learning approach, converting user feedback to imm… ▽ More

    Submitted 5 December, 2023; v1 submitted 19 December, 2022; originally announced December 2022.

    Comments: NeurIPS 2023 Spotlight paper

  12. arXiv:2211.16492  [pdf, other

    cs.CL cs.AI cs.CV cs.LG

    Abstract Visual Reasoning with Tangram Shapes

    Authors: Anya Ji, Noriyuki Kojima, Noah Rush, Alane Suhr, Wai Keen Vong, Robert D. Hawkins, Yoav Artzi

    Abstract: We introduce KiloGram, a resource for studying abstract visual reasoning in humans and machines. Drawing on the history of tangram puzzles as stimuli in cognitive science, we build a richly annotated dataset that, with >1k distinct stimuli, is orders of magnitude larger and more diverse than prior resources. It is both visually and linguistically richer, moving beyond whole shape descriptions to i… ▽ More

    Submitted 29 November, 2022; originally announced November 2022.

    Comments: EMNLP 2022 long paper

  13. arXiv:2109.04452  [pdf, other

    cs.CL

    Analysis of Language Change in Collaborative Instruction Following

    Authors: Anna Effenberger, Eva Yan, Rhia Singh, Alane Suhr, Yoav Artzi

    Abstract: We analyze language change over time in a collaborative, goal-oriented instructional task, where utility-maximizing participants form conventions and increase their expertise. Prior work studied such scenarios mostly in the context of reference games, and consistently found that language complexity is reduced along multiple dimensions, such as utterance length, as conventions are formed. In contra… ▽ More

    Submitted 9 September, 2021; originally announced September 2021.

    Comments: Findings of EMNLP 2021 Short Paper

  14. arXiv:2108.04812  [pdf, other

    cs.CL cs.AI cs.LG

    Continual Learning for Grounded Instruction Generation by Observing Human Following Behavior

    Authors: Noriyuki Kojima, Alane Suhr, Yoav Artzi

    Abstract: We study continual learning for natural language instruction generation, by observing human users' instruction execution. We focus on a collaborative scenario, where the system both acts and delegates tasks to human users using natural language. We compare user execution of generated instructions to the original system intent as an indication to the system's success communicating its intent. We sh… ▽ More

    Submitted 10 August, 2021; originally announced August 2021.

    Comments: To appear in TACL 2021. The arXiv version is a pre-MIT Press publication version

  15. arXiv:1910.03655  [pdf, other

    cs.CL cs.AI cs.LG

    Executing Instructions in Situated Collaborative Interactions

    Authors: Alane Suhr, Claudia Yan, Charlotte Schluger, Stanley Yu, Hadi Khader, Marwa Mouallem, Iris Zhang, Yoav Artzi

    Abstract: We study a collaborative scenario where a user not only instructs a system to complete tasks, but also acts alongside it. This allows the user to adapt to the system abilities by changing their language or deciding to simply accomplish some tasks themselves, and requires the system to effectively recover from errors as the user strategically assigns it new goals. We build a game environment to stu… ▽ More

    Submitted 22 November, 2022; v1 submitted 8 October, 2019; originally announced October 2019.

    Comments: EMNLP 2019 long paper

  16. arXiv:1909.10411  [pdf, other

    cs.CL cs.CV

    NLVR2 Visual Bias Analysis

    Authors: Alane Suhr, Yoav Artzi

    Abstract: NLVR2 (Suhr et al., 2019) was designed to be robust for language bias through a data collection process that resulted in each natural language sentence appearing with both true and false labels. The process did not provide a similar measure of control for visual bias. This technical report analyzes the potential for visual bias in NLVR2. We show that some amount of visual bias likely exists. Final… ▽ More

    Submitted 23 September, 2019; originally announced September 2019.

    Comments: Corresponding notebook available at http://lil.nlp.cornell.edu/nlvr/NLVR2BiasAnalysis.html

  17. arXiv:1811.12354  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Touchdown: Natural Language Navigation and Spatial Reasoning in Visual Street Environments

    Authors: Howard Chen, Alane Suhr, Dipendra Misra, Noah Snavely, Yoav Artzi

    Abstract: We study the problem of jointly reasoning about language and vision through a navigation and spatial reasoning task. We introduce the Touchdown task and dataset, where an agent must first follow navigation instructions in a real-life visual urban environment, and then identify a location described in natural language to find a hidden object at the goal position. The data contains 9,326 examples of… ▽ More

    Submitted 16 May, 2020; v1 submitted 29 November, 2018; originally announced November 2018.

    Comments: arXiv admin note: text overlap with arXiv:1809.00786

    Journal ref: Published in CVPR 2019

  18. arXiv:1811.00491  [pdf, other

    cs.CL cs.CV

    A Corpus for Reasoning About Natural Language Grounded in Photographs

    Authors: Alane Suhr, Stephanie Zhou, Ally Zhang, Iris Zhang, Huajun Bai, Yoav Artzi

    Abstract: We introduce a new dataset for joint reasoning about natural language and images, with a focus on semantic diversity, compositionality, and visual reasoning challenges. The data contains 107,292 examples of English sentences paired with web photographs. The task is to determine whether a natural language caption is true about a pair of photographs. We crowdsource the data using sets of visually ri… ▽ More

    Submitted 21 July, 2019; v1 submitted 1 November, 2018; originally announced November 2018.

    Comments: ACL 2019 Long Paper

  19. arXiv:1805.10209  [pdf, other

    cs.CL

    Situated Mapping of Sequential Instructions to Actions with Single-step Reward Observation

    Authors: Alane Suhr, Yoav Artzi

    Abstract: We propose a learning approach for mapping context-dependent sequential instructions to actions. We address the problem of discourse and state dependencies with an attention-based model that considers both the history of the interaction and the state of the world. To train from start and goal states without access to demonstrations, we propose SESTRA, a learning algorithm that takes advantage of s… ▽ More

    Submitted 8 June, 2018; v1 submitted 25 May, 2018; originally announced May 2018.

    Comments: ACL 2018 Long Paper

  20. arXiv:1804.06868  [pdf, other

    cs.CL

    Learning to Map Context-Dependent Sentences to Executable Formal Queries

    Authors: Alane Suhr, Srinivasan Iyer, Yoav Artzi

    Abstract: We propose a context-dependent model to map utterances within an interaction to executable formal queries. To incorporate interaction history, the model maintains an interaction-level encoder that updates after each turn, and can copy sub-sequences of previously predicted queries during generation. Our approach combines implicit and explicit modeling of references between utterances. We evaluate o… ▽ More

    Submitted 25 April, 2018; v1 submitted 18 April, 2018; originally announced April 2018.

    Comments: NAACL-HLT 2018 Long Paper

  21. arXiv:1710.00453  [pdf, other

    cs.CL

    Visual Reasoning with Natural Language

    Authors: Stephanie Zhou, Alane Suhr, Yoav Artzi

    Abstract: Natural language provides a widely accessible and expressive interface for robotic agents. To understand language in complex environments, agents must reason about the full range of language inputs and their correspondence to the world. Such reasoning over language and vision is an open problem that is receiving increasing attention. While existing data sets focus on visual diversity, they do not… ▽ More

    Submitted 1 October, 2017; originally announced October 2017.

    Comments: AAAI NCHRC 2017