Search | arXiv e-print repository

Planetarium: A Rigorous Benchmark for Translating Text to Structured Planning Languages

Authors: Max Zuo, Francisco Piedrahita Velez, Xiaochen Li, Michael L. Littman, Stephen H. Bach

Abstract: Many recent works have explored using language models for planning problems. One line of research focuses on translating natural language descriptions of planning tasks into structured planning languages, such as the planning domain definition language (PDDL). While this approach is promising, accurately measuring the quality of generated PDDL code continues to pose significant challenges. First,… ▽ More Many recent works have explored using language models for planning problems. One line of research focuses on translating natural language descriptions of planning tasks into structured planning languages, such as the planning domain definition language (PDDL). While this approach is promising, accurately measuring the quality of generated PDDL code continues to pose significant challenges. First, generated PDDL code is typically evaluated using planning validators that check whether the problem can be solved with a planner. This method is insufficient because a language model might generate valid PDDL code that does not align with the natural language description of the task. Second, existing evaluation sets often have natural language descriptions of the planning task that closely resemble the ground truth PDDL, reducing the challenge of the task. To bridge this gap, we introduce \benchmarkName, a benchmark designed to evaluate language models' ability to generate PDDL code from natural language descriptions of planning tasks. We begin by creating a PDDL equivalence algorithm that rigorously evaluates the correctness of PDDL code generated by language models by flexibly comparing it against a ground truth PDDL. Then, we present a dataset of $132,037$ text-to-PDDL pairs across 13 different tasks, with varying levels of difficulty. Finally, we evaluate several API-access and open-weight language models that reveal this task's complexity. For example, $87.6\%$ of the PDDL problem descriptions generated by GPT-4o are syntactically parseable, $82.2\%$ are valid, solve-able problems, but only $35.1\%$ are semantically correct, highlighting the need for a more rigorous benchmark for this problem. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2310.00371 [pdf, other]

ConSOR: A Context-Aware Semantic Object Rearrangement Framework for Partially Arranged Scenes

Authors: Kartik Ramachandruni, Max Zuo, Sonia Chernova

Abstract: Object rearrangement is the problem of enabling a robot to identify the correct object placement in a complex environment. Prior work on object rearrangement has explored a diverse set of techniques for following user instructions to achieve some desired goal state. Logical predicates, images of the goal scene, and natural language descriptions have all been used to instruct a robot in how to arra… ▽ More Object rearrangement is the problem of enabling a robot to identify the correct object placement in a complex environment. Prior work on object rearrangement has explored a diverse set of techniques for following user instructions to achieve some desired goal state. Logical predicates, images of the goal scene, and natural language descriptions have all been used to instruct a robot in how to arrange objects. In this work, we argue that burdening the user with specifying goal scenes is not necessary in partially-arranged environments, such as common household settings. Instead, we show that contextual cues from partially arranged scenes (i.e., the placement of some number of pre-arranged objects in the environment) provide sufficient context to enable robots to perform object rearrangement \textit{without any explicit user goal specification}. We introduce ConSOR, a Context-aware Semantic Object Rearrangement framework that utilizes contextual cues from a partially arranged initial state of the environment to complete the arrangement of new objects, without explicit goal specification from the user. We demonstrate that ConSOR strongly outperforms two baselines in generalizing to novel object arrangements and unseen object categories. The code and data can be found at https://github.com/kartikvrama/consor. △ Less

Submitted 30 September, 2023; originally announced October 2023.

Comments: Accepted to IROS 2023

arXiv:2212.07194 [pdf]

Traffic Flow Prediction via Variational Bayesian Inference-based Encoder-Decoder Framework

Authors: Jianlei Kong, Xiaomeng Fan, Xue-Bo Jin, Min Zuo

Abstract: Accurate traffic flow prediction, a hotspot for intelligent transportation research, is the prerequisite for mastering traffic and making travel plans. The speed of traffic flow can be affected by roads condition, weather, holidays, etc. Furthermore, the sensors to catch the information about traffic flow will be interfered with by environmental factors such as illumination, collection time, occlu… ▽ More Accurate traffic flow prediction, a hotspot for intelligent transportation research, is the prerequisite for mastering traffic and making travel plans. The speed of traffic flow can be affected by roads condition, weather, holidays, etc. Furthermore, the sensors to catch the information about traffic flow will be interfered with by environmental factors such as illumination, collection time, occlusion, etc. Therefore, the traffic flow in the practical transportation system is complicated, uncertain, and challenging to predict accurately. This paper proposes a deep encoder-decoder prediction framework based on variational Bayesian inference. A Bayesian neural network is constructed by combining variational inference with gated recurrent units (GRU) and used as the deep neural network unit of the encoder-decoder framework to mine the intrinsic dynamics of traffic flow. Then, the variational inference is introduced into the multi-head attention mechanism to avoid noise-induced deterioration of prediction accuracy. The proposed model achieves superior prediction performance on the Guangzhou urban traffic flow dataset over the benchmarks, particularly when the long-term prediction. △ Less

Submitted 14 December, 2022; originally announced December 2022.

arXiv:2210.09705 [pdf, other]

ATCON: Attention Consistency for Vision Models

Authors: Ali Mirzazadeh, Florian Dubost, Maxwell Pike, Krish Maniar, Max Zuo, Christopher Lee-Messer, Daniel Rubin

Abstract: Attention--or attribution--maps methods are methods designed to highlight regions of the model's input that were discriminative for its predictions. However, different attention maps methods can highlight different regions of the input, with sometimes contradictory explanations for a prediction. This effect is exacerbated when the training set is small. This indicates that either the model learned… ▽ More Attention--or attribution--maps methods are methods designed to highlight regions of the model's input that were discriminative for its predictions. However, different attention maps methods can highlight different regions of the input, with sometimes contradictory explanations for a prediction. This effect is exacerbated when the training set is small. This indicates that either the model learned incorrect representations or that the attention maps methods did not accurately estimate the model's representations. We propose an unsupervised fine-tuning method that optimizes the consistency of attention maps and show that it improves both classification performance and the quality of attention maps. We propose an implementation for two state-of-the-art attention computation methods, Grad-CAM and Guided Backpropagation, which relies on an input masking technique. We also show results on Grad-CAM and Integrated Gradients in an ablation study. We evaluate this method on our own dataset of event detection in continuous video recordings of hospital patients aggregated and curated for this work. As a sanity check, we also evaluate the proposed method on PASCAL VOC and SVHN. With the proposed method, with small training sets, we achieve a 6.6 points lift of F1 score over the baselines on our video dataset, a 2.9 point lift of F1 score on PASCAL, and a 1.8 points lift of mean Intersection over Union over Grad-CAM for weakly supervised detection on PASCAL. Those improved attention maps may help clinicians better understand vision model predictions and ease the deployment of machine learning systems into clinical care. We share part of the code for this article at the following repository: https://github.com/alimirzazadeh/SemisupervisedAttention. △ Less

Submitted 18 October, 2022; originally announced October 2022.

Comments: WACV 2023

arXiv:2203.12774 [pdf, other]

Efficient Exploration via First-Person Behavior Cloning Assisted Rapidly-Exploring Random Trees

Authors: Max Zuo, Logan Schick, Matthew Gombolay, Nakul Gopalan

Abstract: Modern day computer games have extremely large state and action spaces. To detect bugs in these games' models, human testers play the games repeatedly to explore the game and find errors in the games. Such gameplay is exhaustive and time consuming. Moreover, since robotics simulators depend on similar methods of model specification and debugging, the problem of finding errors in the model is of in… ▽ More Modern day computer games have extremely large state and action spaces. To detect bugs in these games' models, human testers play the games repeatedly to explore the game and find errors in the games. Such gameplay is exhaustive and time consuming. Moreover, since robotics simulators depend on similar methods of model specification and debugging, the problem of finding errors in the model is of interest to the robotics community to ensure robot behaviors and interactions are consistent in simulators. Previous methods have used reinforcement learning arXiv:2103.13798 and search based methods (Chang, 2019, (Chang, 2021) arXiv:1811.06962 including Rapidly-exploring Random Trees (RRT) to explore a game's state-action space to find bugs. However, such search and exploration based methods are not efficient at exploring the state-action space without a pre-defined heuristic. In this work we attempt to combine a human-tester's expertise in solving games, and the RRT's exhaustiveness to search a game's state space efficiently with high coverage. This paper introduces Cloning Assisted RRT (CA-RRT) to test a game through search. We compare our methods to two existing baselines: 1) a weighted-RRT as described by arXiv:1812.03125; 2) human demonstration seeded RRT as described by Chang et. al. We find CA-RRT is applicable to more game maps and explores more game states in fewer tree expansions/iterations when compared to the existing baselines. In each test, CA-RRT reached more states on average in the same number of iterations as weighted-RRT. In our tested environments, CA-RRT reached the same number of states as weighted-RRT by more than 5000 fewer iterations on average, almost a 50% reduction and applied to more scenarios than. Moreover, as a consequence of our first person behavior cloning approach, CA-RRT worked on unseen game maps than just seeding the RRT with human demonstrated states. △ Less

Submitted 19 April, 2022; v1 submitted 23 March, 2022; originally announced March 2022.

Comments: Published in HRI 2022 Workshop - MLHRC. This is a replacement to include broader citations from works in the field

arXiv:2011.06780

A differential evolution-based optimization tool for interplanetary transfer trajectory design

Authors: Mingcheng Zuo, Guangming Dai, Lei Peng, Zhe Tang

Abstract: The extremely sensitive and highly nonlinear search space of interplanetary transfer trajectory design bring about big challenges on global optimization. As a representative, the current known best solution of the global trajectory optimization problem (GTOP) designed by the European space agency (ESA) is very hard to be found. To deal with this difficulty, a powerful differential evolution-based… ▽ More The extremely sensitive and highly nonlinear search space of interplanetary transfer trajectory design bring about big challenges on global optimization. As a representative, the current known best solution of the global trajectory optimization problem (GTOP) designed by the European space agency (ESA) is very hard to be found. To deal with this difficulty, a powerful differential evolution-based optimization tool named COoperative Differential Evolution (CODE) is proposed in this paper. CODE employs a two-stage evolutionary process, which concentrates on learning global structure in the earlier process, and tends to self-adaptively learn the structures of different local spaces. Besides, considering the spatial distribution of global optimum on different problems and the gradient information on different variables, a multiple boundary check technique has been employed. Also, Covariance Matrix Adaptation Evolutionary Strategies (CMA-ES) is used as a local optimizer. The previous studies have shown that a specific swarm intelligent optimization algorithm usually can solve only one or two GTOP problems. However, the experimental test results show that CODE can find the current known best solutions of Cassini1 and Sagas directly, and the cooperation with CMA-ES can solve Cassini2, GTOC1, Messenger (reduced) and Rosetta. For the most complicated Messenger (full) problem, even though CODE cannot find the current known best solution, the found best solution with objective function equaling to 3.38 km/s is still a level that other swarm intelligent algorithms cannot easily reach. △ Less

Submitted 13 April, 2021; v1 submitted 13 November, 2020; originally announced November 2020.

Comments: The algorithm has been developed, and the results need a change

Showing 1–6 of 6 results for author: Zuo, M