Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 171 results for author: Li, E

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.04230  [pdf, other

    cs.CV

    A Physical Model-Guided Framework for Underwater Image Enhancement and Depth Estimation

    Authors: Dazhao Du, Enhan Li, Lingyu Si, Fanjiang Xu, Jianwei Niu, Fuchun Sun

    Abstract: Due to the selective absorption and scattering of light by diverse aquatic media, underwater images usually suffer from various visual degradations. Existing underwater image enhancement (UIE) approaches that combine underwater physical imaging models with neural networks often fail to accurately estimate imaging model parameters such as depth and veiling light, resulting in poor performance in ce… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  2. arXiv:2406.01623  [pdf, other

    cs.SE cs.AI

    WebSuite: Systematically Evaluating Why Web Agents Fail

    Authors: Eric Li, Jim Waldo

    Abstract: We describe WebSuite, the first diagnostic benchmark for generalist web agents, designed to systematically evaluate why agents fail. Advances in AI have led to the rise of numerous web agents that autonomously operate a browser to complete tasks. However, most existing benchmarks focus on strictly measuring whether an agent can or cannot complete a task, without giving insight on why. In this pape… ▽ More

    Submitted 31 May, 2024; originally announced June 2024.

  3. arXiv:2406.01380  [pdf, other

    cs.CV stat.AP

    Convolutional Unscented Kalman Filter for Multi-Object Tracking with Outliers

    Authors: Shiqi Liu, Wenhan Cao, Chang Liu, Tianyi Zhang, Shengbo Eben Li

    Abstract: Multi-object tracking (MOT) is an essential technique for navigation in autonomous driving. In tracking-by-detection systems, biases, false positives, and misses, which are referred to as outliers, are inevitable due to complex traffic scenarios. Recent tracking methods are based on filtering algorithms that overlook these outliers, leading to reduced tracking accuracy or even loss of the objects… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: 11 pages, 5 figures

  4. arXiv:2405.15177  [pdf, other

    cs.LG cs.AI

    Diffusion Actor-Critic with Entropy Regulator

    Authors: Yinuo Wang, Likun Wang, Yuxuan Jiang, Wenjun Zou, Tong Liu, Xujie Song, Wenxuan Wang, Liming Xiao, Jiang Wu, Jingliang Duan, Shengbo Eben Li

    Abstract: Reinforcement learning (RL) has proven highly effective in addressing complex decision-making and control tasks. However, in most traditional RL algorithms, the policy is typically parameterized as a diagonal Gaussian distribution with learned mean and variance, which constrains their capability to acquire complex policies. In response to this problem, we propose an online RL algorithm termed diff… ▽ More

    Submitted 15 June, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

  5. arXiv:2405.09713  [pdf, other

    cs.CV cs.AI cs.CL

    SOK-Bench: A Situated Video Reasoning Benchmark with Aligned Open-World Knowledge

    Authors: Andong Wang, Bo Wu, Sunli Chen, Zhenfang Chen, Haotian Guan, Wei-Ning Lee, Li Erran Li, Chuang Gan

    Abstract: Learning commonsense reasoning from visual contexts and scenes in real-world is a crucial step toward advanced artificial intelligence. However, existing video reasoning benchmarks are still inadequate since they were mainly designed for factual or situated reasoning and rarely involve broader knowledge in the real world. Our work aims to delve deeper into reasoning evaluations, specifically withi… ▽ More

    Submitted 16 May, 2024; v1 submitted 15 May, 2024; originally announced May 2024.

    Comments: CVPR

  6. arXiv:2405.04434  [pdf, other

    cs.CL cs.AI

    DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

    Authors: DeepSeek-AI, Aixin Liu, Bei Feng, Bin Wang, Bingxuan Wang, Bo Liu, Chenggang Zhao, Chengqi Dengr, Chong Ruan, Damai Dai, Daya Guo, Dejian Yang, Deli Chen, Dongjie Ji, Erhang Li, Fangyun Lin, Fuli Luo, Guangbo Hao, Guanting Chen, Guowei Li, H. Zhang, Hanwei Xu, Hao Yang, Haowei Zhang, Honghui Ding , et al. (132 additional authors not shown)

    Abstract: We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total parameters, of which 21B are activated for each token, and supports a context length of 128K tokens. DeepSeek-V2 adopts innovative architectures including Multi-head Latent Attention (MLA) and DeepSeekMoE. MLA guarantees efficient inference… ▽ More

    Submitted 19 June, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

  7. arXiv:2404.00481  [pdf, other

    stat.ML cs.LG eess.SY

    Convolutional Bayesian Filtering

    Authors: Wenhan Cao, Shiqi Liu, Chang Liu, Zeyu He, Stephen S. -T. Yau, Shengbo Eben Li

    Abstract: Bayesian filtering serves as the mainstream framework of state estimation in dynamic systems. Its standard version utilizes total probability rule and Bayes' law alternatively, where how to define and compute conditional probability is critical to state distribution inference. Previously, the conditional probability is assumed to be exactly known, which represents a measure of the occurrence proba… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

  8. arXiv:2403.14447  [pdf, other

    cs.CV cs.RO

    Exploring 3D Human Pose Estimation and Forecasting from the Robot's Perspective: The HARPER Dataset

    Authors: Andrea Avogaro, Andrea Toaiari, Federico Cunico, Xiangmin Xu, Haralambos Dafas, Alessandro Vinciarelli, Emma Li, Marco Cristani

    Abstract: We introduce HARPER, a novel dataset for 3D body pose estimation and forecast in dyadic interactions between users and Spot, the quadruped robot manufactured by Boston Dynamics. The key-novelty is the focus on the robot's perspective, i.e., on the data captured by the robot's sensors. These make 3D body pose analysis challenging because being close to the ground captures humans only partially. The… ▽ More

    Submitted 23 March, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

  9. arXiv:2403.14443  [pdf, other

    cs.AI cs.CL cs.GT cs.LG cs.MA cs.SI

    Language Models Can Reduce Asymmetry in Information Markets

    Authors: Nasim Rahaman, Martin Weiss, Manuel Wüthrich, Yoshua Bengio, Li Erran Li, Chris Pal, Bernhard Schölkopf

    Abstract: This work addresses the buyer's inspection paradox for information markets. The paradox is that buyers need to access information to determine its value, while sellers need to limit access to prevent theft. To study this, we introduce an open-source simulated digital marketplace where intelligent agents, powered by language models, buy and sell information on behalf of external participants. The c… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

  10. arXiv:2403.12848  [pdf, other

    cs.CV

    Compositional 3D Scene Synthesis with Scene Graph Guided Layout-Shape Generation

    Authors: Yao Wei, Martin Renqiang Min, George Vosselman, Li Erran Li, Michael Ying Yang

    Abstract: Compositional 3D scene synthesis has diverse applications across a spectrum of industries such as robotics, films, and video games, as it closely mirrors the complexity of real-world multi-object environments. Early works typically employ shape retrieval based frameworks which naturally suffer from limited shape diversity. Recent progresses have been made in shape generation with powerful generati… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

  11. arXiv:2403.12847  [pdf, other

    cs.LG

    Policy Bifurcation in Safe Reinforcement Learning

    Authors: Wenjun Zou, Yao Lyu, Jie Li, Yujie Yang, Shengbo Eben Li, Jingliang Duan, Xianyuan Zhan, Jingjing Liu, Yaqin Zhang, Keqiang Li

    Abstract: Safe reinforcement learning (RL) offers advanced solutions to constrained optimal control problems. Existing studies in safe RL implicitly assume continuity in policy functions, where policies map states to actions in a smooth, uninterrupted manner; however, our research finds that in some scenarios, the feasible policy should be discontinuous or multi-valued, interpolating between discontinuous l… ▽ More

    Submitted 28 March, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

  12. arXiv:2403.11807  [pdf, other

    cs.AI cs.CL

    How Far Are We on the Decision-Making of LLMs? Evaluating LLMs' Gaming Ability in Multi-Agent Environments

    Authors: Jen-tse Huang, Eric John Li, Man Ho Lam, Tian Liang, Wenxuan Wang, Youliang Yuan, Wenxiang Jiao, Xing Wang, Zhaopeng Tu, Michael R. Lyu

    Abstract: Decision-making, a complicated task requiring various types of abilities, presents an excellent framework for assessing Large Language Models (LLMs). Our research investigates LLMs' decision-making capabilities through the lens of a well-established field, Game Theory. We focus specifically on games that support the participation of more than two agents simultaneously. Subsequently, we introduce o… ▽ More

    Submitted 25 April, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

    Comments: 16 pages of main text. 11 pages of appendices. 15 figures, 9 tables. Updated scoring scheme

  13. arXiv:2403.11506  [pdf, other

    cs.CV cs.AI

    End-To-End Underwater Video Enhancement: Dataset and Model

    Authors: Dazhao Du, Enhan Li, Lingyu Si, Fanjiang Xu, Jianwei Niu

    Abstract: Underwater video enhancement (UVE) aims to improve the visibility and frame quality of underwater videos, which has significant implications for marine research and exploration. However, existing methods primarily focus on developing image enhancement algorithms to enhance each frame independently. There is a lack of supervised datasets and models specifically tailored for UVE tasks. To fill this… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  14. arXiv:2403.03730  [pdf, other

    cs.CV cs.AI cs.LG

    Learning 3D object-centric representation through prediction

    Authors: John Day, Tushar Arora, Jirui Liu, Li Erran Li, Ming Bo Cai

    Abstract: As part of human core knowledge, the representation of objects is the building block of mental representation that supports high-level concepts and symbolic reasoning. While humans develop the ability of perceiving objects situated in 3D environments without supervision, models that learn the same set of abilities with similar constraints faced by human infants are lacking. Towards this end, we de… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

    Comments: 21 pages, 11 figures. Project webpage can be found at https://jday54.github.io/opple_site/

    ACM Class: I.2.10; I.4.8; I.4.6; I.4.10; I.2.6

  15. arXiv:2403.01768  [pdf, other

    eess.SY cs.AI

    Canonical Form of Datatic Description in Control Systems

    Authors: Guojian Zhan, Ziang Zheng, Shengbo Eben Li

    Abstract: The design of feedback controllers is undergoing a paradigm shift from modelic (i.e., model-driven) control to datatic (i.e., data-driven) control. Canonical form of state space model is an important concept in modelic control systems, exemplified by Jordan form, controllable form and observable form, whose purpose is to facilitate system analysis and controller synthesis. In the realm of datatic… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

  16. arXiv:2402.19251  [pdf, other

    cs.AI cs.RO

    A Cognitive-Based Trajectory Prediction Approach for Autonomous Driving

    Authors: Haicheng Liao, Yongkang Li, Zhenning Li, Chengyue Wang, Zhiyong Cui, Shengbo Eben Li, Chengzhong Xu

    Abstract: In autonomous vehicle (AV) technology, the ability to accurately predict the movements of surrounding vehicles is paramount for ensuring safety and operational efficiency. Incorporating human decision-making insights enables AVs to more effectively anticipate the potential actions of other vehicles, significantly improving prediction accuracy and responsiveness in dynamic environments. This paper… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

  17. arXiv:2402.11588  [pdf, other

    cs.CV cs.AI

    SDiT: Spiking Diffusion Model with Transformer

    Authors: Shu Yang, Hanzhi Ma, Chengting Yu, Aili Wang, Er-Ping Li

    Abstract: Spiking neural networks (SNNs) have low power consumption and bio-interpretable characteristics, and are considered to have tremendous potential for energy-efficient computing. However, the exploration of SNNs on image generation tasks remains very limited, and a unified and effective structure for SNN-based generative models has yet to be proposed. In this paper, we explore a novel diffusion mode… ▽ More

    Submitted 24 February, 2024; v1 submitted 18 February, 2024; originally announced February 2024.

  18. arXiv:2402.06118  [pdf, other

    cs.CV cs.AI

    ViGoR: Improving Visual Grounding of Large Vision Language Models with Fine-Grained Reward Modeling

    Authors: Siming Yan, Min Bai, Weifeng Chen, Xiong Zhou, Qixing Huang, Li Erran Li

    Abstract: By combining natural language understanding, generation capabilities, and breadth of knowledge of large language models with image perception, recent large vision language models (LVLMs) have shown unprecedented visual reasoning capabilities. However, the generated text often suffers from inaccurate grounding in the visual input, resulting in errors such as hallucination of nonexistent scene eleme… ▽ More

    Submitted 18 April, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

    Comments: 10 pages, 3 figures

  19. arXiv:2402.04318  [pdf, other

    cs.RO

    Human Observation-Inspired Trajectory Prediction for Autonomous Driving in Mixed-Autonomy Traffic Environments

    Authors: Haicheng Liao, Shangqian Liu, Yongkang Li, Zhenning Li, Chengyue Wang, Yunjian Li, Shengbo Eben Li, Chengzhong Xu

    Abstract: In the burgeoning field of autonomous vehicles (AVs), trajectory prediction remains a formidable challenge, especially in mixed autonomy environments. Traditional approaches often rely on computational methods such as time-series analysis. Our research diverges significantly by adopting an interdisciplinary approach that integrates principles of human cognition and observational behavior into traj… ▽ More

    Submitted 8 March, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

  20. arXiv:2401.16395  [pdf, other

    cs.FL

    Deciding Subtyping for Asynchronous Multiparty Sessions

    Authors: Elaine Li, Felix Stutz, Thomas Wies

    Abstract: Multiparty session types (MSTs) are a type-based approach to verifying communication protocols, represented as global types in the framework. We present a precise subtyping relation for asynchronous MSTs with communicating state machines (CSMs) as implementation model. We address two problems: when can a local implementation safely substitute another, and when does an arbitrary CSM implement a glo… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

  21. arXiv:2401.13920  [pdf, other

    cs.LG cs.AI cs.CL

    LocMoE: A Low-Overhead MoE for Large Language Model Training

    Authors: Jing Li, Zhijie Sun, Xuan He, Li Zeng, Yi Lin, Entong Li, Binfan Zheng, Rongqian Zhao, Xin Chen

    Abstract: The Mixtures-of-Experts (MoE) model is a widespread distributed and integrated learning method for large language models (LLM), which is favored due to its ability to sparsify and expand models efficiently. However, the performance of MoE is limited by load imbalance and high latency of All-to-All communication, along with relatively redundant computation owing to large expert capacity. Load imbal… ▽ More

    Submitted 23 May, 2024; v1 submitted 24 January, 2024; originally announced January 2024.

    Comments: 1. Update the font size of all figures. 2. Update the name of the proposed layer Grouped Average Pooling (GrAP). 3. Change the order of the Section Contribution Statement

  22. arXiv:2401.13054  [pdf, other

    cs.SI cs.DM cs.LG

    Frustrated Random Walks: A Fast Method to Compute Node Distances on Hypergraphs

    Authors: Enzhi Li, Scott Nickleach, Bilal Fadlallah

    Abstract: A hypergraph is a generalization of a graph that arises naturally when attribute-sharing among entities is considered. Compared to graphs, hypergraphs have the distinct advantage that they contain explicit communities and are more convenient to manipulate. An open problem in hypergraph research is how to accurately and efficiently calculate node distances on hypergraphs. Estimating node distances… ▽ More

    Submitted 28 June, 2024; v1 submitted 23 January, 2024; originally announced January 2024.

    Comments: 15 pages, 6 figures

  23. arXiv:2401.10700  [pdf, other

    cs.LG cs.AI cs.RO

    Safe Offline Reinforcement Learning with Feasibility-Guided Diffusion Model

    Authors: Yinan Zheng, Jianxiong Li, Dongjie Yu, Yujie Yang, Shengbo Eben Li, Xianyuan Zhan, Jingjing Liu

    Abstract: Safe offline RL is a promising way to bypass risky online interactions towards safe policy learning. Most existing methods only enforce soft constraints, i.e., constraining safety violations in expectation below thresholds predetermined. This can lead to potentially unsafe outcomes, thus unacceptable in safety-critical scenarios. An alternative is to enforce the hard constraint of zero violation.… ▽ More

    Submitted 19 January, 2024; originally announced January 2024.

    Comments: ICLR 2024, 30pages, 11 figures

  24. arXiv:2401.06341  [pdf, other

    cs.CV cs.RO

    AffordanceLLM: Grounding Affordance from Vision Language Models

    Authors: Shengyi Qian, Weifeng Chen, Min Bai, Xiong Zhou, Zhuowen Tu, Li Erran Li

    Abstract: Affordance grounding refers to the task of finding the area of an object with which one can interact. It is a fundamental but challenging task, as a successful solution requires the comprehensive understanding of a scene in multiple aspects including detection, localization, and recognition of objects with their parts, of geo-spatial configuration/layout of the scene, of 3D shapes and physics, as… ▽ More

    Submitted 17 April, 2024; v1 submitted 11 January, 2024; originally announced January 2024.

  25. arXiv:2401.02954  [pdf, other

    cs.CL cs.AI cs.LG

    DeepSeek LLM: Scaling Open-Source Language Models with Longtermism

    Authors: DeepSeek-AI, :, Xiao Bi, Deli Chen, Guanting Chen, Shanhuang Chen, Damai Dai, Chengqi Deng, Honghui Ding, Kai Dong, Qiushi Du, Zhe Fu, Huazuo Gao, Kaige Gao, Wenjun Gao, Ruiqi Ge, Kang Guan, Daya Guo, Jianzhong Guo, Guangbo Hao, Zhewen Hao, Ying He, Wenjie Hu, Panpan Huang, Erhang Li , et al. (63 additional authors not shown)

    Abstract: The rapid development of open-source large language models (LLMs) has been truly remarkable. However, the scaling law described in previous literature presents varying conclusions, which casts a dark cloud over scaling LLMs. We delve into the study of scaling laws and present our distinctive findings that facilitate scaling of large scale models in two commonly used open-source configurations, 7B… ▽ More

    Submitted 5 January, 2024; originally announced January 2024.

  26. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  27. arXiv:2312.08715  [pdf, other

    cs.RO

    Bayes3D: fast learning and inference in structured generative models of 3D objects and scenes

    Authors: Nishad Gothoskar, Matin Ghavami, Eric Li, Aidan Curtis, Michael Noseworthy, Karen Chung, Brian Patton, William T. Freeman, Joshua B. Tenenbaum, Mirko Klukas, Vikash K. Mansinghka

    Abstract: Robots cannot yet match humans' ability to rapidly learn the shapes of novel 3D objects and recognize them robustly despite clutter and occlusion. We present Bayes3D, an uncertainty-aware perception system for structured 3D scenes, that reports accurate posterior uncertainty over 3D object shape, pose, and scene composition in the presence of clutter and occlusion. Bayes3D delivers these capabilit… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

  28. arXiv:2312.07636  [pdf, other

    cs.LG cs.CV stat.ML

    Go beyond End-to-End Training: Boosting Greedy Local Learning with Context Supply

    Authors: Chengting Yu, Fengzhao Zhang, Hanzhi Ma, Aili Wang, Erping Li

    Abstract: Traditional end-to-end (E2E) training of deep networks necessitates storing intermediate activations for back-propagation, resulting in a large memory footprint on GPUs and restricted model parallelization. As an alternative, greedy local learning partitions the network into gradient-isolated modules and trains supervisely based on local preliminary losses, thereby providing asynchronous and paral… ▽ More

    Submitted 12 December, 2023; originally announced December 2023.

    Comments: 9 figures, 12 tables

  29. arXiv:2312.06371  [pdf, other

    cs.RO cs.AI

    BAT: Behavior-Aware Human-Like Trajectory Prediction for Autonomous Driving

    Authors: Haicheng Liao, Zhenning Li, Huanming Shen, Wenxuan Zeng, Dongping Liao, Guofa Li, Shengbo Eben Li, Chengzhong Xu

    Abstract: The ability to accurately predict the trajectory of surrounding vehicles is a critical hurdle to overcome on the journey to fully autonomous vehicles. To address this challenge, we pioneer a novel behavior-aware trajectory prediction model (BAT) that incorporates insights and findings from traffic psychology, human behavior, and decision-making. Our model consists of behavior-aware, interaction-aw… ▽ More

    Submitted 15 December, 2023; v1 submitted 11 December, 2023; originally announced December 2023.

  30. arXiv:2312.06240  [pdf, other

    cs.CV

    UIEDP:Underwater Image Enhancement with Diffusion Prior

    Authors: Dazhao Du, Enhan Li, Lingyu Si, Fanjiang Xu, Jianwei Niu, Fuchun Sun

    Abstract: Underwater image enhancement (UIE) aims to generate clear images from low-quality underwater images. Due to the unavailability of clear reference images, researchers often synthesize them to construct paired datasets for training deep models. However, these synthesized images may sometimes lack quality, adversely affecting training outcomes. To address this issue, we propose UIE with Diffusion Pri… ▽ More

    Submitted 11 December, 2023; originally announced December 2023.

  31. arXiv:2312.01836  [pdf, other

    cs.RO cs.AI

    Integrated Drill Boom Hole-Seeking Control via Reinforcement Learning

    Authors: Haoqi Yan, Haoyuan Xu, Hongbo Gao, Fei Ma, Shengbo Eben Li, Jingliang Duan

    Abstract: Intelligent drill boom hole-seeking is a promising technology for enhancing drilling efficiency, mitigating potential safety hazards, and relieving human operators. Most existing intelligent drill boom control methods rely on a hierarchical control framework based on inverse kinematics. However, these methods are generally time-consuming due to the computational complexity of inverse kinematics an… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

  32. arXiv:2311.12320  [pdf, other

    cs.AI

    A Survey on Multimodal Large Language Models for Autonomous Driving

    Authors: Can Cui, Yunsheng Ma, Xu Cao, Wenqian Ye, Yang Zhou, Kaizhao Liang, Jintai Chen, Juanwu Lu, Zichong Yang, Kuei-Da Liao, Tianren Gao, Erlong Li, Kun Tang, Zhipeng Cao, Tong Zhou, Ao Liu, Xinrui Yan, Shuqi Mei, Jianguo Cao, Ziran Wang, Chao Zheng

    Abstract: With the emergence of Large Language Models (LLMs) and Vision Foundation Models (VFMs), multimodal AI systems benefiting from large models have the potential to equally perceive the real world, make decisions, and control tools as humans. In recent months, LLMs have shown widespread attention in autonomous driving and map systems. Despite its immense potential, there is still a lack of a comprehen… ▽ More

    Submitted 20 November, 2023; originally announced November 2023.

  33. arXiv:2311.03408  [pdf, other

    cs.LG cs.AI cs.NE quant-ph

    Training Multi-layer Neural Networks on Ising Machine

    Authors: Xujie Song, Tong Liu, Shengbo Eben Li, Jingliang Duan, Wenxuan Wang, Keqiang Li

    Abstract: As a dedicated quantum device, Ising machines could solve large-scale binary optimization problems in milliseconds. There is emerging interest in utilizing Ising machines to train feedforward neural networks due to the prosperity of generative artificial intelligence. However, existing methods can only train single-layer feedforward networks because of the complex nonlinear network topology. This… ▽ More

    Submitted 5 November, 2023; originally announced November 2023.

  34. arXiv:2311.03004  [pdf, other

    cs.IT physics.app-ph

    Breaking the Degrees-of-Freedom Limit of Holographic MIMO Communications: A 3-D Antenna Array Topology

    Authors: Shuai S. A. Yuan, Jie Wu, Hongjing Xu, Tengjiao Wang, Da Li, Xiaoming Chen, Chongwen Huang, Sheng Sun, Shilie Zheng, Xianmin Zhang, Er-Ping Li, Wei E. I. Sha

    Abstract: The performance of holographic multiple-input multiple-output (MIMO) communications, employing two-dimensional (2-D) planar antenna arrays, is typically compromised by finite degrees-of-freedom (DOF) stemming from limited array size. The DOF constraint becomes significant when the element spacing approaches approximately half a wavelength, thereby restricting the overall performance of MIMO system… ▽ More

    Submitted 27 February, 2024; v1 submitted 6 November, 2023; originally announced November 2023.

    Journal ref: IEEE Transactions on Vehicular Technology, Volume 73 , Issue 8, 2024

  35. arXiv:2311.01556  [pdf, other

    cs.CV cs.RO

    MemorySeg: Online LiDAR Semantic Segmentation with a Latent Memory

    Authors: Enxu Li, Sergio Casas, Raquel Urtasun

    Abstract: Semantic segmentation of LiDAR point clouds has been widely studied in recent years, with most existing methods focusing on tackling this task using a single scan of the environment. However, leveraging the temporal stream of observations can provide very rich contextual information on regions of the scene with poor visibility (e.g., occlusions) or sparse observations (e.g., at long range), and ca… ▽ More

    Submitted 2 November, 2023; originally announced November 2023.

    Comments: accepted at ICCV 2023

  36. arXiv:2311.01520  [pdf, other

    cs.CV cs.RO

    4D-Former: Multimodal 4D Panoptic Segmentation

    Authors: Ali Athar, Enxu Li, Sergio Casas, Raquel Urtasun

    Abstract: 4D panoptic segmentation is a challenging but practically useful task that requires every point in a LiDAR point-cloud sequence to be assigned a semantic class label, and individual objects to be segmented and tracked over time. Existing approaches utilize only LiDAR inputs which convey limited information in regions with point sparsity. This problem can, however, be mitigated by utilizing RGB cam… ▽ More

    Submitted 17 November, 2023; v1 submitted 2 November, 2023; originally announced November 2023.

    Comments: accepted at CoRL 2023

  37. arXiv:2310.19022  [pdf, other

    math.OC cs.LG eess.SY

    Optimization Landscape of Policy Gradient Methods for Discrete-time Static Output Feedback

    Authors: Jingliang Duan, Jie Li, Xuyang Chen, Kai Zhao, Shengbo Eben Li, Lin Zhao

    Abstract: In recent times, significant advancements have been made in delving into the optimization landscape of policy gradient methods for achieving optimal control in linear time-invariant (LTI) systems. Compared with state-feedback control, output-feedback control is more prevalent since the underlying state of the system may not be fully observed in many practical settings. This paper analyzes the opti… ▽ More

    Submitted 29 October, 2023; originally announced October 2023.

    Journal ref: IEEE Transactions on Cybernetics, 2023

  38. arXiv:2310.07211  [pdf, other

    cs.LG

    Bridging the Gap between Newton-Raphson Method and Regularized Policy Iteration

    Authors: Zeyang Li, Chuxiong Hu, Yunan Wang, Guojian Zhan, Jie Li, Shengbo Eben Li

    Abstract: Regularization is one of the most important techniques in reinforcement learning algorithms. The well-known soft actor-critic algorithm is a special case of regularized policy iteration where the regularizer is chosen as Shannon entropy. Despite some empirical success of regularized policy iteration, its theoretical underpinnings remain unclear. This paper proves that regularized policy iteration… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

  39. arXiv:2310.07207  [pdf, other

    cs.LG

    Robust Safe Reinforcement Learning under Adversarial Disturbances

    Authors: Zeyang Li, Chuxiong Hu, Shengbo Eben Li, Jia Cheng, Yunan Wang

    Abstract: Safety is a primary concern when applying reinforcement learning to real-world control tasks, especially in the presence of external disturbances. However, existing safe reinforcement learning algorithms rarely account for external disturbances, limiting their applicability and robustness in practice. To address this challenge, this paper proposes a robust safe reinforcement learning framework tha… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

  40. arXiv:2310.05858  [pdf, other

    cs.LG eess.SY

    DSAC-T: Distributional Soft Actor-Critic with Three Refinements

    Authors: Jingliang Duan, Wenxuan Wang, Liming Xiao, Jiaxin Gao, Shengbo Eben Li

    Abstract: Reinforcement learning (RL) has proven to be highly effective in tackling complex decision-making and control tasks. However, prevalent model-free RL methods often face severe performance degradation due to the well-known overestimation issue. In response to this problem, we recently introduced an off-policy RL algorithm, called distributional soft actor-critic (DSAC or DSAC-v1), which can effecti… ▽ More

    Submitted 28 December, 2023; v1 submitted 9 October, 2023; originally announced October 2023.

  41. arXiv:2310.03026  [pdf, other

    cs.RO cs.AI cs.CL cs.CV cs.LG

    LanguageMPC: Large Language Models as Decision Makers for Autonomous Driving

    Authors: Hao Sha, Yao Mu, Yuxuan Jiang, Li Chen, Chenfeng Xu, Ping Luo, Shengbo Eben Li, Masayoshi Tomizuka, Wei Zhan, Mingyu Ding

    Abstract: Existing learning-based autonomous driving (AD) systems face challenges in comprehending high-level information, generalizing to rare events, and providing interpretability. To address these problems, this work employs Large Language Models (LLMs) as a decision-making component for complex AD scenarios that require human commonsense understanding. We devise cognitive pathways to enable comprehensi… ▽ More

    Submitted 13 October, 2023; v1 submitted 4 October, 2023; originally announced October 2023.

  42. arXiv:2310.02777  [pdf, other

    cs.CL

    The Role of Linguistic Priors in Measuring Compositional Generalization of Vision-Language Models

    Authors: Chenwei Wu, Li Erran Li, Stefano Ermon, Patrick Haffner, Rong Ge, Zaiwei Zhang

    Abstract: Compositionality is a common property in many modalities including natural languages and images, but the compositional generalization of multi-modal models is not well-understood. In this paper, we identify two sources of visual-linguistic compositionality: linguistic priors and the interplay between images and texts. We show that current attempts to improve compositional generalization rely on li… ▽ More

    Submitted 4 October, 2023; originally announced October 2023.

  43. arXiv:2310.01386  [pdf, other

    cs.CL

    Who is ChatGPT? Benchmarking LLMs' Psychological Portrayal Using PsychoBench

    Authors: Jen-tse Huang, Wenxuan Wang, Eric John Li, Man Ho Lam, Shujie Ren, Youliang Yuan, Wenxiang Jiao, Zhaopeng Tu, Michael R. Lyu

    Abstract: Large Language Models (LLMs) have recently showcased their remarkable capacities, not only in natural language processing tasks but also across diverse domains such as clinical medicine, legal consultation, and education. LLMs become more than mere applications, evolving into assistants capable of addressing diverse user requests. This narrows the distinction between human beings and artificial in… ▽ More

    Submitted 22 January, 2024; v1 submitted 2 October, 2023; originally announced October 2023.

    Comments: Accepted for ICLR 2024 Oral Presentation. 15 pages (main text) and 5 pages (appendix)

  44. arXiv:2309.15289  [pdf, other

    cs.CV cs.LG

    SEPT: Towards Efficient Scene Representation Learning for Motion Prediction

    Authors: Zhiqian Lan, Yuxuan Jiang, Yao Mu, Chen Chen, Shengbo Eben Li

    Abstract: Motion prediction is crucial for autonomous vehicles to operate safely in complex traffic environments. Extracting effective spatiotemporal relationships among traffic elements is key to accurate forecasting. Inspired by the successful practice of pretrained large language models, this paper presents SEPT, a modeling framework that leverages self-supervised learning to develop powerful spatiotempo… ▽ More

    Submitted 19 December, 2023; v1 submitted 26 September, 2023; originally announced September 2023.

  45. arXiv:2309.06835  [pdf, other

    cs.LG

    Safe Reinforcement Learning with Dual Robustness

    Authors: Zeyang Li, Chuxiong Hu, Yunan Wang, Yujie Yang, Shengbo Eben Li

    Abstract: Reinforcement learning (RL) agents are vulnerable to adversarial disturbances, which can deteriorate task performance or compromise safety specifications. Existing methods either address safety requirements under the assumption of no adversary (e.g., safe RL) or only focus on robustness against performance adversaries (e.g., robust RL). Learning one policy that is both safe and robust remains a ch… ▽ More

    Submitted 13 September, 2023; originally announced September 2023.

  46. arXiv:2309.01430  [pdf, other

    cs.CV

    DAT++: Spatially Dynamic Vision Transformer with Deformable Attention

    Authors: Zhuofan Xia, Xuran Pan, Shiji Song, Li Erran Li, Gao Huang

    Abstract: Transformers have shown superior performance on various vision tasks. Their large receptive field endows Transformer models with higher representation power than their CNN counterparts. Nevertheless, simply enlarging the receptive field also raises several concerns. On the one hand, using dense attention in ViT leads to excessive memory and computational cost, and features can be influenced by irr… ▽ More

    Submitted 4 September, 2023; originally announced September 2023.

    Comments: 17 pages, 6 figures, 11 tables

  47. arXiv:2308.16891  [pdf, other

    cs.RO cs.CV cs.LG

    GNFactor: Multi-Task Real Robot Learning with Generalizable Neural Feature Fields

    Authors: Yanjie Ze, Ge Yan, Yueh-Hua Wu, Annabella Macaluso, Yuying Ge, Jianglong Ye, Nicklas Hansen, Li Erran Li, Xiaolong Wang

    Abstract: It is a long-standing problem in robotics to develop agents capable of executing diverse manipulation tasks from visual observations in unstructured real-world environments. To achieve this goal, the robot needs to have a comprehensive understanding of the 3D structure and semantics of the scene. In this work, we present $\textbf{GNFactor}$, a visual behavior cloning agent for multi-task robotic m… ▽ More

    Submitted 1 September, 2023; v1 submitted 31 August, 2023; originally announced August 2023.

    Comments: CoRL 2023 Oral. Website: https://yanjieze.com/GNFactor/

  48. arXiv:2308.03656  [pdf, other

    cs.CL

    Emotionally Numb or Empathetic? Evaluating How LLMs Feel Using EmotionBench

    Authors: Jen-tse Huang, Man Ho Lam, Eric John Li, Shujie Ren, Wenxuan Wang, Wenxiang Jiao, Zhaopeng Tu, Michael R. Lyu

    Abstract: Evaluating Large Language Models' (LLMs) anthropomorphic capabilities has become increasingly important in contemporary discourse. Utilizing the emotion appraisal theory from psychology, we propose to evaluate the empathy ability of LLMs, i.e., how their feelings change when presented with specific situations. After a careful and comprehensive survey, we collect a dataset containing over 400 situa… ▽ More

    Submitted 24 April, 2024; v1 submitted 7 August, 2023; originally announced August 2023.

    Comments: 12 pages of main text; 9 pages of appendices

  49. arXiv:2308.02955  [pdf, other

    cs.SE cs.LG

    An Empirical Study of AI-based Smart Contract Creation

    Authors: Rabimba Karanjai, Edward Li, Lei Xu, Weidong Shi

    Abstract: The introduction of large language models (LLMs) like ChatGPT and Google Palm2 for smart contract generation seems to be the first well-established instance of an AI pair programmer. LLMs have access to a large number of open-source smart contracts, enabling them to utilize more extensive code in Solidity than other code generation tools. Although the initial and informal assessments of LLMs for s… ▽ More

    Submitted 19 August, 2023; v1 submitted 5 August, 2023; originally announced August 2023.

    Comments: Updated to address issues

  50. arXiv:2305.19926  [pdf, other

    cs.CL

    Revisiting the Reliability of Psychological Scales on Large Language Models

    Authors: Jen-tse Huang, Wenxuan Wang, Man Ho Lam, Eric John Li, Wenxiang Jiao, Michael R. Lyu

    Abstract: Recent research has extended beyond assessing the performance of Large Language Models (LLMs) to examining their characteristics from a psychological standpoint, acknowledging the necessity of understanding their behavioral characteristics. The administration of personality tests to LLMs has emerged as a noteworthy area in this context. However, the suitability of employing psychological scales, i… ▽ More

    Submitted 28 December, 2023; v1 submitted 31 May, 2023; originally announced May 2023.

    Comments: 10 pages. Added more comprehensive experiments and analysis