Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 3,910 results for author: Chen, X

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.12063  [pdf, other

    stat.ML cs.AI cs.LG physics.ao-ph

    A Deconfounding Approach to Climate Model Bias Correction

    Authors: Wentao Gao, Jiuyong Li, Debo Cheng, Lin Liu, Jixue Liu, Thuc Duy Le, Xiaojing Du, Xiongren Chen, Yanchang Zhao, Yun Chen

    Abstract: Global Climate Models (GCMs) are crucial for predicting future climate changes by simulating the Earth systems. However, GCM outputs exhibit systematic biases due to model uncertainties, parameterization simplifications, and inadequate representation of complex climate phenomena. Traditional bias correction methods, which rely on historical observation data and statistical techniques, often neglec… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  2. arXiv:2408.11824  [pdf, other

    cs.HC cs.AI

    AppAgent v2: Advanced Agent for Flexible Mobile Interactions

    Authors: Yanda Li, Chi Zhang, Wanqi Yang, Bin Fu, Pei Cheng, Xin Chen, Ling Chen, Yunchao Wei

    Abstract: With the advancement of Multimodal Large Language Models (MLLM), LLM-driven visual agents are increasingly impacting software interfaces, particularly those with graphical user interfaces. This work introduces a novel LLM-based multimodal agent framework for mobile devices. This framework, capable of navigating mobile devices, emulates human-like interactions. Our agent constructs a flexible actio… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

  3. arXiv:2408.11611  [pdf, other

    cs.IR cs.LG

    DTN: Deep Multiple Task-specific Feature Interactions Network for Multi-Task Recommendation

    Authors: Yaowen Bi, Yuteng Lian, Jie Cui, Jun Liu, Peijian Wang, Guanghui Li, Xuejun Chen, Jinglin Zhao, Hao Wen, Jing Zhang, Zhaoqi Zhang, Wenzhuo Song, Yang Sun, Weiwei Zhang, Mingchen Cai, Guanxing Zhang

    Abstract: Neural-based multi-task learning (MTL) has been successfully applied to many recommendation applications. However, these MTL models (e.g., MMoE, PLE) did not consider feature interaction during the optimization, which is crucial for capturing complex high-order features and has been widely used in ranking models for real-world recommender systems. Moreover, through feature importance analysis acro… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  4. arXiv:2408.11599  [pdf, other

    cs.CL cs.AI

    Cause-Aware Empathetic Response Generation via Chain-of-Thought Fine-Tuning

    Authors: Xinhao Chen, Chong Yang, Man Lan, Li Cai, Yang Chen, Tu Hu, Xinlin Zhuang, Aimin Zhou

    Abstract: Empathetic response generation endows agents with the capability to comprehend dialogue contexts and react to expressed emotions. Previous works predominantly focus on leveraging the speaker's emotional labels, but ignore the importance of emotion cause reasoning in empathetic response generation, which hinders the model's capacity for further affective understanding and cognitive inference. In th… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  5. arXiv:2408.11540  [pdf, other

    cs.CV

    DeRainGS: Gaussian Splatting for Enhanced Scene Reconstruction in Rainy Environments

    Authors: Shuhong Liu, Xiang Chen, Hongming Chen, Quanfeng Xu, Mingrui Li

    Abstract: Reconstruction under adverse rainy conditions poses significant challenges due to reduced visibility and the distortion of visual perception. These conditions can severely impair the quality of geometric maps, which is essential for applications ranging from autonomous planning to environmental monitoring. In response to these challenges, this study introduces the novel task of 3D Reconstruction i… ▽ More

    Submitted 21 August, 2024; v1 submitted 21 August, 2024; originally announced August 2024.

  6. arXiv:2408.11492  [pdf, other

    cs.AI

    Estimating Peer Direct and Indirect Effects in Observational Network Data

    Authors: Xiaojing Du, Jiuyong Li, Debo Cheng, Lin Liu, Wentao Gao, Xiongren Chen

    Abstract: Estimating causal effects is crucial for decision-makers in many applications, but it is particularly challenging with observational network data due to peer interactions. Many algorithms have been proposed to estimate causal effects involving network data, particularly peer effects, but they often overlook the variety of peer effects. To address this issue, we propose a general setting which cons… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

    Comments: AAAI

  7. arXiv:2408.11084  [pdf, other

    math.OC cs.LG

    Multi-level Monte-Carlo Gradient Methods for Stochastic Optimization with Biased Oracles

    Authors: Yifan Hu, Jie Wang, Xin Chen, Niao He

    Abstract: We consider stochastic optimization when one only has access to biased stochastic oracles of the objective and the gradient, and obtaining stochastic gradients with low biases comes at high costs. This setting captures various optimization paradigms, such as conditional stochastic optimization, distributionally robust optimization, shortfall risk optimization, and machine learning paradigms, such… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: A preliminary version of this manuscript has appeared in a conference proceeding. Please refer to Yifan Hu, Xin Chen, and Niao He. On the bias-variance-cost tradeoff of stochastic optimization. Advances in Neural Information Processing Systems, 2021

  8. arXiv:2408.10746  [pdf, other

    cs.DC cs.AI cs.LG cs.NI

    Pluto and Charon: A Time and Memory Efficient Collaborative Edge AI Framework for Personal LLMs Fine-Tuning

    Authors: Bei Ouyang, Shengyuan Ye, Liekang Zeng, Tianyi Qian, Jingyi Li, Xu Chen

    Abstract: Large language models (LLMs) have unlocked a plethora of powerful applications at the network edge, such as intelligent personal assistants. Data privacy and security concerns have prompted a shift towards edge-based fine-tuning of personal LLMs, away from cloud reliance. However, this raises issues of computational intensity and resource scarcity, hindering training efficiency and feasibility. Wh… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: Accepted by The 53rd International Conference on Parallel Processing (ICPP'24)

  9. arXiv:2408.10679  [pdf, other

    cs.CV

    DemMamba: Alignment-free Raw Video Demoireing with Frequency-assisted Spatio-Temporal Mamba

    Authors: Shuning Xu, Xina Liu, Binbin Song, Xiangyu Chen, Qiubo Chen, Jiantao Zhou

    Abstract: Moire patterns arise when two similar repetitive patterns interfere, a phenomenon frequently observed during the capture of images or videos on screens. The color, shape, and location of moire patterns may differ across video frames, posing a challenge in learning information from adjacent frames and preserving temporal consistency. Previous video demoireing methods heavily rely on well-designed a… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  10. arXiv:2408.10635  [pdf, other

    cs.AI cs.CL

    Strategist: Learning Strategic Skills by LLMs via Bi-Level Tree Search

    Authors: Jonathan Light, Min Cai, Weiqin Chen, Guanzhi Wang, Xiusi Chen, Wei Cheng, Yisong Yue, Ziniu Hu

    Abstract: In this paper, we propose a new method Strategist that utilizes LLMs to acquire new skills for playing multi-agent games through a self-improvement process. Our method gathers quality feedback through self-play simulations with Monte Carlo tree search and LLM-based reflection, which can then be used to learn high-level strategic skills such as how to evaluate states that guide the low-level execut… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: website: https://llm-strategist.github.io

  11. arXiv:2408.10602  [pdf, other

    cs.CV cs.AI

    MV-MOS: Multi-View Feature Fusion for 3D Moving Object Segmentation

    Authors: Jintao Cheng, Xingming Chen, Jinxin Liang, Xiaoyu Tang, Xieyuanli Chen, Dachuan Li

    Abstract: Effectively summarizing dense 3D point cloud data and extracting motion information of moving objects (moving object segmentation, MOS) is crucial to autonomous driving and robotics applications. How to effectively utilize motion and semantic features and avoid information loss during 3D-to-2D projection is still a key challenge. In this paper, we propose a novel multi-view MOS model (MV-MOS) by f… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: 7 pages, 4 figures

  12. arXiv:2408.10235  [pdf, other

    eess.SP cs.HC cs.LG

    Multi-Source EEG Emotion Recognition via Dynamic Contrastive Domain Adaptation

    Authors: Yun Xiao, Yimeng Zhang, Xiaopeng Peng, Shuzheng Han, Xia Zheng, Dingyi Fang, Xiaojiang Chen

    Abstract: Electroencephalography (EEG) provides reliable indications of human cognition and mental states. Accurate emotion recognition from EEG remains challenging due to signal variations among individuals and across measurement sessions. To address these challenges, we introduce a multi-source dynamic contrastive domain adaptation method (MS-DCDA), which models coarse-grained inter-domain and fine-graine… ▽ More

    Submitted 3 August, 2024; originally announced August 2024.

  13. arXiv:2408.10007  [pdf, other

    cs.CV

    P3P: Pseudo-3D Pre-training for Scaling 3D Masked Autoencoders

    Authors: Xuechao Chen, Ying Chen, Jialin Li, Qiang Nie, Yong Liu, Qixing Huang, Yang Li

    Abstract: 3D pre-training is crucial to 3D perception tasks. However, limited by the difficulties in collecting clean 3D data, 3D pre-training consistently faced data scaling challenges. Inspired by semi-supervised learning leveraging limited labeled data and a large amount of unlabeled data, in this work, we propose a novel self-supervised pre-training framework utilizing the real 3D data and the pseudo-3D… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: Under review. Pre-print

  14. arXiv:2408.09786  [pdf, other

    cs.CV

    Cross-composition Feature Disentanglement for Compositional Zero-shot Learning

    Authors: Yuxia Geng, Runkai Zhu, Jiaoyan Chen, Jintai Chen, Zhuo Chen, Xiang Chen, Can Xu, Yuxiang Wang, Xiaoliang Xu

    Abstract: Disentanglement of visual features of primitives (i.e., attributes and objects) has shown exceptional results in Compositional Zero-shot Learning (CZSL). However, due to the feature divergence of an attribute (resp. object) when combined with different objects (resp. attributes), it is challenging to learn disentangled primitive features that are general across different compositions. To this end,… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: work in progress

  15. arXiv:2408.08813  [pdf, other

    cs.CV

    Retrieval-augmented Few-shot Medical Image Segmentation with Foundation Models

    Authors: Lin Zhao, Xiao Chen, Eric Z. Chen, Yikang Liu, Terrence Chen, Shanhui Sun

    Abstract: Medical image segmentation is crucial for clinical decision-making, but the scarcity of annotated data presents significant challenges. Few-shot segmentation (FSS) methods show promise but often require retraining on the target domain and struggle to generalize across different modalities. Similarly, adapting foundation models like the Segment Anything Model (SAM) for medical imaging has limitatio… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

  16. arXiv:2408.08681  [pdf, other

    cs.LG math.NA math.PR

    A Mean Field Ansatz for Zero-Shot Weight Transfer

    Authors: Xingyuan Chen, Wenwei Kuang, Lei Deng, Wei Han, Bo Bai, Goncalo dos Reis

    Abstract: The pre-training cost of large language models (LLMs) is prohibitive. One cutting-edge approach to reduce the cost is zero-shot weight transfer, also known as model growth for some cases, which magically transfers the weights trained in a small model to a large model. However, there are still some theoretical mysteries behind the weight transfer. In this paper, inspired by prior applications of me… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

    Comments: 40 pages, 6 Figures, 1 table

  17. arXiv:2408.08601  [pdf, other

    cs.CV

    Learning A Low-Level Vision Generalist via Visual Task Prompt

    Authors: Xiangyu Chen, Yihao Liu, Yuandong Pu, Wenlong Zhang, Jiantao Zhou, Yu Qiao, Chao Dong

    Abstract: Building a unified model for general low-level vision tasks holds significant research and practical value. Current methods encounter several critical issues. Multi-task restoration approaches can address multiple degradation-to-clean restoration tasks, while their applicability to tasks with different target domains (e.g., image stylization) is limited. Methods like PromptGIP can handle multiple… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

    Comments: Accepted to ACMMM24

  18. arXiv:2408.08537  [pdf, other

    cs.CR cs.SE

    SeeWasm: An Efficient and Fully-Functional Symbolic Execution Engine for WebAssembly Binaries

    Authors: Ningyu He, Zhehao Zhao, Hanqin Guan, Jikai Wang, Shuo Peng, Ding Li, Haoyu Wang, Xiangqun Chen, Yao Guo

    Abstract: WebAssembly (Wasm), as a compact, fast, and isolation-guaranteed binary format, can be compiled from more than 40 high-level programming languages. However, vulnerabilities in Wasm binaries could lead to sensitive data leakage and even threaten their hosting environments. To identify them, symbolic execution is widely adopted due to its soundness and the ability to automatically generate exploitat… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

    Comments: Accepted by ISSTA'24 Demo Track, the tool can be accessed at https://github.com/PKU-ASAL/SeeWasm

  19. arXiv:2408.08342  [pdf, other

    cs.GR cs.CV

    CT4D: Consistent Text-to-4D Generation with Animatable Meshes

    Authors: Ce Chen, Shaoli Huang, Xuelin Chen, Guangyi Chen, Xiaoguang Han, Kun Zhang, Mingming Gong

    Abstract: Text-to-4D generation has recently been demonstrated viable by integrating a 2D image diffusion model with a video diffusion model. However, existing models tend to produce results with inconsistent motions and geometric structures over time. To this end, we present a novel framework, coined CT4D, which directly operates on animatable meshes for generating consistent 4D content from arbitrary user… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

  20. arXiv:2408.08192  [pdf, other

    cs.LG cs.GT cs.MA math.OC

    Stochastic Semi-Gradient Descent for Learning Mean Field Games with Population-Aware Function Approximation

    Authors: Chenyu Zhang, Xu Chen, Xuan Di

    Abstract: Mean field games (MFGs) model the interactions within a large-population multi-agent system using the population distribution. Traditional learning methods for MFGs are based on fixed-point iteration (FPI), which calculates best responses and induced population distribution separately and sequentially. However, FPI-type methods suffer from inefficiency and instability, due to oscillations caused b… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

  21. arXiv:2408.08015  [pdf, other

    cs.DC cs.AI cs.CV cs.LG cs.NI

    Asteroid: Resource-Efficient Hybrid Pipeline Parallelism for Collaborative DNN Training on Heterogeneous Edge Devices

    Authors: Shengyuan Ye, Liekang Zeng, Xiaowen Chu, Guoliang Xing, Xu Chen

    Abstract: On-device Deep Neural Network (DNN) training has been recognized as crucial for privacy-preserving machine learning at the edge. However, the intensive training workload and limited onboard computing resources pose significant challenges to the availability and efficiency of model training. While existing works address these challenges through native resource management optimization, we instead le… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

    Comments: Accepted by The 30th Annual International Conference on Mobile Computing and Networking (MobiCom'24)

  22. GOReloc: Graph-based Object-Level Relocalization for Visual SLAM

    Authors: Yutong Wang, Chaoyang Jiang, Xieyuanli Chen

    Abstract: This article introduces a novel method for object-level relocalization of robotic systems. It determines the pose of a camera sensor by robustly associating the object detections in the current frame with 3D objects in a lightweight object-level map. Object graphs, considering semantic uncertainties, are constructed for both the incoming camera frame and the pre-built map. Objects are represented… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

    Comments: 8 pages, accepted by IEEE RAL

    Journal ref: IEEE Robotics and Automation Letters 2024

  23. arXiv:2408.06634  [pdf, other

    q-fin.CP cs.CL cs.LG q-fin.ST

    Harnessing Earnings Reports for Stock Predictions: A QLoRA-Enhanced LLM Approach

    Authors: Haowei Ni, Shuchen Meng, Xupeng Chen, Ziqing Zhao, Andi Chen, Panfeng Li, Shiyao Zhang, Qifu Yin, Yuanqing Wang, Yuxi Chan

    Abstract: Accurate stock market predictions following earnings reports are crucial for investors. Traditional methods, particularly classical machine learning models, struggle with these predictions because they cannot effectively process and interpret extensive textual data contained in earnings reports and often overlook nuances that influence market movements. This paper introduces an advanced approach b… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

    Comments: Accepted by 2024 6th International Conference on Data-driven Optimization of Complex Systems

  24. arXiv:2408.06567  [pdf, other

    cs.CL cs.AI

    AquilaMoE: Efficient Training for MoE Models with Scale-Up and Scale-Out Strategies

    Authors: Bo-Wen Zhang, Liangdong Wang, Ye Yuan, Jijie Li, Shuhao Gu, Mengdi Zhao, Xinya Wu, Guang Liu, Chengwei Wu, Hanyu Zhao, Li Du, Yiming Ju, Quanyue Ma, Yulong Ao, Yingli Zhao, Songhe Zhu, Zhou Cao, Dong Liang, Yonghua Lin, Ming Zhang, Shunfei Wang, Yanxin Zhou, Min Ye, Xuekai Chen, Xinyang Yu , et al. (2 additional authors not shown)

    Abstract: In recent years, with the rapid application of large language models across various fields, the scale of these models has gradually increased, and the resources required for their pre-training have grown exponentially. Training an LLM from scratch will cost a lot of computation resources while scaling up from a smaller model is a more efficient approach and has thus attracted significant attention… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

  25. arXiv:2408.06258  [pdf, other

    cs.SE cs.LG

    Deep Learning System Boundary Testing through Latent Space Style Mixing

    Authors: Amr Abdellatif, Xingcheng Chen, Vincenzo Riccio, Andrea Stocco

    Abstract: Evaluating the behavioral frontier of deep learning (DL) systems is crucial for understanding their generalizability and robustness. However, boundary testing is challenging due to their high-dimensional input space. Generative artificial intelligence offers a promising solution by modeling data distribution within compact latent space representations, thereby facilitating finer-grained exploratio… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

  26. arXiv:2408.06152  [pdf, other

    cs.MM cs.AI cs.CV cs.NI

    Palantir: Towards Efficient Super Resolution for Ultra-high-definition Live Streaming

    Authors: Xinqi Jin, Zhui Zhu, Xikai Sun, Fan Dang, Jiangchuan Liu, Jingao Xu, Kebin Liu, Xinlei Chen, Yunhao Liu

    Abstract: Neural enhancement through super-resolution deep neural networks opens up new possibilities for ultra-high-definition live streaming over existing encoding and networking infrastructure. Yet, the heavy SR DNN inference overhead leads to severe deployment challenges. To reduce the overhead, existing systems propose to apply DNN-based SR only on selected anchor frames while upscaling non-anchor fram… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

  27. arXiv:2408.05775  [pdf, other

    cs.CV

    Efficient Test-Time Prompt Tuning for Vision-Language Models

    Authors: Yuhan Zhu, Guozhen Zhang, Chen Xu, Haocheng Shen, Xiaoxin Chen, Gangshan Wu, Limin Wang

    Abstract: Vision-language models have showcased impressive zero-shot classification capabilities when equipped with suitable text prompts. Previous studies have shown the effectiveness of test-time prompt tuning; however, these methods typically require per-image prompt adaptation during inference, which incurs high computational budgets and limits scalability and practical deployment. To overcome this issu… ▽ More

    Submitted 11 August, 2024; originally announced August 2024.

  28. arXiv:2408.05477  [pdf, other

    cs.CV

    Scene123: One Prompt to 3D Scene Generation via Video-Assisted and Consistency-Enhanced MAE

    Authors: Yiying Yang, Fukun Yin, Jiayuan Fan, Xin Chen, Wanzhang Li, Gang Yu

    Abstract: As Artificial Intelligence Generated Content (AIGC) advances, a variety of methods have been developed to generate text, images, videos, and 3D objects from single or multimodal inputs, contributing efforts to emulate human-like cognitive content creation. However, generating realistic large-scale scenes from a single input presents a challenge due to the complexities involved in ensuring consiste… ▽ More

    Submitted 20 August, 2024; v1 submitted 10 August, 2024; originally announced August 2024.

    Comments: arXiv admin note: text overlap with arXiv:2305.11588 by other authors

  29. arXiv:2408.05428  [pdf, other

    cs.LG stat.ME stat.ML

    Generalized Encouragement-Based Instrumental Variables for Counterfactual Regression

    Authors: Anpeng Wu, Kun Kuang, Ruoxuan Xiong, Xiangwei Chen, Zexu Sun, Fei Wu, Kun Zhang

    Abstract: In causal inference, encouragement designs (EDs) are widely used to analyze causal effects, when randomized controlled trials (RCTs) are impractical or compliance to treatment cannot be perfectly enforced. Unlike RCTs, which directly allocate treatments, EDs randomly assign encouragement policies that positively motivate individuals to engage in a specific treatment. These random encouragements ac… ▽ More

    Submitted 10 August, 2024; originally announced August 2024.

  30. arXiv:2408.05002  [pdf, other

    cs.SE

    An Empirical Study on Challenges for LLM Developers

    Authors: Xiang Chen, Chaoyang Gao, Chunyang Chen, Guangbei Zhang, Yong Liu

    Abstract: In recent years, large language models (LLMs) have seen rapid advancements, significantly impacting various fields such as natural language processing, and software engineering. These LLMs, exemplified by OpenAI's ChatGPT, have revolutionized the way we approach language understanding and generation tasks. However, in contrast to traditional software development practices, LLM development introduc… ▽ More

    Submitted 11 August, 2024; v1 submitted 6 August, 2024; originally announced August 2024.

    Comments: 29 pages, 15 figures

  31. arXiv:2408.04813  [pdf, other

    cs.CV

    Rethinking Multiple Instance Learning: Developing an Instance-Level Classifier via Weakly-Supervised Self-Training

    Authors: Yingfan Ma, Xiaoyuan Luo, Mingzhi Yuan, Xinrong Chen, Manning Wang

    Abstract: Multiple instance learning (MIL) problem is currently solved from either bag-classification or instance-classification perspective, both of which ignore important information contained in some instances and result in limited performance. For example, existing methods often face difficulty in learning hard positive instances. In this paper, we formulate MIL as a semi-supervised instance classificat… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

  32. arXiv:2408.04812  [pdf, other

    cs.ET cs.AI

    A Collaborative PIM Computing Optimization Framework for Multi-Tenant DNN

    Authors: Bojing Li, Duo Zhong, Xiang Chen, Chenchen Liu

    Abstract: Modern Artificial Intelligence (AI) applications are increasingly utilizing multi-tenant deep neural networks (DNNs), which lead to a significant rise in computing complexity and the need for computing parallelism. ReRAM-based processing-in-memory (PIM) computing, with its high density and low power consumption characteristics, holds promising potential for supporting the deployment of multi-tenan… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

  33. arXiv:2408.04532  [pdf, other

    cs.LG

    How Transformers Utilize Multi-Head Attention in In-Context Learning? A Case Study on Sparse Linear Regression

    Authors: Xingwu Chen, Lei Zhao, Difan Zou

    Abstract: Despite the remarkable success of transformer-based models in various real-world tasks, their underlying mechanisms remain poorly understood. Recent studies have suggested that transformers can implement gradient descent as an in-context learner for linear regression problems and have developed various theoretical analyses accordingly. However, these works mostly focus on the expressive power of t… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

  34. arXiv:2408.04344  [pdf, other

    cs.SE

    Semantic-Enhanced Indirect Call Analysis with Large Language Models

    Authors: Baijun Cheng, Cen Zhang, Kailong Wang, Ling Shi, Yang Liu, Haoyu Wang, Yao Guo, Xiangqun Chen

    Abstract: In contemporary software development, the widespread use of indirect calls to achieve dynamic features poses challenges in constructing precise control flow graphs (CFGs), which further impacts the performance of downstream static analysis tasks. To tackle this issue, various types of indirect call analyzers have been proposed. However, they do not fully leverage the semantic information of the pr… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    Comments: Accepted by ASE'24

  35. arXiv:2408.04268  [pdf, other

    cs.CV

    Evaluating Modern Approaches in 3D Scene Reconstruction: NeRF vs Gaussian-Based Methods

    Authors: Yiming Zhou, Zixuan Zeng, Andi Chen, Xiaofan Zhou, Haowei Ni, Shiyao Zhang, Panfeng Li, Liangxi Liu, Mengyao Zheng, Xupeng Chen

    Abstract: Exploring the capabilities of Neural Radiance Fields (NeRF) and Gaussian-based methods in the context of 3D scene reconstruction, this study contrasts these modern approaches with traditional Simultaneous Localization and Mapping (SLAM) systems. Utilizing datasets such as Replica and ScanNet, we assess performance based on tracking accuracy, mapping fidelity, and view synthesis. Findings reveal th… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    Comments: Accepted by 2024 6th International Conference on Data-driven Optimization of Complex Systems

  36. arXiv:2408.04205  [pdf, other

    cs.IT

    High-Efficiency Urban 3D Radio Map Estimation Based on Sparse Measurements

    Authors: Xinwei Chen, Xiaofeng Zhong, Zijian Zhang, Linglong Dai, Shidong Zhou

    Abstract: Recent widespread applications for unmanned aerial vehicles (UAVs) -- from infrastructure inspection to urban logistics -- have prompted an urgent need for high-accuracy three-dimensional (3D) radio maps. However, existing methods designed for two-dimensional radio maps face challenges of high measurement costs and limited data availability when extended to 3D scenarios. To tackle these challenges… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    Comments: 5 pages,7 figures

  37. arXiv:2408.04203  [pdf, other

    cs.AI

    MMRole: A Comprehensive Framework for Developing and Evaluating Multimodal Role-Playing Agents

    Authors: Yanqi Dai, Huanran Hu, Lei Wang, Shengjie Jin, Xu Chen, Zhiwu Lu

    Abstract: Recently, Role-Playing Agents (RPAs) have garnered increasing attention for their potential to deliver emotional value and facilitate sociological research. However, existing studies are primarily confined to the textual modality, unable to simulate humans' multimodal perceptual capabilities. To bridge this gap, we introduce the concept of Multimodal Role-Playing Agents (MRPAs), and propose a comp… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

  38. arXiv:2408.04181  [pdf, other

    cs.CR cs.AI

    EdgeShield: A Universal and Efficient Edge Computing Framework for Robust AI

    Authors: Duo Zhong, Bojing Li, Xiang Chen, Chenchen Liu

    Abstract: The increasing prevalence of adversarial attacks on Artificial Intelligence (AI) systems has created a need for innovative security measures. However, the current methods of defending against these attacks often come with a high computing cost and require back-end processing, making real-time defense challenging. Fortunately, there have been remarkable advancements in edge-computing, which make it… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

  39. arXiv:2408.03950  [pdf, other

    cs.RO cs.AI

    EcoFollower: An Environment-Friendly Car Following Model Considering Fuel Consumption

    Authors: Hui Zhong, Xianda Chen, PakHin Tiu, Hongliang Lu, Meixin Zhu

    Abstract: To alleviate energy shortages and environmental impacts caused by transportation, this study introduces EcoFollower, a novel eco-car-following model developed using reinforcement learning (RL) to optimize fuel consumption in car-following scenarios. Employing the NGSIM datasets, the performance of EcoFollower was assessed in comparison with the well-established Intelligent Driver Model (IDM). The… ▽ More

    Submitted 22 July, 2024; originally announced August 2024.

  40. arXiv:2408.03806  [pdf, other

    cs.IT cs.LG cs.NI

    Trustworthy Image Semantic Communication with GenAI: Explainablity, Controllability, and Efficiency

    Authors: Xijun Wang, Dongshan Ye, Chenyuan Feng, Howard H. Yang, Xiang Chen, Tony Q. S. Quek

    Abstract: Image semantic communication (ISC) has garnered significant attention for its potential to achieve high efficiency in visual content transmission. However, existing ISC systems based on joint source-channel coding face challenges in interpretability, operability, and compatibility. To address these limitations, we propose a novel trustworthy ISC framework. This approach leverages text extraction a… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

    Comments: 8 pages, 4 figures, 2 tables

  41. arXiv:2408.03624  [pdf, other

    cs.CV

    AgentsCoMerge: Large Language Model Empowered Collaborative Decision Making for Ramp Merging

    Authors: Senkang Hu, Zhengru Fang, Zihan Fang, Yiqin Deng, Xianhao Chen, Yuguang Fang, Sam Kwong

    Abstract: Ramp merging is one of the bottlenecks in traffic systems, which commonly cause traffic congestion, accidents, and severe carbon emissions. In order to address this essential issue and enhance the safety and efficiency of connected and autonomous vehicles (CAVs) at multi-lane merging zones, we propose a novel collaborative decision-making framework, named AgentsCoMerge, to leverage large language… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

  42. arXiv:2408.03337  [pdf, other

    cs.HC cs.AI cs.CY cs.LG

    PsyDI: Towards a Personalized and Progressively In-depth Chatbot for Psychological Measurements

    Authors: Xueyan Li, Xinyan Chen, Yazhe Niu, Shuai Hu, Yu Liu

    Abstract: In the field of psychology, traditional assessment methods, such as standardized scales, are frequently critiqued for their static nature, lack of personalization, and reduced participant engagement, while comprehensive counseling evaluations are often inaccessible. The complexity of quantifying psychological traits further limits these methods. Despite advances with large language models (LLMs),… ▽ More

    Submitted 15 August, 2024; v1 submitted 22 July, 2024; originally announced August 2024.

    Comments: 29 pages, 15 figures

  43. arXiv:2408.02861  [pdf, other

    cs.CL cs.LG

    A Framework for Fine-Tuning LLMs using Heterogeneous Feedback

    Authors: Ryan Aponte, Ryan A. Rossi, Shunan Guo, Franck Dernoncourt, Tong Yu, Xiang Chen, Subrata Mitra, Nedim Lipka

    Abstract: Large language models (LLMs) have been applied to a wide range of tasks, including text summarization, web navigation, and chatbots. They have benefitted from supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF) following an unsupervised pretraining. These datasets can be difficult to collect, limited in scope, and vary in sample quality. Additionally, datasets can va… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

    Comments: 7 pages, 1 figure

    ACM Class: I.2.7

  44. arXiv:2408.02803  [pdf, other

    cs.HC cs.CV

    SiCo: A Size-Controllable Virtual Try-On Approach for Informed Decision-Making

    Authors: Sherry X. Chen, Alex Christopher Lim, Yimeng Liu, Pradeep Sen, Misha Sra

    Abstract: Virtual try-on (VTO) applications aim to improve the online shopping experience by allowing users to preview garments, before making purchase decisions. However, many VTO tools fail to consider the crucial relationship between a garment's size and the user's body size, often employing a one-size-fits-all approach when visualizing a clothing item. This results in poor size recommendations and purch… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

  45. arXiv:2408.02788  [pdf, other

    cs.CV

    GazeXplain: Learning to Predict Natural Language Explanations of Visual Scanpaths

    Authors: Xianyu Chen, Ming Jiang, Qi Zhao

    Abstract: While exploring visual scenes, humans' scanpaths are driven by their underlying attention processes. Understanding visual scanpaths is essential for various applications. Traditional scanpath models predict the where and when of gaze shifts without providing explanations, creating a gap in understanding the rationale behind fixations. To bridge this gap, we introduce GazeXplain, a novel study of v… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

    Comments: To appear in ECCV2024

  46. arXiv:2408.02622  [pdf, other

    cs.CL cs.AI cs.HC cs.SD eess.AS

    Language Model Can Listen While Speaking

    Authors: Ziyang Ma, Yakun Song, Chenpeng Du, Jian Cong, Zhuo Chen, Yuping Wang, Yuxuan Wang, Xie Chen

    Abstract: Dialogue serves as the most natural manner of human-computer interaction (HCI). Recent advancements in speech language models (SLM) have significantly enhanced speech-based conversational AI. However, these models are limited to turn-based conversation, lacking the ability to interact with humans in real-time spoken scenarios, for example, being interrupted when the generated content is not satisf… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

    Comments: Demo can be found at https://ddlbojack.github.io/LSLM

  47. arXiv:2408.02450  [pdf, other

    cs.SE

    An Evaluation of Requirements Modeling for Cyber-Physical Systems via LLMs

    Authors: Dongming Jin, Shengxin Zhao, Zhi Jin, Xiaohong Chen, Chunhui Wang, Zheng Fang, Hongbin Xiao

    Abstract: Cyber-physical systems (CPSs) integrate cyber and physical components and enable them to interact with each other to meet user needs. The needs for CPSs span rich application domains such as healthcare and medicine, smart home, smart building, etc. This indicates that CPSs are all about solving real-world problems. With the increasing abundance of sensing devices and effectors, the problems wanted… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

    Comments: 12 pages, 8 figures

  48. arXiv:2408.01970  [pdf, other

    cs.AI cs.CV

    SR-CIS: Self-Reflective Incremental System with Decoupled Memory and Reasoning

    Authors: Biqing Qi, Junqi Gao, Xinquan Chen, Dong Li, Weinan Zhang, Bowen Zhou

    Abstract: The ability of humans to rapidly learn new knowledge while retaining old memories poses a significant challenge for current deep learning models. To handle this challenge, we draw inspiration from human memory and learning mechanisms and propose the Self-Reflective Complementary Incremental System (SR-CIS). Comprising the deconstructed Complementary Inference Module (CIM) and Complementary Memory… ▽ More

    Submitted 4 August, 2024; originally announced August 2024.

  49. arXiv:2408.01841  [pdf, other

    cs.RO

    BEVPlace++: Fast, Robust, and Lightweight LiDAR Global Localization for Unmanned Ground Vehicles

    Authors: Lun Luo, Si-Yuan Cao, Xiaorui Li, Jintao Xu, Rui Ai, Zhu Yu, Xieyuanli Chen

    Abstract: This article introduces BEVPlace++, a novel, fast, and robust LiDAR global localization method for unmanned ground vehicles. It uses lightweight convolutional neural networks (CNNs) on Bird's Eye View (BEV) image-like representations of LiDAR data to achieve accurate global localization through place recognition followed by 3-DoF pose estimation. Our detailed analyses reveal an interesting fact th… ▽ More

    Submitted 9 August, 2024; v1 submitted 3 August, 2024; originally announced August 2024.

    Comments: Under review

  50. arXiv:2408.01334  [pdf, other

    cs.RO cs.AI cs.CV cs.HC

    A Backbone for Long-Horizon Robot Task Understanding

    Authors: Xiaoshuai Chen, Wei Chen, Dongmyoung Lee, Yukun Ge, Nicolas Rojas, Petar Kormushev

    Abstract: End-to-end robot learning, particularly for long-horizon tasks, often results in unpredictable outcomes and poor generalization. To address these challenges, we propose a novel Therblig-based Backbone Framework (TBBF) to enhance robot task understanding and transferability. This framework uses therbligs (basic action elements) as the backbone to decompose high-level robot tasks into elemental robo… ▽ More

    Submitted 7 August, 2024; v1 submitted 2 August, 2024; originally announced August 2024.

    Comments: 8 pages, 8 figures. This work is intended to be submitted to IEEE Robotics and Automation Letters (RA-L) for possible publication