Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 1,300 results for author: Xue, W

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.19765  [pdf, other

    cs.AI

    Map2Traj: Street Map Piloted Zero-shot Trajectory Generation with Diffusion Model

    Authors: Zhenyu Tao, Wei Xu, Xiaohu You

    Abstract: User mobility modeling serves a crucial role in analysis and optimization of contemporary wireless networks. Typical stochastic mobility models, e.g., random waypoint model and Gauss Markov model, can hardly capture the distribution characteristics of users within real-world areas. State-of-the-art trace-based mobility models and existing learning-based trajectory generation methods, however, are… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  2. arXiv:2407.19672  [pdf, other

    cs.CL

    SeaLLMs 3: Open Foundation and Chat Multilingual Large Language Models for Southeast Asian Languages

    Authors: Wenxuan Zhang, Hou Pong Chan, Yiran Zhao, Mahani Aljunied, Jianyu Wang, Chaoqun Liu, Yue Deng, Zhiqiang Hu, Weiwen Xu, Yew Ken Chia, Xin Li, Lidong Bing

    Abstract: Large Language Models (LLMs) have shown remarkable abilities across various tasks, yet their development has predominantly centered on high-resource languages like English and Chinese, leaving low-resource languages underserved. To address this disparity, we present SeaLLMs 3, the latest iteration of the SeaLLMs model family, tailored for Southeast Asian languages. This region, characterized by it… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

  3. arXiv:2407.19514  [pdf, other

    cs.CV cs.MM

    Detached and Interactive Multimodal Learning

    Authors: Yunfeng Fan, Wenchao Xu, Haozhao Wang, Junhong Liu, Song Guo

    Abstract: Recently, Multimodal Learning (MML) has gained significant interest as it compensates for single-modality limitations through comprehensive complementary information within multimodal data. However, traditional MML methods generally use the joint learning framework with a uniform learning objective that can lead to the modality competition issue, where feedback predominantly comes from certain mod… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

    Comments: Accepted by ACM MM 24

  4. arXiv:2407.19394  [pdf, other

    cs.CV

    Depth-Wise Convolutions in Vision Transformers for Efficient Training on Small Datasets

    Authors: Tianxiao Zhang, Wenju Xu, Bo Luo, Guanghui Wang

    Abstract: The Vision Transformer (ViT) leverages the Transformer's encoder to capture global information by dividing images into patches and achieves superior performance across various computer vision tasks. However, the self-attention mechanism of ViT captures the global context from the outset, overlooking the inherent relationships between neighboring pixels in images or videos. Transformers mainly focu… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

  5. arXiv:2407.17183  [pdf

    cs.RO

    Robust Point Cloud Registration in Robotic Inspection with Locally Consistent Gaussian Mixture Model

    Authors: Lingjie Su, Wei Xu, Wenlong Li

    Abstract: In robotic inspection of aviation parts, achieving accurate pairwise point cloud registration between scanned and model data is essential. However, noise and outliers generated in robotic scanned data can compromise registration accuracy. To mitigate this challenge, this article proposes a probability-based registration method utilizing Gaussian Mixture Model (GMM) with local consistency constrain… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

    Comments: 12 pages, 14 figures

  6. arXiv:2407.16637  [pdf, other

    cs.CL cs.AI cs.LG

    Course-Correction: Safety Alignment Using Synthetic Preferences

    Authors: Rongwu Xu, Yishuo Cai, Zhenhong Zhou, Renjie Gu, Haiqin Weng, Yan Liu, Tianwei Zhang, Wei Xu, Han Qiu

    Abstract: The risk of harmful content generated by large language models (LLMs) becomes a critical concern. This paper presents a systematic study on assessing and improving LLMs' capability to perform the task of \textbf{course-correction}, \ie, the model can steer away from generating harmful content autonomously. To start with, we introduce the \textsc{C$^2$-Eval} benchmark for quantitative assessment an… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: Dataset and script will be available at https://github.com/pillowsofwind/Course-Correction

  7. arXiv:2407.16327  [pdf, other

    cs.CR cs.CV

    Understanding Impacts of Electromagnetic Signal Injection Attacks on Object Detection

    Authors: Youqian Zhang, Chunxi Yang, Eugene Y. Fu, Qinhong Jiang, Chen Yan, Sze-Yiu Chau, Grace Ngai, Hong-Va Leong, Xiapu Luo, Wenyuan Xu

    Abstract: Object detection can localize and identify objects in images, and it is extensively employed in critical multimedia applications such as security surveillance and autonomous driving. Despite the success of existing object detection models, they are often evaluated in ideal scenarios where captured images guarantee the accurate and complete representation of the detecting scenes. However, images ca… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: 2024 IEEE International Conference on Multimedia and Expo (ICME), July 15 - July 19, 2024, Niagra Falls, Ontario, Canada

  8. arXiv:2407.16197  [pdf, other

    cs.CV cs.RO

    LiCROcc: Teach Radar for Accurate Semantic Occupancy Prediction using LiDAR and Camera

    Authors: Yukai Ma, Jianbiao Mei, Xuemeng Yang, Licheng Wen, Weihua Xu, Jiangning Zhang, Botian Shi, Yong Liu, Xingxing Zuo

    Abstract: Semantic Scene Completion (SSC) is pivotal in autonomous driving perception, frequently confronted with the complexities of weather and illumination changes. The long-term strategy involves fusing multi-modal information to bolster the system's robustness. Radar, increasingly utilized for 3D target detection, is gradually replacing LiDAR in autonomous driving applications, offering a robust sensin… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  9. arXiv:2407.15366  [pdf, other

    cs.CL cs.AI cs.CY

    Walking in Others' Shoes: How Perspective-Taking Guides Large Language Models in Reducing Toxicity and Bias

    Authors: Rongwu Xu, Zi'an Zhou, Tianwei Zhang, Zehan Qi, Su Yao, Ke Xu, Wei Xu, Han Qiu

    Abstract: The common toxicity and societal bias in contents generated by large language models (LLMs) necessitate strategies to reduce harm. Present solutions often demand white-box access to the model or substantial training, which is impractical for cutting-edge commercial LLMs. Moreover, prevailing prompting methods depend on external tool feedback and fail to simultaneously lessen toxicity and bias. Mot… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

  10. arXiv:2407.15343  [pdf, other

    cs.CL

    Improving Minimum Bayes Risk Decoding with Multi-Prompt

    Authors: David Heineman, Yao Dou, Wei Xu

    Abstract: While instruction fine-tuned LLMs are effective text generators, sensitivity to prompt construction makes performance unstable and sub-optimal in practice. Relying on a single "best" prompt cannot capture all differing approaches to a generation problem. Using this observation, we propose multi-prompt decoding, where many candidate generations are decoded from a prompt bank at inference-time. To e… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

  11. arXiv:2407.15045  [pdf, other

    eess.SY cs.LG

    Efficient Sampling for Data-Driven Frequency Stability Constraint via Forward-Mode Automatic Differentiation

    Authors: Wangkun Xu, Qian Chen, Pudong Ge, Zhongda Chu, Fei Teng

    Abstract: Encoding frequency stability constraints in the operation problem is challenging due to its complex dynamics. Recently, data-driven approaches have been proposed to learn the stability criteria offline with the trained model embedded as a constraint of online optimization. However, random sampling of stationary operation points is less efficient in generating balanced stable and unstable samples.… ▽ More

    Submitted 20 July, 2024; originally announced July 2024.

  12. arXiv:2407.14562  [pdf, other

    cs.AI cs.CL

    Thought-Like-Pro: Enhancing Reasoning of Large Language Models through Self-Driven Prolog-based Chain-of-Though

    Authors: Xiaoyu Tan, Yongxin Deng, Xihe Qiu, Weidi Xu, Chao Qu, Wei Chu, Yinghui Xu, Yuan Qi

    Abstract: Large language models (LLMs) have shown exceptional performance as general-purpose assistants, excelling across a variety of reasoning tasks. This achievement represents a significant step toward achieving artificial general intelligence (AGI). Despite these advancements, the effectiveness of LLMs often hinges on the specific prompting strategies employed, and there remains a lack of a robust fram… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    ACM Class: I.2.7

  13. arXiv:2407.12471  [pdf, other

    cs.CY cs.CL

    Characterization of Political Polarized Users Attacked by Language Toxicity on Twitter

    Authors: Wentao Xu

    Abstract: Understanding the dynamics of language toxicity on social media is important for us to investigate the propagation of misinformation and the development of echo chambers for political scenarios such as U.S. presidential elections. Recent research has used large-scale data to investigate the dynamics across social media platforms. However, research on the toxicity dynamics is not enough. This study… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: This work has been accepted by 2024 Conference on Computer Supported Cooperative Work and Social Computing (CSCW2024). Association for Computing Machinery (ACM), New York, NY, USA

    MSC Class: 91; 94 ACM Class: J.4

  14. arXiv:2407.12178  [pdf, other

    cs.LG cs.AI stat.ML

    Exploration Unbound

    Authors: Dilip Arumugam, Wanqiao Xu, Benjamin Van Roy

    Abstract: A sequential decision-making agent balances between exploring to gain new knowledge about an environment and exploiting current knowledge to maximize immediate reward. For environments studied in the traditional literature, optimal decisions gravitate over time toward exploitation as the agent accumulates sufficient knowledge and the benefits of further exploration vanish. What if, however, the en… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: Accepted to the Finding the Frame Workshop at RLC 2024

  15. arXiv:2407.11385  [pdf, other

    cs.RO cs.GR

    Grasping Diverse Objects with Simulated Humanoids

    Authors: Zhengyi Luo, Jinkun Cao, Sammy Christen, Alexander Winkler, Kris Kitani, Weipeng Xu

    Abstract: We present a method for controlling a simulated humanoid to grasp an object and move it to follow an object trajectory. Due to the challenges in controlling a humanoid with dexterous hands, prior methods often use a disembodied hand and only consider vertical lifts or short trajectories. This limited scope hampers their applicability for object manipulation required for animation and simulation. T… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: Project page: https://www.zhengyiluo.com/Omnigrasp/

  16. arXiv:2407.11046  [pdf, other

    cs.LG cs.AI cs.CL

    A Survey on LoRA of Large Language Models

    Authors: Yuren Mao, Yuhang Ge, Yijiang Fan, Wenyi Xu, Yu Mi, Zhonghao Hu, Yunjun Gao

    Abstract: Low-Rank Adaptation~(LoRA), which updates the dense neural network layers with pluggable low-rank matrices, is one of the best performed parameter efficient fine-tuning paradigms. Furthermore, it has significant advantages in cross-task generalization and privacy-preserving. Hence, LoRA has gained much attention recently, and the number of related literature demonstrates exponential growth. It is… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  17. arXiv:2407.10680  [pdf, ps, other

    cs.SI cs.NI

    Friedkin-Johnsen Model for Opinion Dynamics on Signed Graphs

    Authors: Xiaotian Zhou, Haoxin Sun, Wanyue Xu, Wei Li, Zhongzhi Zhang

    Abstract: A signed graph offers richer information than an unsigned graph, since it describes both collaborative and competitive relationships in social networks. In this paper, we study opinion dynamics on a signed graph, based on the Friedkin-Johnsen model. We first interpret the equilibrium opinion in terms of a defined random walk on an augmented signed graph, by representing the equilibrium opinion of… ▽ More

    Submitted 17 July, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

  18. arXiv:2407.09552  [pdf

    cs.CV cs.GR

    Optimized 3D Point Labeling with Leaders Using the Beams Displacement Method

    Authors: Zhiwei Wei, Nai Yang, Wenjia Xu, Su Ding

    Abstract: In three-dimensional geographical scenes, adding labels with leader lines to point features can significantly improve their visibility. Leadered labels have a large degree of freedom in position con-figuration, but existing methods are mostly based on limited position candidate models, which not only fail to effectively utilize the map space but also make it difficult to consider the relative rela… ▽ More

    Submitted 27 June, 2024; originally announced July 2024.

    Comments: 12 pages, in Chinese language, 10 figures, an accepted version of ChinaVis2024

  19. arXiv:2407.08558  [pdf, other

    cs.AI

    ST-Mamba: Spatial-Temporal Mamba for Traffic Flow Estimation Recovery using Limited Data

    Authors: Doncheng Yuan, Jianzhe Xue, Jinshan Su, Wenchao Xu, Haibo Zhou

    Abstract: Traffic flow estimation (TFE) is crucial for urban intelligent traffic systems. While traditional on-road detectors are hindered by limited coverage and high costs, cloud computing and data mining of vehicular network data, such as driving speeds and GPS coordinates, present a promising and cost-effective alternative. Furthermore, minimizing data collection can significantly reduce overhead. Howev… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: Accepted by 2024 IEEE/CIC International Conference on Communications in China (ICCC)

  20. arXiv:2407.08047   

    cs.LG cs.AI

    Spatial-Temporal Attention Model for Traffic State Estimation with Sparse Internet of Vehicles

    Authors: Jianzhe Xue, Dongcheng Yuan, Yu Sun, Tianqi Zhang, Wenchao Xu, Haibo Zhou, Xuemin, Shen

    Abstract: The growing number of connected vehicles offers an opportunity to leverage internet of vehicles (IoV) data for traffic state estimation (TSE) which plays a crucial role in intelligent transportation systems (ITS). By utilizing only a portion of IoV data instead of the entire dataset, the significant overheads associated with collecting and processing large amounts of data can be avoided. In this p… ▽ More

    Submitted 14 July, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

    Comments: need further improvement

  21. arXiv:2407.07700  [pdf, other

    stat.ML cs.LG

    Split Conformal Prediction under Data Contamination

    Authors: Jase Clarkson, Wenkai Xu, Mihai Cucuringu, Gesine Reinert

    Abstract: Conformal prediction is a non-parametric technique for constructing prediction intervals or sets from arbitrary predictive models under the assumption that the data is exchangeable. It is popular as it comes with theoretical guarantees on the marginal coverage of the prediction sets and the split conformal prediction variant has a very low computational cost compared to model training. We study th… ▽ More

    Submitted 16 July, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

  22. arXiv:2407.06985  [pdf, other

    cs.AI

    PEER: Expertizing Domain-Specific Tasks with a Multi-Agent Framework and Tuning Methods

    Authors: Yiying Wang, Xiaojing Li, Binzhu Wang, Yueyang Zhou, Han Ji, Hong Chen, Jinshi Zhang, Fei Yu, Zewei Zhao, Song Jin, Renji Gong, Wanqing Xu

    Abstract: In domain-specific applications, GPT-4, augmented with precise prompts or Retrieval-Augmented Generation (RAG), shows notable potential but faces the critical tri-lemma of performance, cost, and data privacy. High performance requires sophisticated processing techniques, yet managing multiple agents within a complex workflow often proves costly and challenging. To address this, we introduce the PE… ▽ More

    Submitted 9 July, 2024; v1 submitted 9 July, 2024; originally announced July 2024.

  23. arXiv:2407.06611  [pdf, other

    cs.CV cs.AI

    CEIA: CLIP-Based Event-Image Alignment for Open-World Event-Based Understanding

    Authors: Wenhao Xu, Wenming Weng, Yueyi Zhang, Zhiwei Xiong

    Abstract: We present CEIA, an effective framework for open-world event-based understanding. Currently training a large event-text model still poses a huge challenge due to the shortage of paired event-text data. In response to this challenge, CEIA learns to align event and image data as an alternative instead of directly aligning event and text data. Specifically, we leverage the rich event-image datasets t… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  24. arXiv:2407.06128  [pdf

    cs.CV

    Towards SAR Automatic Target Recognition MultiCategory SAR Image Classification Based on Light Weight Vision Transformer

    Authors: Guibin Zhao, Pengfei Li, Zhibo Zhang, Fusen Guo, Xueting Huang, Wei Xu, Jinyin Wang, Jianlong Chen

    Abstract: Synthetic Aperture Radar has been extensively used in numerous fields and can gather a wealth of information about the area of interest. This large scene data intensive technology puts a high value on automatic target recognition which can free the utilizers and boost the efficiency. Recent advances in artificial intelligence have made it possible to create a deep learning based SAR ATR that can a… ▽ More

    Submitted 9 July, 2024; v1 submitted 18 May, 2024; originally announced July 2024.

  25. arXiv:2407.05597  [pdf, other

    cs.CV cs.GR

    GeoNLF: Geometry guided Pose-Free Neural LiDAR Fields

    Authors: Weiyi Xue, Zehan Zheng, Fan Lu, Haiyun Wei, Guang Chen, Changjun Jiang

    Abstract: Although recent efforts have extended Neural Radiance Fields (NeRF) into LiDAR point cloud synthesis, the majority of existing works exhibit a strong dependence on precomputed poses. However, point cloud registration methods struggle to achieve precise global pose estimation, whereas previous pose-free NeRFs overlook geometric consistency in global reconstruction. In light of this, we explore the… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  26. arXiv:2407.05233  [pdf, other

    cs.CL cs.AI

    Advancing Prompt Recovery in NLP: A Deep Dive into the Integration of Gemma-2b-it and Phi2 Models

    Authors: Jianlong Chen, Wei Xu, Zhicheng Ding, Jinxin Xu, Hao Yan, Xinyu Zhang

    Abstract: Prompt recovery, a crucial task in natural language processing, entails the reconstruction of prompts or instructions that language models use to convert input text into a specific output. Although pivotal, the design and effectiveness of prompts represent a challenging and relatively untapped field within NLP research. This paper delves into an exhaustive investigation of prompt recovery methodol… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

  27. arXiv:2407.05005  [pdf, other

    cs.LG cs.DC

    Personalized Federated Domain-Incremental Learning based on Adaptive Knowledge Matching

    Authors: Yichen Li, Wenchao Xu, Haozhao Wang, Ruixuan Li, Yining Qi, Jingcai Guo

    Abstract: This paper focuses on Federated Domain-Incremental Learning (FDIL) where each client continues to learn incremental tasks where their domain shifts from each other. We propose a novel adaptive knowledge matching-based personalized FDIL approach (pFedDIL) which allows each client to alternatively utilize appropriate incremental task learning strategy on the correlation with the knowledge from previ… ▽ More

    Submitted 18 July, 2024; v1 submitted 6 July, 2024; originally announced July 2024.

  28. arXiv:2407.04952  [pdf, other

    cs.CL cs.CV

    Granular Privacy Control for Geolocation with Vision Language Models

    Authors: Ethan Mendes, Yang Chen, James Hays, Sauvik Das, Wei Xu, Alan Ritter

    Abstract: Vision Language Models (VLMs) are rapidly advancing in their capability to answer information-seeking questions. As these models are widely deployed in consumer applications, they could lead to new privacy risks due to emergent abilities to identify people in photos, geolocate images, etc. As we demonstrate, somewhat surprisingly, current open-source and proprietary VLMs are very capable image geo… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

  29. arXiv:2407.04859  [pdf

    cs.CV cs.AI

    Hybrid Primal Sketch: Combining Analogy, Qualitative Representations, and Computer Vision for Scene Understanding

    Authors: Kenneth D. Forbus, Kezhen Chen, Wangcheng Xu, Madeline Usher

    Abstract: One of the purposes of perception is to bridge between sensors and conceptual understanding. Marr's Primal Sketch combined initial edge-finding with multiple downstream processes to capture aspects of visual perception such as grouping and stereopsis. Given the progress made in multiple areas of AI since then, we have developed a new framework inspired by Marr's work, the Hybrid Primal Sketch, whi… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: 16 pages, 6 figures

  30. arXiv:2407.04681  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Rethinking Visual Prompting for Multimodal Large Language Models with External Knowledge

    Authors: Yuanze Lin, Yunsheng Li, Dongdong Chen, Weijian Xu, Ronald Clark, Philip Torr, Lu Yuan

    Abstract: In recent years, multimodal large language models (MLLMs) have made significant strides by training on vast high-quality image-text datasets, enabling them to generally understand images well. However, the inherent difficulty in explicitly conveying fine-grained or spatially dense information in text, such as masks, poses a challenge for MLLMs, limiting their ability to answer questions requiring… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  31. arXiv:2407.04346  [pdf

    cs.CV

    MobileFlow: A Multimodal LLM For Mobile GUI Agent

    Authors: Songqin Nong, Jiali Zhu, Rui Wu, Jiongchao Jin, Shuo Shan, Xiutian Huang, Wenhao Xu

    Abstract: Currently, the integration of mobile Graphical User Interfaces (GUIs) is ubiquitous in most people's daily lives. And the ongoing evolution of multimodal large-scale models, such as GPT-4v, Qwen-VL-Max, has significantly bolstered the capabilities of GUI comprehension and user action analysis, showcasing the potentiality of intelligent GUI assistants. However, current GUI Agents often need to acce… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  32. arXiv:2407.03776  [pdf, other

    cs.IT

    Energy-Efficient Probabilistic Semantic Communication over Space-Air-Ground Integrated Networks

    Authors: Zhouxiang Zhao, Zhaohui Yang, Mingzhe Chen, Zhaoyang Zhang, Wei Xu, Kaibin Huang

    Abstract: Space-air-ground integrated networks (SAGINs) are emerging as a pivotal element in the evolution of future wireless networks. Despite their potential, the joint design of communication and computation within SAGINs remains a formidable challenge. In this paper, the problem of energy efficiency in SAGIN-enabled probabilistic semantic communication (PSC) system is investigated. In the considered mod… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  33. arXiv:2407.03460  [pdf, other

    cs.CL cs.AI

    Collaborative Quest Completion with LLM-driven Non-Player Characters in Minecraft

    Authors: Sudha Rao, Weijia Xu, Michael Xu, Jorge Leandro, Ken Lobb, Gabriel DesGarennes, Chris Brockett, Bill Dolan

    Abstract: The use of generative AI in video game development is on the rise, and as the conversational and other capabilities of large language models continue to improve, we expect LLM-driven non-player characters (NPCs) to become widely deployed. In this paper, we seek to understand how human players collaborate with LLM-driven NPCs to accomplish in-game goals. We design a minigame within Minecraft where… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: Accepted at Wordplay workshop at ACL 2024

    Journal ref: ACL 2024

  34. arXiv:2407.03263  [pdf, other

    cs.CV

    A Unified Framework for 3D Scene Understanding

    Authors: Wei Xu, Chunsheng Shi, Sifan Tu, Xin Zhou, Dingkang Liang, Xiang Bai

    Abstract: We propose UniSeg3D, a unified 3D segmentation framework that achieves panoptic, semantic, instance, interactive, referring, and open-vocabulary semantic segmentation tasks within a single model. Most previous 3D segmentation approaches are specialized for a specific task, thereby limiting their understanding of 3D scenes to a task-specific perspective. In contrast, the proposed method unifies six… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: The code will be available at https://dk-liang.github.io/UniSeg3D/

  35. arXiv:2407.02797  [pdf, other

    cs.RO cs.CV

    Solving Motion Planning Tasks with a Scalable Generative Model

    Authors: Yihan Hu, Siqi Chai, Zhening Yang, Jingyu Qian, Kun Li, Wenxin Shao, Haichao Zhang, Wei Xu, Qiang Liu

    Abstract: As autonomous driving systems being deployed to millions of vehicles, there is a pressing need of improving the system's scalability, safety and reducing the engineering cost. A realistic, scalable, and practical simulator of the driving world is highly desired. In this paper, we present an efficient solution based on generative models which learns the dynamics of the driving scenes. With this mod… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: ECCV2024

  36. arXiv:2407.01577  [pdf, other

    q-fin.TR cs.AI cs.LG

    MOT: A Mixture of Actors Reinforcement Learning Method by Optimal Transport for Algorithmic Trading

    Authors: Xi Cheng, Jinghao Zhang, Yunan Zeng, Wenfang Xue

    Abstract: Algorithmic trading refers to executing buy and sell orders for specific assets based on automatically identified trading opportunities. Strategies based on reinforcement learning (RL) have demonstrated remarkable capabilities in addressing algorithmic trading problems. However, the trading patterns differ among market conditions due to shifted distribution data. Ignoring multiple patterns in the… ▽ More

    Submitted 2 June, 2024; originally announced July 2024.

    Comments: 13 pages, 5 figures, PAKDD2024 accepted

  37. arXiv:2407.01183  [pdf, other

    cs.DB

    TCSR-SQL: Towards Table Content-aware Text-to-SQL with Self-retrieval

    Authors: Wenbo Xu, Liang Yan, Peiyi Han, Haifeng Zhu, Chuanyi Liu, Shaoming Duan, Cuiyun Gao, Yingwei Liang

    Abstract: Large Language Model-based (LLM-based) Text-to-SQL methods have achieved important progress in generating SQL queries for real-world applications. When confronted with table content-aware questions in real-world scenarios, ambiguous data content keywords and non-existent database schema column names within the question leads to the poor performance of existing methods. To solve this problem, we pr… ▽ More

    Submitted 12 July, 2024; v1 submitted 1 July, 2024; originally announced July 2024.

  38. arXiv:2407.00993  [pdf, other

    cs.AI cs.CL

    Mobile-Bench: An Evaluation Benchmark for LLM-based Mobile Agents

    Authors: Shihan Deng, Weikai Xu, Hongda Sun, Wei Liu, Tao Tan, Jianfeng Liu, Ang Li, Jian Luan, Bin Wang, Rui Yan, Shuo Shang

    Abstract: With the remarkable advancements of large language models (LLMs), LLM-based agents have become a research hotspot in human-computer interaction. However, there is a scarcity of benchmarks available for LLM-based mobile agents. Benchmarking these agents generally faces three main challenges: (1) The inefficiency of UI-only operations imposes limitations to task evaluation. (2) Specific instructions… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  39. arXiv:2407.00490  [pdf, other

    cs.LG math.OC stat.ML

    Toward Global Convergence of Gradient EM for Over-Parameterized Gaussian Mixture Models

    Authors: Weihang Xu, Maryam Fazel, Simon S. Du

    Abstract: We study the gradient Expectation-Maximization (EM) algorithm for Gaussian Mixture Models (GMM) in the over-parameterized setting, where a general GMM with $n>1$ components learns from data that are generated by a single ground truth Gaussian distribution. While results for the special case of 2-Gaussian mixtures are well-known, a general global convergence analysis for arbitrary $n$ remains unres… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

    Comments: 25 pages

  40. arXiv:2406.18573  [pdf

    cs.CV cs.CY cs.GR

    Generating grid maps via the snake model

    Authors: Zhiwei Wei, Nai Yang, Wenjia Xu, Su Ding

    Abstract: The grid map, often referred to as the tile map, stands as a vital tool in geospatial visualization, possessing unique attributes that differentiate it from more commonly known techniques such as choropleths and cartograms. It transforms geographic regions into grids, which requires the displacement of both region centroids and boundary nodes to establish a coherent grid arrangement. However, exis… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: 10 Pages, 8 Figures

    Journal ref: Transactions in GIS, 2024, 1-19

  41. arXiv:2406.18102  [pdf

    eess.IV cs.CV

    A Lung Nodule Dataset with Histopathology-based Cancer Type Annotation

    Authors: Muwei Jian, Hongyu Chen, Zaiyong Zhang, Nan Yang, Haorang Zhang, Lifu Ma, Wenjing Xu, Huixiang Zhi

    Abstract: Recently, Computer-Aided Diagnosis (CAD) systems have emerged as indispensable tools in clinical diagnostic workflows, significantly alleviating the burden on radiologists. Nevertheless, despite their integration into clinical settings, CAD systems encounter limitations. Specifically, while CAD systems can achieve high performance in the detection of lung nodules, they face challenges in accuratel… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  42. arXiv:2406.16943  [pdf, other

    eess.SP cs.AI cs.HC cs.LG

    EarDA: Towards Accurate and Data-Efficient Earable Activity Sensing

    Authors: Shengzhe Lyu, Yongliang Chen, Di Duan, Renqi Jia, Weitao Xu

    Abstract: In the realm of smart sensing with the Internet of Things, earable devices are empowered with the capability of multi-modality sensing and intelligence of context-aware computing, leading to its wide usage in Human Activity Recognition (HAR). Nonetheless, unlike the movements captured by Inertial Measurement Unit (IMU) sensors placed on the upper or lower body, those motion signals obtained from e… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: accepted by 2024 IEEE Coupling of Sensing & Computing in AIoT Systems (CSCAIoT)

  43. arXiv:2406.16537  [pdf, other

    cs.CV cs.AI

    Character-Adapter: Prompt-Guided Region Control for High-Fidelity Character Customization

    Authors: Yuhang Ma, Wenting Xu, Jiji Tang, Qinfeng Jin, Rongsheng Zhang, Zeng Zhao, Changjie Fan, Zhipeng Hu

    Abstract: Customized image generation, which seeks to synthesize images with consistent characters, holds significant relevance for applications such as storytelling, portrait generation, and character design. However, previous approaches have encountered challenges in preserving characters with high-fidelity consistency due to inadequate feature extraction and concept confusion of reference characters. The… ▽ More

    Submitted 3 July, 2024; v1 submitted 24 June, 2024; originally announced June 2024.

  44. arXiv:2406.16377  [pdf, other

    cs.CL cs.AI

    On the Transformations across Reward Model, Parameter Update, and In-Context Prompt

    Authors: Deng Cai, Huayang Li, Tingchen Fu, Siheng Li, Weiwen Xu, Shuaiyi Li, Bowen Cao, Zhisong Zhang, Xinting Huang, Leyang Cui, Yan Wang, Lemao Liu, Taro Watanabe, Shuming Shi

    Abstract: Despite the general capabilities of pre-trained large language models (LLMs), they still need further adaptation to better serve practical applications. In this paper, we demonstrate the interchangeability of three popular and distinct adaptation tools: parameter updating, reward modeling, and in-context prompting. This interchangeability establishes a triangular framework with six transformation… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  45. arXiv:2406.16200  [pdf, other

    cs.LG cs.CR cs.IT eess.SP

    Towards unlocking the mystery of adversarial fragility of neural networks

    Authors: Jingchao Gao, Raghu Mudumbai, Xiaodong Wu, Jirong Yi, Catherine Xu, Hui Xie, Weiyu Xu

    Abstract: In this paper, we study the adversarial robustness of deep neural networks for classification tasks. We look at the smallest magnitude of possible additive perturbations that can change the output of a classification algorithm. We provide a matrix-theoretic explanation of the adversarial fragility of deep neural network for classification. In particular, our theoretical results show that neural ne… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: 21 pages

  46. arXiv:2406.14795  [pdf, other

    cs.RO eess.SY

    Design and Control of a Low-cost Non-backdrivable End-effector Upper Limb Rehabilitation Device

    Authors: Fulan Li, Yunfei Guo, Wenda Xu, Weide Zhang, Fangyun Zhao, Baiyu Wang, Huaguang Du, Chengkun Zhang

    Abstract: This paper presents the development of an upper limb end-effector based rehabilitation device for stroke patients, offering assistance or resistance along any 2-dimensional trajectory during physical therapy. It employs a non-backdrivable ball-screw-driven mechanism for enhanced control accuracy. The control system features three novel algorithms: First, the Implicit Euler velocity control algorit… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 12 pages, 15 figures

  47. arXiv:2406.14449  [pdf, other

    cs.AI

    APEER: Automatic Prompt Engineering Enhances Large Language Model Reranking

    Authors: Can Jin, Hongwu Peng, Shiyu Zhao, Zhenting Wang, Wujiang Xu, Ligong Han, Jiahui Zhao, Kai Zhong, Sanguthevar Rajasekaran, Dimitris N. Metaxas

    Abstract: Large Language Models (LLMs) have significantly enhanced Information Retrieval (IR) across various modules, such as reranking. Despite impressive performance, current zero-shot relevance ranking with LLMs heavily relies on human prompt engineering. Existing automatic prompt engineering algorithms primarily focus on language modeling and classification tasks, leaving the domain of IR, particularly… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  48. arXiv:2406.14399  [pdf, other

    cs.LG cs.CV physics.ao-ph stat.ML

    WEATHER-5K: A Large-scale Global Station Weather Dataset Towards Comprehensive Time-series Forecasting Benchmark

    Authors: Tao Han, Song Guo, Zhenghao Chen, Wanghan Xu, Lei Bai

    Abstract: Global Station Weather Forecasting (GSWF) is crucial for various sectors, including aviation, agriculture, energy, and disaster preparedness. Recent advancements in deep learning have significantly improved the accuracy of weather predictions by optimizing models based on public meteorological data. However, existing public datasets for GSWF optimization and benchmarking still suffer from signific… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 26 pages,13 figures

  49. arXiv:2406.14098  [pdf, ps, other

    cs.CV

    HeartBeat: Towards Controllable Echocardiography Video Synthesis with Multimodal Conditions-Guided Diffusion Models

    Authors: Xinrui Zhou, Yuhao Huang, Wufeng Xue, Haoran Dou, Jun Cheng, Han Zhou, Dong Ni

    Abstract: Echocardiography (ECHO) video is widely used for cardiac examination. In clinical, this procedure heavily relies on operator experience, which needs years of training and maybe the assistance of deep learning-based systems for enhanced accuracy and efficiency. However, it is challenging since acquiring sufficient customized data (e.g., abnormal cases) for novice training and deep model development… ▽ More

    Submitted 4 July, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

    Comments: Accepted by MICCAI 2024

  50. arXiv:2406.12168  [pdf, other

    cs.LG cs.AI cs.CL

    BPO: Supercharging Online Preference Learning by Adhering to the Proximity of Behavior LLM

    Authors: Wenda Xu, Jiachen Li, William Yang Wang, Lei Li

    Abstract: Direct alignment from preferences (DAP) has emerged as a promising paradigm for aligning large language models (LLMs) to human desiderata from pre-collected, offline preference datasets. While recent studies indicate that existing offline DAP methods can directly benefit from online training samples, we highlight the need to develop specific online DAP algorithms to fully harness the power of onli… ▽ More

    Submitted 19 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

    Comments: Wenda Xu and Jiachen Li contributed equally