Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 1,261 results for author: Xu, W

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.18573  [pdf

    cs.CV cs.CY cs.GR

    Generating grid maps via the snake model

    Authors: Zhiwei Wei, Nai Yang, Wenjia Xu, Su Ding

    Abstract: The grid map, often referred to as the tile map, stands as a vital tool in geospatial visualization, possessing unique attributes that differentiate it from more commonly known techniques such as choropleths and cartograms. It transforms geographic regions into grids, which requires the displacement of both region centroids and boundary nodes to establish a coherent grid arrangement. However, exis… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: 10 Pages, 8 Figures

    Journal ref: Transactions in GIS, 2024, 1-19

  2. arXiv:2406.18102  [pdf

    eess.IV cs.CV

    A Lung Nodule Dataset with Histopathology-based Cancer Type Annotation

    Authors: Muwei Jian, Hongyu Chen, Zaiyong Zhang, Nan Yang, Haorang Zhang, Lifu Ma, Wenjing Xu, Huixiang Zhi

    Abstract: Recently, Computer-Aided Diagnosis (CAD) systems have emerged as indispensable tools in clinical diagnostic workflows, significantly alleviating the burden on radiologists. Nevertheless, despite their integration into clinical settings, CAD systems encounter limitations. Specifically, while CAD systems can achieve high performance in the detection of lung nodules, they face challenges in accuratel… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  3. arXiv:2406.16943  [pdf, other

    eess.SP cs.AI cs.HC cs.LG

    EarDA: Towards Accurate and Data-Efficient Earable Activity Sensing

    Authors: Shengzhe Lyu, Yongliang Chen, Di Duan, Renqi Jia, Weitao Xu

    Abstract: In the realm of smart sensing with the Internet of Things, earable devices are empowered with the capability of multi-modality sensing and intelligence of context-aware computing, leading to its wide usage in Human Activity Recognition (HAR). Nonetheless, unlike the movements captured by Inertial Measurement Unit (IMU) sensors placed on the upper or lower body, those motion signals obtained from e… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: accepted by 2024 IEEE Coupling of Sensing & Computing in AIoT Systems (CSCAIoT)

  4. arXiv:2406.16537  [pdf, other

    cs.CV cs.AI

    Character-Adapter: Prompt-Guided Region Control for High-Fidelity Character Customization

    Authors: Yuhang Ma, Wenting Xu, Jiji Tang, Qinfeng Jin, Rongsheng Zhang, Zeng Zhao, Changjie Fan, Zhipeng Hu

    Abstract: Customized image generation, which seeks to synthesize images with consistent characters, holds significant relevance for applications such as storytelling, portrait generation, and character design. However, previous approaches have encountered challenges in preserving characters with high-fidelity consistency due to inadequate feature extraction and concept confusion of reference characters. The… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  5. arXiv:2406.16377  [pdf, other

    cs.CL cs.AI

    On the Transformations across Reward Model, Parameter Update, and In-Context Prompt

    Authors: Deng Cai, Huayang Li, Tingchen Fu, Siheng Li, Weiwen Xu, Shuaiyi Li, Bowen Cao, Zhisong Zhang, Xinting Huang, Leyang Cui, Yan Wang, Lemao Liu, Taro Watanabe, Shuming Shi

    Abstract: Despite the general capabilities of pre-trained large language models (LLMs), they still need further adaptation to better serve practical applications. In this paper, we demonstrate the interchangeability of three popular and distinct adaptation tools: parameter updating, reward modeling, and in-context prompting. This interchangeability establishes a triangular framework with six transformation… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  6. arXiv:2406.16200  [pdf, other

    cs.LG cs.CR cs.IT eess.SP

    Towards unlocking the mystery of adversarial fragility of neural networks

    Authors: Jingchao Gao, Raghu Mudumbai, Xiaodong Wu, Jirong Yi, Catherine Xu, Hui Xie, Weiyu Xu

    Abstract: In this paper, we study the adversarial robustness of deep neural networks for classification tasks. We look at the smallest magnitude of possible additive perturbations that can change the output of a classification algorithm. We provide a matrix-theoretic explanation of the adversarial fragility of deep neural network for classification. In particular, our theoretical results show that neural ne… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: 21 pages

  7. arXiv:2406.14795  [pdf, other

    cs.RO eess.SY

    Design and Control of a Low-cost Non-backdrivable End-effector Upper Limb Rehabilitation Device

    Authors: Fulan Li, Yunfei Guo, Wenda Xu, Weide Zhang, Fangyun Zhao, Baiyu Wang, Huaguang Du, Chengkun Zhang

    Abstract: This paper presents the development of an upper limb end-effector based rehabilitation device for stroke patients, offering assistance or resistance along any 2-dimensional trajectory during physical therapy. It employs a non-backdrivable ball-screw-driven mechanism for enhanced control accuracy. The control system features three novel algorithms: First, the Implicit Euler velocity control algorit… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 12 pages, 15 figures

  8. arXiv:2406.14449  [pdf, other

    cs.AI

    APEER: Automatic Prompt Engineering Enhances Large Language Model Reranking

    Authors: Can Jin, Hongwu Peng, Shiyu Zhao, Zhenting Wang, Wujiang Xu, Ligong Han, Jiahui Zhao, Kai Zhong, Sanguthevar Rajasekaran, Dimitris N. Metaxas

    Abstract: Large Language Models (LLMs) have significantly enhanced Information Retrieval (IR) across various modules, such as reranking. Despite impressive performance, current zero-shot relevance ranking with LLMs heavily relies on human prompt engineering. Existing automatic prompt engineering algorithms primarily focus on language modeling and classification tasks, leaving the domain of IR, particularly… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  9. arXiv:2406.14399  [pdf, other

    cs.LG cs.CV physics.ao-ph stat.ML

    WEATHER-5K: A Large-scale Global Station Weather Dataset Towards Comprehensive Time-series Forecasting Benchmark

    Authors: Tao Han, Song Guo, Zhenghao Chen, Wanghan Xu, Lei Bai

    Abstract: Global Station Weather Forecasting (GSWF) is crucial for various sectors, including aviation, agriculture, energy, and disaster preparedness. Recent advancements in deep learning have significantly improved the accuracy of weather predictions by optimizing models based on public meteorological data. However, existing public datasets for GSWF optimization and benchmarking still suffer from signific… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 26 pages,13 figures

  10. arXiv:2406.14098  [pdf, ps, other

    cs.CV

    HeartBeat: Towards Controllable Echocardiography Video Synthesis with Multimodal Conditions-Guided Diffusion Models

    Authors: Xinrui Zhou, Yuhao Huang, Wufeng Xue, Haoran Dou, Jun Cheng, Han Zhou, Dong Ni

    Abstract: Echocardiography (ECHO) video is widely used for cardiac examination. In clinical, this procedure heavily relies on operator experience, which needs years of training and maybe the assistance of deep learning-based systems for enhanced accuracy and efficiency. However, it is challenging since acquiring sufficient customized data (e.g., abnormal cases) for novice training and deep model development… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Accepted by MICCAI 2024

  11. arXiv:2406.12168  [pdf, other

    cs.LG cs.AI cs.CL

    BPO: Supercharging Online Preference Learning by Adhering to the Proximity of Behavior LLM

    Authors: Wenda Xu, Jiachen Li, William Yang Wang, Lei Li

    Abstract: Direct alignment from preferences (DAP) has emerged as a promising paradigm for aligning large language models (LLMs) to human desiderata from pre-collected, offline preference datasets. While recent studies indicate that existing offline DAP methods can directly benefit from online training samples, we highlight the need to develop specific online DAP algorithms to fully harness the power of onli… ▽ More

    Submitted 19 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

    Comments: Wenda Xu and Jiachen Li contributed equally

  12. arXiv:2406.11636  [pdf, other

    eess.IV cs.CV cs.LG

    Feasibility of Federated Learning from Client Databases with Different Brain Diseases and MRI Modalities

    Authors: Felix Wagner, Wentian Xu, Pramit Saha, Ziyun Liang, Daniel Whitehouse, David Menon, Natalie Voets, J. Alison Noble, Konstantinos Kamnitsas

    Abstract: Segmentation models for brain lesions in MRI are commonly developed for a specific disease and trained on data with a predefined set of MRI modalities. Each such model cannot segment the disease using data with a different set of MRI modalities, nor can it segment any other type of disease. Moreover, this training paradigm does not allow a model to benefit from learning from heterogeneous database… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    ACM Class: I.4.9; I.4.6; I.2.11; I.4.0

  13. arXiv:2406.10517  [pdf, other

    cs.IR cs.AI cs.LG

    ADSNet: Cross-Domain LTV Prediction with an Adaptive Siamese Network in Advertising

    Authors: Ruize Wang, Hui Xu, Ying Cheng, Qi He, Xing Zhou, Rui Feng, Wei Xu, Lei Huang, Jie Jiang

    Abstract: Advertising platforms have evolved in estimating Lifetime Value (LTV) to better align with advertisers' true performance metric. However, the sparsity of real-world LTV data presents a significant challenge to LTV predictive model(i.e., pLTV), severely limiting the their capabilities. Therefore, we propose to utilize external data, in addition to the internal data of advertising platform, to expan… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: Accepted to KDD 2024

  14. arXiv:2406.09056  [pdf, other

    cs.CL cs.AI

    CUDRT: Benchmarking the Detection of Human vs. Large Language Models Generated Texts

    Authors: Zhen Tao, Zhiyu Li, Dinghao Xi, Wei Xu

    Abstract: The proliferation of large language models (LLMs) has significantly enhanced text generation capabilities across various industries. However, these models' ability to generate human-like text poses substantial challenges in discerning between human and AI authorship. Despite the effectiveness of existing AI-generated text detectors, their development is hindered by the lack of comprehensive, publi… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: 32 pages

  15. arXiv:2406.08587  [pdf, other

    cs.CL cs.AI cs.LG

    CS-Bench: A Comprehensive Benchmark for Large Language Models towards Computer Science Mastery

    Authors: Xiaoshuai Song, Muxi Diao, Guanting Dong, Zhengyang Wang, Yujia Fu, Runqi Qiao, Zhexu Wang, Dayuan Fu, Huangxuan Wu, Bin Liang, Weihao Zeng, Yejie Wang, Zhuoma GongQue, Jianing Yu, Qiuna Tan, Weiran Xu

    Abstract: Computer Science (CS) stands as a testament to the intricacies of human intelligence, profoundly advancing the development of artificial intelligence and modern society. However, the current community of large language models (LLMs) overly focuses on benchmarks for analyzing specific foundational skills (e.g. mathematics and code generation), neglecting an all-round evaluation of the computer scie… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Work in progress

  16. arXiv:2406.07648  [pdf, other

    cs.CV

    M-LRM: Multi-view Large Reconstruction Model

    Authors: Mengfei Li, Xiaoxiao Long, Yixun Liang, Weiyu Li, Yuan Liu, Peng Li, Xiaowei Chi, Xingqun Qi, Wei Xue, Wenhan Luo, Qifeng Liu, Yike Guo

    Abstract: Despite recent advancements in the Large Reconstruction Model (LRM) demonstrating impressive results, when extending its input from single image to multiple images, it exhibits inefficiencies, subpar geometric and texture quality, as well as slower convergence speed than expected. It is attributed to that, LRM formulates 3D reconstruction as a naive images-to-3D translation problem, ignoring the… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  17. arXiv:2406.07349  [pdf, other

    cs.CR

    Erasing Radio Frequency Fingerprints via Active Adversarial Perturbation

    Authors: Zhaoyi Lu, Wenchao Xu, Ming Tu, Xin Xie, Cunqing Hua, Nan Cheng

    Abstract: Radio Frequency (RF) fingerprinting is to identify a wireless device from its uniqueness of the analog circuitry or hardware imperfections. However, unlike the MAC address which can be modified, such hardware feature is inevitable for the signal emitted to air, which can possibly reveal device whereabouts, e.g., a sniffer can use a pre-trained model to identify a nearby device when receiving its s… ▽ More

    Submitted 12 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

  18. arXiv:2406.07177  [pdf, other

    cs.LG

    TernaryLLM: Ternarized Large Language Model

    Authors: Tianqi Chen, Zhe Li, Weixiang Xu, Zeyu Zhu, Dong Li, Lu Tian, Emad Barsoum, Peisong Wang, Jian Cheng

    Abstract: Large language models (LLMs) have achieved remarkable performance on Natural Language Processing (NLP) tasks, but they are hindered by high computational costs and memory requirements. Ternarization, an extreme form of quantization, offers a solution by reducing memory usage and enabling energy-efficient floating-point additions. However, applying ternarization to LLMs faces challenges stemming fr… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  19. arXiv:2406.07089  [pdf, other

    cs.CV

    RS-Agent: Automating Remote Sensing Tasks through Intelligent Agents

    Authors: Wenjia Xu, Zijian Yu, Yixu Wang, Jiuniu Wang, Mugen Peng

    Abstract: An increasing number of models have achieved great performance in remote sensing tasks with the recent development of Large Language Models (LLMs) and Visual Language Models (VLMs). However, these models are constrained to basic vision and language instruction-tuning tasks, facing challenges in complex remote sensing applications. Additionally, these models lack specialized expertise in profession… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  20. arXiv:2406.05366  [pdf, other

    cs.LG math.OC

    Regret Bounds for Episodic Risk-Sensitive Linear Quadratic Regulator

    Authors: Wenhao Xu, Xuefeng Gao, Xuedong He

    Abstract: Risk-sensitive linear quadratic regulator is one of the most fundamental problems in risk-sensitive optimal control. In this paper, we study online adaptive control of risk-sensitive linear quadratic regulator in the finite horizon episodic setting. We propose a simple least-squares greedy algorithm and show that it achieves $\widetilde{\mathcal{O}}(\log N)$ regret under a specific identifiability… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

  21. arXiv:2406.04428  [pdf, other

    cs.CL cs.AI

    MoralBench: Moral Evaluation of LLMs

    Authors: Jianchao Ji, Yutong Chen, Mingyu Jin, Wujiang Xu, Wenyue Hua, Yongfeng Zhang

    Abstract: In the rapidly evolving field of artificial intelligence, large language models (LLMs) have emerged as powerful tools for a myriad of applications, from natural language processing to decision-making support systems. However, as these models become increasingly integrated into societal frameworks, the imperative to ensure they operate within ethical and moral boundaries has never been more critica… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  22. arXiv:2406.04321  [pdf, other

    cs.CV cs.LG cs.MM cs.SD

    VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling

    Authors: Zeyue Tian, Zhaoyang Liu, Ruibin Yuan, Jiahao Pan, Xiaoqiang Huang, Qifeng Liu, Xu Tan, Qifeng Chen, Wei Xue, Yike Guo

    Abstract: In this work, we systematically study music generation conditioned solely on the video. First, we present a large-scale dataset comprising 190K video-music pairs, including various genres such as movie trailers, advertisements, and documentaries. Furthermore, we propose VidMuse, a simple framework for generating music aligned with video inputs. VidMuse stands out by producing high-fidelity music t… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: The code and datasets will be available at https://github.com/ZeyueT/VidMuse/

  23. arXiv:2406.03888  [pdf, ps, other

    cs.IT eess.SP

    MSE-Based Training and Transmission Optimization for MIMO ISAC Systems

    Authors: Zhenyao He, Wei Xu, Hong Shen, Yonina C. Eldar, Xiaohu You

    Abstract: In this paper, we investigate a multiple-input multiple-output (MIMO) integrated sensing and communication (ISAC) system under typical block-fading channels. As a non-trivial extension to most existing works on ISAC, both the training and transmission signals sent by the ISAC transmitter are exploited for sensing. Specifically, we develop two training and transmission design schemes to minimize a… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  24. arXiv:2406.03703  [pdf, other

    cs.CL cs.LG

    Synthesizing Conversations from Unlabeled Documents using Automatic Response Segmentation

    Authors: Fanyou Wu, Weijie Xu, Chandan K. Reddy, Srinivasan H. Sengamedu

    Abstract: In this study, we tackle the challenge of inadequate and costly training data that has hindered the development of conversational question answering (ConvQA) systems. Enterprises have a large corpus of diverse internal documents. Instead of relying on a searching engine, a more compelling approach for people to comprehend these documents is to create a dialogue system. In this paper, we propose a… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: findings of ACL 2024

  25. arXiv:2406.03459  [pdf, other

    cs.CV

    LW-DETR: A Transformer Replacement to YOLO for Real-Time Detection

    Authors: Qiang Chen, Xiangbo Su, Xinyu Zhang, Jian Wang, Jiahui Chen, Yunpeng Shen, Chuchu Han, Ziliang Chen, Weixiang Xu, Fanrong Li, Shan Zhang, Kun Yao, Errui Ding, Gang Zhang, Jingdong Wang

    Abstract: In this paper, we present a light-weight detection transformer, LW-DETR, which outperforms YOLOs for real-time object detection. The architecture is a simple stack of a ViT encoder, a projector, and a shallow DETR decoder. Our approach leverages recent advanced techniques, such as training-effective techniques, e.g., improved loss and pretraining, and interleaved window and global attentions for r… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  26. arXiv:2406.02291  [pdf, other

    cs.NI eess.SP

    A deep-learning-based MAC for integrating channel access, rate adaptation and channel switch

    Authors: Jiantao Xin, Wei Xu, Bin Cao, Taotao Wang, Shengli Zhang

    Abstract: With increasing density and heterogeneity in unlicensed wireless networks, traditional MAC protocols, such as carrier-sense multiple access with collision avoidance (CSMA/CA) in Wi-Fi networks, are experiencing performance degradation. This is manifested in increased collisions and extended backoff times, leading to diminished spectrum efficiency and protocol coordination. Addressing these issues,… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  27. arXiv:2406.00965  [pdf, other

    cs.RO cs.AI

    Efficient Behavior Tree Planning with Commonsense Pruning and Heuristic

    Authors: Xinglin Chen, Yishuai Cai, Yunxin Mao, Minglong Li, Zhou Yang, Wen Shanghua, Wenjing Yang, Weixia Xu, Ji Wang

    Abstract: Behavior Tree (BT) planning is crucial for autonomous robot behavior control, yet its application in complex scenarios is hampered by long planning times. Pruning and heuristics are common techniques to accelerate planning, but it is difficult to design general pruning strategies and heuristic functions for BT planning problems. This paper proposes improving BT planning efficiency for everyday ser… ▽ More

    Submitted 3 June, 2024; v1 submitted 2 June, 2024; originally announced June 2024.

  28. arXiv:2406.00645  [pdf, other

    cs.LG cs.AI cs.CV

    FuRL: Visual-Language Models as Fuzzy Rewards for Reinforcement Learning

    Authors: Yuwei Fu, Haichao Zhang, Di Wu, Wei Xu, Benoit Boulet

    Abstract: In this work, we investigate how to leverage pre-trained visual-language models (VLM) for online Reinforcement Learning (RL). In particular, we focus on sparse reward tasks with pre-defined textual task descriptions. We first identify the problem of reward misalignment when applying VLM as a reward in RL tasks. To address this issue, we introduce a lightweight fine-tuning method, named Fuzzy VLM r… ▽ More

    Submitted 4 June, 2024; v1 submitted 2 June, 2024; originally announced June 2024.

    Comments: ICML 2024

  29. arXiv:2405.20902  [pdf, other

    cs.CL cs.AI cs.CR

    Preemptive Answer "Attacks" on Chain-of-Thought Reasoning

    Authors: Rongwu Xu, Zehan Qi, Wei Xu

    Abstract: Large language models (LLMs) showcase impressive reasoning capabilities when coupled with Chain-of-Thought (CoT) prompting. However, the robustness of this approach warrants further investigation. In this paper, we introduce a novel scenario termed preemptive answers, where the LLM obtains an answer before engaging in reasoning. This situation can arise inadvertently or induced by malicious users… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

    Comments: Accepted to ACL'24 (Findings). Camera-ready version

  30. arXiv:2405.20852  [pdf, other

    cs.CL

    Towards Spoken Language Understanding via Multi-level Multi-grained Contrastive Learning

    Authors: Xuxin Cheng, Wanshi Xu, Zhihong Zhu, Hongxiang Li, Yuexian Zou

    Abstract: Spoken language understanding (SLU) is a core task in task-oriented dialogue systems, which aims at understanding the user's current goal through constructing semantic frames. SLU usually consists of two subtasks, including intent detection and slot filling. Although there are some SLU frameworks joint modeling the two subtasks and achieving high performance, most of them still overlook the inhere… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

  31. arXiv:2405.20710  [pdf, other

    cs.IR

    Information Maximization via Variational Autoencoders for Cross-Domain Recommendation

    Authors: Xuying Ning, Wujiang Xu, Xiaolei Liu, Mingming Ha, Qiongxu Ma, Youru Li, Linxun Chen, Yongfeng Zhang

    Abstract: Cross-Domain Sequential Recommendation (CDSR) methods aim to address the data sparsity and cold-start problems present in Single-Domain Sequential Recommendation (SDSR). Existing CDSR methods typically rely on overlapping users, designing complex cross-domain modules to capture users' latent interests that can propagate across different domains. However, their propagated informative information is… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

  32. arXiv:2405.19334  [pdf, other

    cs.AI cs.CL cs.CV cs.MM cs.SD

    LLMs Meet Multimodal Generation and Editing: A Survey

    Authors: Yingqing He, Zhaoyang Liu, Jingye Chen, Zeyue Tian, Hongyu Liu, Xiaowei Chi, Runtao Liu, Ruibin Yuan, Yazhou Xing, Wenhai Wang, Jifeng Dai, Yong Zhang, Wei Xue, Qifeng Liu, Yike Guo, Qifeng Chen

    Abstract: With the recent advancement in large language models (LLMs), there is a growing interest in combining LLMs with multimodal learning. Previous surveys of multimodal large language models (MLLMs) mainly focus on multimodal understanding. This survey elaborates on multimodal generation and editing across various domains, comprising image, video, 3D, and audio. Specifically, we summarize the notable a… ▽ More

    Submitted 9 June, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

    Comments: 52 Pages with 16 Figures, 12 Tables, and 545 References. GitHub Repository at: https://github.com/YingqingHe/Awesome-LLMs-meet-Multimodal-Generation

  33. arXiv:2405.18511  [pdf, other

    cs.CV

    Feasibility and benefits of joint learning from MRI databases with different brain diseases and modalities for segmentation

    Authors: Wentian Xu, Matthew Moffat, Thalia Seale, Ziyun Liang, Felix Wagner, Daniel Whitehouse, David Menon, Virginia Newcombe, Natalie Voets, Abhirup Banerjee, Konstantinos Kamnitsas

    Abstract: Models for segmentation of brain lesions in multi-modal MRI are commonly trained for a specific pathology using a single database with a predefined set of MRI modalities, determined by a protocol for the specific disease. This work explores the following open questions: Is it feasible to train a model using multiple databases that contain varying sets of MRI modalities and annotations for differen… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: Accepted to MIDL 2024

    Journal ref: Proceedings of Machine Learning Research, MIDL 2024

  34. arXiv:2405.17890  [pdf, other

    cs.IR cs.CL cs.LG

    SLMRec: Empowering Small Language Models for Sequential Recommendation

    Authors: Wujiang Xu, Zujie Liang, Jiaojiao Han, Xuying Ning, Wenfang Lin, Linxun Chen, Feng Wei, Yongfeng Zhang

    Abstract: The sequential Recommendation (SR) task involves predicting the next item a user is likely to interact with, given their past interactions. The SR models examine the sequence of a user's actions to discern more complex behavioral patterns and temporal dynamics. Recent research demonstrates the great impact of LLMs on sequential recommendation systems, either viewing sequential recommendation as la… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  35. arXiv:2405.17759  [pdf, ps, other

    cs.IT

    Wireless Federated Learning over Resource-Constrained Networks: Digital versus Analog Transmissions

    Authors: Jiacheng Yao, Wei Xu, Zhaohui Yang, Xiaohu You, Mehdi Bennis, H. Vincent Poor

    Abstract: To enable wireless federated learning (FL) in communication resource-constrained networks, two communication schemes, i.e., digital and analog ones, are effective solutions. In this paper, we quantitatively compare these two techniques, highlighting their essential differences as well as respectively suitable scenarios. We first examine both digital and analog transmission schemes, together with a… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: Accepted by IEEE TWC. arXiv admin note: text overlap with arXiv:2402.09657

  36. arXiv:2405.17329  [pdf, other

    cs.IT eess.SP

    Joint MIMO Transceiver and Reflector Design for Reconfigurable Intelligent Surface-Assisted Communication

    Authors: Yaqiong Zhao, Jindan Xu, Wei Xu, Kezhi Wang, Xinquan Ye, Chau Yuen, Xiaohu You

    Abstract: In this paper, we consider a reconfigurable intelligent surface (RIS)-assisted multiple-input multiple-output communication system with multiple antennas at both the base station (BS) and the user. We plan to maximize the achievable rate through jointly optimizing the transmit precoding matrix, the receive combining matrix, and the RIS reflection matrix under the constraints of the transmit power… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: 14 pages, 12 figures

  37. arXiv:2405.16874  [pdf, other

    cs.CV

    CoCoGesture: Toward Coherent Co-speech 3D Gesture Generation in the Wild

    Authors: Xingqun Qi, Hengyuan Zhang, Yatian Wang, Jiahao Pan, Chen Liu, Peng Li, Xiaowei Chi, Mengfei Li, Qixun Zhang, Wei Xue, Shanghang Zhang, Wenhan Luo, Qifeng Liu, Yike Guo

    Abstract: Deriving co-speech 3D gestures has seen tremendous progress in virtual avatar animation. Yet, the existing methods often produce stiff and unreasonable gestures with unseen human speech inputs due to the limited 3D speech-gesture data. In this paper, we propose CoCoGesture, a novel framework enabling vivid and diverse gesture synthesis from unseen human speech prompts. Our key insight is built upo… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: The dataset will be released as soon as possible

  38. arXiv:2405.15964  [pdf, other

    cs.CL

    A hierarchical Bayesian model for syntactic priming

    Authors: Weijie Xu, Richard Futrell

    Abstract: The effect of syntactic priming exhibits three well-documented empirical properties: the lexical boost, the inverse frequency effect, and the asymmetrical decay. We aim to show how these three empirical phenomena can be reconciled in a general learning framework, the hierarchical Bayesian model (HBM). The model represents syntactic knowledge in a hierarchical structure of syntactic statistics, whe… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: 6 pages; accepted to CogSci 2024

  39. arXiv:2405.15403  [pdf, other

    cs.LG stat.ML

    Fine-Grained Dynamic Framework for Bias-Variance Joint Optimization on Data Missing Not at Random

    Authors: Mingming Ha, Xuewen Tao, Wenfang Lin, Qionxu Ma, Wujiang Xu, Linxun Chen

    Abstract: In most practical applications such as recommendation systems, display advertising, and so forth, the collected data often contains missing values and those missing values are generally missing-not-at-random, which deteriorates the prediction performance of models. Some existing estimators and regularizers attempt to achieve unbiased estimation to improve the predictive performance. However, varia… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  40. arXiv:2405.13796  [pdf, other

    cs.LG cs.AI

    Generalizing Weather Forecast to Fine-grained Temporal Scales via Physics-AI Hybrid Modeling

    Authors: Wanghan Xu, Fenghua Ling, Wenlong Zhang, Tao Han, Hao Chen, Wanli Ouyang, Lei Bai

    Abstract: Data-driven artificial intelligence (AI) models have made significant advancements in weather forecasting, particularly in medium-range and nowcasting. However, most data-driven weather forecasting models are black-box systems that focus on learning data mapping rather than fine-grained physical evolution in the time dimension. Consequently, the limitations in the temporal scale of datasets preven… ▽ More

    Submitted 29 May, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

  41. arXiv:2405.13711  [pdf, other

    cs.LG cs.AI math.DS physics.ao-ph

    VAE-Var: Variational-Autoencoder-Enhanced Variational Assimilation

    Authors: Yi Xiao, Qilong Jia, Wei Xue, Lei Bai

    Abstract: Data assimilation refers to a set of algorithms designed to compute the optimal estimate of a system's state by refining the prior prediction (known as background states) using observed data. Variational assimilation methods rely on the maximum likelihood approach to formulate a variational cost, with the optimal state estimate derived by minimizing this cost. Although traditional variational meth… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  42. arXiv:2405.13077  [pdf, other

    cs.CR cs.AI cs.CL

    GPT-4 Jailbreaks Itself with Near-Perfect Success Using Self-Explanation

    Authors: Govind Ramesh, Yao Dou, Wei Xu

    Abstract: Research on jailbreaking has been valuable for testing and understanding the safety and security issues of large language models (LLMs). In this paper, we introduce Iterative Refinement Induced Self-Jailbreak (IRIS), a novel approach that leverages the reflective capabilities of LLMs for jailbreaking with only black-box access. Unlike previous methods, IRIS simplifies the jailbreaking process by u… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

  43. arXiv:2405.11936  [pdf, other

    cs.CV

    UAV-VisLoc: A Large-scale Dataset for UAV Visual Localization

    Authors: Wenjia Xu, Yaxuan Yao, Jiaqi Cao, Zhiwei Wei, Chunbo Liu, Jiuniu Wang, Mugen Peng

    Abstract: The application of unmanned aerial vehicles (UAV) has been widely extended recently. It is crucial to ensure accurate latitude and longitude coordinates for UAVs, especially when the global navigation satellite systems (GNSS) are disrupted and unreliable. Existing visual localization methods achieve autonomous visual localization without error accumulation by matching the ground-down view image of… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

  44. arXiv:2405.11386  [pdf, other

    eess.IV cs.CV

    Liver Fat Quantification Network with Body Shape

    Authors: Qiyue Wang, Wu Xue, Xiaoke Zhang, Fang Jin, James Hahn

    Abstract: It is critically important to detect the content of liver fat as it is related to cardiac complications and cardiovascular disease mortality. However, existing methods are either associated with high cost and/or medical complications (e.g., liver biopsy, imaging technology) or only roughly estimate the grades of steatosis. In this paper, we propose a deep neural network to estimate the percentage… ▽ More

    Submitted 30 May, 2024; v1 submitted 18 May, 2024; originally announced May 2024.

  45. arXiv:2405.09215  [pdf, other

    cs.CV cs.AI

    Xmodel-VLM: A Simple Baseline for Multimodal Vision Language Model

    Authors: Wanting Xu, Yang Liu, Langping He, Xucheng Huang, Ling Jiang

    Abstract: We introduce Xmodel-VLM, a cutting-edge multimodal vision language model. It is designed for efficient deployment on consumer GPU servers. Our work directly confronts a pivotal industry issue by grappling with the prohibitive service costs that hinder the broad adoption of large-scale multimodal systems. Through rigorous training, we have developed a 1B-scale language model from the ground up, emp… ▽ More

    Submitted 20 June, 2024; v1 submitted 15 May, 2024; originally announced May 2024.

  46. arXiv:2405.07969  [pdf, other

    cs.CV cs.AI

    Investigating the Semantic Robustness of CLIP-based Zero-Shot Anomaly Segmentation

    Authors: Kevin Stangl, Marius Arvinte, Weilin Xu, Cory Cornelius

    Abstract: Zero-shot anomaly segmentation using pre-trained foundation models is a promising approach that enables effective algorithms without expensive, domain-specific training or fine-tuning. Ensuring that these methods work across various environmental conditions and are robust to distribution shifts is an open problem. We investigate the performance of WinCLIP [14] zero-shot anomaly segmentation algori… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

  47. arXiv:2405.07933  [pdf, other

    cs.CV

    Authentic Hand Avatar from a Phone Scan via Universal Hand Model

    Authors: Gyeongsik Moon, Weipeng Xu, Rohan Joshi, Chenglei Wu, Takaaki Shiratori

    Abstract: The authentic 3D hand avatar with every identifiable information, such as hand shapes and textures, is necessary for immersive experiences in AR/VR. In this paper, we present a universal hand model (UHM), which 1) can universally represent high-fidelity 3D hand meshes of arbitrary identities (IDs) and 2) can be adapted to each person with a short phone scan for the authentic hand avatar. For effec… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

    Comments: Accepted to CVPR 2024

  48. arXiv:2405.07682  [pdf, other

    cs.SD cs.AI cs.CL cs.MM eess.AS

    FastSAG: Towards Fast Non-Autoregressive Singing Accompaniment Generation

    Authors: Jianyi Chen, Wei Xue, Xu Tan, Zhen Ye, Qifeng Liu, Yike Guo

    Abstract: Singing Accompaniment Generation (SAG), which generates instrumental music to accompany input vocals, is crucial to developing human-AI symbiotic art creation systems. The state-of-the-art method, SingSong, utilizes a multi-stage autoregressive (AR) model for SAG, however, this method is extremely slow as it generates semantic and acoustic tokens recursively, and this makes it impossible for real-… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

    Comments: IJCAI 2024

  49. arXiv:2405.07474  [pdf, other

    cs.AI cs.HC cs.RO

    Integrating Intent Understanding and Optimal Behavior Planning for Behavior Tree Generation from Human Instructions

    Authors: Xinglin Chen, Yishuai Cai, Yunxin Mao, Minglong Li, Wenjing Yang, Weixia Xu, Ji Wang

    Abstract: Robots executing tasks following human instructions in domestic or industrial environments essentially require both adaptability and reliability. Behavior Tree (BT) emerges as an appropriate control architecture for these scenarios due to its modularity and reactivity. Existing BT generation methods, however, either do not involve interpreting natural language or cannot theoretically guarantee the… ▽ More

    Submitted 27 June, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

  50. arXiv:2405.07029  [pdf

    cs.SD eess.AS

    A framework of text-dependent speaker verification for chinese numerical string corpus

    Authors: Litong Zheng, Feng Hong, Weijie Xu, Wan Zheng

    Abstract: The Chinese numerical string corpus, serves as a valuable resource for speaker verification, particularly in financial transactions. Researches indicate that in short speech scenarios, text-dependent speaker verification (TD-SV) consistently outperforms text-independent speaker verification (TI-SV). However, TD-SV potentially includes the validation of text information, that can be negatively impa… ▽ More

    Submitted 21 May, 2024; v1 submitted 11 May, 2024; originally announced May 2024.

    Comments: arXiv admin note: text overlap with arXiv:2312.01645