Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 983 results for author: Xie, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2409.02246  [pdf, other

    cs.LG math.OC

    Multi-Agent Reinforcement Learning for Joint Police Patrol and Dispatch

    Authors: Matthew Repasky, He Wang, Yao Xie

    Abstract: Police patrol units need to split their time between performing preventive patrol and being dispatched to serve emergency incidents. In the existing literature, patrol and dispatch decisions are often studied separately. We consider joint optimization of these two decisions to improve police operations efficiency and reduce response time to emergency calls. Methodology/results: We propose a novel… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

  2. arXiv:2409.01581  [pdf, other

    cs.RO cs.AI

    GaussianPU: A Hybrid 2D-3D Upsampling Framework for Enhancing Color Point Clouds via 3D Gaussian Splatting

    Authors: Zixuan Guo, Yifan Xie, Weijing Xie, Peng Huang, Fei Ma, Fei Richard Yu

    Abstract: Dense colored point clouds enhance visual perception and are of significant value in various robotic applications. However, existing learning-based point cloud upsampling methods are constrained by computational resources and batch processing strategies, which often require subdividing point clouds into smaller patches, leading to distortions that degrade perceptual quality. To address this challe… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

    Comments: 7 pages, 5 figures

  3. arXiv:2409.01557  [pdf, other

    cs.CV

    TASL-Net: Tri-Attention Selective Learning Network for Intelligent Diagnosis of Bimodal Ultrasound Video

    Authors: Chengqian Zhao, Zhao Yao, Zhaoyu Hu, Yuanxin Xie, Yafang Zhang, Yuanyuan Wang, Shuo Li, Jianhua Zhou, Jianqiao Zhou, Yin Wang, Jinhua Yu

    Abstract: In the intelligent diagnosis of bimodal (gray-scale and contrast-enhanced) ultrasound videos, medical domain knowledge such as the way sonographers browse videos, the particular areas they emphasize, and the features they pay special attention to, plays a decisive role in facilitating precise diagnosis. Embedding medical knowledge into the deep learning network can not only enhance performance but… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

  4. arXiv:2409.01353  [pdf, other

    cs.CV

    From Pixels to Objects: A Hierarchical Approach for Part and Object Segmentation Using Local and Global Aggregation

    Authors: Yunfei Xie, Cihang Xie, Alan Yuille, Jieru Mei

    Abstract: In this paper, we introduce a hierarchical transformer-based model designed for sophisticated image segmentation tasks, effectively bridging the granularity of part segmentation with the comprehensive scope of object segmentation. At the heart of our approach is a multi-level representation strategy, which systematically advances from individual pixels to superpixels, and ultimately to cohesive gr… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

  5. arXiv:2409.00086  [pdf, other

    cs.NI cs.AR cs.HC cs.LG eess.SY

    Towards Battery-Free Wireless Sensing via Radio-Frequency Energy Harvesting

    Authors: Tao Ni, Zehua Sun, Mingda Han, Guohao Lan, Yaxiong Xie, Zhenjiang Li, Tao Gu, Weitao Xu

    Abstract: Diverse Wi-Fi-based wireless applications have been proposed, ranging from daily activity recognition to vital sign monitoring. Despite their remarkable sensing accuracy, the high energy consumption and the requirement for customized hardware modification hinder the wide deployment of the existing sensing solutions. In this paper, we propose REHSense, an energy-efficient wireless sensing solution… ▽ More

    Submitted 25 August, 2024; originally announced September 2024.

  6. arXiv:2408.16403  [pdf, other

    cs.LG

    DeepSPoC: A Deep Learning-Based PDE Solver Governed by Sequential Propagation of Chaos

    Authors: Kai Du, Yongle Xie, Tao Zhou, Yuancheng Zhou

    Abstract: Sequential propagation of chaos (SPoC) is a recently developed tool to solve mean-field stochastic differential equations and their related nonlinear Fokker-Planck equations. Based on the theory of SPoC, we present a new method (deepSPoC) that combines the interacting particle system of SPoC and deep learning. Under the framework of deepSPoC, two classes of frequently used deep models include full… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  7. arXiv:2408.15600  [pdf, other

    cs.LG cs.DC

    Exploring Selective Layer Fine-Tuning in Federated Learning

    Authors: Yuchang Sun, Yuexiang Xie, Bolin Ding, Yaliang Li, Jun Zhang

    Abstract: Federated learning (FL) has emerged as a promising paradigm for fine-tuning foundation models using distributed data in a privacy-preserving manner. Under limited computational resources, clients often find it more practical to fine-tune a selected subset of layers, rather than the entire model, based on their task-specific data. In this study, we provide a thorough theoretical exploration of sele… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  8. arXiv:2408.14866  [pdf, other

    cs.CL cs.CR cs.LG

    Advancing Adversarial Suffix Transfer Learning on Aligned Large Language Models

    Authors: Hongfu Liu, Yuxi Xie, Ye Wang, Michael Shieh

    Abstract: Language Language Models (LLMs) face safety concerns due to potential misuse by malicious users. Recent red-teaming efforts have identified adversarial suffixes capable of jailbreaking LLMs using the gradient-based search algorithm Greedy Coordinate Gradient (GCG). However, GCG struggles with computational inefficiency, limiting further investigations regarding suffix transferability and scalabili… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

    Comments: 11 pages, 4 figures

  9. arXiv:2408.14792  [pdf, other

    cs.CY cs.AI cs.CL

    Measuring Human Contribution in AI-Assisted Content Generation

    Authors: Yueqi Xie, Tao Qi, Jingwei Yi, Ryan Whalen, Junming Huang, Qian Ding, Yu Xie, Xing Xie, Fangzhao Wu

    Abstract: With the growing prevalence of generative artificial intelligence (AI), an increasing amount of content is no longer exclusively generated by humans but by generative AI models with human guidance. This shift presents notable challenges for the delineation of originality due to the varying degrees of human contribution in AI-assisted works. This study raises the research question of measuring huma… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

  10. arXiv:2408.10853  [pdf, other

    cs.SD cs.AI eess.AS

    Does Current Deepfake Audio Detection Model Effectively Detect ALM-based Deepfake Audio?

    Authors: Yuankun Xie, Chenxu Xiong, Xiaopeng Wang, Zhiyong Wang, Yi Lu, Xin Qi, Ruibo Fu, Yukun Liu, Zhengqi Wen, Jianhua Tao, Guanjun Li, Long Ye

    Abstract: Currently, Audio Language Models (ALMs) are rapidly advancing due to the developments in large language models and audio neural codecs. These ALMs have significantly lowered the barrier to creating deepfake audio, generating highly realistic and diverse types of deepfake audio, which pose severe threats to society. Consequently, effective audio deepfake detection technologies to detect ALM-based a… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  11. arXiv:2408.10852  [pdf, other

    cs.SD eess.AS

    EELE: Exploring Efficient and Extensible LoRA Integration in Emotional Text-to-Speech

    Authors: Xin Qi, Ruibo Fu, Zhengqi Wen, Jianhua Tao, Shuchen Shi, Yi Lu, Zhiyong Wang, Xiaopeng Wang, Yuankun Xie, Yukun Liu, Guanjun Li, Xuefei Liu, Yongwei Li

    Abstract: In the current era of Artificial Intelligence Generated Content (AIGC), a Low-Rank Adaptation (LoRA) method has emerged. It uses a plugin-based approach to learn new knowledge with lower parameter quantities and computational costs, and it can be plugged in and out based on the specific sub-tasks, offering high flexibility. However, the current application schemes primarily incorporate LoRA into t… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  12. arXiv:2408.10849  [pdf, other

    cs.SD eess.AS

    A Noval Feature via Color Quantisation for Fake Audio Detection

    Authors: Zhiyong Wang, Xiaopeng Wang, Yuankun Xie, Ruibo Fu, Zhengqi Wen, Jianhua Tao, Yukun Liu, Guanjun Li, Xin Qi, Yi Lu, Xuefei Liu, Yongwei Li

    Abstract: In the field of deepfake detection, previous studies focus on using reconstruction or mask and prediction methods to train pre-trained models, which are then transferred to fake audio detection training where the encoder is used to extract features, such as wav2vec2.0 and Masked Auto Encoder. These methods have proven that using real audio for reconstruction pre-training can better help the model… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: accepted by ISCSLP2024

  13. arXiv:2408.10088  [pdf, other

    cs.SI

    Recent Surge in Public Interest in Transportation: Sentiment Analysis of Baidu Apollo Go Using Weibo Data

    Authors: Shiqi Wang, Zhouye Zhao, Yuhang Xie, Mingchuan Ma, Zirui Chen, Zeyu Wang, Bohao Su, Wenrui Xu, Tianyi Li

    Abstract: Urban mobility and transportation systems have been profoundly transformed by the advancement of autonomous vehicle technologies. Baidu Apollo Go, a pioneer robotaxi service from the Chinese tech giant Baidu, has recently been widely deployed in major cities like Beijing and Wuhan, sparking increased conversation and offering a glimpse into the future of urban mobility. This study investigates p… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    ACM Class: J.4

  14. arXiv:2408.09672  [pdf, other

    cs.LG math.OC stat.ML

    Regularization for Adversarial Robust Learning

    Authors: Jie Wang, Rui Gao, Yao Xie

    Abstract: Despite the growing prevalence of artificial neural networks in real-world applications, their vulnerability to adversarial attacks remains a significant concern, which motivates us to investigate the robustness of machine learning models. While various heuristics aim to optimize the distributionally robust risk using the $\infty$-Wasserstein metric, such a notion of robustness frequently encounte… ▽ More

    Submitted 22 August, 2024; v1 submitted 18 August, 2024; originally announced August 2024.

    Comments: 51 pages, 5 figures

  15. arXiv:2408.09333  [pdf, other

    cs.CL

    SkyScript-100M: 1,000,000,000 Pairs of Scripts and Shooting Scripts for Short Drama

    Authors: Jing Tang, Quanlu Jia, Yuqiang Xie, Zeyu Gong, Xiang Wen, Jiayi Zhang, Yalong Guo, Guibin Chen, Jiangping Yang

    Abstract: Generating high-quality shooting scripts containing information such as scene and shot language is essential for short drama script generation. We collect 6,660 popular short drama episodes from the Internet, each with an average of 100 short episodes, and the total number of short episodes is about 80,000, with a total duration of about 2,000 hours and totaling 10 terabytes (TB). We perform keyfr… ▽ More

    Submitted 28 August, 2024; v1 submitted 17 August, 2024; originally announced August 2024.

    Comments: 18 pages, 12 figures

  16. arXiv:2408.09074  [pdf, ps, other

    cs.LG math.OC

    Gradient-Variation Online Learning under Generalized Smoothness

    Authors: Yan-Feng Xie, Peng Zhao, Zhi-Hua Zhou

    Abstract: Gradient-variation online learning aims to achieve regret guarantees that scale with the variations in the gradients of online functions, which has been shown to be crucial for attaining fast convergence in games and robustness in stochastic optimization, hence receiving increased attention. Existing results often require the smoothness condition by imposing a fixed bound on the gradient Lipschitz… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

  17. arXiv:2408.08554  [pdf, other

    cs.LG

    ABQ-LLM: Arbitrary-Bit Quantized Inference Acceleration for Large Language Models

    Authors: Chao Zeng, Songwei Liu, Yusheng Xie, Hong Liu, Xiaojian Wang, Miao Wei, Shu Yang, Fangmin Chen, Xing Mei

    Abstract: Large Language Models (LLMs) have revolutionized natural language processing tasks. However, their practical application is constrained by substantial memory and computational demands. Post-training quantization (PTQ) is considered an effective method to accelerate LLM inference. Despite its growing popularity in LLM model compression, PTQ deployment faces two major challenges. First, low-bit quan… ▽ More

    Submitted 22 August, 2024; v1 submitted 16 August, 2024; originally announced August 2024.

  18. arXiv:2408.07337  [pdf, other

    cs.CV

    KIND: Knowledge Integration and Diversion in Diffusion Models

    Authors: Yucheng Xie, Fu Feng, Jing Wang, Xin Geng, Yong Rui

    Abstract: Pre-trained models have become the preferred backbone due to the expansion of model parameters, with techniques like Parameter-Efficient Fine-Tuning (PEFTs) typically fixing the parameters of these models. However, pre-trained models may not always be optimal, especially when there are discrepancies between training tasks and target tasks, potentially resulting in negative transfer. To address thi… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  19. arXiv:2408.07219  [pdf, other

    cs.LG stat.ME

    Causal Effect Estimation using identifiable Variational AutoEncoder with Latent Confounders and Post-Treatment Variables

    Authors: Yang Xie, Ziqi Xu, Debo Cheng, Jiuyong Li, Lin Liu, Yinghao Zhang, Zaiwen Feng

    Abstract: Estimating causal effects from observational data is challenging, especially in the presence of latent confounders. Much work has been done on addressing this challenge, but most of the existing research ignores the bias introduced by the post-treatment variables. In this paper, we propose a novel method of joint Variational AutoEncoder (VAE) and identifiable Variational AutoEncoder (iVAE) for lea… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

  20. arXiv:2408.06922  [pdf, other

    cs.SD cs.AI eess.AS

    Temporal Variability and Multi-Viewed Self-Supervised Representations to Tackle the ASVspoof5 Deepfake Challenge

    Authors: Yuankun Xie, Xiaopeng Wang, Zhiyong Wang, Ruibo Fu, Zhengqi Wen, Haonan Cheng, Long Ye

    Abstract: ASVspoof5, the fifth edition of the ASVspoof series, is one of the largest global audio security challenges. It aims to advance the development of countermeasure (CM) to discriminate bonafide and spoofed speech utterances. In this paper, we focus on addressing the problem of open-domain audio deepfake detection, which corresponds directly to the ASVspoof5 Track1 open condition. At first, we compre… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

  21. arXiv:2408.06603  [pdf, other

    cs.AI

    Simple but Effective Compound Geometric Operations for Temporal Knowledge Graph Completion

    Authors: Rui Ying, Mengting Hu, Jianfeng Wu, Yalan Xie, Xiaoyi Liu, Zhunheng Wang, Ming Jiang, Hang Gao, Linlin Zhang, Renhong Cheng

    Abstract: Temporal knowledge graph completion aims to infer the missing facts in temporal knowledge graphs. Current approaches usually embed factual knowledge into continuous vector space and apply geometric operations to learn potential patterns in temporal knowledge graphs. However, these methods only adopt a single operation, which may have limitations in capturing the complex temporal dynamics present i… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

  22. arXiv:2408.06042  [pdf, ps, other

    cs.CR cs.AI

    Understanding Byzantine Robustness in Federated Learning with A Black-box Server

    Authors: Fangyuan Zhao, Yuexiang Xie, Xuebin Ren, Bolin Ding, Shusen Yang, Yaliang Li

    Abstract: Federated learning (FL) becomes vulnerable to Byzantine attacks where some of participators tend to damage the utility or discourage the convergence of the learned model via sending their malicious model updates. Previous works propose to apply robust rules to aggregate updates from participators against different types of Byzantine attacks, while at the same time, attackers can further design adv… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

    Comments: We have released code on https://github.com/alibaba/FederatedScope/tree/Byzantine_attack_defense

  23. arXiv:2408.05953  [pdf, other

    cs.CV cs.MM

    A Simple Task-aware Contrastive Local Descriptor Selection Strategy for Few-shot Learning between inter class and intra class

    Authors: Qian Qiao, Yu Xie, Shaoyao Huang, Fanzhang Li

    Abstract: Few-shot image classification aims to classify novel classes with few labeled samples. Recent research indicates that deep local descriptors have better representational capabilities. These studies recognize the impact of background noise on classification performance. They typically filter query descriptors using all local descriptors in the support classes or engage in bidirectional selection be… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

    Comments: Submitted to ICANN 2024

  24. arXiv:2408.05711  [pdf, other

    cs.CV

    Contrastive masked auto-encoders based self-supervised hashing for 2D image and 3D point cloud cross-modal retrieval

    Authors: Rukai Wei, Heng Cui, Yu Liu, Yufeng Hou, Yanzhao Xie, Ke Zhou

    Abstract: Implementing cross-modal hashing between 2D images and 3D point-cloud data is a growing concern in real-world retrieval systems. Simply applying existing cross-modal approaches to this new task fails to adequately capture latent multi-modal semantics and effectively bridge the modality gap between 2D and 3D. To address these issues without relying on hand-crafted labels, we propose contrastive mas… ▽ More

    Submitted 11 August, 2024; originally announced August 2024.

    Comments: Accepted by ICME 2024

  25. arXiv:2408.03615  [pdf, other

    cs.AI cs.CL

    Optimus-1: Hybrid Multimodal Memory Empowered Agents Excel in Long-Horizon Tasks

    Authors: Zaijing Li, Yuquan Xie, Rui Shao, Gongwei Chen, Dongmei Jiang, Liqiang Nie

    Abstract: Building a general-purpose agent is a long-standing vision in the field of artificial intelligence. Existing agents have made remarkable progress in many domains, yet they still struggle to complete long-horizon tasks in an open world. We attribute this to the lack of necessary world knowledge and multimodal experience that can guide agents through a variety of long-horizon tasks. In this paper, w… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

    Comments: 30 pages, 13 figures

  26. arXiv:2408.02900  [pdf, other

    cs.CV

    MedTrinity-25M: A Large-scale Multimodal Dataset with Multigranular Annotations for Medicine

    Authors: Yunfei Xie, Ce Zhou, Lang Gao, Juncheng Wu, Xianhang Li, Hong-Yu Zhou, Sheng Liu, Lei Xing, James Zou, Cihang Xie, Yuyin Zhou

    Abstract: This paper introduces MedTrinity-25M, a comprehensive, large-scale multimodal dataset for medicine, covering over 25 million images across 10 modalities, with multigranular annotations for more than 65 diseases. These enriched annotations encompass both global textual information, such as disease/lesion type, modality, region-specific descriptions, and inter-regional relationships, as well as deta… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

    Comments: The project page is at https://yunfeixie233.github.io/MedTrinity-25M

  27. arXiv:2408.02215  [pdf

    cs.IR

    Exploring Query Understanding for Amazon Product Search

    Authors: Chen Luo, Xianfeng Tang, Hanqing Lu, Yaochen Xie, Hui Liu, Zhenwei Dai, Limeng Cui, Ashutosh Joshi, Sreyashi Nag, Yang Li, Zhen Li, Rahul Goutam, Jiliang Tang, Haiyang Zhang, Qi He

    Abstract: Online shopping platforms, such as Amazon, offer services to billions of people worldwide. Unlike web search or other search engines, product search engines have their unique characteristics, primarily featuring short queries which are mostly a combination of product attributes and structured product search space. The uniqueness of product search underscores the crucial importance of the query und… ▽ More

    Submitted 4 August, 2024; originally announced August 2024.

  28. arXiv:2408.02001  [pdf, other

    cs.CV

    AdaCBM: An Adaptive Concept Bottleneck Model for Explainable and Accurate Diagnosis

    Authors: Townim F. Chowdhury, Vu Minh Hieu Phan, Kewen Liao, Minh-Son To, Yutong Xie, Anton van den Hengel, Johan W. Verjans, Zhibin Liao

    Abstract: The integration of vision-language models such as CLIP and Concept Bottleneck Models (CBMs) offers a promising approach to explaining deep neural network (DNN) decisions using concepts understandable by humans, addressing the black-box concern of DNNs. While CLIP provides both explainability and zero-shot classification capability, its pre-training on generic image and text data may limit its clas… ▽ More

    Submitted 4 August, 2024; originally announced August 2024.

    Comments: Accepted at MICCAI 2024, the 27th International Conference on Medical Image Computing and Computer Assisted Intervention

  29. arXiv:2408.01902  [pdf, other

    cs.AR

    A Comprehensive Survey on GNN Characterization

    Authors: Meng Wu, Mingyu Yan, Wenming Li, Xiaochun Ye, Dongrui Fan, Yuan Xie

    Abstract: Characterizing graph neural networks (GNNs) is essential for identifying performance bottlenecks and facilitating their deployment. Despite substantial work in this area, a comprehensive survey on GNN characterization is lacking. This work presents a comprehensive survey, proposing a triple-level classification method to categorize, summarize, and compare existing efforts. In addition, we identify… ▽ More

    Submitted 15 August, 2024; v1 submitted 3 August, 2024; originally announced August 2024.

  30. arXiv:2408.01141  [pdf, other

    cond-mat.mes-hall cond-mat.dis-nn cs.LG

    Machine learning topological energy braiding of non-Bloch bands

    Authors: Shuwei Shi, Shibing Chu, Yuee Xie, Yuanping Chen

    Abstract: Machine learning has been used to identify phase transitions in a variety of physical systems. However, there is still a lack of relevant research on non-Bloch energy braiding in non-Hermitian systems. In this work, we study non-Bloch energy braiding in one-dimensional non-Hermitian systems using unsupervised and supervised methods. In unsupervised learning, we use diffusion maps to successfully i… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

  31. arXiv:2408.00766  [pdf, other

    cs.CV

    Optimizing Diffusion Models for Joint Trajectory Prediction and Controllable Generation

    Authors: Yixiao Wang, Chen Tang, Lingfeng Sun, Simone Rossi, Yichen Xie, Chensheng Peng, Thomas Hannagan, Stefano Sabatini, Nicola Poerio, Masayoshi Tomizuka, Wei Zhan

    Abstract: Diffusion models are promising for joint trajectory prediction and controllable generation in autonomous driving, but they face challenges of inefficient inference steps and high computational demands. To tackle these challenges, we introduce Optimal Gaussian Diffusion (OGD) and Estimated Clean Manifold (ECM) Guidance. OGD optimizes the prior distribution for a small diffusion time $T$ and starts… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

    Comments: 30 pages, 20 figures, Accepted to ECCV 2024

  32. arXiv:2408.00355  [pdf, other

    cs.CV cs.AI

    DNTextSpotter: Arbitrary-Shaped Scene Text Spotting via Improved Denoising Training

    Authors: Yu Xie, Qian Qiao, Jun Gao, Tianxiang Wu, Shaoyao Huang, Jiaqing Fan, Ziqiang Cao, Zili Wang, Yue Zhang, Jielei Zhang, Huyang Sun

    Abstract: More and more end-to-end text spotting methods based on Transformer architecture have demonstrated superior performance. These methods utilize a bipartite graph matching algorithm to perform one-to-one optimal matching between predicted objects and actual objects. However, the instability of bipartite graph matching can lead to inconsistent optimization targets, thereby affecting the training perf… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

    Comments: Accepted by ACMMM2024

  33. arXiv:2407.19546  [pdf, other

    cs.CV

    XLIP: Cross-modal Attention Masked Modelling for Medical Language-Image Pre-Training

    Authors: Biao Wu, Yutong Xie, Zeyu Zhang, Minh Hieu Phan, Qi Chen, Ling Chen, Qi Wu

    Abstract: Vision-and-language pretraining (VLP) in the medical field utilizes contrastive learning on image-text pairs to achieve effective transfer across tasks. Yet, current VLP approaches with the masked modelling strategy face two challenges when applied to the medical domain. First, current models struggle to accurately reconstruct key pathological features due to the scarcity of medical data. Second,… ▽ More

    Submitted 2 August, 2024; v1 submitted 28 July, 2024; originally announced July 2024.

  34. HICEScore: A Hierarchical Metric for Image Captioning Evaluation

    Authors: Zequn Zeng, Jianqiao Sun, Hao Zhang, Tiansheng Wen, Yudi Su, Yan Xie, Zhengjue Wang, Bo Chen

    Abstract: Image captioning evaluation metrics can be divided into two categories, reference-based metrics and reference-free metrics. However, reference-based approaches may struggle to evaluate descriptive captions with abundant visual details produced by advanced multimodal large language models, due to their heavy reliance on limited human-annotated references. In contrast, previous reference-free metric… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

    Comments: Accepted by ACM MM2024

  35. arXiv:2407.18492  [pdf

    cs.CV

    Neural Modulation Alteration to Positive and Negative Emotions in Depressed Patients: Insights from fMRI Using Positive/Negative Emotion Atlas

    Authors: Yu Feng, Weiming Zeng, Yifan Xie, Hongyu Chen, Lei Wang, Yingying Wang, Hongjie Yan, Kaile Zhang, Ran Tao, Wai Ting Siok, Nizhuan Wang

    Abstract: Background: Although it has been noticed that depressed patients show differences in processing emotions, the precise neural modulation mechanisms of positive and negative emotions remain elusive. FMRI is a cutting-edge medical imaging technology renowned for its high spatial resolution and dynamic temporal information, making it particularly suitable for the neural dynamics of depression research… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

  36. arXiv:2407.18209  [pdf, other

    cs.ET cs.AR

    SuperFlow: A Fully-Customized RTL-to-GDS Design Automation Flow for Adiabatic Quantum-Flux-Parametron Superconducting Circuits

    Authors: Yanyue Xie, Peiyan Dong, Geng Yuan, Zhengang Li, Masoud Zabihi, Chao Wu, Sung-En Chang, Xufeng Zhang, Xue Lin, Caiwen Ding, Nobuyuki Yoshikawa, Olivia Chen, Yanzhi Wang

    Abstract: Superconducting circuits, like Adiabatic Quantum-Flux-Parametron (AQFP), offer exceptional energy efficiency but face challenges in physical design due to sophisticated spacing and timing constraints. Current design tools often neglect the importance of constraint adherence throughout the entire design flow. In this paper, we propose SuperFlow, a fully-customized RTL-to-GDS design flow tailored fo… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

    Comments: Accepted by DATE 2024

  37. arXiv:2407.18175  [pdf, other

    cs.LG cs.AI cs.CV

    Quasar-ViT: Hardware-Oriented Quantization-Aware Architecture Search for Vision Transformers

    Authors: Zhengang Li, Alec Lu, Yanyue Xie, Zhenglun Kong, Mengshu Sun, Hao Tang, Zhong Jia Xue, Peiyan Dong, Caiwen Ding, Yanzhi Wang, Xue Lin, Zhenman Fang

    Abstract: Vision transformers (ViTs) have demonstrated their superior accuracy for computer vision tasks compared to convolutional neural networks (CNNs). However, ViT models are often computation-intensive for efficient deployment on resource-limited edge devices. This work proposes Quasar-ViT, a hardware-oriented quantization-aware architecture search framework for ViTs, to design efficient ViT models for… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

    Comments: Accepted by ICS 2024

  38. arXiv:2407.17789  [pdf, other

    cs.MA cs.AI

    Very Large-Scale Multi-Agent Simulation in AgentScope

    Authors: Xuchen Pan, Dawei Gao, Yuexiang Xie, Zhewei Wei, Yaliang Li, Bolin Ding, Ji-Rong Wen, Jingren Zhou

    Abstract: Recent advances in large language models (LLMs) have opened new avenues for applying multi-agent systems in very large-scale simulations. However, there remain several challenges when conducting multi-agent simulations with existing platforms, such as limited scalability and low efficiency, unsatisfied agent diversity, and effort-intensive management processes. To address these challenges, we deve… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

    Comments: We have released code on https://github.com/modelscope/agentscope

  39. arXiv:2407.17470  [pdf, other

    cs.CV

    SV4D: Dynamic 3D Content Generation with Multi-Frame and Multi-View Consistency

    Authors: Yiming Xie, Chun-Han Yao, Vikram Voleti, Huaizu Jiang, Varun Jampani

    Abstract: We present Stable Video 4D (SV4D), a latent video diffusion model for multi-frame and multi-view consistent dynamic 3D content generation. Unlike previous methods that rely on separately trained generative models for video generation and novel view synthesis, we design a unified diffusion model to generate novel view videos of dynamic 3D objects. Specifically, given a monocular reference video, SV… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

    Comments: Project page: https://sv4d.github.io/

  40. arXiv:2407.16364  [pdf, other

    cs.CV

    Harmonizing Visual Text Comprehension and Generation

    Authors: Zhen Zhao, Jingqun Tang, Binghong Wu, Chunhui Lin, Shu Wei, Hao Liu, Xin Tan, Zhizhong Zhang, Can Huang, Yuan Xie

    Abstract: In this work, we present TextHarmony, a unified and versatile multimodal generative model proficient in comprehending and generating visual text. Simultaneously generating images and texts typically results in performance degradation due to the inherent inconsistency between vision and language modalities. To overcome this challenge, existing approaches resort to modality-specific data for supervi… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  41. arXiv:2407.15441  [pdf, other

    cs.CL

    Developing a Reliable, General-Purpose Hallucination Detection and Mitigation Service: Insights and Lessons Learned

    Authors: Song Wang, Xun Wang, Jie Mei, Yujia Xie, Sean Muarray, Zhang Li, Lingfeng Wu, Si-Qing Chen, Wayne Xiong

    Abstract: Hallucination, a phenomenon where large language models (LLMs) produce output that is factually incorrect or unrelated to the input, is a major challenge for LLM applications that require accuracy and dependability. In this paper, we introduce a reliable and high-speed production system aimed at detecting and rectifying the hallucination issue within LLMs. Our system encompasses named entity recog… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

  42. arXiv:2407.14266  [pdf, other

    cs.IR cs.LG

    L^2CL: Embarrassingly Simple Layer-to-Layer Contrastive Learning for Graph Collaborative Filtering

    Authors: Xinzhou Jin, Jintang Li, Liang Chen, Chenyun Yu, Yuanzhen Xie, Tao Xie, Chengxiang Zhuo, Zang Li, Zibin Zheng

    Abstract: Graph neural networks (GNNs) have recently emerged as an effective approach to model neighborhood signals in collaborative filtering. Towards this research line, graph contrastive learning (GCL) demonstrates robust capabilities to address the supervision label shortage issue through generating massive self-supervised signals. Despite its effectiveness, GCL for recommendation suffers seriously from… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

  43. arXiv:2407.14009  [pdf, other

    cs.CV

    Scale Disparity of Instances in Interactive Point Cloud Segmentation

    Authors: Chenrui Han, Xuan Yu, Yuxuan Xie, Yili Liu, Sitong Mao, Shunbo Zhou, Rong Xiong, Yue Wang

    Abstract: Interactive point cloud segmentation has become a pivotal task for understanding 3D scenes, enabling users to guide segmentation models with simple interactions such as clicks, therefore significantly reducing the effort required to tailor models to diverse scenarios and new categories. However, in the realm of interactive segmentation, the meaning of instance diverges from that in instance segmen… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: Accepted by 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems

  44. arXiv:2407.12783  [pdf, other

    cs.CV cs.GR

    SMooDi: Stylized Motion Diffusion Model

    Authors: Lei Zhong, Yiming Xie, Varun Jampani, Deqing Sun, Huaizu Jiang

    Abstract: We introduce a novel Stylized Motion Diffusion model, dubbed SMooDi, to generate stylized motion driven by content texts and style motion sequences. Unlike existing methods that either generate motion of various content or transfer style from one sequence to another, SMooDi can rapidly generate motion across a broad range of content and diverse styles. To this end, we tailor a pre-trained text-to-… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: ECCV 2024. Project page: https://neu-vi.github.io/SMooDi/

  45. arXiv:2407.12758  [pdf, other

    cs.CV

    Mutual Information Guided Optimal Transport for Unsupervised Visible-Infrared Person Re-identification

    Authors: Zhizhong Zhang, Jiangming Wang, Xin Tan, Yanyun Qu, Junping Wang, Yong Xie, Yuan Xie

    Abstract: Unsupervised visible infrared person re-identification (USVI-ReID) is a challenging retrieval task that aims to retrieve cross-modality pedestrian images without using any label information. In this task, the large cross-modality variance makes it difficult to generate reliable cross-modality labels, and the lack of annotations also provides additional difficulties for learning modality-invariant… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  46. arXiv:2407.11310  [pdf, other

    cs.LG cs.NI

    Digital Twin Vehicular Edge Computing Network: Task Offloading and Resource Allocation

    Authors: Yu Xie, Qiong Wu, Pingyi Fan

    Abstract: With the increasing demand for multiple applications on internet of vehicles. It requires vehicles to carry out multiple computing tasks in real time. However, due to the insufficient computing capability of vehicles themselves, offloading tasks to vehicular edge computing (VEC) servers and allocating computing resources to tasks becomes a challenge. In this paper, a multi task digital twin (DT) V… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: This paper has been submitted to ICICSP 2024. The source code has been released at:https://github.com/qiongwu86/Digital-Twin-Vehicular-Edge-Computing-Network_Task-Offloading-and-Resource-Allocation

  47. arXiv:2407.10976  [pdf, other

    cs.NI cs.LG eess.SP stat.AP

    Learning Cellular Network Connection Quality with Conformal

    Authors: Hanyang Jiang, Elizabeth Belding, Ellen Zegure, Yao Xie

    Abstract: In this paper, we address the problem of uncertainty quantification for cellular network speed. It is a well-known fact that the actual internet speed experienced by a mobile phone can fluctuate significantly, even when remaining in a single location. This high degree of variability underscores that mere point estimation of network speed is insufficient. Rather, it is advantageous to establish a p… ▽ More

    Submitted 4 June, 2024; originally announced July 2024.

    Comments: arXiv admin note: text overlap with arXiv:2311.05641

  48. arXiv:2407.08639  [pdf, other

    cs.AI cs.LG

    $β$-DPO: Direct Preference Optimization with Dynamic $β$

    Authors: Junkang Wu, Yuexiang Xie, Zhengyi Yang, Jiancan Wu, Jinyang Gao, Bolin Ding, Xiang Wang, Xiangnan He

    Abstract: Direct Preference Optimization (DPO) has emerged as a compelling approach for training Large Language Models (LLMs) to adhere to human preferences. However, the performance of DPO is sensitive to the fine-tuning of its trade-off parameter $β$, as well as to the quality of the preference data. We analyze the impact of $β$ and data quality on DPO, uncovering that optimal $β$ values vary with the inf… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  49. arXiv:2407.07880  [pdf, other

    cs.LG cs.AI cs.CL

    Towards Robust Alignment of Language Models: Distributionally Robustifying Direct Preference Optimization

    Authors: Junkang Wu, Yuexiang Xie, Zhengyi Yang, Jiancan Wu, Jiawei Chen, Jinyang Gao, Bolin Ding, Xiang Wang, Xiangnan He

    Abstract: This study addresses the challenge of noise in training datasets for Direct Preference Optimization (DPO), a method for aligning Large Language Models (LLMs) with human preferences. We categorize noise into pointwise noise, which includes low-quality data points, and pairwise noise, which encompasses erroneous data pair associations that affect preference rankings. Utilizing Distributionally Robus… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  50. arXiv:2407.07591  [pdf, other

    cs.RO cs.NE

    A 'MAP' to find high-performing soft robot designs: Traversing complex design spaces using MAP-elites and Topology Optimization

    Authors: Yue Xie, Josh Pinskier, Lois Liow, David Howard, Fumiya Iida

    Abstract: Soft robotics has emerged as the standard solution for grasping deformable objects, and has proven invaluable for mobile robotic exploration in extreme environments. However, despite this growth, there are no widely adopted computational design tools that produce quality, manufacturable designs. To advance beyond the diminishing returns of heuristic bio-inspiration, the field needs efficient tools… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.