Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 1,143 results for author: Xiao, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.18921  [pdf, other

    cs.CL

    Capturing Minds, Not Just Words: Enhancing Role-Playing Language Models with Personality-Indicative Data

    Authors: Yiting Ran, Xintao Wang, Rui Xu, Xinfeng Yuan, Jiaqing Liang, Yanghua Xiao, Deqing Yang

    Abstract: Role-playing agents (RPA) have been a popular application area for large language models (LLMs), attracting significant interest from both industry and academia.While existing RPAs well portray the characters' knowledge and tones, they face challenges in capturing their minds, especially for small role-playing language models (RPLMs). In this paper, we propose to enhance RPLMs via personality-indi… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: 10pages

  2. arXiv:2406.18313  [pdf, other

    cs.SD cs.CL eess.AS

    Advancing Airport Tower Command Recognition: Integrating Squeeze-and-Excitation and Broadcasted Residual Learning

    Authors: Yuanxi Lin, Tonglin Zhou, Yang Xiao

    Abstract: Accurate recognition of aviation commands is vital for flight safety and efficiency, as pilots must follow air traffic control instructions precisely. This paper addresses challenges in speech command recognition, such as noisy environments and limited computational resources, by advancing keyword spotting technology. We create a dataset of standardized airport tower commands, including routine an… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Accepted by IALP 2024

  3. arXiv:2406.18045  [pdf, other

    cs.CL cs.AI

    PharmGPT: Domain-Specific Large Language Models for Bio-Pharmaceutical and Chemistry

    Authors: Linqing Chen, Weilei Wang, Zilong Bai, Peng Xu, Yan Fang, Jie Fang, Wentao Wu, Lizhi Zhou, Ruiji Zhang, Yubin Xia, Chaobo Xu, Ran Hu, Licong Xu, Qijun Cai, Haoran Hua, Jing Sun, Jin Liu, Tian Qiu, Haowen Liu, Meng Hu, Xiuwen Li, Fei Gao, Yufu Wang, Lin Tie, Chaochao Wang , et al. (9 additional authors not shown)

    Abstract: Large language models (LLMs) have revolutionized Natural Language Processing (NLP) by by minimizing the need for complex feature engineering. However, the application of LLMs in specialized domains like biopharmaceuticals and chemistry remains largely unexplored. These fields are characterized by intricate terminologies, specialized knowledge, and a high demand for precision areas where general pu… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  4. arXiv:2406.17507  [pdf, other

    cs.IR

    ACE: A Generative Cross-Modal Retrieval Framework with Coarse-To-Fine Semantic Modeling

    Authors: Minghui Fang, Shengpeng Ji, Jialong Zuo, Hai Huang, Yan Xia, Jieming Zhu, Xize Cheng, Xiaoda Yang, Wenrui Liu, Gang Wang, Zhenhua Dong, Zhou Zhao

    Abstract: Generative retrieval, which has demonstrated effectiveness in text-to-text retrieval, utilizes a sequence-to-sequence model to directly generate candidate identifiers based on natural language queries. Without explicitly computing the similarity between queries and candidates, generative retrieval surpasses dual-tower models in both speed and accuracy on large-scale corpora, providing new insights… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  5. arXiv:2406.16299  [pdf, other

    cs.CL cs.AI

    Compensate Quantization Errors: Make Weights Hierarchical to Compensate Each Other

    Authors: Yifei Gao, Jie Ou, Lei Wang, Yuting Xiao, Zhiyuan Xiang, Ruiting Dai, Jun Cheng

    Abstract: Emergent Large Language Models (LLMs) use their extraordinary performance and powerful deduction capacity to discern from traditional language models. However, the expenses of computational resources and storage for these LLMs are stunning, quantization then arises as a trending conversation. To address accuracy decay caused by quantization, two streams of works in post-training quantization metho… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: Efficient quantization method

    MSC Class: F.2.3

  6. arXiv:2406.15534  [pdf, other

    cs.LG cs.AI cs.CL q-bio.QM

    Geneverse: A collection of Open-source Multimodal Large Language Models for Genomic and Proteomic Research

    Authors: Tianyu Liu, Yijia Xiao, Xiao Luo, Hua Xu, W. Jim Zheng, Hongyu Zhao

    Abstract: The applications of large language models (LLMs) are promising for biomedical and healthcare research. Despite the availability of open-source LLMs trained using a wide range of biomedical data, current research on the applications of LLMs to genomics and proteomics is still limited. To fill this gap, we propose a collection of finetuned LLMs and multimodal LLMs (MLLMs), known as Geneverse, for th… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: 8 pages

  7. arXiv:2406.14952  [pdf, other

    cs.CL

    ESC-Eval: Evaluating Emotion Support Conversations in Large Language Models

    Authors: Haiquan Zhao, Lingyu Li, Shisong Chen, Shuqi Kong, Jiaan Wang, Kexin Huang, Tianle Gu, Yixu Wang, Dandan Liang, Zhixu Li, Yan Teng, Yanghua Xiao, Yingchun Wang

    Abstract: Emotion Support Conversation (ESC) is a crucial application, which aims to reduce human stress, offer emotional guidance, and ultimately enhance human mental and physical well-being. With the advancement of Large Language Models (LLMs), many researchers have employed LLMs as the ESC models. However, the evaluation of these LLM-based ESCs remains uncertain. Inspired by the awesome development of ro… ▽ More

    Submitted 24 June, 2024; v1 submitted 21 June, 2024; originally announced June 2024.

    Comments: Pre-print

  8. arXiv:2406.14017  [pdf, other

    cs.IR

    EAGER: Two-Stream Generative Recommender with Behavior-Semantic Collaboration

    Authors: Ye Wang, Jiahao Xun, Mingjie Hong, Jieming Zhu, Tao Jin, Wang Lin, Haoyuan Li, Linjun Li, Yan Xia, Zhou Zhao, Zhenhua Dong

    Abstract: Generative retrieval has recently emerged as a promising approach to sequential recommendation, framing candidate item retrieval as an autoregressive sequence generation problem. However, existing generative methods typically focus solely on either behavioral or semantic aspects of item information, neglecting their complementary nature and thus resulting in limited effectiveness. To address this… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Accepted by KDD 2024. Source code available at https://reczoo.github.io/EAGER

  9. arXiv:2406.13925  [pdf, other

    cs.CL cs.AI

    GenderAlign: An Alignment Dataset for Mitigating Gender Bias in Large Language Models

    Authors: Tao Zhang, Ziqian Zeng, Yuxiang Xiao, Huiping Zhuang, Cen Chen, James Foulds, Shimei Pan

    Abstract: Large Language Models (LLMs) are prone to generating content that exhibits gender biases, raising significant ethical concerns. Alignment, the process of fine-tuning LLMs to better align with desired behaviors, is recognized as an effective approach to mitigate gender biases. Although proprietary LLMs have made significant strides in mitigating gender bias, their alignment datasets are not publicl… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  10. arXiv:2406.13672  [pdf, other

    cs.CV

    Q-SNNs: Quantized Spiking Neural Networks

    Authors: Wenjie Wei, Yu Liang, Ammar Belatreche, Yichen Xiao, Honglin Cao, Zhenbang Ren, Guoqing Wang, Malu Zhang, Yang Yang

    Abstract: Brain-inspired Spiking Neural Networks (SNNs) leverage sparse spikes to represent information and process them in an asynchronous event-driven manner, offering an energy-efficient paradigm for the next generation of machine intelligence. However, the current focus within the SNN community prioritizes accuracy optimization through the development of large-scale models, limiting their viability in r… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: 8 pages, 5 figures

  11. arXiv:2406.13130  [pdf, other

    cs.LG stat.ML

    Advancing Retail Data Science: Comprehensive Evaluation of Synthetic Data

    Authors: Yu Xia, Chi-Hua Wang, Joshua Mabry, Guang Cheng

    Abstract: The evaluation of synthetic data generation is crucial, especially in the retail sector where data accuracy is paramount. This paper introduces a comprehensive framework for assessing synthetic retail data, focusing on fidelity, utility, and privacy. Our approach differentiates between continuous and discrete data attributes, providing precise evaluation criteria. Fidelity is measured through stab… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  12. arXiv:2406.12753  [pdf, other

    cs.CL cs.AI

    OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI

    Authors: Zhen Huang, Zengzhi Wang, Shijie Xia, Xuefeng Li, Haoyang Zou, Ruijie Xu, Run-Ze Fan, Lyumanshan Ye, Ethan Chern, Yixin Ye, Yikai Zhang, Yuqing Yang, Ting Wu, Binjie Wang, Shichao Sun, Yang Xiao, Yiyuan Li, Fan Zhou, Steffi Chern, Yiwei Qin, Yan Ma, Jiadi Su, Yixiu Liu, Yuxiang Zheng, Shaoting Zhang , et al. (3 additional authors not shown)

    Abstract: The evolution of Artificial Intelligence (AI) has been significantly accelerated by advancements in Large Language Models (LLMs) and Large Multimodal Models (LMMs), gradually showcasing potential cognitive reasoning abilities in problem-solving and scientific discovery (i.e., AI4Science) once exclusive to human intellect. To comprehensively evaluate current models' performance in cognitive reasoni… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 44 pages

  13. arXiv:2406.12708  [pdf, other

    cs.CL

    AgentReview: Exploring Peer Review Dynamics with LLM Agents

    Authors: Yiqiao Jin, Qinlin Zhao, Yiyang Wang, Hao Chen, Kaijie Zhu, Yijia Xiao, Jindong Wang

    Abstract: Peer review is fundamental to the integrity and advancement of scientific publication. Traditional methods of peer review analyses often rely on exploration and statistics of existing peer review data, which do not adequately address the multivariate nature of the process, account for the latent variables, and are further constrained by privacy concerns due to the sensitive nature of the data. We… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 22 pages, 10 figures

  14. arXiv:2406.12641  [pdf, other

    cs.CL

    DetectBench: Can Large Language Model Detect and Piece Together Implicit Evidence?

    Authors: Zhouhong Gu, Lin Zhang, Xiaoxuan Zhu, Jiangjie Chen, Wenhao Huang, Yikai Zhang, Shusen Wang, Zheyu Ye, Yan Gao, Hongwei Feng, Yanghua Xiao

    Abstract: Detecting evidence within the context is a key step in the process of reasoning task. Evaluating and enhancing the capabilities of LLMs in evidence detection will strengthen context-based reasoning performance. This paper proposes a benchmark called DetectBench for verifying the ability to detect and piece together implicit evidence within a long context. DetectBench contains 3,928 multiple-choice… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  15. arXiv:2406.12454  [pdf, other

    cs.AI

    A Neural Column Generation Approach to the Vehicle Routing Problem with Two-Dimensional Loading and Last-In-First-Out Constraints

    Authors: Yifan Xia, Xiangyi Zhang

    Abstract: The vehicle routing problem with two-dimensional loading constraints (2L-CVRP) and the last-in-first-out (LIFO) rule presents significant practical and algorithmic challenges. While numerous heuristic approaches have been proposed to address its complexity, stemming from two NP-hard problems: the vehicle routing problem (VRP) and the two-dimensional bin packing problem (2D-BPP), less attention has… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: Accepted by International Joint Conference on Artificial Intelligence (IJCAI 2024)

  16. arXiv:2406.12293  [pdf, other

    cs.CV

    Unleashing the Potential of Open-set Noisy Samples Against Label Noise for Medical Image Classification

    Authors: Zehui Liao, Shishuai Hu, Yong Xia

    Abstract: The challenge of addressing mixed closed-set and open-set label noise in medical image classification remains largely unexplored. Unlike natural image classification where there is a common practice of segregation and separate processing of closed-set and open-set noisy samples from clean ones, medical image classification faces difficulties due to high inter-class similarity which complicates the… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 10 pages, 1 figure

  17. arXiv:2406.12266  [pdf, other

    cs.CL

    Towards a Client-Centered Assessment of LLM Therapists by Client Simulation

    Authors: Jiashuo Wang, Yang Xiao, Yanran Li, Changhe Song, Chunpu Xu, Chenhao Tan, Wenjie Li

    Abstract: Although there is a growing belief that LLMs can be used as therapists, exploring LLMs' capabilities and inefficacy, particularly from the client's perspective, is limited. This work focuses on a client-centered assessment of LLM therapists with the involvement of simulated clients, a standard approach in clinical medical education. However, there are two challenges when applying the approach to a… ▽ More

    Submitted 20 June, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

  18. arXiv:2406.12223  [pdf, other

    cs.CL cs.CY

    ToxiCloakCN: Evaluating Robustness of Offensive Language Detection in Chinese with Cloaking Perturbations

    Authors: Yunze Xiao, Yujia Hu, Kenny Tsu Wei Choo, Roy Ka-wei Lee

    Abstract: Detecting hate speech and offensive language is essential for maintaining a safe and respectful digital environment. This study examines the limitations of state-of-the-art large language models (LLMs) in identifying offensive content within systematically perturbed data, with a focus on Chinese, a language particularly susceptible to such perturbations. We introduce \textsf{ToxiCloakCN}, an enhan… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 10 pages,5 Tables, 2 Figures

  19. arXiv:2406.11698  [pdf, other

    cs.CL

    Meta Reasoning for Large Language Models

    Authors: Peizhong Gao, Ao Xie, Shaoguang Mao, Wenshan Wu, Yan Xia, Haipeng Mi, Furu Wei

    Abstract: We introduce Meta-Reasoning Prompting (MRP), a novel and efficient system prompting method for large language models (LLMs) inspired by human meta-reasoning. Traditional in-context learning-based reasoning techniques, such as Tree-of-Thoughts, show promise but lack consistent state-of-the-art performance across diverse tasks due to their specialized nature. MRP addresses this limitation by guiding… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  20. arXiv:2406.11455  [pdf, other

    cs.CL cs.AI

    Adaptive Reinforcement Learning Planning: Harnessing Large Language Models for Complex Information Extraction

    Authors: Zepeng Ding, Ruiyang Ke, Wenhao Huang, Guochao Jiang, Yanda Li, Deqing Yang, Yanghua Xiao, Jiaqing Liang

    Abstract: Existing research on large language models (LLMs) shows that they can solve information extraction tasks through multi-step planning. However, their extraction behavior on complex sentences and tasks is unstable, emerging issues such as false positives and missing elements. We observe that decomposing complex extraction tasks and extracting them step by step can effectively improve LLMs' performan… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  21. arXiv:2406.10902  [pdf, other

    cs.CV cs.CL

    Light Up the Shadows: Enhance Long-Tailed Entity Grounding with Concept-Guided Vision-Language Models

    Authors: Yikai Zhang, Qianyu He, Xintao Wang, Siyu Yuan, Jiaqing Liang, Yanghua Xiao

    Abstract: Multi-Modal Knowledge Graphs (MMKGs) have proven valuable for various downstream tasks. However, scaling them up is challenging because building large-scale MMKGs often introduces mismatched images (i.e., noise). Most entities in KGs belong to the long tail, meaning there are few images of them available online. This scarcity makes it difficult to determine whether a found image matches the entity… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  22. arXiv:2406.10881  [pdf, other

    cs.CL

    Teaching Large Language Models to Express Knowledge Boundary from Their Own Signals

    Authors: Lida Chen, Zujie Liang, Xintao Wang, Jiaqing Liang, Yanghua Xiao, Feng Wei, Jinglei Chen, Zhenghong Hao, Bing Han, Wei Wang

    Abstract: Large language models (LLMs) have achieved great success, but their occasional content fabrication, or hallucination, limits their practical application. Hallucination arises because LLMs struggle to admit ignorance due to inadequate training on knowledge boundaries. We call it a limitation of LLMs that they can not accurately express their knowledge boundary, answering questions they know while a… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  23. arXiv:2406.10621  [pdf, other

    cs.CL cs.AI

    StructBench: An Autogenerated Benchmark for Evaluating Large Language Model's Ability in Structure-Rich Text Understanding

    Authors: Zhouhong Gu, Haoning Ye, Zeyang Zhou, Hongwei Feng, Yanghua Xiao

    Abstract: Given the substantial volumes of structured data held by many companies, enabling Large Language Models (LLMs) to directly understand structured text in non-structured forms could significantly enhance their capabilities across various business scenarios. To this end, we propose evaluation data generation method for assessing LLM's ability in understanding the structure-rich text, which generates… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

  24. arXiv:2406.10079  [pdf, other

    cs.CV cs.AI

    Localizing Events in Videos with Multimodal Queries

    Authors: Gengyuan Zhang, Mang Ling Ada Fok, Yan Xia, Yansong Tang, Daniel Cremers, Philip Torr, Volker Tresp, Jindong Gu

    Abstract: Video understanding is a pivotal task in the digital era, yet the dynamic and multievent nature of videos makes them labor-intensive and computationally demanding to process. Thus, localizing a specific event given a semantic query has gained importance in both user-oriented applications like video search and academic research into video foundation models. A significant limitation in current resea… ▽ More

    Submitted 22 June, 2024; v1 submitted 14 June, 2024; originally announced June 2024.

    Comments: 9 pages; fix some typos

  25. arXiv:2406.08850  [pdf, other

    cs.CV

    COVE: Unleashing the Diffusion Feature Correspondence for Consistent Video Editing

    Authors: Jiangshan Wang, Yue Ma, Jiayi Guo, Yicheng Xiao, Gao Huang, Xiu Li

    Abstract: Video editing is an emerging task, in which most current methods adopt the pre-trained text-to-image (T2I) diffusion model to edit the source video in a zero-shot manner. Despite extensive efforts, maintaining the temporal consistency of edited videos remains challenging due to the lack of temporal constraints in the regular T2I diffusion model. To address this issue, we propose COrrespondence-gui… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  26. arXiv:2406.07828  [pdf, other

    cs.CV

    Spatial Annealing Smoothing for Efficient Few-shot Neural Rendering

    Authors: Yuru Xiao, Xianming Liu, Deming Zhai, Kui Jiang, Junjun Jiang, Xiangyang Ji

    Abstract: Neural Radiance Fields (NeRF) with hybrid representations have shown impressive capabilities in reconstructing scenes for view synthesis, delivering high efficiency. Nonetheless, their performance significantly drops with sparse view inputs, due to the issue of overfitting. While various regularization strategies have been devised to address these challenges, they often depend on inefficient assum… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  27. arXiv:2406.07498  [pdf, other

    cs.SD eess.AS

    RaD-Net 2: A causal two-stage repairing and denoising speech enhancement network with knowledge distillation and complex axial self-attention

    Authors: Mingshuai Liu, Zhuangqi Chen, Xiaopeng Yan, Yuanjun Lv, Xianjun Xia, Chuanzeng Huang, Yijian Xiao, Lei Xie

    Abstract: In real-time speech communication systems, speech signals are often degraded by multiple distortions. Recently, a two-stage Repair-and-Denoising network (RaD-Net) was proposed with superior speech quality improvement in the ICASSP 2024 Speech Signal Improvement (SSI) Challenge. However, failure to use future information and constraint receptive field of convolution layers limit the system's perfor… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  28. arXiv:2406.07006  [pdf, other

    cs.CV

    MIPI 2024 Challenge on Few-shot RAW Image Denoising: Methods and Results

    Authors: Xin Jin, Chunle Guo, Xiaoming Li, Zongsheng Yue, Chongyi Li, Shangchen Zhou, Ruicheng Feng, Yuekun Dai, Peiqing Yang, Chen Change Loy, Ruoqi Li, Chang Liu, Ziyi Wang, Yao Du, Jingjing Yang, Long Bao, Heng Sun, Xiangyu Kong, Xiaoxia Xing, Jinlong Wu, Yuanyang Xue, Hyunhee Park, Sejun Song, Changho Kim, Jingfan Tan , et al. (17 additional authors not shown)

    Abstract: The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photogra… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: CVPR 2024 Mobile Intelligent Photography and Imaging (MIPI) Workshop--Few-shot RAWImage Denoising Challenge Report. Website: https://mipi-challenge.org/MIPI2024/

  29. arXiv:2406.06652  [pdf, other

    cs.LG cs.AI

    Improving Generalization of Neural Vehicle Routing Problem Solvers Through the Lens of Model Architecture

    Authors: Yubin Xiao, Di Wang, Xuan Wu, Yuesong Wu, Boyang Li, Wei Du, Liupu Wang, You Zhou

    Abstract: Neural models produce promising results when solving Vehicle Routing Problems (VRPs), but often fall short in generalization. Recent attempts to enhance model generalization often incur unnecessarily large training cost or cannot be directly applied to other models solving different VRP variants. To address these issues, we take a novel perspective on model architecture in this study. Specifically… ▽ More

    Submitted 17 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

    Comments: 13 pages, 6 figures, and 6 tables

  30. arXiv:2406.06282  [pdf, other

    cs.LG

    PowerInfer-2: Fast Large Language Model Inference on a Smartphone

    Authors: Zhenliang Xue, Yixin Song, Zeyu Mi, Le Chen, Yubin Xia, Haibo Chen

    Abstract: This paper introduces PowerInfer-2, a framework designed for high-speed inference of Large Language Models (LLMs) on smartphones, particularly effective for models whose sizes exceed the device's memory capacity. The key insight of PowerInfer-2 is to utilize the heterogeneous computation, memory, and I/O resources in smartphones by decomposing traditional matrix computations into fine-grained neur… ▽ More

    Submitted 12 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

    Comments: 14 pages, 11 figures

  31. arXiv:2406.06156  [pdf, other

    cs.SE

    Stronger, Cheaper and Demonstration-Free Log Parsing with LLMs

    Authors: Yi Xiao, Van-Hoang Le, Hongyu Zhang

    Abstract: Log parsing, the process of converting raw log messages into structured formats, is an important initial step for automated analysis of logs of large-scale software systems. Traditional log parsers often rely on heuristics or handcrafted features, which may not generalize well across diverse log sources or require extensive model tuning. Recently, some log parsers have utilized powerful generative… ▽ More

    Submitted 12 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

  32. arXiv:2406.05699  [pdf, ps, other

    eess.AS cs.AI eess.SP

    An Investigation of Noise Robustness for Flow-Matching-Based Zero-Shot TTS

    Authors: Xiaofei Wang, Sefik Emre Eskimez, Manthan Thakker, Hemin Yang, Zirun Zhu, Min Tang, Yufei Xia, Jinzhu Li, Sheng Zhao, Jinyu Li, Naoyuki Kanda

    Abstract: Recently, zero-shot text-to-speech (TTS) systems, capable of synthesizing any speaker's voice from a short audio prompt, have made rapid advancements. However, the quality of the generated speech significantly deteriorates when the audio prompt contains noise, and limited research has been conducted to address this issue. In this paper, we explored various strategies to enhance the quality of audi… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: Accepted to INTERSPEECH2024

  33. arXiv:2406.05392  [pdf, other

    cs.CL cs.AI cs.CY cs.LG

    Deconstructing The Ethics of Large Language Models from Long-standing Issues to New-emerging Dilemmas

    Authors: Chengyuan Deng, Yiqun Duan, Xin Jin, Heng Chang, Yijun Tian, Han Liu, Henry Peng Zou, Yiqiao Jin, Yijia Xiao, Yichen Wang, Shenghao Wu, Zongxing Xie, Kuofeng Gao, Sihong He, Jun Zhuang, Lu Cheng, Haohan Wang

    Abstract: Large Language Models (LLMs) have achieved unparalleled success across diverse language modeling tasks in recent years. However, this progress has also intensified ethical concerns, impacting the deployment of LLMs in everyday contexts. This paper provides a comprehensive survey of ethical challenges associated with LLMs, from longstanding issues such as copyright infringement, systematic bias, an… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

  34. arXiv:2406.04802  [pdf, other

    cs.CV cs.LG

    Predictive Dynamic Fusion

    Authors: Bing Cao, Yinan Xia, Yi Ding, Changqing Zhang, Qinghua Hu

    Abstract: Multimodal fusion is crucial in joint decision-making systems for rendering holistic judgments. Since multimodal data changes in open environments, dynamic fusion has emerged and achieved remarkable progress in numerous applications. However, most existing dynamic multimodal fusion methods lack theoretical guarantees and easily fall into suboptimal problems, yielding unreliability and instability.… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: 21 pages, 7 figures

  35. arXiv:2406.04784  [pdf, other

    cs.CL cs.AI

    SelfGoal: Your Language Agents Already Know How to Achieve High-level Goals

    Authors: Ruihan Yang, Jiangjie Chen, Yikai Zhang, Siyu Yuan, Aili Chen, Kyle Richardson, Yanghua Xiao, Deqing Yang

    Abstract: Language agents powered by large language models (LLMs) are increasingly valuable as decision-making tools in domains such as gaming and programming. However, these agents often face challenges in achieving high-level goals without detailed instructions and in adapting to environments where feedback is delayed. In this paper, we present SelfGoal, a novel automatic approach designed to enhance agen… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: Preprint

  36. arXiv:2406.04712  [pdf, other

    cs.CL

    AICoderEval: Improving AI Domain Code Generation of Large Language Models

    Authors: Yinghui Xia, Yuyan Chen, Tianyu Shi, Jun Wang, Jinsong Yang

    Abstract: Automated code generation is a pivotal capability of large language models (LLMs). However, assessing this capability in real-world scenarios remains challenging. Previous methods focus more on low-level code generation, such as model loading, instead of generating high-level codes catering for real-world tasks, such as image-to-text, text classification, in various domains. Therefore, we construc… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  37. arXiv:2406.04702  [pdf, other

    cs.LG

    Marking the Pace: A Blockchain-Enhanced Privacy-Traceable Strategy for Federated Recommender Systems

    Authors: Zhen Cai, Tao Tang, Shuo Yu, Yunpeng Xiao, Feng Xia

    Abstract: Federated recommender systems have been crucially enhanced through data sharing and continuous model updates, attributed to the pervasive connectivity and distributed computing capabilities of Internet of Things (IoT) devices. Given the sensitivity of IoT data, transparent data processing in data sharing and model updates is paramount. However, existing methods fall short in tracing the flow of sh… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  38. arXiv:2406.03503  [pdf, other

    cs.AI cs.LG

    Position: Rethinking Post-Hoc Search-Based Neural Approaches for Solving Large-Scale Traveling Salesman Problems

    Authors: Yifan Xia, Xianliang Yang, Zichuan Liu, Zhihao Liu, Lei Song, Jiang Bian

    Abstract: Recent advancements in solving large-scale traveling salesman problems (TSP) utilize the heatmap-guided Monte Carlo tree search (MCTS) paradigm, where machine learning (ML) models generate heatmaps, indicating the probability distribution of each edge being part of the optimal solution, to guide MCTS in solution finding. However, our theoretical and experimental analysis raises doubts about the ef… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

    Comments: Accepted by International Conference on Machine Learning (ICML 2024)

  39. arXiv:2406.02463  [pdf, other

    cs.CR

    Click Without Compromise: Online Advertising Measurement via Per User Differential Privacy

    Authors: Yingtai Xiao, Jian Du, Shikun Zhang, Qiang Yan, Danfeng Zhang, Daniel Kifer

    Abstract: Online advertising is a cornerstone of the Internet ecosystem, with advertising measurement playing a crucial role in optimizing efficiency. Ad measurement entails attributing desired behaviors, such as purchases, to ad exposures across various platforms, necessitating the collection of user activities across these platforms. As this practice faces increasing restrictions due to rising privacy con… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  40. arXiv:2406.02395  [pdf, other

    cs.LG cs.CV

    GrootVL: Tree Topology is All You Need in State Space Model

    Authors: Yicheng Xiao, Lin Song, Shaoli Huang, Jiangshan Wang, Siyu Song, Yixiao Ge, Xiu Li, Ying Shan

    Abstract: The state space models, employing recursively propagated features, demonstrate strong representation capabilities comparable to Transformer models and superior efficiency. However, constrained by the inherent geometric constraints of sequences, it still falls short in modeling long-range dependencies. To address this issue, we propose the GrootVL network, which first dynamically generates a tree t… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: The code is available at https://github.com/EasonXiao-888/GrootVL

  41. arXiv:2406.02014  [pdf, other

    q-bio.NC cs.LG cs.SD eess.AS

    Understanding Auditory Evoked Brain Signal via Physics-informed Embedding Network with Multi-Task Transformer

    Authors: Wanli Ma, Xuegang Tang, Jin Gu, Ying Wang, Yuling Xia

    Abstract: In the fields of brain-computer interaction and cognitive neuroscience, effective decoding of auditory signals from task-based functional magnetic resonance imaging (fMRI) is key to understanding how the brain processes complex auditory information. Although existing methods have enhanced decoding capabilities, limitations remain in information utilization and model representation. To overcome the… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  42. arXiv:2406.00934  [pdf, other

    cs.CV

    LanEvil: Benchmarking the Robustness of Lane Detection to Environmental Illusions

    Authors: Tianyuan Zhang, Lu Wang, Hainan Li, Yisong Xiao, Siyuan Liang, Aishan Liu, Xianglong Liu, Dacheng Tao

    Abstract: Lane detection (LD) is an essential component of autonomous driving systems, providing fundamental functionalities like adaptive cruise control and automated lane centering. Existing LD benchmarks primarily focus on evaluating common cases, neglecting the robustness of LD models against environmental illusions such as shadows and tire marks on the road. This research gap poses significant safety c… ▽ More

    Submitted 11 June, 2024; v1 submitted 2 June, 2024; originally announced June 2024.

  43. arXiv:2406.00699  [pdf, other

    cs.CV

    Towards General Robustness Verification of MaxPool-based Convolutional Neural Networks via Tightening Linear Approximation

    Authors: Yuan Xiao, Shiqing Ma, Juan Zhai, Chunrong Fang, Jinyuan Jia, Zhenyu Chen

    Abstract: The robustness of convolutional neural networks (CNNs) is vital to modern AI-driven systems. It can be quantified by formal verification by providing a certified lower bound, within which any perturbation does not alter the original input's classification result. It is challenging due to nonlinear components, such as MaxPool. At present, many verification methods are sound but risk losing some pre… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

    Comments: Accepted to CVPR2024. Project page: https://github.com/xiaoyuanpigo/maxlin

  44. arXiv:2406.00415  [pdf, other

    cs.AI

    Neural Combinatorial Optimization Algorithms for Solving Vehicle Routing Problems: A Comprehensive Survey with Perspectives

    Authors: Xuan Wu, Di Wang, Lijie Wen, Yubin Xiao, Chunguo Wu, Yuesong Wu, Chaoyu Yu, Douglas L. Maskell, You Zhou

    Abstract: Although several surveys on Neural Combinatorial Optimization (NCO) solvers specifically designed to solve Vehicle Routing Problems (VRPs) have been conducted. These existing surveys did not cover the state-of-the-art (SOTA) NCO solvers emerged recently. More importantly, to provide a comprehensive taxonomy of NCO solvers with up-to-date coverage, based on our thorough review of relevant publicati… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  45. arXiv:2405.19519  [pdf, other

    cs.CL cs.AI

    Two-layer retrieval augmented generation framework for low-resource medical question-answering: proof of concept using Reddit data

    Authors: Sudeshna Das, Yao Ge, Yuting Guo, Swati Rajwal, JaMor Hairston, Jeanne Powell, Drew Walker, Snigdha Peddireddy, Sahithi Lakamana, Selen Bozkurt, Matthew Reyna, Reza Sameni, Yunyu Xiao, Sangmi Kim, Rasheeta Chandler, Natalie Hernandez, Danielle Mowery, Rachel Wightman, Jennifer Love, Anthony Spadaro, Jeanmarie Perrone, Abeed Sarker

    Abstract: Retrieval augmented generation (RAG) provides the capability to constrain generative model outputs, and mitigate the possibility of hallucination, by providing relevant in-context text. The number of tokens a generative large language model (LLM) can incorporate as context is finite, thus limiting the volume of knowledge from which to generate an answer. We propose a two-layer RAG framework for qu… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  46. arXiv:2405.19217  [pdf, other

    cs.IT cs.CR cs.DC cs.LG

    LoByITFL: Low Communication Secure and Private Federated Learning

    Authors: Yue Xia, Christoph Hofmeister, Maximilian Egger, Rawad Bitar

    Abstract: Federated Learning (FL) faces several challenges, such as the privacy of the clients data and security against Byzantine clients. Existing works treating privacy and security jointly make sacrifices on the privacy guarantee. In this work, we introduce LoByITFL, the first communication-efficient Information-Theoretic (IT) private and secure FL scheme that makes no sacrifices on the privacy guarante… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  47. arXiv:2405.18910  [pdf, other

    cs.AI

    Predicting Parking Availability in Singapore with Cross-Domain Data: A New Dataset and A Data-Driven Approach

    Authors: Huaiwu Zhang, Yutong Xia, Siru Zhong, Kun Wang, Zekun Tong, Qingsong Wen, Roger Zimmermann, Yuxuan Liang

    Abstract: The increasing number of vehicles highlights the need for efficient parking space management. Predicting real-time Parking Availability (PA) can help mitigate traffic congestion and the corresponding social problems, which is a pressing issue in densely populated cities like Singapore. In this study, we aim to collectively predict future PA across Singapore with complex factors from various domain… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: Accepted by IJCAI 2024 (Multi-Year Track On AI And Social Good with ~20% acceptance rate)

  48. arXiv:2405.18267  [pdf, other

    eess.IV cs.CV cs.LG

    CT-based brain ventricle segmentation via diffusion Schrödinger Bridge without target domain ground truths

    Authors: Reihaneh Teimouri, Marta Kersten-Oertel, Yiming Xiao

    Abstract: Efficient and accurate brain ventricle segmentation from clinical CT scans is critical for emergency surgeries like ventriculostomy. With the challenges in poor soft tissue contrast and a scarcity of well-annotated databases for clinical brain CTs, we introduce a novel uncertainty-aware ventricle segmentation technique without the need of CT segmentation ground truths by leveraging diffusion-model… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: Early acceptance at MICCAI2024

  49. arXiv:2405.18092  [pdf

    cs.AI cs.ET cs.MA cs.RO eess.SY

    LLM experiments with simulation: Large Language Model Multi-Agent System for Process Simulation Parametrization in Digital Twins

    Authors: Yuchen Xia, Daniel Dittler, Nasser Jazdi, Haonan Chen, Michael Weyrich

    Abstract: This paper presents a novel design of a multi-agent system framework that applies a large language model (LLM) to automate the parametrization of process simulations in digital twins. We propose a multi-agent framework that includes four types of agents: observation, reasoning, decision and summarization. By enabling dynamic interaction between LLM agents and simulation model, the developed system… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: Submitted to IEEE-ETFA2024, under peer-review

  50. arXiv:2405.16854  [pdf, other

    cs.MA

    Knowing What Not to Do: Leverage Language Model Insights for Action Space Pruning in Multi-agent Reinforcement Learning

    Authors: Zhihao Liu, Xianliang Yang, Zichuan Liu, Yifan Xia, Wei Jiang, Yuanyu Zhang, Lijuan Li, Guoliang Fan, Lei Song, Bian Jiang

    Abstract: Multi-agent reinforcement learning (MARL) is employed to develop autonomous agents that can learn to adopt cooperative or competitive strategies within complex environments. However, the linear increase in the number of agents leads to a combinatorial explosion of the action space, which may result in algorithmic instability, difficulty in convergence, or entrapment in local optima. While research… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.