Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 1,718 results for author: XU, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.20981  [pdf, other

    cs.GT

    Escape Sensing Games: Detection-vs-Evasion in Security Applications

    Authors: Niclas Boehmer, Minbiao Han, Haifeng Xu, Milind Tambe

    Abstract: Traditional game-theoretic research for security applications primarily focuses on the allocation of external protection resources to defend targets. This work puts forward the study of a new class of games centered around strategically arranging targets to protect them against a constrained adversary, with motivations from varied domains such as peacekeeping resource transit and cybersecurity. Sp… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

  2. arXiv:2407.20893  [pdf, other

    cs.LG cs.AI eess.SP

    MambaCapsule: Towards Transparent Cardiac Disease Diagnosis with Electrocardiography Using Mamba Capsule Network

    Authors: Yinlong Xu, Xiaoqiang Liu, Zitai Kong, Yixuan Wu, Yue Wang, Yingzhou Lu, Honghao Gao, Jian Wu, Hongxia Xu

    Abstract: Cardiac arrhythmia, a condition characterized by irregular heartbeats, often serves as an early indication of various heart ailments. With the advent of deep learning, numerous innovative models have been introduced for diagnosing arrhythmias using Electrocardiogram (ECG) signals. However, recent studies solely focus on the performance of models, neglecting the interpretation of their results. Thi… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

  3. arXiv:2407.20251  [pdf, other

    eess.SP cond-mat.mtrl-sci cs.LG

    An Uncertainty-aware Deep Learning Framework-based Robust Design Optimization of Metamaterial Units

    Authors: Zihan Wang, Anindya Bhaduri, Hongyi Xu, Liping Wang

    Abstract: Mechanical metamaterials represent an innovative class of artificial structures, distinguished by their extraordinary mechanical characteristics, which are beyond the scope of traditional natural materials. The use of deep generative models has become increasingly popular in the design of metamaterial units. The effectiveness of using deep generative models lies in their capacity to compress compl… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

  4. arXiv:2407.20109  [pdf, other

    cs.LG cs.AI

    Diffusion-DICE: In-Sample Diffusion Guidance for Offline Reinforcement Learning

    Authors: Liyuan Mao, Haoran Xu, Weinan Zhang, Xianyuan Zhan, Amy Zhang

    Abstract: One important property of DIstribution Correction Estimation (DICE) methods is that the solution is the optimal stationary distribution ratio between the optimized and data collection policy. In this work, we show that DICE-based methods can be viewed as a transformation from the behavior distribution to the optimal policy distribution. Based on this, we propose a novel approach, Diffusion-DICE, t… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: Preprint, under review

  5. arXiv:2407.19852  [pdf

    quant-ph cs.LG q-bio.BM

    Quantum Long Short-Term Memory for Drug Discovery

    Authors: Liang Zhang, Yin Xu, Mohan Wu, Liang Wang, Hua Xu

    Abstract: Quantum computing combined with machine learning (ML) is an extremely promising research area, with numerous studies demonstrating that quantum machine learning (QML) is expected to solve scientific problems more effectively than classical ML. In this work, we successfully apply QML to drug discovery, showing that QML can significantly improve model performance and achieve faster convergence compa… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  6. arXiv:2407.19456  [pdf, other

    cs.MM

    An Inverse Partial Optimal Transport Framework for Music-guided Movie Trailer Generation

    Authors: Yutong Wang, Sidan Zhu, Hongteng Xu, Dixin Luo

    Abstract: Trailer generation is a challenging video clipping task that aims to select highlighting shots from long videos like movies and re-organize them in an attractive way. In this study, we propose an inverse partial optimal transport (IPOT) framework to achieve music-guided movie trailer generation. In particular, we formulate the trailer generation task as selecting and sorting key movie shots based… ▽ More

    Submitted 30 July, 2024; v1 submitted 28 July, 2024; originally announced July 2024.

    Comments: acmmm2024

  7. arXiv:2407.19302  [pdf, other

    cs.CL cs.MM

    IBMEA: Exploring Variational Information Bottleneck for Multi-modal Entity Alignment

    Authors: Taoyu Su, Jiawei Sheng, Shicheng Wang, Xinghua Zhang, Hongbo Xu, Tingwen Liu

    Abstract: Multi-modal entity alignment (MMEA) aims to identify equivalent entities between multi-modal knowledge graphs (MMKGs), where the entities can be associated with related images. Most existing studies integrate multi-modal information heavily relying on the automatically-learned fusion module, rarely suppressing the redundant information for MMEA explicitly. To this end, we explore variational infor… ▽ More

    Submitted 27 July, 2024; originally announced July 2024.

    Comments: Accepted by ACM MM 2024

  8. arXiv:2407.19296  [pdf, other

    cs.AI

    Multi-Modal CLIP-Informed Protein Editing

    Authors: Mingze Yin, Hanjing Zhou, Yiheng Zhu, Miao Lin, Yixuan Wu, Jialu Wu, Hongxia Xu, Chang-Yu Hsieh, Tingjun Hou, Jintai Chen, Jian Wu

    Abstract: Proteins govern most biological functions essential for life, but achieving controllable protein discovery and optimization remains challenging. Recently, machine learning-assisted protein editing (MLPE) has shown promise in accelerating optimization cycles and reducing experimental workloads. However, current methods struggle with the vast combinatorial space of potential protein edits and cannot… ▽ More

    Submitted 27 July, 2024; originally announced July 2024.

    Comments: 13 pages, 7 figures, 5 tables

  9. arXiv:2407.19256  [pdf

    cs.AI cs.CL cs.LG

    Stochastic Parrots or ICU Experts? Large Language Models in Critical Care Medicine: A Scoping Review

    Authors: Tongyue Shi, Jun Ma, Zihan Yu, Haowei Xu, Minqi Xiong, Meirong Xiao, Yilin Li, Huiying Zhao, Guilan Kong

    Abstract: With the rapid development of artificial intelligence (AI), large language models (LLMs) have shown strong capabilities in natural language understanding, reasoning, and generation, attracting amounts of research interest in applying LLMs to health and medicine. Critical care medicine (CCM) provides diagnosis and treatment for critically ill patients who often require intensive monitoring and inte… ▽ More

    Submitted 27 July, 2024; originally announced July 2024.

    Comments: 28 pages, 5 figures

  10. WorkR: Occupation Inference for Intelligent Task Assistance

    Authors: Yonchanok Khaokaew, Hao Xue, Mohammad Saiedur Rahaman, Flora D. Salim

    Abstract: Occupation information can be utilized by digital assistants to provide occupation-specific personalized task support, including interruption management, task planning, and recommendations. Prior research in the digital workplace assistant domain requires users to input their occupation information for effective support. However, as many individuals switch between multiple occupations daily, curre… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

  11. arXiv:2407.18148  [pdf, other

    cs.DC cs.LG

    StraightLine: An End-to-End Resource-Aware Scheduler for Machine Learning Application Requests

    Authors: Cheng-Wei Ching, Boyuan Guan, Hailu Xu, Liting Hu

    Abstract: The life cycle of machine learning (ML) applications consists of two stages: model development and model deployment. However, traditional ML systems (e.g., training-specific or inference-specific systems) focus on one particular stage or phase of the life cycle of ML applications. These systems often aim at optimizing model training or accelerating model inference, and they frequently assume homog… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

    Comments: 6 pages, 8 figures, to appear in AIoTC'24

  12. arXiv:2407.16697  [pdf, other

    cs.CV

    AbdomenAtlas: A Large-Scale, Detailed-Annotated, & Multi-Center Dataset for Efficient Transfer Learning and Open Algorithmic Benchmarking

    Authors: Wenxuan Li, Chongyu Qu, Xiaoxi Chen, Pedro R. A. S. Bassi, Yijia Shi, Yuxiang Lai, Qian Yu, Huimin Xue, Yixiong Chen, Xiaorui Lin, Yutong Tang, Yining Cao, Haoqi Han, Zheyuan Zhang, Jiawei Liu, Tiezheng Zhang, Yujiu Ma, Jincheng Wang, Guang Zhang, Alan Yuille, Zongwei Zhou

    Abstract: We introduce the largest abdominal CT dataset (termed AbdomenAtlas) of 20,460 three-dimensional CT volumes sourced from 112 hospitals across diverse populations, geographies, and facilities. AbdomenAtlas provides 673K high-quality masks of anatomical structures in the abdominal region annotated by a team of 10 radiologists with the help of AI algorithms. We start by having expert radiologists manu… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: Published in Medical Image Analysis

  13. arXiv:2407.16667  [pdf, other

    cs.CR cs.AI cs.CL

    RedAgent: Red Teaming Large Language Models with Context-aware Autonomous Language Agent

    Authors: Huiyu Xu, Wenhui Zhang, Zhibo Wang, Feng Xiao, Rui Zheng, Yunhe Feng, Zhongjie Ba, Kui Ren

    Abstract: Recently, advanced Large Language Models (LLMs) such as GPT-4 have been integrated into many real-world applications like Code Copilot. These applications have significantly expanded the attack surface of LLMs, exposing them to a variety of threats. Among them, jailbreak attacks that induce toxic responses through jailbreak prompts have raised critical safety concerns. To identify these threats, a… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  14. arXiv:2407.15840  [pdf, other

    cs.RO

    QueST: Self-Supervised Skill Abstractions for Learning Continuous Control

    Authors: Atharva Mete, Haotian Xue, Albert Wilcox, Yongxin Chen, Animesh Garg

    Abstract: Generalization capabilities, or rather a lack thereof, is one of the most important unsolved problems in the field of robot learning, and while several large scale efforts have set out to tackle this problem, unsolved it remains. In this paper, we hypothesize that learning temporal action abstractions using latent variable models (LVMs), which learn to map data to a compressed latent space and bac… ▽ More

    Submitted 22 July, 2024; v1 submitted 22 July, 2024; originally announced July 2024.

    Comments: Keywords: Behavior Clonning, Action Quantization, Self Supervised Skill Abstraction, Few-shot Imitation Learning

  15. arXiv:2407.15815  [pdf, other

    cs.RO cs.AI cs.CV

    Learning to Manipulate Anywhere: A Visual Generalizable Framework For Reinforcement Learning

    Authors: Zhecheng Yuan, Tianming Wei, Shuiqi Cheng, Gu Zhang, Yuanpei Chen, Huazhe Xu

    Abstract: Can we endow visuomotor robots with generalization capabilities to operate in diverse open-world scenarios? In this paper, we propose \textbf{Maniwhere}, a generalizable framework tailored for visual reinforcement learning, enabling the trained robot policies to generalize across a combination of multiple visual disturbance types. Specifically, we introduce a multi-view representation learning app… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

    Comments: Webpage: https://gemcollector.github.io/maniwhere/

  16. arXiv:2407.15272  [pdf, other

    cs.CV

    MIBench: Evaluating Multimodal Large Language Models over Multiple Images

    Authors: Haowei Liu, Xi Zhang, Haiyang Xu, Yaya Shi, Chaoya Jiang, Ming Yan, Ji Zhang, Fei Huang, Chunfeng Yuan, Bing Li, Weiming Hu

    Abstract: Built on the power of LLMs, numerous multimodal large language models (MLLMs) have recently achieved remarkable performance on various vision-language tasks across multiple benchmarks. However, most existing MLLMs and benchmarks primarily focus on single-image input scenarios, leaving the performance of MLLMs when handling realistic multiple images remain underexplored. Although a few benchmarks c… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

    Comments: 10 pages, 4 figures

  17. arXiv:2407.14054  [pdf, other

    cs.CV

    PointRegGPT: Boosting 3D Point Cloud Registration using Generative Point-Cloud Pairs for Training

    Authors: Suyi Chen, Hao Xu, Haipeng Li, Kunming Luo, Guanghui Liu, Chi-Wing Fu, Ping Tan, Shuaicheng Liu

    Abstract: Data plays a crucial role in training learning-based methods for 3D point cloud registration. However, the real-world dataset is expensive to build, while rendering-based synthetic data suffers from domain gaps. In this work, we present PointRegGPT, boosting 3D point cloud registration using generative point-cloud pairs for training. Given a single depth map, we first apply a random camera motion… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

    Comments: To appear at the European Conference on Computer Vision (ECCV) 2024

    ACM Class: I.3.3; I.4.5

  18. arXiv:2407.13642  [pdf, other

    cs.CV

    Open-Vocabulary 3D Semantic Segmentation with Text-to-Image Diffusion Models

    Authors: Xiaoyu Zhu, Hao Zhou, Pengfei Xing, Long Zhao, Hao Xu, Junwei Liang, Alexander Hauptmann, Ting Liu, Andrew Gallagher

    Abstract: In this paper, we investigate the use of diffusion models which are pre-trained on large-scale image-caption pairs for open-vocabulary 3D semantic understanding. We propose a novel method, namely Diff2Scene, which leverages frozen representations from text-image generative models, along with salient-aware and geometric-aware masks, for open-vocabulary 3D semantic segmentation and visual grounding… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  19. arXiv:2407.12291  [pdf, other

    cs.CV

    JointDreamer: Ensuring Geometry Consistency and Text Congruence in Text-to-3D Generation via Joint Score Distillation

    Authors: Chenhan Jiang, Yihan Zeng, Tianyang Hu, Songcun Xu, Wei Zhang, Hang Xu, Dit-Yan Yeung

    Abstract: Score Distillation Sampling (SDS) by well-trained 2D diffusion models has shown great promise in text-to-3D generation. However, this paradigm distills view-agnostic 2D image distributions into the rendering distribution of 3D representation for each view independently, overlooking the coherence across views and yielding 3D inconsistency in generations. In this work, we propose \textbf{J}oint \tex… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: 29 pages, ECCV2024

  20. arXiv:2407.12274  [pdf, other

    cs.CV

    MDPE: A Multimodal Deception Dataset with Personality and Emotional Characteristics

    Authors: Cong Cai, Shan Liang, Xuefei Liu, Kang Zhu, Zhengqi Wen, Jianhua Tao, Heng Xie, Jizhou Cui, Yiming Ma, Zhenhua Cheng, Hanzhe Xu, Ruibo Fu, Bin Liu, Yongwei Li

    Abstract: Deception detection has garnered increasing attention in recent years due to the significant growth of digital media and heightened ethical and security concerns. It has been extensively studied using multimodal methods, including video, audio, and text. In addition, individual differences in deception production and detection are believed to play a crucial role.Although some studies have utilized… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: Code and data are available; Submitted to NeurIPS 2024 Datasets and Benchmarks Track

  21. arXiv:2407.11054  [pdf

    cs.LG cs.AI

    Generative AI for Health Technology Assessment: Opportunities, Challenges, and Policy Considerations

    Authors: Rachael Fleurence, Jiang Bian, Xiaoyan Wang, Hua Xu, Dalia Dawoud, Tala Fakhouri, Mitch Higashi, Jagpreet Chhatwal

    Abstract: This review introduces the transformative potential of generative Artificial Intelligence (AI) and foundation models, including large language models (LLMs), for health technology assessment (HTA). We explore their applications in four critical areas, evidence synthesis, evidence generation, clinical trials and economic modeling: (1) Evidence synthesis: Generative AI has the potential to assist in… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: 24 pages, 1 figure, 1 table, 2 boxes, 103 references

  22. arXiv:2407.11007  [pdf, other

    cs.CL cs.AI

    Panacea: A foundation model for clinical trial search, summarization, design, and recruitment

    Authors: Jiacheng Lin, Hanwen Xu, Zifeng Wang, Sheng Wang, Jimeng Sun

    Abstract: Clinical trials are fundamental in developing new drugs, medical devices, and treatments. However, they are often time-consuming and have low success rates. Although there have been initial attempts to create large language models (LLMs) for clinical trial design and patient-trial matching, these models remain task-specific and not adaptable to diverse clinical trial tasks. To address this challen… ▽ More

    Submitted 25 June, 2024; originally announced July 2024.

  23. arXiv:2407.10973  [pdf, other

    cs.AI

    Make-An-Agent: A Generalizable Policy Network Generator with Behavior-Prompted Diffusion

    Authors: Yongyuan Liang, Tingqiang Xu, Kaizhe Hu, Guangqi Jiang, Furong Huang, Huazhe Xu

    Abstract: Can we generate a control policy for an agent using just one demonstration of desired behaviors as a prompt, as effortlessly as creating an image from a textual description? In this paper, we present Make-An-Agent, a novel policy parameter generator that leverages the power of conditional diffusion models for behavior-to-policy generation. Guided by behavior embeddings that encode trajectory infor… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  24. arXiv:2407.10956  [pdf, other

    cs.AI cs.CL

    Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?

    Authors: Ruisheng Cao, Fangyu Lei, Haoyuan Wu, Jixuan Chen, Yeqiao Fu, Hongcheng Gao, Xinzhuang Xiong, Hanchong Zhang, Yuchen Mao, Wenjing Hu, Tianbao Xie, Hongshen Xu, Danyang Zhang, Sida Wang, Ruoxi Sun, Pengcheng Yin, Caiming Xiong, Ansong Ni, Qian Liu, Victor Zhong, Lu Chen, Kai Yu, Tao Yu

    Abstract: Data science and engineering workflows often span multiple stages, from warehousing to orchestration, using tools like BigQuery, dbt, and Airbyte. As vision language models (VLMs) advance in multimodal understanding and code generation, VLM-based agents could potentially automate these workflows by generating SQL queries, Python code, and GUI operations. This automation can improve the productivit… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: 34 pages, 14 figures, 10 tables

  25. arXiv:2407.10695  [pdf, other

    cs.CV

    IE-NeRF: Inpainting Enhanced Neural Radiance Fields in the Wild

    Authors: Shuaixian Wang, Haoran Xu, Yaokun Li, Jiwei Chen, Guang Tan

    Abstract: We present a novel approach for synthesizing realistic novel views using Neural Radiance Fields (NeRF) with uncontrolled photos in the wild. While NeRF has shown impressive results in controlled settings, it struggles with transient objects commonly found in dynamic and time-varying scenes. Our framework called \textit{Inpainting Enhanced NeRF}, or \ours, enhances the conventional NeRF by drawing… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  26. arXiv:2407.10687  [pdf, other

    cs.CV cs.GR

    FRI-Net: Floorplan Reconstruction via Room-wise Implicit Representation

    Authors: Honghao Xu, Juzhan Xu, Zeyu Huang, Pengfei Xu, Hui Huang, Ruizhen Hu

    Abstract: In this paper, we introduce a novel method called FRI-Net for 2D floorplan reconstruction from 3D point cloud. Existing methods typically rely on corner regression or box regression, which lack consideration for the global shapes of rooms. To address these issues, we propose a novel approach using a room-wise implicit representation with structural regularization to characterize the shapes of room… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  27. arXiv:2407.09053  [pdf, other

    cs.RO

    Navi2Gaze: Leveraging Foundation Models for Navigation and Target Gazing

    Authors: Jun Zhu, Zihao Du, Haotian Xu, Fengbo Lan, Zilong Zheng, Bo Ma, Shengjie Wang, Tao Zhang

    Abstract: Task-aware navigation continues to be a challenging area of research, especially in scenarios involving open vocabulary. Previous studies primarily focus on finding suitable locations for task completion, often overlooking the importance of the robot's pose. However, the robot's orientation is crucial for successfully completing tasks because of how objects are arranged (e.g., to open a refrigerat… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  28. arXiv:2407.08706  [pdf, other

    cs.CV

    HiRes-LLaVA: Restoring Fragmentation Input in High-Resolution Large Vision-Language Models

    Authors: Runhui Huang, Xinpeng Ding, Chunwei Wang, Jianhua Han, Yulong Liu, Hengshuang Zhao, Hang Xu, Lu Hou, Wei Zhang, Xiaodan Liang

    Abstract: High-resolution inputs enable Large Vision-Language Models (LVLMs) to discern finer visual details, enhancing their comprehension capabilities. To reduce the training and computation costs caused by high-resolution input, one promising direction is to use sliding windows to slice the input into uniform patches, each matching the input size of the well-trained vision encoder. Although efficient, th… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  29. arXiv:2407.08473  [pdf, other

    cs.AR cs.AI

    Natural language is not enough: Benchmarking multi-modal generative AI for Verilog generation

    Authors: Kaiyan Chang, Zhirong Chen, Yunhao Zhou, Wenlong Zhu, kun wang, Haobo Xu, Cangyuan Li, Mengdi Wang, Shengwen Liang, Huawei Li, Yinhe Han, Ying Wang

    Abstract: Natural language interfaces have exhibited considerable potential in the automation of Verilog generation derived from high-level specifications through the utilization of large language models, garnering significant attention. Nevertheless, this paper elucidates that visual representations contribute essential contextual information critical to design intent for hardware architectures possessing… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: Accepted by ICCAD 2024

  30. arXiv:2407.08440  [pdf, other

    cs.CL cs.AI

    Beyond Instruction Following: Evaluating Rule Following of Large Language Models

    Authors: Wangtao Sun, Chenxiang Zhang, Xueyou Zhang, Ziyang Huang, Haotian Xu, Pei Chen, Shizhu He, Jun Zhao, Kang Liu

    Abstract: Although Large Language Models (LLMs) have demonstrated strong instruction-following ability to be helpful, they are further supposed to be controlled and guided by rules in real-world scenarios to be safe, and accurate in responses. This demands the possession of rule-following capability of LLMs. However, few works have made a clear evaluation of the rule-following capability of LLMs. Previous s… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  31. arXiv:2407.08189  [pdf, other

    cs.CL cs.AI

    fairBERTs: Erasing Sensitive Information Through Semantic and Fairness-aware Perturbations

    Authors: Jinfeng Li, Yuefeng Chen, Xiangyu Liu, Longtao Huang, Rong Zhang, Hui Xue

    Abstract: Pre-trained language models (PLMs) have revolutionized both the natural language processing research and applications. However, stereotypical biases (e.g., gender and racial discrimination) encoded in PLMs have raised negative ethical implications for PLMs, which critically limits their broader applications. To address the aforementioned unfairness issues, we present fairBERTs, a general framework… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  32. arXiv:2407.07771  [pdf, other

    cs.CL cs.CV cs.MM

    Multi-task Prompt Words Learning for Social Media Content Generation

    Authors: Haochen Xue, Chong Zhang, Chengzhi Liu, Fangyu Wu, Xiaobo Jin

    Abstract: The rapid development of the Internet has profoundly changed human life. Humans are increasingly expressing themselves and interacting with others on social media platforms. However, although artificial intelligence technology has been widely used in many aspects of life, its application in social media content creation is still blank. To solve this problem, we propose a new prompt word generation… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: 8 pages, 5 figures

    Journal ref: International Joint Conference on Neural Networks 2024

  33. arXiv:2407.07495  [pdf, other

    cs.CL

    Bucket Pre-training is All You Need

    Authors: Hongtao Liu, Qiyao Peng, Qing Yang, Kai Liu, Hongyan Xu

    Abstract: Large language models (LLMs) have demonstrated exceptional performance across various natural language processing tasks. However, the conventional fixed-length data composition strategy for pretraining, which involves concatenating and splitting documents, can introduce noise and limit the model's ability to capture long-range dependencies. To address this, we first introduce three metrics for eva… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  34. arXiv:2407.07487  [pdf, other

    cs.CL

    Review-LLM: Harnessing Large Language Models for Personalized Review Generation

    Authors: Qiyao Peng, Hongtao Liu, Hongyan Xu, Qing Yang, Minglai Shao, Wenjun Wang

    Abstract: Product review generation is an important task in recommender systems, which could provide explanation and persuasiveness for the recommendation. Recently, Large Language Models (LLMs, e.g., ChatGPT) have shown superior text modeling and generating ability, which could be applied in review generation. However, directly applying the LLMs for generating reviews might be troubled by the ``polite'' ph… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  35. arXiv:2407.06937  [pdf, other

    cs.CV

    HumanRefiner: Benchmarking Abnormal Human Generation and Refining with Coarse-to-fine Pose-Reversible Guidance

    Authors: Guian Fang, Wenbiao Yan, Yuanfan Guo, Jianhua Han, Zutao Jiang, Hang Xu, Shengcai Liao, Xiaodan Liang

    Abstract: Text-to-image diffusion models have significantly advanced in conditional image generation. However, these models usually struggle with accurately rendering images featuring humans, resulting in distorted limbs and other anomalies. This issue primarily stems from the insufficient recognition and evaluation of limb qualities in diffusion models. To address this issue, we introduce AbHuman, the firs… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV2024

  36. arXiv:2407.06894  [pdf, other

    cs.IT cs.PF

    RIS-Assisted Received Adaptive Spatial Modulation for Wireless Communication

    Authors: Chaorong Zhang, Hui Xu, Benjamin K. Ng, Chan-Tong Lam

    Abstract: A novel wireless transmission scheme, as named the reconfigurable intelligent surface (RIS)-assisted received adaptive spatial modulation (RASM) scheme, is proposed in this paper. In this scheme, the adaptive spatial modulation (ASM)-based antennas selection works at the receiver by employing the characteristics of the RIS in each time slot, where the signal-to-noise ratio at specific selected ant… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  37. arXiv:2407.06250  [pdf, other

    cs.CV

    FairDiff: Fair Segmentation with Point-Image Diffusion

    Authors: Wenyi Li, Haoran Xu, Guiyu Zhang, Huan-ang Gao, Mingju Gao, Mengyu Wang, Hao Zhao

    Abstract: Fairness is an important topic for medical image analysis, driven by the challenge of unbalanced training data among diverse target groups and the societal demand for equitable medical quality. In response to this issue, our research adopts a data-driven strategy-enhancing data balance by integrating synthetic images. However, in terms of generating synthetic images, previous works either lack pai… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: Accepted to MICCAI 2024

  38. arXiv:2407.05703  [pdf, other

    cs.CV

    LGRNet: Local-Global Reciprocal Network for Uterine Fibroid Segmentation in Ultrasound Videos

    Authors: Huihui Xu, Yijun Yang, Angelica I Aviles-Rivero, Guang Yang, Jing Qin, Lei Zhu

    Abstract: Regular screening and early discovery of uterine fibroid are crucial for preventing potential malignant transformations and ensuring timely, life-saving interventions. To this end, we collect and annotate the first ultrasound video dataset with 100 videos for uterine fibroid segmentation (UFUV). We also present Local-Global Reciprocal Network (LGRNet) to efficiently and effectively propagate the l… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: MICCAI2024 Early Accept

  39. arXiv:2407.05128  [pdf, other

    cs.CV

    SCSA: Exploring the Synergistic Effects Between Spatial and Channel Attention

    Authors: Yunzhong Si, Huiying Xu, Xinzhong Zhu, Wenhao Zhang, Yao Dong, Yuxing Chen, Hongbo Li

    Abstract: Channel and spatial attentions have respectively brought significant improvements in extracting feature dependencies and spatial structure relations for various downstream vision tasks. While their combination is more beneficial for leveraging their individual strengths, the synergy between channel and spatial attentions has not been fully explored, lacking in fully harness the synergistic potenti… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

  40. arXiv:2407.04994  [pdf, other

    cs.CV cs.LG

    The Solution for Language-Enhanced Image New Category Discovery

    Authors: Haonan Xu, Dian Chao, Xiangyu Wu, Zhonghua Wan, Yang Yang

    Abstract: Treating texts as images, combining prompts with textual labels for prompt tuning, and leveraging the alignment properties of CLIP have been successfully applied in zero-shot multi-label image recognition. Nonetheless, relying solely on textual labels to store visual information is insufficient for representing the diversity of visual objects. In this paper, we propose reversing the training proce… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

  41. arXiv:2407.04991  [pdf, other

    cs.LG cs.CL

    The Solution for the AIGC Inference Performance Optimization Competition

    Authors: Sishun Pan, Haonan Xu, Zhonghua Wan, Yang Yang

    Abstract: In recent years, the rapid advancement of large-scale pre-trained language models based on transformer architectures has revolutionized natural language processing tasks. Among these, ChatGPT has gained widespread popularity, demonstrating human-level conversational abilities and attracting over 100 million monthly users by late 2022. Concurrently, Baidu's commercial deployment of the Ernie Wenxin… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

  42. arXiv:2407.04699  [pdf, other

    cs.CV cs.AI

    LaRa: Efficient Large-Baseline Radiance Fields

    Authors: Anpei Chen, Haofei Xu, Stefano Esposito, Siyu Tang, Andreas Geiger

    Abstract: Radiance field methods have achieved photorealistic novel view synthesis and geometry reconstruction. But they are mostly applied in per-scene optimization or small-baseline settings. While several recent works investigate feed-forward reconstruction with large baselines by utilizing transformers, they all operate with a standard global attention mechanism and hence ignore the local nature of 3D r… ▽ More

    Submitted 15 July, 2024; v1 submitted 5 July, 2024; originally announced July 2024.

    Comments: Project Page: https://apchenstu.github.io/LaRa/

  43. arXiv:2407.04675  [pdf, other

    eess.AS cs.SD

    Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition

    Authors: Ye Bai, Jingping Chen, Jitong Chen, Wei Chen, Zhuo Chen, Chuang Ding, Linhao Dong, Qianqian Dong, Yujiao Du, Kepan Gao, Lu Gao, Yi Guo, Minglun Han, Ting Han, Wenchao Hu, Xinying Hu, Yuxiang Hu, Deyu Hua, Lu Huang, Mingkun Huang, Youjia Huang, Jishuo Jin, Fanliu Kong, Zongwei Lan, Tianyu Li , et al. (30 additional authors not shown)

    Abstract: Modern automatic speech recognition (ASR) model is required to accurately transcribe diverse speech signals (from different domains, languages, accents, etc) given the specific contextual information in various application scenarios. Classic end-to-end models fused with extra language models perform well, but mainly in data matching scenarios and are gradually approaching a bottleneck. In this wor… ▽ More

    Submitted 10 July, 2024; v1 submitted 5 July, 2024; originally announced July 2024.

  44. arXiv:2407.04368  [pdf, other

    cs.CL cs.SD eess.AS

    Romanization Encoding For Multilingual ASR

    Authors: Wen Ding, Fei Jia, Hainan Xu, Yu Xi, Junjie Lai, Boris Ginsburg

    Abstract: We introduce romanization encoding for script-heavy languages to optimize multilingual and code-switching Automatic Speech Recognition (ASR) systems. By adopting romanization encoding alongside a balanced concatenated tokenizer within a FastConformer-RNNT framework equipped with a Roman2Char module, we significantly reduce vocabulary and output dimensions, enabling larger training batches and redu… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  45. arXiv:2407.04331  [pdf, other

    cs.SD cs.AI eess.AS

    MuseBarControl: Enhancing Fine-Grained Control in Symbolic Music Generation through Pre-Training and Counterfactual Loss

    Authors: Yangyang Shu, Haiming Xu, Ziqin Zhou, Anton van den Hengel, Lingqiao Liu

    Abstract: Automatically generating symbolic music-music scores tailored to specific human needs-can be highly beneficial for musicians and enthusiasts. Recent studies have shown promising results using extensive datasets and advanced transformer architectures. However, these state-of-the-art models generally offer only basic control over aspects like tempo and style for the entire composition, lacking the a… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: Demo is available at: https://ganperf.github.io/musebarcontrol.github.io/musebarcontrol/

  46. arXiv:2407.03474  [pdf

    cs.CY

    How high-status women promote repeated collaboration among women in male-dominated contexts

    Authors: Huimin Xu, Jamie Strassman, Ying Ding, Steven Gray, Maytal Saar-Tsechansky

    Abstract: Male-dominated contexts pose a dilemma: they increase the benefits of repeated collaboration among women, yet at the same time, make such collaborations less likely. This paper seeks to understand the conditions that foster repeated collaboration among women versus men in male-dominated settings by examining the critical role of status hierarchies. Using collaboration data on 8,232,769 computer sc… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  47. arXiv:2407.02887  [pdf, other

    cs.CV

    Explicitly Guided Information Interaction Network for Cross-modal Point Cloud Completion

    Authors: Hang Xu, Chen Long, Wenxiao Zhang, Yuan Liu, Zhen Cao, Zhen Dong, Bisheng Yang

    Abstract: In this paper, we explore a novel framework, EGIInet (Explicitly Guided Information Interaction Network), a model for View-guided Point cloud Completion (ViPC) task, which aims to restore a complete point cloud from a partial one with a single view image. In comparison with previous methods that relied on the global semantics of input images, EGIInet efficiently combines the information from two m… ▽ More

    Submitted 22 July, 2024; v1 submitted 3 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  48. arXiv:2407.02773  [pdf, other

    cs.MM

    OpenVNA: A Framework for Analyzing the Behavior of Multimodal Language Understanding System under Noisy Scenarios

    Authors: Ziqi Yuan, Baozheng Zhang, Hua Xu, Zhiyun Liang, Kai Gao

    Abstract: We present OpenVNA, an open-source framework designed for analyzing the behavior of multimodal language understanding systems under noisy conditions. OpenVNA serves as an intuitive toolkit tailored for researchers, facilitating convenience batch-level robustness evaluation and on-the-fly instance-level demonstration. It primarily features a benchmark Python library for assessing global model robus… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: 10 pages, 4 figures, to be published in ACL 2024 System Demonstration Track

  49. arXiv:2407.02442  [pdf, other

    cs.IT

    A New Achievable Region of the $K$-User MAC Wiretap Channel with Confidential and Open Messages Under Strong Secrecy

    Authors: Hao Xu, Kai-Kit Wong, Giuseppe Caire

    Abstract: This paper investigates the achievable region of a $K$-user discrete memoryless (DM) multiple access wiretap (MAC-WT) channel, where each user transmits both secret and open messages. All these messages are intended for Bob, while Eve is only interested in the secret messages. In the achievable coding strategy, the confidential information is protected by open messages and also by the introduction… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: 61 pages, 15 figures. arXiv admin note: text overlap with arXiv:2209.05403

  50. Coding-Enhanced Cooperative Jamming for Secret Communication in Fluid Antenna Systems

    Authors: Hao Xu, Kai-Kit Wong, Wee Kiat New, Guyue Li, Farshad Rostami Ghadi, Yongxu Zhu, Shi Jin, Chan-Byoung Chae, Yangyang Zhang

    Abstract: This letter investigates the secret communication problem for a fluid antenna system (FAS)-assisted wiretap channel, where the legitimate transmitter transmits an information-bearing signal to the legitimate receiver, and at the same time, transmits a jamming signal to interfere with the eavesdropper (Eve). Unlike the conventional jamming scheme, which usually transmits Gaussian noise that interfe… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: 6 pages, 3 figures, this paper has been accepted by IEEE Communications Letters