Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 1,331 results for author: Yu, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.20427  [pdf, other

    cs.CV eess.IV

    Mean Opinion Score as a New Metric for User-Evaluation of XAI Methods

    Authors: Hyeon Yu, Jenny Benois-Pineau, Romain Bourqui, Romain Giot, Alexey Zhukov

    Abstract: This paper investigates the use of Mean Opinion Score (MOS), a common image quality metric, as a user-centric evaluation metric for XAI post-hoc explainers. To measure the MOS, a user experiment is proposed, which has been conducted with explanation maps of intentionally distorted images. Three methods from the family of feature attribution methods - Gradient-weighted Class Activation Mapping (Gra… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: Supported by organization Laboratoire Bordelais de Recherche en Informatique, 15 pages, 4 figures, 3 tables

    ACM Class: I.4.7

  2. arXiv:2407.19074  [pdf

    cs.CE cs.LG

    Parsimonious Universal Function Approximator for Elastic and Elasto-Plastic Cavity Expansion Problems

    Authors: Xiao-Xuan Chen, Pin Zhang, Hai-Sui Yu, Zhen-Yu Yin, Brian Sheil

    Abstract: Cavity expansion is a canonical problem in geotechnics, which can be described by partial differential equations (PDEs) and ordinary differential equations (ODEs). This study explores the potential of using a new solver, a physics-informed neural network (PINN), to calculate the stress field in an expanded cavity in the elastic and elasto-plastic regimes. Whilst PINNs have emerged as an effective… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  3. arXiv:2407.18766  [pdf, other

    cs.IT eess.SP

    Secrecy Performance Analysis of Integrated RF-UWOC IoT Networks Enabled by UAV and Underwater-RIS

    Authors: Abrar Bin Sarawar, A. S. M. Badrudduza, Md. Ibrahim, Imran Shafique Ansari, Heejung Yu

    Abstract: In the sixth-generation (6G) Internet of Things (IoT) networks, the use of UAV-mounted base stations and reconfigurable intelligent surfaces (RIS) has been considered to enhance coverage, flexibility, and security in non-terrestrial networks (NTNs). In addition to aerial networks enabled by NTN technologies, the integration of underwater networks with 6G IoT can be considered one of the most innov… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

  4. arXiv:2407.18323  [pdf, other

    cs.NI eess.SP

    Active Reconfigurable Intelligent Surface-Aided Terahertz Wireless Communications

    Authors: Waqas Khalid, Heejung Yu, Yazdan Ahmad Qadri

    Abstract: Terahertz (THz) communication is expected to be a key technology for future sixth-generation (6G) wireless networks. Furthermore, reconfigurable intelligent surfaces (RIS) have been proposed to modify the wireless propagation environment and enhance system performance. Given the sensitivity to blockages and limited coverage range, RIS is particularly promising for THz communications. Active RIS ca… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: Submitted in KICS Summer Conference 2024, (19 June 2024 - 22 June 2024), Jeju, Korea

  5. arXiv:2407.17261  [pdf, other

    cs.CV

    Embedding-Free Transformer with Inference Spatial Reduction for Efficient Semantic Segmentation

    Authors: Hyunwoo Yu, Yubin Cho, Beoungwoo Kang, Seunghun Moon, Kyeongbo Kong, Suk-Ju Kang

    Abstract: We present an Encoder-Decoder Attention Transformer, EDAFormer, which consists of the Embedding-Free Transformer (EFT) encoder and the all-attention decoder leveraging our Embedding-Free Attention (EFA) structure. The proposed EFA is a novel global context modeling mechanism that focuses on functioning the global non-linearity, not the specific roles of the query, key and value. For the decoder, w… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

  6. arXiv:2407.17023  [pdf, other

    cs.CL cs.AI

    From Internal Conflict to Contextual Adaptation of Language Models

    Authors: Sara Vera Marjanović, Haeun Yu, Pepa Atanasova, Maria Maistro, Christina Lioma, Isabelle Augenstein

    Abstract: Knowledge-intensive language understanding tasks require Language Models (LMs) to integrate relevant context, mitigating their inherent weaknesses, such as incomplete or outdated knowledge. Nevertheless, studies indicate that LMs often ignore the provided context as it can conflict with the pre-existing LM's memory learned during pre-training. Moreover, conflicting knowledge can already be present… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

    Comments: 22 pages, 15 figures

    MSC Class: 68T50 ACM Class: I.2.7

  7. arXiv:2407.17020  [pdf, other

    cs.CV

    EAFormer: Scene Text Segmentation with Edge-Aware Transformers

    Authors: Haiyang Yu, Teng Fu, Bin Li, Xiangyang Xue

    Abstract: Scene text segmentation aims at cropping texts from scene images, which is usually used to help generative models edit or remove texts. The existing text segmentation methods tend to involve various text-related supervisions for better performance. However, most of them ignore the importance of text edges, which are significant for downstream applications. In this paper, we propose Edge-Aware Tran… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  8. arXiv:2407.16634  [pdf, other

    eess.IV cs.AI cs.CV cs.HC

    Knowledge-driven AI-generated data for accurate and interpretable breast ultrasound diagnoses

    Authors: Haojun Yu, Youcheng Li, Nan Zhang, Zihan Niu, Xuantong Gong, Yanwen Luo, Quanlin Wu, Wangyan Qin, Mengyuan Zhou, Jie Han, Jia Tao, Ziwei Zhao, Di Dai, Di He, Dong Wang, Binghui Tang, Ling Huo, Qingli Zhu, Yong Wang, Liwei Wang

    Abstract: Data-driven deep learning models have shown great capabilities to assist radiologists in breast ultrasound (US) diagnoses. However, their effectiveness is limited by the long-tail distribution of training data, which leads to inaccuracies in rare cases. In this study, we address a long-standing challenge of improving the diagnostic model performance on rare cases using long-tailed data. Specifical… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  9. arXiv:2407.15869  [pdf, other

    cs.LG cs.AI

    Long Input Sequence Network for Long Time Series Forecasting

    Authors: Chao Ma, Yikai Hou, Xiang Li, Yinggang Sun, Haining Yu

    Abstract: Short fixed-length inputs are the main bottleneck of deep learning methods in long time-series forecasting tasks. Prolonging input length causes overfitting, rapidly deteriorating accuracy. Our research indicates that the overfitting is a combination reaction of the multi-scale pattern coupling in time series and the fixed focusing scale of current models. First, we find that the patterns exhibite… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: 9 pages

  10. arXiv:2407.15762  [pdf, other

    cs.LG cs.AI cs.CL

    Conditioned Language Policy: A General Framework for Steerable Multi-Objective Finetuning

    Authors: Kaiwen Wang, Rahul Kidambi, Ryan Sullivan, Alekh Agarwal, Christoph Dann, Andrea Michi, Marco Gelmi, Yunxuan Li, Raghav Gupta, Avinava Dubey, Alexandre Ramé, Johan Ferret, Geoffrey Cideron, Le Hou, Hongkun Yu, Amr Ahmed, Aranyak Mehta, Léonard Hussenot, Olivier Bachem, Edouard Leurent

    Abstract: Reward-based finetuning is crucial for aligning language policies with intended behaviors (e.g., creativity and safety). A key challenge here is to develop steerable language models that trade-off multiple (conflicting) objectives in a flexible and efficient manner. This paper presents Conditioned Language Policy (CLP), a general framework for finetuning language models on multiple objectives. Bui… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

    Comments: 40 pages

  11. arXiv:2407.14785  [pdf, ps, other

    cs.DS

    Stochastic Online Metric Matching: Adversarial is no Harder than Stochastic

    Authors: Amin Saberi, Mingwei Yang, Sophie H. Yu

    Abstract: We study the stochastic online metric matching problem. In this problem, $m$ servers and $n$ requests are located in a metric space, where all servers are available upfront and requests arrive one at a time. In particular, servers are adversarially chosen, and requests are independently drawn from a known distribution. Upon the arrival of a new request, it needs to be immediately and irrevocably m… ▽ More

    Submitted 20 July, 2024; originally announced July 2024.

  12. arXiv:2407.14568  [pdf, other

    cs.CL cs.AI cs.DB

    SQLfuse: Enhancing Text-to-SQL Performance through Comprehensive LLM Synergy

    Authors: Tingkai Zhang, Chaoyu Chen, Cong Liao, Jun Wang, Xudong Zhao, Hang Yu, Jianchao Wang, Jianguo Li, Wenhui Shi

    Abstract: Text-to-SQL conversion is a critical innovation, simplifying the transition from complex SQL to intuitive natural language queries, especially significant given SQL's prevalence in the job market across various roles. The rise of Large Language Models (LLMs) like GPT-3.5 and GPT-4 has greatly advanced this field, offering improved natural language understanding and the ability to generate nuanced… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

  13. arXiv:2407.14239  [pdf, other

    cs.AI

    KoMA: Knowledge-driven Multi-agent Framework for Autonomous Driving with Large Language Models

    Authors: Kemou Jiang, Xuan Cai, Zhiyong Cui, Aoyong Li, Yilong Ren, Haiyang Yu, Hao Yang, Daocheng Fu, Licheng Wen, Pinlong Cai

    Abstract: Large language models (LLMs) as autonomous agents offer a novel avenue for tackling real-world challenges through a knowledge-driven manner. These LLM-enhanced methodologies excel in generalization and interpretability. However, the complexity of driving tasks often necessitates the collaboration of multiple, heterogeneous agents, underscoring the need for such LLM-driven agents to engage in coope… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

    Comments: 13 pages, 18 figures

  14. arXiv:2407.14084  [pdf, other

    math.CO cs.IT

    A Purely Entropic Approach to the Rainbow Triangle Problem

    Authors: Ting-Wei Chao, Hung-Hsun Hans Yu

    Abstract: In this short note, we present a purely entropic proof that in a $3$-edge-colored simple graph with $R$ red edges, $G$ green edges, and $B$ blue edges, the number of rainbow triangles is at most $\sqrt{2RGB}$.

    Submitted 19 July, 2024; originally announced July 2024.

    Comments: 5 pages, 5 figures

    MSC Class: 05D05; 05D40; 94A17

  15. arXiv:2407.13996  [pdf, other

    cs.DC cs.AR cs.PF

    Missile: Fine-Grained, Hardware-Level GPU Resource Isolation for Multi-Tenant DNN Inference

    Authors: Yongkang Zhang, Haoxuan Yu, Chenxia Han, Cheng Wang, Baotong Lu, Yang Li, Xiaowen Chu, Huaicheng Li

    Abstract: Colocating high-priority, latency-sensitive (LS) and low-priority, best-effort (BE) DNN inference services reduces the total cost of ownership (TCO) of GPU clusters. Limited by bottlenecks such as VRAM channel conflicts and PCIe bus contentions, existing GPU sharing solutions are unable to avoid resource conflicts among concurrently executing tasks, failing to achieve both low latency for LS tasks… ▽ More

    Submitted 27 July, 2024; v1 submitted 18 July, 2024; originally announced July 2024.

    Comments: 18 pages, 18 figures

    ACM Class: D.4.9; I.2.5

  16. arXiv:2407.13863  [pdf, other

    cs.CV

    A Closer Look at GAN Priors: Exploiting Intermediate Features for Enhanced Model Inversion Attacks

    Authors: Yixiang Qiu, Hao Fang, Hongyao Yu, Bin Chen, MeiKang Qiu, Shu-Tao Xia

    Abstract: Model Inversion (MI) attacks aim to reconstruct privacy-sensitive training data from released models by utilizing output information, raising extensive concerns about the security of Deep Neural Networks (DNNs). Recent advances in generative adversarial networks (GANs) have contributed significantly to the improved performance of MI attacks due to their powerful ability to generate realistic image… ▽ More

    Submitted 27 July, 2024; v1 submitted 18 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  17. arXiv:2407.11730  [pdf, other

    cs.CV

    Monocular Occupancy Prediction for Scalable Indoor Scenes

    Authors: Hongxiao Yu, Yuqi Wang, Yuntao Chen, Zhaoxiang Zhang

    Abstract: Camera-based 3D occupancy prediction has recently garnered increasing attention in outdoor driving scenes. However, research in indoor scenes remains relatively unexplored. The core differences in indoor scenes lie in the complexity of scene scale and the variance in object size. In this paper, we propose a novel method, named ISO, for predicting indoor scene occupancy using monocular images. ISO… ▽ More

    Submitted 16 July, 2024; v1 submitted 16 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

  18. arXiv:2407.11548  [pdf, other

    cs.IR

    A PLMs based protein retrieval framework

    Authors: Yuxuan Wu, Xiao Yi, Yang Tan, Huiqun Yu, Guisheng Fan

    Abstract: Protein retrieval, which targets the deconstruction of the relationship between sequences, structures and functions, empowers the advancing of biology. Basic Local Alignment Search Tool (BLAST), a sequence-similarity-based algorithm, has proved the efficiency of this field. Despite the existing tools for protein retrieval, they prioritize sequence similarity and probably overlook proteins that are… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: 16 pages, 12 figures

    ACM Class: H.3.3

  19. arXiv:2407.11537  [pdf, other

    cs.CV cs.AI

    AEMIM: Adversarial Examples Meet Masked Image Modeling

    Authors: Wenzhao Xiang, Chang Liu, Hang Su, Hongyang Yu

    Abstract: Masked image modeling (MIM) has gained significant traction for its remarkable prowess in representation learning. As an alternative to the traditional approach, the reconstruction from corrupted images has recently emerged as a promising pretext task. However, the regular corrupted images are generated using generic generators, often lacking relevance to the specific reconstruction task involved… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: Under review of International Journal of Computer Vision (IJCV)

  20. Exploring Knowledge Transfer in Evolutionary Many-task Optimization: A Complex Network Perspective

    Authors: Yudong Yang, Kai Wu, Xiangyi Teng, Handing Wang, He Yu, Jing Liu

    Abstract: The field of evolutionary many-task optimization (EMaTO) is increasingly recognized for its ability to streamline the resolution of optimization challenges with repetitive characteristics, thereby conserving computational resources. This paper tackles the challenge of crafting efficient knowledge transfer mechanisms within EMaTO, a task complicated by the computational demands of individual task e… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: 9 pages, accepted by GECCO 2024 poster

  21. arXiv:2407.08569  [pdf, other

    cs.CV

    Approaching Outside: Scaling Unsupervised 3D Object Detection from 2D Scene

    Authors: Ruiyang Zhang, Hu Zhang, Hang Yu, Zhedong Zheng

    Abstract: The unsupervised 3D object detection is to accurately detect objects in unstructured environments with no explicit supervisory signals. This task, given sparse LiDAR point clouds, often results in compromised performance for detecting distant or small objects due to the inherent sparsity and limited spatial resolution. In this paper, we are among the early attempts to integrate LiDAR data with 2D… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV'24, 18 pages, 5 figures, 6 tables

  22. arXiv:2407.08481  [pdf, other

    eess.IV cs.CV

    SliceMamba for Medical Image Segmentation

    Authors: Chao Fan, Hongyuan Yu, Luo Wang, Yan Huang, Liang Wang, Xibin Jia

    Abstract: Despite the progress made in Mamba-based medical image segmentation models, current methods utilizing unidirectional or multi-directional feature scanning mechanisms fail to well model dependencies between neighboring positions in the image, hindering the effective modeling of local features. However, local features are crucial for medical image segmentation as they provide vital information about… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  23. arXiv:2407.08337  [pdf, other

    cs.LG cs.DC stat.ML

    FedLog: Personalized Federated Classification with Less Communication and More Flexibility

    Authors: Haolin Yu, Guojun Zhang, Pascal Poupart

    Abstract: In federated learning (FL), the common paradigm that FedAvg proposes and most algorithms follow is that clients train local models with their private data, and the model parameters are shared for central aggregation, mostly averaging. In this paradigm, the communication cost is often a challenge, as modern massive neural networks can contain millions to billions parameters. We suggest that clients… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  24. arXiv:2407.08106  [pdf, other

    cs.RO

    SGLC: Semantic Graph-Guided Coarse-Fine-Refine Full Loop Closing for LiDAR SLAM

    Authors: Neng Wang, Xieyuanli Chen, Chenghao Shi, Zhiqiang Zheng, Hongshan Yu, Huimin Lu

    Abstract: Loop closing is a crucial component in SLAM that helps eliminate accumulated errors through two main steps: loop detection and loop pose correction. The first step determines whether loop closing should be performed, while the second estimates the 6-DoF pose to correct odometry drift. Current methods mostly focus on developing robust descriptors for loop closure detection, often neglecting loop po… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: 8 pages, 4 figures

  25. arXiv:2407.07347  [pdf, other

    cs.CV eess.IV

    MNeRV: A Multilayer Neural Representation for Videos

    Authors: Qingling Chang, Haohui Yu, Shuxuan Fu, Zhiqiang Zeng, Chuangquan Chen

    Abstract: As a novel video representation method, Neural Representations for Videos (NeRV) has shown great potential in the fields of video compression, video restoration, and video interpolation. In the process of representing videos using NeRV, each frame corresponds to an embedding, which is then reconstructed into a video frame sequence after passing through a small number of decoding layers (E-NeRV, HN… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: 14 pages, 12 figures, 8 table

  26. arXiv:2407.06772  [pdf, other

    cs.IT eess.SP

    Revealing the evanescent components in Kronecker-product based codebooks: insights and implications

    Authors: Jun Yang, Yijian Chen, Yunqi Sun, Yuan Si, Hongkang Yu, Shujuan Zhang, Zhaohua Lu

    Abstract: The orthogonal bases of discrete Fourier transform (DFT) has been recognized as the standard spatial-domain bases for Type I, Type II and enhanced Type II codewords by the 3rd Generation Partnership Project (3GPP). For uniform planar arrays, these spatial-domain bases are derived as the Kronecker product of one-dimensional DFT bases. Theoretically, each spatial basis corresponds to a beam directed… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: 11 pages, 9 figures

  27. arXiv:2407.06459  [pdf, other

    cs.RO cs.AI cs.LG

    How Much Progress Did I Make? An Unexplored Human Feedback Signal for Teaching Robots

    Authors: Hang Yu, Qidi Fang, Shijie Fang, Reuben M. Aronson, Elaine Schaertl Short

    Abstract: Enhancing the expressiveness of human teaching is vital for both improving robots' learning from humans and the human-teaching-robot experience. In this work, we characterize and test a little-used teaching signal: \textit{progress}, designed to represent the completion percentage of a task. We conducted two online studies with 76 crowd-sourced participants and one public space study with 40 non-e… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: 8 pages. RO-MAN 2024

  28. arXiv:2407.05643  [pdf, other

    cs.IT eess.SP

    Spatial Non-Stationary Dual-Wideband Channel Estimation for XL-MIMO Systems

    Authors: Anzheng Tang, Jun-Bo Wang, Yijin Pan, Tuo Wu, Chuanwen Chang, Yijian Chen, Hongkang Yu, Maged Elkashlan

    Abstract: In this paper, we investigate the channel estimation problem for extremely large-scale multi-input and multi-output (XL-MIMO) systems, considering the spherical wavefront effect, spatially non-stationary (SnS) property, and dual-wideband effects. To accurately characterize the XL-MIMO channel, we first derive a novel spatial-and-frequency-domain channel model for XL-MIMO systems and carefully exam… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: This paper has been submitted to IEEE journal for possible publication

  29. arXiv:2407.05571  [pdf, other

    cs.NI eess.SP

    Cost-Efficient Computation Offloading in SAGIN: A Deep Reinforcement Learning and Perception-Aided Approach

    Authors: Yulan Gao, Ziqiang Ye, Han Yu

    Abstract: The Space-Air-Ground Integrated Network (SAGIN), crucial to the advancement of sixth-generation (6G) technology, plays a key role in ensuring universal connectivity, particularly by addressing the communication needs of remote areas lacking cellular network infrastructure. This paper delves into the role of unmanned aerial vehicles (UAVs) within SAGIN, where they act as a control layer owing to th… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

  30. arXiv:2407.03672  [pdf, other

    cs.LG cs.AI

    A Survey of Data Synthesis Approaches

    Authors: Hsin-Yu Chang, Pei-Yu Chen, Tun-Hsiang Chou, Chang-Sheng Kao, Hsuan-Yun Yu, Yen-Ting Lin, Yun-Nung Chen

    Abstract: This paper provides a detailed survey of synthetic data techniques. We first discuss the expected goals of using synthetic data in data augmentation, which can be divided into four parts: 1) Improving Diversity, 2) Data Balancing, 3) Addressing Domain Shift, and 4) Resolving Edge Cases. Synthesizing data are closely related to the prevailing machine learning techniques at the time, therefore, we s… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  31. arXiv:2407.03418  [pdf, other

    cs.LG cs.AI cs.CL cs.CV

    HEMM: Holistic Evaluation of Multimodal Foundation Models

    Authors: Paul Pu Liang, Akshay Goindani, Talha Chafekar, Leena Mathur, Haofei Yu, Ruslan Salakhutdinov, Louis-Philippe Morency

    Abstract: Multimodal foundation models that can holistically process text alongside images, video, audio, and other sensory modalities are increasingly used in a variety of real-world applications. However, it is challenging to characterize and study progress in multimodal foundation models, given the range of possible modeling decisions, tasks, and domains. In this paper, we introduce Holistic Evaluation o… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: Code available at https://github.com/pliang279/HEMM

  32. arXiv:2407.03205  [pdf, other

    cs.CV

    Category-Aware Dynamic Label Assignment with High-Quality Oriented Proposal

    Authors: Mingkui Feng, Hancheng Yu, Xiaoyu Dang, Ming Zhou

    Abstract: Objects in aerial images are typically embedded in complex backgrounds and exhibit arbitrary orientations. When employing oriented bounding boxes (OBB) to represent arbitrary oriented objects, the periodicity of angles could lead to discontinuities in label regression values at the boundaries, inducing abrupt fluctuations in the loss function. To address this problem, an OBB representation based o… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  33. arXiv:2407.02052  [pdf, other

    eess.AS cs.SD

    The USTC-NERCSLIP Systems for The ICMC-ASR Challenge

    Authors: Minghui Wu, Luzhen Xu, Jie Zhang, Haitao Tang, Yanyan Yue, Ruizhi Liao, Jintao Zhao, Zhengzhe Zhang, Yichi Wang, Haoyin Yan, Hongliang Yu, Tongle Ma, Jiachen Liu, Chongliang Wu, Yongchao Li, Yanyong Zhang, Xin Fang, Yue Zhang

    Abstract: This report describes the submitted system to the In-Car Multi-Channel Automatic Speech Recognition (ICMC-ASR) challenge, which considers the ASR task with multi-speaker overlapping and Mandarin accent dynamics in the ICMC case. We implement the front-end speaker diarization using the self-supervised learning representation based multi-speaker embedding and beamforming using the speaker position,… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: Accepted at ICASSP 2024

  34. arXiv:2407.01976  [pdf, other

    cs.CL cs.AI cs.MM

    A Bounding Box is Worth One Token: Interleaving Layout and Text in a Large Language Model for Document Understanding

    Authors: Jinghui Lu, Haiyang Yu, Yanjie Wang, Yongjie Ye, Jingqun Tang, Ziwei Yang, Binghong Wu, Qi Liu, Hao Feng, Han Wang, Hao Liu, Can Huang

    Abstract: Recently, many studies have demonstrated that exclusively incorporating OCR-derived text and spatial layouts with large language models (LLMs) can be highly effective for document understanding tasks. However, existing methods that integrate spatial layouts with text have limitations, such as producing overly long text sequences or failing to fully leverage the autoregressive traits of LLMs. In th… ▽ More

    Submitted 24 July, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

  35. arXiv:2407.01336  [pdf, other

    cs.IT eess.SP

    Compressed Sensing Inspired User Acquisition for Downlink Integrated Sensing and Communication Transmissions

    Authors: Yi Song, Fernando Pedraza, Shuangyang Li, Siyao Li, Han Yu, Giuseppe Caire

    Abstract: This paper investigates radar-assisted user acquisition for downlink multi-user multiple-input multiple-output (MIMO) transmission using Orthogonal Frequency Division Multiplexing (OFDM) signals. Specifically, we formulate a concise mathematical model for the user acquisition problem, where each user is characterized by its delay and beamspace response. Therefore, we propose a two-stage method for… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  36. Personalized Federated Continual Learning via Multi-granularity Prompt

    Authors: Hao Yu, Xin Yang, Xin Gao, Yan Kang, Hao Wang, Junbo Zhang, Tianrui Li

    Abstract: Personalized Federated Continual Learning (PFCL) is a new practical scenario that poses greater challenges in sharing and personalizing knowledge. PFCL not only relies on knowledge fusion for server aggregation at the global spatial-temporal perspective but also needs model improvement for each client according to the local requirements. Existing methods, whether in Personalized Federated Learning… ▽ More

    Submitted 27 June, 2024; originally announced July 2024.

    Comments: Accepted by KDD 2024 Research Track

  37. arXiv:2406.19741  [pdf, other

    cs.RO cs.AI

    ROS-LLM: A ROS framework for embodied AI with task feedback and structured reasoning

    Authors: Christopher E. Mower, Yuhui Wan, Hongzhan Yu, Antoine Grosnit, Jonas Gonzalez-Billandon, Matthieu Zimmer, Jinlong Wang, Xinyu Zhang, Yao Zhao, Anbang Zhai, Puze Liu, Daniel Palenicek, Davide Tateo, Cesar Cadena, Marco Hutter, Jan Peters, Guangjian Tian, Yuzheng Zhuang, Kun Shao, Xingyue Quan, Jianye Hao, Jun Wang, Haitham Bou-Ammar

    Abstract: We present a framework for intuitive robot programming by non-experts, leveraging natural language prompts and contextual information from the Robot Operating System (ROS). Our system integrates large language models (LLMs), enabling non-experts to articulate task requirements to the system through a chat interface. Key features of the framework include: integration of ROS with an AI agent connect… ▽ More

    Submitted 12 July, 2024; v1 submitted 28 June, 2024; originally announced June 2024.

    Comments: This document contains 26 pages and 13 figures

  38. arXiv:2406.17419  [pdf, other

    cs.CL cs.AI

    Leave No Document Behind: Benchmarking Long-Context LLMs with Extended Multi-Doc QA

    Authors: Minzheng Wang, Longze Chen, Cheng Fu, Shengyi Liao, Xinghua Zhang, Bingli Wu, Haiyang Yu, Nan Xu, Lei Zhang, Run Luo, Yunshui Li, Min Yang, Fei Huang, Yongbin Li

    Abstract: Long-context modeling capabilities have garnered widespread attention, leading to the emergence of Large Language Models (LLMs) with ultra-context windows. Meanwhile, benchmarks for evaluating long-context LLMs are gradually catching up. However, existing benchmarks employ irrelevant noise texts to artificially extend the length of test cases, diverging from the real-world scenarios of long-contex… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: We release our code and data publicly at https://github.com/MozerWang/Loong

  39. arXiv:2406.17262  [pdf, other

    cs.CL

    D2LLM: Decomposed and Distilled Large Language Models for Semantic Search

    Authors: Zihan Liao, Hang Yu, Jianguo Li, Jun Wang, Wei Zhang

    Abstract: The key challenge in semantic search is to create models that are both accurate and efficient in pinpointing relevant sentences for queries. While BERT-style bi-encoders excel in efficiency with pre-computed embeddings, they often miss subtle nuances in search tasks. Conversely, GPT-style LLMs with cross-encoder designs capture these nuances but are computationally intensive, hindering real-time a… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  40. arXiv:2406.15363  [pdf

    cs.CL

    Exploring LLM Multi-Agents for ICD Coding

    Authors: Rumeng Li, Xun Wang, Hong Yu

    Abstract: Large Language Models (LLMs) have demonstrated impressive and diverse abilities that can benefit various domains, such as zero and few-shot information extraction from clinical text without domain-specific training. However, for the ICD coding task, they often hallucinate key details and produce high recall but low precision results due to the high-dimensional and skewed distribution of the ICD co… ▽ More

    Submitted 1 April, 2024; originally announced June 2024.

  41. arXiv:2406.13972  [pdf, other

    cs.SE

    CREF: An LLM-based Conversational Software Repair Framework for Programming Tutors

    Authors: Boyang Yang, Haoye Tian, Weiguo Pian, Haoran Yu, Haitao Wang, Jacques Klein, Tegawendé F. Bissyandé, Shunfu Jin

    Abstract: Program repair techniques offer cost-saving benefits for debugging within software development and programming education scenarios. With the proven effectiveness of Large Language Models (LLMs) in code-related tasks, researchers have explored their potential for program repair. However, it is crucial to recognize that existing repair benchmarks may have influenced LLM training data, potentially ca… ▽ More

    Submitted 8 July, 2024; v1 submitted 19 June, 2024; originally announced June 2024.

  42. arXiv:2406.13578  [pdf, other

    cs.CL

    Enhancing Distractor Generation for Multiple-Choice Questions with Retrieval Augmented Pretraining and Knowledge Graph Integration

    Authors: Han-Cheng Yu, Yu-An Shih, Kin-Man Law, Kai-Yu Hsieh, Yu-Chen Cheng, Hsin-Chih Ho, Zih-An Lin, Wen-Chuan Hsu, Yao-Chung Fan

    Abstract: In this paper, we tackle the task of distractor generation (DG) for multiple-choice questions. Our study introduces two key designs. First, we propose \textit{retrieval augmented pretraining}, which involves refining the language model pretraining to align it more closely with the downstream task of DG. Second, we explore the integration of knowledge graphs to enhance the performance of DG. Throug… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: Findings at ACL 2024

  43. arXiv:2406.12793  [pdf, other

    cs.CL

    ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools

    Authors: Team GLM, :, Aohan Zeng, Bin Xu, Bowen Wang, Chenhui Zhang, Da Yin, Dan Zhang, Diego Rojas, Guanyu Feng, Hanlin Zhao, Hanyu Lai, Hao Yu, Hongning Wang, Jiadai Sun, Jiajie Zhang, Jiale Cheng, Jiayi Gui, Jie Tang, Jing Zhang, Jingyu Sun, Juanzi Li, Lei Zhao, Lindong Wu, Lucen Zhong , et al. (34 additional authors not shown)

    Abstract: We introduce ChatGLM, an evolving family of large language models that we have been developing over time. This report primarily focuses on the GLM-4 language series, which includes GLM-4, GLM-4-Air, and GLM-4-9B. They represent our most capable models that are trained with all the insights and lessons gained from the preceding three generations of ChatGLM. To date, the GLM-4 models are pre-trained… ▽ More

    Submitted 29 July, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

  44. arXiv:2406.12195  [pdf, other

    quant-ph cs.LG

    Quantum Compiling with Reinforcement Learning on a Superconducting Processor

    Authors: Z. T. Wang, Qiuhao Chen, Yuxuan Du, Z. H. Yang, Xiaoxia Cai, Kaixuan Huang, Jingning Zhang, Kai Xu, Jun Du, Yinan Li, Yuling Jiao, Xingyao Wu, Wu Liu, Xiliang Lu, Huikai Xu, Yirong Jin, Ruixia Wang, Haifeng Yu, S. P. Zhao

    Abstract: To effectively implement quantum algorithms on noisy intermediate-scale quantum (NISQ) processors is a central task in modern quantum technology. NISQ processors feature tens to a few hundreds of noisy qubits with limited coherence times and gate operations with errors, so NISQ algorithms naturally require employing circuits of short lengths via quantum compilation. Here, we develop a reinforcemen… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  45. arXiv:2406.11274  [pdf, other

    cs.CL

    Skip-Layer Attention: Bridging Abstract and Detailed Dependencies in Transformers

    Authors: Qian Chen, Wen Wang, Qinglin Zhang, Siqi Zheng, Shiliang Zhang, Chong Deng, Hai Yu, Jiaqing Liu, Yukun Ma, Chong Zhang

    Abstract: The Transformer architecture has significantly advanced deep learning, particularly in natural language processing, by effectively managing long-range dependencies. However, as the demand for understanding complex relationships grows, refining the Transformer's architecture becomes critical. This paper introduces Skip-Layer Attention (SLA) to enhance Transformer models by enabling direct attention… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 7 pages, 1 figure

  46. arXiv:2406.10593  [pdf, other

    cs.AI cs.DB cs.IR

    QDA-SQL: Questions Enhanced Dialogue Augmentation for Multi-Turn Text-to-SQL

    Authors: Yinggang Sun, Ziming Guo, Haining Yu, Chuanyi Liu, Xiang Li, Bingxuan Wang, Xiangzhan Yu, Tiancheng Zhao

    Abstract: Fine-tuning large language models (LLMs) for specific domain tasks has achieved great success in Text-to-SQL tasks. However, these fine-tuned models often face challenges with multi-turn Text-to-SQL tasks caused by ambiguous or unanswerable questions. It is desired to enhance LLMs to handle multiple types of questions in multi-turn Text-to-SQL tasks. To address this, we propose a novel data augmen… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: 13 pages, 7 figures

  47. arXiv:2406.09394  [pdf, other

    cs.CV cs.GR

    WonderWorld: Interactive 3D Scene Generation from a Single Image

    Authors: Hong-Xing Yu, Haoyi Duan, Charles Herrmann, William T. Freeman, Jiajun Wu

    Abstract: We present WonderWorld, a novel framework for interactive 3D scene extrapolation that enables users to explore and shape virtual environments based on a single input image and user-specified text. While significant improvements have been made to the visual quality of scene generation, existing methods are run offline, taking tens of minutes to hours to generate a scene. By leveraging Fast Gaussian… ▽ More

    Submitted 14 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: Project website: https://WonderWorld-2024.github.io/

  48. arXiv:2406.09205  [pdf, other

    cs.CL cs.AI

    ReadCtrl: Personalizing text generation with readability-controlled instruction learning

    Authors: Hieu Tran, Zonghai Yao, Lingxi Li, Hong Yu

    Abstract: Content generation conditioning on users's readability is an important application for personalization. In an era of large language models (LLMs), readability-controlled text generation based on LLMs has become increasingly important. This paper introduces a novel methodology called "Readability-Controlled Instruction Learning (ReadCtrl)," which aims to instruction-tune LLMs to tailor users' reada… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: 9 pages

  49. arXiv:2406.07472  [pdf, other

    cs.CV

    4Real: Towards Photorealistic 4D Scene Generation via Video Diffusion Models

    Authors: Heng Yu, Chaoyang Wang, Peiye Zhuang, Willi Menapace, Aliaksandr Siarohin, Junli Cao, Laszlo A Jeni, Sergey Tulyakov, Hsin-Ying Lee

    Abstract: Existing dynamic scene generation methods mostly rely on distilling knowledge from pre-trained 3D generative models, which are typically fine-tuned on synthetic object datasets. As a result, the generated scenes are often object-centric and lack photorealism. To address these limitations, we introduce a novel pipeline designed for photorealistic text-to-4D scene generation, discarding the dependen… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  50. arXiv:2406.07103  [pdf, other

    eess.AS cs.AI

    MR-RawNet: Speaker verification system with multiple temporal resolutions for variable duration utterances using raw waveforms

    Authors: Seung-bin Kim, Chan-yeong Lim, Jungwoo Heo, Ju-ho Kim, Hyun-seo Shin, Kyo-Won Koo, Ha-Jin Yu

    Abstract: In speaker verification systems, the utilization of short utterances presents a persistent challenge, leading to performance degradation primarily due to insufficient phonetic information to characterize the speakers. To overcome this obstacle, we propose a novel structure, MR-RawNet, designed to enhance the robustness of speaker verification systems against variable duration utterances using raw… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: 5 pages, accepted by Interspeech 2024