Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 177 results for author: Meng, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.18854  [pdf, other

    cs.CV cs.AI

    Unifying Visual and Semantic Feature Spaces with Diffusion Models for Enhanced Cross-Modal Alignment

    Authors: Yuze Zheng, Zixuan Li, Xiangxian Li, Jinxing Liu, Yuqing Wang, Xiangxu Meng, Lei Meng

    Abstract: Image classification models often demonstrate unstable performance in real-world applications due to variations in image information, driven by differing visual perspectives of subject objects and lighting discrepancies. To mitigate these challenges, existing studies commonly incorporate additional modal information matching the visual data to regularize the model's learning process, enabling the… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

  2. arXiv:2407.13175  [pdf, other

    cs.RO

    OVGNet: A Unified Visual-Linguistic Framework for Open-Vocabulary Robotic Grasping

    Authors: Li Meng, Zhao Qi, Lyu Shuchang, Wang Chunlei, Ma Yujing, Cheng Guangliang, Yang Chenguang

    Abstract: Recognizing and grasping novel-category objects remains a crucial yet challenging problem in real-world robotic applications. Despite its significance, limited research has been conducted in this specific domain. To address this, we seamlessly propose a novel framework that integrates open-vocabulary learning into the domain of robotic grasping, empowering robots with the capability to adeptly han… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: Accepted in IROS2024

  3. arXiv:2407.12537  [pdf, other

    cs.RO eess.SP

    Collaborative Fall Detection and Response using Wi-Fi Sensing and Mobile Companion Robot

    Authors: Yunwang Chen, Yaozhong Kang, Ziqi Zhao, Yue Hong, Lingxiao Meng, Max Q. -H. Meng

    Abstract: This paper presents a collaborative fall detection and response system integrating Wi-Fi sensing with robotic assistance. The proposed system leverages channel state information (CSI) disruptions caused by movements to detect falls in non-line-of-sight (NLOS) scenarios, offering non-intrusive monitoring. Besides, a companion robot is utilized to provide assistance capabilities to navigate and resp… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: Draft for the submission of Robio 2024

  4. arXiv:2407.10376  [pdf, other

    q-bio.NC cs.CL

    Large Language Model-based FMRI Encoding of Language Functions for Subjects with Neurocognitive Disorder

    Authors: Yuejiao Wang, Xianmin Gong, Lingwei Meng, Xixin Wu, Helen Meng

    Abstract: Functional magnetic resonance imaging (fMRI) is essential for developing encoding models that identify functional changes in language-related brain areas of individuals with Neurocognitive Disorders (NCD). While large language model (LLM)-based fMRI encoding has shown promise, existing studies predominantly focus on healthy, young adults, overlooking older NCD populations and cognitive level corre… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: 5 pages, accepted by Interspeech 2024

  5. arXiv:2407.09817  [pdf, other

    cs.SD cs.CL eess.AS

    Empowering Whisper as a Joint Multi-Talker and Target-Talker Speech Recognition System

    Authors: Lingwei Meng, Jiawen Kang, Yuejiao Wang, Zengrui Jin, Xixin Wu, Xunying Liu, Helen Meng

    Abstract: Multi-talker speech recognition and target-talker speech recognition, both involve transcription in multi-talker contexts, remain significant challenges. However, existing methods rarely attempt to simultaneously address both tasks. In this study, we propose a pioneering approach to empower Whisper, which is a speech foundation model, to tackle joint multi-talker and target-talker speech recogniti… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

    Comments: Accepted to INTERSPEECH 2024

  6. arXiv:2407.08551  [pdf, other

    cs.CL cs.SD eess.AS

    Autoregressive Speech Synthesis without Vector Quantization

    Authors: Lingwei Meng, Long Zhou, Shujie Liu, Sanyuan Chen, Bing Han, Shujie Hu, Yanqing Liu, Jinyu Li, Sheng Zhao, Xixin Wu, Helen Meng, Furu Wei

    Abstract: We present MELLE, a novel continuous-valued tokens based language modeling approach for text to speech synthesis (TTS). MELLE autoregressively generates continuous mel-spectrogram frames directly from text condition, bypassing the need for vector quantization, which are originally designed for audio compression and sacrifice fidelity compared to mel-spectrograms. Specifically, (i) instead of cross… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  7. arXiv:2407.06861  [pdf, other

    cs.CV

    Window-to-Window BEV Representation Learning for Limited FoV Cross-View Geo-localization

    Authors: Lei Cheng, Teng Wang, Lingquan Meng, Changyin Sun

    Abstract: Cross-view geo-localization confronts significant challenges due to large perspective changes, especially when the ground-view query image has a limited field of view with unknown orientation. To bridge the cross-view domain gap, we for the first time explore to learn a BEV representation directly from the ground query image. However, the unknown orientation between ground and aerial images combin… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  8. arXiv:2407.06730  [pdf, other

    cs.CV

    LVLM-empowered Multi-modal Representation Learning for Visual Place Recognition

    Authors: Teng Wang, Lingquan Meng, Lei Cheng, Changyin Sun

    Abstract: Visual place recognition (VPR) remains challenging due to significant viewpoint changes and appearance variations. Mainstream works tackle these challenges by developing various feature aggregation methods to transform deep features into robust and compact global representations. Unfortunately, satisfactory results cannot be achieved under challenging conditions. We start from a new perspective an… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  9. arXiv:2406.14123  [pdf

    cs.CY

    Mapping AI Ethics Narratives: Evidence from Twitter Discourse Between 2015 and 2022

    Authors: Mengyi Wei, Puzhen Zhang, Chuan Chen, Dongsheng Chen, Chenyu Zuo, Liqiu Meng

    Abstract: Public participation is indispensable for an insightful understanding of the ethics issues raised by AI technologies. Twitter is selected in this paper to serve as an online public sphere for exploring discourse on AI ethics, facilitating broad and equitable public engagement in the development of AI technology. A research framework is proposed to demonstrate how to transform AI ethics-related dis… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 22 pages, 6 figures

  10. arXiv:2406.11739  [pdf, other

    cs.CV

    V3Det Challenge 2024 on Vast Vocabulary and Open Vocabulary Object Detection: Methods and Results

    Authors: Jiaqi Wang, Yuhang Zang, Pan Zhang, Tao Chu, Yuhang Cao, Zeyi Sun, Ziyu Liu, Xiaoyi Dong, Tong Wu, Dahua Lin, Zeming Chen, Zhi Wang, Lingchen Meng, Wenhao Yao, Jianwei Yang, Sihong Wu, Zhineng Chen, Zuxuan Wu, Yu-Gang Jiang, Peixi Wu, Bosong Chai, Xuan Nie, Longquan Yan, Zeyu Wang, Qifan Zhou , et al. (9 additional authors not shown)

    Abstract: Detecting objects in real-world scenes is a complex task due to various challenges, including the vast range of object categories, and potential encounters with previously unknown or unseen objects. The challenges necessitate the development of public benchmarks and challenges to advance the field of object detection. Inspired by the success of previous COCO and LVIS Challenges, we organize the V3… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  11. arXiv:2406.07855  [pdf, other

    cs.CL cs.SD eess.AS

    VALL-E R: Robust and Efficient Zero-Shot Text-to-Speech Synthesis via Monotonic Alignment

    Authors: Bing Han, Long Zhou, Shujie Liu, Sanyuan Chen, Lingwei Meng, Yanming Qian, Yanqing Liu, Sheng Zhao, Jinyu Li, Furu Wei

    Abstract: With the help of discrete neural audio codecs, large language models (LLM) have increasingly been recognized as a promising methodology for zero-shot Text-to-Speech (TTS) synthesis. However, sampling based decoding strategies bring astonishing diversity to generation, but also pose robustness issues such as typos, omissions and repetition. In addition, the high sampling rate of audio also brings h… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: 15 pages, 5 figures

  12. arXiv:2406.06909  [pdf, other

    cs.LG cond-mat.dis-nn stat.ML

    Training Dynamics of Nonlinear Contrastive Learning Model in the High Dimensional Limit

    Authors: Lineghuan Meng, Chuang Wang

    Abstract: This letter presents a high-dimensional analysis of the training dynamics for a single-layer nonlinear contrastive learning model. The empirical distribution of the model weights converges to a deterministic measure governed by a McKean-Vlasov nonlinear partial differential equation (PDE). Under L2 regularization, this PDE reduces to a closed set of low-dimensional ordinary differential equations… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: 21 pages, 11 figures

  13. arXiv:2406.06592  [pdf, other

    cs.CL cs.LG

    Improve Mathematical Reasoning in Language Models by Automated Process Supervision

    Authors: Liangchen Luo, Yinxiao Liu, Rosanne Liu, Samrat Phatale, Harsh Lara, Yunxuan Li, Lei Shu, Yun Zhu, Lei Meng, Jiao Sun, Abhinav Rastogi

    Abstract: Complex multi-step reasoning tasks, such as solving mathematical problems or generating code, remain a significant hurdle for even the most advanced large language models (LLMs). Verifying LLM outputs with an Outcome Reward Model (ORM) is a standard inference-time technique aimed at enhancing the reasoning performance of LLMs. However, this still proves insufficient for reasoning tasks with a leng… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: 18 pages, 5 figures, 1 table

  14. arXiv:2406.04523  [pdf, other

    cs.CL cs.LG

    Proofread: Fixes All Errors with One Tap

    Authors: Renjie Liu, Yanxiang Zhang, Yun Zhu, Haicheng Sun, Yuanbo Zhang, Michael Xuelin Huang, Shanqing Cai, Lei Meng, Shumin Zhai

    Abstract: The impressive capabilities in Large Language Models (LLMs) provide a powerful approach to reimagine users' typing experience. This paper demonstrates Proofread, a novel Gboard feature powered by a server-side LLM in Gboard, enabling seamless sentence-level and paragraph-level corrections with a single tap. We describe the complete system in this paper, from data generation, metrics design to mode… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: 8 pages, 3 figures, 2 tables

  15. arXiv:2406.04334  [pdf, other

    cs.CV

    DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effective for LMMs

    Authors: Lingchen Meng, Jianwei Yang, Rui Tian, Xiyang Dai, Zuxuan Wu, Jianfeng Gao, Yu-Gang Jiang

    Abstract: Most large multimodal models (LMMs) are implemented by feeding visual tokens as a sequence into the first layer of a large language model (LLM). The resulting architecture is simple but significantly increases computation and memory costs, as it has to handle a large number of additional tokens in its input layer. This paper presents a new architecture DeepStack for LMMs. Considering $N$ layers in… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Project Page: https://deepstack-vl.github.io/

  16. arXiv:2406.01151  [pdf, other

    cs.AR

    A 0.96pJ/SOP, 30.23K-neuron/mm^2 Heterogeneous Neuromorphic Chip With Fullerene-like Interconnection Topology for Edge-AI Computing

    Authors: P. J. Zhou, Q. Yu, M. Chen, Y. C. Wang, L. W. Meng, Y. Zuo, N. Ning, Y. Liu, S. G. Hu, G. C. Qiao

    Abstract: Edge-AI computing requires high energy efficiency, low power consumption, and relatively high flexibility and compact area, challenging the AI-chip design. This work presents a 0.96 pJ/SOP heterogeneous neuromorphic system-on-chip (SoC) with fullerene-like interconnection topology for edge-AI computing. The neuromorphic core integrates different technologies to augment computing energy efficiency,… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: 5 pages, 8 figures

  17. arXiv:2405.20046  [pdf, other

    cs.AI

    Cross-Training with Multi-View Knowledge Fusion for Heterogenous Federated Learning

    Authors: Zhuang Qi, Lei Meng, Weihao He, Ruohan Zhang, Yu Wang, Xin Qi, Xiangxu Meng

    Abstract: Federated learning benefits from cross-training strategies, which enables models to train on data from distinct sources to improve the generalization capability. However, the data heterogeneity between sources may lead models to gradually forget previously acquired knowledge when undergoing cross-training to adapt to new tasks or data sources. We argue that integrating personalized and global know… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  18. arXiv:2405.16178  [pdf, other

    cs.CL

    Accelerating Inference of Retrieval-Augmented Generation via Sparse Context Selection

    Authors: Yun Zhu, Jia-Chen Gu, Caitlin Sikora, Ho Ko, Yinxiao Liu, Chu-Cheng Lin, Lei Shu, Liangchen Luo, Lei Meng, Bang Liu, Jindong Chen

    Abstract: Large language models (LLMs) augmented with retrieval exhibit robust performance and extensive versatility by incorporating external contexts. However, the input length grows linearly in the number of retrieved documents, causing a dramatic increase in latency. In this paper, we propose a novel paradigm named Sparse RAG, which seeks to cut computation costs through sparsity. Specifically, Sparse R… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

  19. arXiv:2405.13848  [pdf, other

    cs.LG

    Maximum Manifold Capacity Representations in State Representation Learning

    Authors: Li Meng, Morten Goodwin, Anis Yazidi, Paal Engelstad

    Abstract: The expanding research on manifold-based self-supervised learning (SSL) builds on the manifold hypothesis, which suggests that the inherent complexity of high-dimensional data can be unraveled through lower-dimensional manifold embeddings. Capitalizing on this, DeepInfomax with an unbalanced atlas (DIM-UA) has emerged as a powerful tool and yielded impressive results for state representations in r… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  20. arXiv:2405.09375  [pdf, other

    cs.RO

    VascularPilot3D: Toward a 3D fully autonomous navigation for endovascular robotics

    Authors: Song Jingwei, Yang Keke, Chen Han, Liu Jiayi, Gu Yinan, Hui Qianxin, Huang Yanqi, Li Meng, Zhang Zheng, Cao Tuoyu, Ghaffari Maani

    Abstract: This research reports VascularPilot3D, the first 3D fully autonomous endovascular robot navigation system. As an exploration toward autonomous guidewire navigation, VascularPilot3D is developed as a complete navigation system based on intra-operative imaging systems (fluoroscopic X-ray in this study) and typical endovascular robots. VascularPilot3D adopts previously researched fast 3D-2D vessel re… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

    Comments: Submitted to MICCAI2024

  21. arXiv:2405.07687  [pdf, other

    cs.RO

    Highly Efficient Observation Process based on FFT Filtering for Robot Swarm Collaborative Navigation in Unknown Environments

    Authors: Chenxi Li, Weining Lu, Zhihao Ma, Litong Meng, Bin Liang

    Abstract: Collaborative path planning for robot swarms in complex, unknown environments without external positioning is a challenging problem. This requires robots to find safe directions based on real-time environmental observations, and to efficiently transfer and fuse these observations within the swarm. This study presents a filtering method based on Fast Fourier Transform (FFT) to address these two iss… ▽ More

    Submitted 17 July, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

    Comments: 8 pages, 8 figures, 1 table

  22. arXiv:2404.13899  [pdf, other

    cs.CL cs.AI cs.MM

    Towards Better Text-to-Image Generation Alignment via Attention Modulation

    Authors: Yihang Wu, Xiao Cao, Kaixin Li, Zitan Chen, Haonan Wang, Lei Meng, Zhiyong Huang

    Abstract: In text-to-image generation tasks, the advancements of diffusion models have facilitated the fidelity of generated results. However, these models encounter challenges when processing text prompts containing multiple entities and attributes. The uneven distribution of attention results in the issues of entity leakage and attribute misalignment. Training from scratch to address this issue requires n… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  23. arXiv:2404.06037  [pdf, other

    cs.DC

    A Survey of Distributed Graph Algorithms on Massive Graphs

    Authors: Lingkai Meng, Yu Shao, Long Yuan, Longbin Lai, Peng Cheng, Xue Li, Wenyuan Yu, Wenjie Zhang, Xuemin Lin, Jingren Zhou

    Abstract: Distributed processing of large-scale graph data has many practical applications and has been widely studied. In recent years, a lot of distributed graph processing frameworks and algorithms have been proposed. While many efforts have been devoted to analyzing these, with most analyzing them based on programming models, less research focuses on understanding their challenges in distributed environ… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

  24. arXiv:2403.14941  [pdf, other

    cs.LG cs.AI

    Unifying Lane-Level Traffic Prediction from a Graph Structural Perspective: Benchmark and Baseline

    Authors: Shuhao Li, Yue Cui, Jingyi Xu, Libin Li, Lingkai Meng, Weidong Yang, Fan Zhang, Xiaofang Zhou

    Abstract: Traffic prediction has long been a focal and pivotal area in research, witnessing both significant strides from city-level to road-level predictions in recent years. With the advancement of Vehicle-to-Everything (V2X) technologies, autonomous driving, and large-scale models in the traffic domain, lane-level traffic prediction has emerged as an indispensable direction. However, further progress in… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

  25. arXiv:2403.10056  [pdf, other

    cs.CL cs.AI

    Don't Half-listen: Capturing Key-part Information in Continual Instruction Tuning

    Authors: Yongquan He, Xuancheng Huang, Minghao Tang, Lingxun Meng, Xiang Li, Wei Lin, Wenyuan Zhang, Yifu Gao

    Abstract: Instruction tuning for large language models (LLMs) can drive them to produce results consistent with human goals in specific downstream tasks. However, the process of continual instruction tuning (CIT) for LLMs may bring about the catastrophic forgetting (CF) problem, where previously learned abilities are degraded. Recent methods try to alleviate the CF problem by modifying models or replaying d… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

    Comments: 18 pages, 4 figures

  26. arXiv:2403.08216  [pdf, other

    cs.LG cs.CV

    PaddingFlow: Improving Normalizing Flows with Padding-Dimensional Noise

    Authors: Qinglong Meng, Chongkun Xia, Xueqian Wang

    Abstract: Normalizing flow is a generative modeling approach with efficient sampling. However, Flow-based models suffer two issues: 1) If the target distribution is manifold, due to the unmatch between the dimensions of the latent target distribution and the data distribution, flow-based models might perform badly. 2) Discrete data might make flow-based models collapse into a degenerate mixture of point mas… ▽ More

    Submitted 23 April, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

  27. arXiv:2403.06798  [pdf, other

    eess.IV cs.CV cs.LG

    Dynamic Perturbation-Adaptive Adversarial Training on Medical Image Classification

    Authors: Shuai Li, Xiaoguang Ma, Shancheng Jiang, Lu Meng

    Abstract: Remarkable successes were made in Medical Image Classification (MIC) recently, mainly due to wide applications of convolutional neural networks (CNNs). However, adversarial examples (AEs) exhibited imperceptible similarity with raw data, raising serious concerns on network robustness. Although adversarial training (AT), in responding to malevolent AEs, was recognized as an effective approach to im… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

    Comments: 9 pages, 4 figures, 2 tables

  28. arXiv:2403.03739  [pdf, other

    cs.LG cs.AI

    A&B BNN: Add&Bit-Operation-Only Hardware-Friendly Binary Neural Network

    Authors: Ruichen Ma, Guanchao Qiao, Yian Liu, Liwei Meng, Ning Ning, Yang Liu, Shaogang Hu

    Abstract: Binary neural networks utilize 1-bit quantized weights and activations to reduce both the model's storage demands and computational burden. However, advanced binary architectures still incorporate millions of inefficient and nonhardware-friendly full-precision multiplication operations. A&B BNN is proposed to directly remove part of the multiplication operations in a traditional BNN and replace th… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

    Comments: CVPR 2024 Accepted

  29. arXiv:2402.07595  [pdf, other

    eess.IV cs.LG

    Comparative Analysis of ImageNet Pre-Trained Deep Learning Models and DINOv2 in Medical Imaging Classification

    Authors: Yuning Huang, Jingchen Zou, Lanxi Meng, Xin Yue, Qing Zhao, Jianqiang Li, Changwei Song, Gabriel Jimenez, Shaowu Li, Guanghui Fu

    Abstract: Medical image analysis frequently encounters data scarcity challenges. Transfer learning has been effective in addressing this issue while conserving computational resources. The recent advent of foundational models like the DINOv2, which uses the vision transformer architecture, has opened new opportunities in the field and gathered significant interest. However, DINOv2's performance on clinical… ▽ More

    Submitted 13 February, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

  30. arXiv:2402.00534  [pdf, other

    cs.CV cs.LG

    A Manifold Representation of the Key in Vision Transformers

    Authors: Li Meng, Morten Goodwin, Anis Yazidi, Paal Engelstad

    Abstract: Vision Transformers implement multi-head self-attention via stacking multiple attention blocks. The query, key, and value are often intertwined and generated within those blocks via a single, shared linear transformation. This paper explores the concept of disentangling the key from the query and value, and adopting a manifold representation for the key. Our experiments reveal that decoupling and… ▽ More

    Submitted 7 June, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

  31. arXiv:2402.00455  [pdf, ps, other

    cs.IT eess.SP

    Tighter Lower Bounds on Aperiodic Ambiguity Function and Their Asymptotic Achievability

    Authors: Lingsheng Meng, Yong Liang Guan, Yao Ge, Zilong Liu, Pingzhi Fan

    Abstract: This paper presents tighter lower bounds on the maximum aperiodic ambiguity function (AF) magnitude of unimodular sequences under certain delay-Doppler low ambiguity zones (LAZ). These bounds are derived by exploiting the upper and lower bounds on the Frobenius norm of the weighted auto- and cross-AF matrices, with the introduction of two weight vectors associated with the delay and Doppler shifts… ▽ More

    Submitted 18 July, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

    Comments: 25 pages, 2 figure

  32. Smart Fitting Room: A One-stop Framework for Matching-aware Virtual Try-on

    Authors: Mingzhe Yu, Yunshan Ma, Lei Wu, Kai Cheng, Xue Li, Lei Meng, Tat-Seng Chua

    Abstract: The development of virtual try-on has revolutionized online shopping by allowing customers to visualize themselves in various fashion items, thus extending the in-store try-on experience to the cyber space. Although virtual try-on has attracted considerable research initiatives, existing systems only focus on the quality of image generation, overlooking whether the fashion item is a good match to… ▽ More

    Submitted 20 April, 2024; v1 submitted 30 January, 2024; originally announced January 2024.

  33. arXiv:2401.14664  [pdf, other

    cs.SD cs.CL eess.AS

    UNIT-DSR: Dysarthric Speech Reconstruction System Using Speech Unit Normalization

    Authors: Yuejiao Wang, Xixin Wu, Disong Wang, Lingwei Meng, Helen Meng

    Abstract: Dysarthric speech reconstruction (DSR) systems aim to automatically convert dysarthric speech into normal-sounding speech. The technology eases communication with speakers affected by the neuromotor disorder and enhances their social inclusion. NED-based (Neural Encoder-Decoder) systems have significantly improved the intelligibility of the reconstructed speech as compared with GAN-based (Generati… ▽ More

    Submitted 26 January, 2024; originally announced January 2024.

    Comments: Accepted to ICASSP 2024

  34. arXiv:2401.09819  [pdf, other

    cs.RO cs.AI cs.LG

    PPNet: A Two-Stage Neural Network for End-to-end Path Planning

    Authors: Qinglong Meng, Chongkun Xia, Xueqian Wang, Songping Mai, Bin Liang

    Abstract: The classical path planners, such as sampling-based path planners, can provide probabilistic completeness guarantees in the sense that the probability that the planner fails to return a solution if one exists, decays to zero as the number of samples approaches infinity. However, finding a near-optimal feasible solution in a given period is challenging in many applications such as the autonomous ve… ▽ More

    Submitted 23 April, 2024; v1 submitted 18 January, 2024; originally announced January 2024.

  35. arXiv:2401.08433  [pdf, other

    cs.RO

    Autonomous Multiple-Trolley Collection System with Nonholonomic Robots: Design, Control, and Implementation

    Authors: Peijia Xie, Bingyi Xia, Anjun Hu, Ziqi Zhao, Lingxiao Meng, Zhirui Sun, Xuheng Gao, Jiankun Wang, Max Q. -H. Meng

    Abstract: The intricate and multi-stage task in dynamic public spaces like luggage trolley collection in airports presents both a promising opportunity and an ongoing challenge for automated service robots. Previous research has primarily focused on handling a single trolley or individual functional components, creating a gap in providing cost-effective and efficient solutions for practical scenarios. In th… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

  36. arXiv:2401.07382  [pdf, other

    cs.CL cs.AI

    Beyond Sparse Rewards: Enhancing Reinforcement Learning with Language Model Critique in Text Generation

    Authors: Meng Cao, Lei Shu, Lei Yu, Yun Zhu, Nevan Wichers, Yinxiao Liu, Lei Meng

    Abstract: Reinforcement learning (RL) can align language models with non-differentiable reward signals, such as human preferences. However, a major challenge arises from the sparsity of these reward signals - typically, there is only a single reward for an entire output. This sparsity of rewards can lead to inefficient and unstable learning. To address this challenge, our paper introduces an novel framework… ▽ More

    Submitted 19 February, 2024; v1 submitted 14 January, 2024; originally announced January 2024.

  37. arXiv:2401.04152  [pdf, other

    cs.SD cs.AI cs.CL eess.AS

    Cross-Speaker Encoding Network for Multi-Talker Speech Recognition

    Authors: Jiawen Kang, Lingwei Meng, Mingyu Cui, Haohan Guo, Xixin Wu, Xunying Liu, Helen Meng

    Abstract: End-to-end multi-talker speech recognition has garnered great interest as an effective approach to directly transcribe overlapped speech from multiple speakers. Current methods typically adopt either 1) single-input multiple-output (SIMO) models with a branched encoder, or 2) single-input single-output (SISO) models based on attention-based encoder-decoder architecture with serialized output train… ▽ More

    Submitted 22 July, 2024; v1 submitted 8 January, 2024; originally announced January 2024.

    Comments: Accepted by ICASSP2024

  38. arXiv:2401.02913  [pdf, other

    cs.IR

    Plug-in Diffusion Model for Sequential Recommendation

    Authors: Haokai Ma, Ruobing Xie, Lei Meng, Xin Chen, Xu Zhang, Leyu Lin, Zhanhui Kang

    Abstract: Pioneering efforts have verified the effectiveness of the diffusion models in exploring the informative uncertainty for recommendation. Considering the difference between recommendation and image synthesis tasks, existing methods have undertaken tailored refinements to the diffusion and reverse process. However, these approaches typically use the highest-score item in corpus for user interest pred… ▽ More

    Submitted 5 January, 2024; originally announced January 2024.

    Comments: Accepted by AAAI 2024

  39. arXiv:2312.06902  [pdf, other

    cs.LG cs.DC

    Perseus: Removing Energy Bloat from Large Model Training

    Authors: Jae-Won Chung, Yile Gu, Insu Jang, Luoxi Meng, Nikhil Bansal, Mosharaf Chowdhury

    Abstract: Training large AI models on numerous GPUs consumes a massive amount of energy. We observe that not all energy consumed during training directly contributes to end-to-end training throughput, and a significant portion can be removed without slowing down training, which we call energy bloat. In this work, we identify two independent sources of energy bloat in large model training, intrinsic and ex… ▽ More

    Submitted 11 December, 2023; originally announced December 2023.

    Comments: Open-source at https://ml.energy/zeus/perseus/

  40. arXiv:2312.02719  [pdf, other

    cs.CV

    A Conditional Denoising Diffusion Probabilistic Model for Point Cloud Upsampling

    Authors: Wentao Qu, Yuantian Shao, Lingwu Meng, Xiaoshui Huang, Liang Xiao

    Abstract: Point cloud upsampling (PCU) enriches the representation of raw point clouds, significantly improving the performance in downstream tasks such as classification and reconstruction. Most of the existing point cloud upsampling methods focus on sparse point cloud feature extraction and upsampling module design. In a different way, we dive deeper into directly modelling the gradient of data distributi… ▽ More

    Submitted 3 December, 2023; originally announced December 2023.

  41. arXiv:2311.14671  [pdf, other

    cs.CV

    SEGIC: Unleashing the Emergent Correspondence for In-Context Segmentation

    Authors: Lingchen Meng, Shiyi Lan, Hengduo Li, Jose M. Alvarez, Zuxuan Wu, Yu-Gang Jiang

    Abstract: In-context segmentation aims at segmenting novel images using a few labeled example images, termed as "in-context examples", exploring content similarities between examples and the target. The resulting models can be generalized seamlessly to novel segmentation tasks, significantly reducing the labeling and training costs compared with conventional pipelines. However, in-context segmentation is mo… ▽ More

    Submitted 22 July, 2024; v1 submitted 24 November, 2023; originally announced November 2023.

    Comments: ECCV-24 camera-ready

  42. arXiv:2311.13947  [pdf

    cs.DB

    High-Ratio Compression for Machine-Generated Data

    Authors: Jiujing Zhang, Zhitao Shen, Shiyu Yang, Lingkai Meng, Chuan Xiao, Wei Jia, Yue Li, Qinhui Sun, Wenjie Zhang, Xuemin Lin

    Abstract: Machine-generated data is rapidly growing and poses challenges for data-intensive systems, especially as the growth of data outpaces the growth of storage space. To cope with the storage issue, compression plays a critical role in storage engines, particularly for data-intensive applications, where high compression ratios and efficient random access are essential. However, existing compression tec… ▽ More

    Submitted 23 November, 2023; originally announced November 2023.

  43. arXiv:2311.09204  [pdf, other

    cs.CL cs.AI

    Fusion-Eval: Integrating Assistant Evaluators with LLMs

    Authors: Lei Shu, Nevan Wichers, Liangchen Luo, Yun Zhu, Yinxiao Liu, Jindong Chen, Lei Meng

    Abstract: Evaluating natural language systems poses significant challenges, particularly in the realms of natural language understanding and high-level reasoning. In this paper, we introduce 'Fusion-Eval', an innovative approach that leverages Large Language Models (LLMs) to integrate insights from various assistant evaluators. The LLM is given the example to evaluate along with scores from the assistant ev… ▽ More

    Submitted 6 June, 2024; v1 submitted 15 November, 2023; originally announced November 2023.

  44. arXiv:2311.09179  [pdf, other

    cs.CL

    SiRA: Sparse Mixture of Low Rank Adaptation

    Authors: Yun Zhu, Nevan Wichers, Chu-Cheng Lin, Xinyi Wang, Tianlong Chen, Lei Shu, Han Lu, Canoee Liu, Liangchen Luo, Jindong Chen, Lei Meng

    Abstract: Parameter Efficient Tuning has been an prominent approach to adapt the Large Language Model to downstream tasks. Most previous works considers adding the dense trainable parameters, where all parameters are used to adapt certain task. We found this less effective empirically using the example of LoRA that introducing more trainable parameters does not help. Motivated by this we investigate the imp… ▽ More

    Submitted 15 November, 2023; originally announced November 2023.

  45. arXiv:2311.07574  [pdf, other

    cs.CV

    To See is to Believe: Prompting GPT-4V for Better Visual Instruction Tuning

    Authors: Junke Wang, Lingchen Meng, Zejia Weng, Bo He, Zuxuan Wu, Yu-Gang Jiang

    Abstract: Existing visual instruction tuning methods typically prompt large language models with textual descriptions to generate instruction-following data. Despite the promising performance achieved, these descriptions are derived from image annotations, which are oftentimes coarse-grained. Furthermore, the instructions might even contradict the visual content without observing the entire visual context.… ▽ More

    Submitted 29 November, 2023; v1 submitted 13 November, 2023; originally announced November 2023.

    Comments: techical report; work in progress

  46. arXiv:2311.07395  [pdf

    cs.RO cs.AI

    Predicting Continuous Locomotion Modes via Multidimensional Feature Learning from sEMG

    Authors: Peiwen Fu, Wenjuan Zhong, Yuyang Zhang, Wenxuan Xiong, Yuzhou Lin, Yanlong Tai, Lin Meng, Mingming Zhang

    Abstract: Walking-assistive devices require adaptive control methods to ensure smooth transitions between various modes of locomotion. For this purpose, detecting human locomotion modes (e.g., level walking or stair ascent) in advance is crucial for improving the intelligence and transparency of such robotic systems. This study proposes Deep-STF, a unified end-to-end deep learning model designed for integra… ▽ More

    Submitted 13 November, 2023; originally announced November 2023.

    Comments: 10 pages,7 figures

  47. arXiv:2311.04499  [pdf, other

    cs.DC

    Near-Linear Scaling Data Parallel Training with Overlapping-Aware Gradient Compression

    Authors: Lin Meng, Yuzhong Sun, Weimin Li

    Abstract: Existing Data Parallel (DP) trainings for deep neural networks (DNNs) often experience limited scalability in speedup due to substantial communication overheads. While Overlapping technique can mitigate such problem by paralleling communication and computation in DP, its effectiveness is constrained by the high communication-to-computation ratios (CCR) of DP training tasks. Gradient compression (G… ▽ More

    Submitted 8 November, 2023; originally announced November 2023.

    Comments: 10 pages, 11 figures

  48. arXiv:2310.12152  [pdf, other

    cs.CV

    Learning from Rich Semantics and Coarse Locations for Long-tailed Object Detection

    Authors: Lingchen Meng, Xiyang Dai, Jianwei Yang, Dongdong Chen, Yinpeng Chen, Mengchen Liu, Yi-Ling Chen, Zuxuan Wu, Lu Yuan, Yu-Gang Jiang

    Abstract: Long-tailed object detection (LTOD) aims to handle the extreme data imbalance in real-world datasets, where many tail classes have scarce instances. One popular strategy is to explore extra data with image-level labels, yet it produces limited results due to (1) semantic ambiguity -- an image-level label only captures a salient part of the image, ignoring the remaining rich semantics within the im… ▽ More

    Submitted 18 October, 2023; originally announced October 2023.

    Comments: Accepted by NeurIPS2023

  49. arXiv:2310.11555  [pdf, other

    cs.DB cs.AI

    Integrating 3D City Data through Knowledge Graphs

    Authors: Linfang Ding, Guohui Xiao, Albulen Pano, Mattia Fumagalli, Dongsheng Chen, Yu Feng, Diego Calvanese, Hongchao Fan, Liqiu Meng

    Abstract: CityGML is a widely adopted standard by the Open Geospatial Consortium (OGC) for representing and exchanging 3D city models. The representation of semantic and topological properties in CityGML makes it possible to query such 3D city data to perform analysis in various applications, e.g., security management and emergency response, energy consumption and estimation, and occupancy measurement. Howe… ▽ More

    Submitted 17 October, 2023; originally announced October 2023.

  50. arXiv:2310.10457  [pdf, other

    cs.IT eess.SP

    Flag Sequence Set Design for Low-Complexity Delay-Doppler Estimation

    Authors: Lingsheng Meng, Yong Liang Guan, Yao Ge, Zilong Liu

    Abstract: This paper studies Flag sequences for low-complexity delay-Doppler estimation by exploiting their distinctive peak-curtain ambiguity functions (AFs). Unlike the existing Flag sequence designs that are limited to prime lengths and periodic auto-AFs, we aim to design Flag sequence sets of arbitrary lengths with low (nontrivial) periodic/aperiodic auto- and cross-AFs. Since every Flag sequence consis… ▽ More

    Submitted 2 June, 2024; v1 submitted 16 October, 2023; originally announced October 2023.

    Comments: 14 pages, 7 figures, 1 table