Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 180 results for author: Wang, P

Searching in archive eess. Search in all archives.
.
  1. arXiv:2408.02943  [pdf, other

    eess.SP

    Recent Advances in Data-driven Intelligent Control for Wireless Communication: A Comprehensive Survey

    Authors: Wei Huo, Huiwen Yang, Nachuan Yang, Zhaohua Yang, Jiuzhou Zhang, Fuhai Nan, Xingzhou Chen, Yifan Mao, Suyang Hu, Pengyu Wang, Xuanyu Zheng, Mingming Zhao, Ling Shi

    Abstract: The advent of next-generation wireless communication systems heralds an era characterized by high data rates, low latency, massive connectivity, and superior energy efficiency. These systems necessitate innovative and adaptive strategies for resource allocation and device behavior control in wireless networks. Traditional optimization-based methods have been found inadequate in meeting the complex… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

  2. arXiv:2407.15448  [pdf, other

    eess.SP cs.IT

    Movable Antenna-Enhanced Wireless Communications: General Architectures and Implementation Methods

    Authors: Boyu Ning, Songjie Yang, Yafei Wu, Peilan Wang, Weidong Mei, Chau Yuen, Emil Björnson

    Abstract: Movable antennas (MAs), traditionally explored in antenna design, have recently garnered significant attention in wireless communications due to their ability to dynamically adjust the antenna positions to changes in the propagation environment. However, previous research has primarily focused on characterizing the performance limits of various MA-assisted wireless communication systems, with less… ▽ More

    Submitted 8 August, 2024; v1 submitted 22 July, 2024; originally announced July 2024.

  3. arXiv:2407.12937  [pdf, other

    eess.SP

    Multi-Band Wi-Fi Neural Dynamic Fusion

    Authors: Sorachi Kato, Pu Perry Wang, Toshiaki Koike-Akino, Takuya Fujihashi, Hassan Mansour, Petros Boufounos

    Abstract: Wi-Fi channel measurements across different bands, e.g., sub-7-GHz and 60-GHz bands, are asynchronous due to the uncoordinated nature of distinct standards protocols, e.g., 802.11ac/ax/be and 802.11ad/ay. Multi-band Wi-Fi fusion has been considered before on a frame-to-frame basis for simple classification tasks, which does not require fine-time-scale alignment. In contrast, this paper considers a… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: 13 pages, 13 figures, 4 tables

  4. arXiv:2407.08130  [pdf, other

    cs.MM cs.CV cs.SD eess.AS

    Spiking Tucker Fusion Transformer for Audio-Visual Zero-Shot Learning

    Authors: Wenrui Li, Penghong Wang, Ruiqin Xiong, Xiaopeng Fan

    Abstract: The spiking neural networks (SNNs) that efficiently encode temporal sequences have shown great potential in extracting audio-visual joint feature representations. However, coupling SNNs (binary spike sequences) with transformers (float-point sequences) to jointly explore the temporal-semantic information still facing challenges. In this paper, we introduce a novel Spiking Tucker Fusion Transformer… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: Accepted by TIP

  5. arXiv:2406.19959  [pdf, other

    cs.SD eess.AS

    RealMAN: A Real-Recorded and Annotated Microphone Array Dataset for Dynamic Speech Enhancement and Localization

    Authors: Bing Yang, Changsheng Quan, Yabo Wang, Pengyu Wang, Yujie Yang, Ying Fang, Nian Shao, Hui Bu, Xin Xu, Xiaofei Li

    Abstract: The training of deep learning-based multichannel speech enhancement and source localization systems relies heavily on the simulation of room impulse response and multichannel diffuse noise, due to the lack of large-scale real-recorded datasets. However, the acoustic mismatch between simulated and real-world data could degrade the model performance when applying in real-world scenarios. To bridge t… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  6. arXiv:2406.15885  [pdf, other

    cs.SD cs.AI eess.AS

    The Music Maestro or The Musically Challenged, A Massive Music Evaluation Benchmark for Large Language Models

    Authors: Jiajia Li, Lu Yang, Mingni Tang, Cong Chen, Zuchao Li, Ping Wang, Hai Zhao

    Abstract: Benchmark plays a pivotal role in assessing the advancements of large language models (LLMs). While numerous benchmarks have been proposed to evaluate LLMs' capabilities, there is a notable absence of a dedicated benchmark for assessing their musical abilities. To address this gap, we present ZIQI-Eval, a comprehensive and large-scale music benchmark specifically designed to evaluate the music-rel… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: Accepted to ACL-Findings 2024

  7. arXiv:2406.10708  [pdf, other

    cs.CV cs.DB eess.SP

    MMVR: Millimeter-wave Multi-View Radar Dataset and Benchmark for Indoor Perception

    Authors: M. Mahbubur Rahman, Ryoma Yataka, Sorachi Kato, Pu Perry Wang, Peizhao Li, Adriano Cardace, Petros Boufounos

    Abstract: Compared with an extensive list of automotive radar datasets that support autonomous driving, indoor radar datasets are scarce at a smaller scale in the format of low-resolution radar point clouds and usually under an open-space single-room setting. In this paper, we scale up indoor radar data collection using multi-view high-resolution radar heatmap in a multi-day, multi-room, and multi-subject s… ▽ More

    Submitted 17 July, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

    Comments: 26 pages, 25 figures, 10 tables; See https://doi.org/10.5281/zenodo.12611978 to access the MMVR dataset

  8. arXiv:2406.10276  [pdf, other

    cs.CL cs.SD eess.AS

    Soft Language Identification for Language-Agnostic Many-to-One End-to-End Speech Translation

    Authors: Peidong Wang, Jian Xue, Jinyu Li, Junkun Chen, Aswin Shanmugam Subramanian

    Abstract: Language-agnostic many-to-one end-to-end speech translation models can convert audio signals from different source languages into text in a target language. These models do not need source language identification, which improves user experience. In some cases, the input language can be given or estimated. Our goal is to use this additional language information while preserving the quality of the o… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  9. arXiv:2406.09161  [pdf, other

    cs.SD eess.AS

    Complex Image-Generative Diffusion Transformer for Audio Denoising

    Authors: Junhui Li, Pu Wang, Jialu Li, Youshan Zhang

    Abstract: The audio denoising technique has captured widespread attention in the deep neural network field. Recently, the audio denoising problem has been converted into an image generation task, and deep learning-based approaches have been applied to tackle this problem. However, its performance is still limited, leaving room for further improvement. In order to enhance audio denoising performance, this pa… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: INTERSPEECH 2024

  10. arXiv:2406.09154  [pdf, other

    cs.SD cs.CL eess.AS

    Diffusion Gaussian Mixture Audio Denoise

    Authors: Pu Wang, Junhui Li, Jialu Li, Liangdong Guo, Youshan Zhang

    Abstract: Recent diffusion models have achieved promising performances in audio-denoising tasks. The unique property of the reverse process could recover clean signals. However, the distribution of real-world noises does not comply with a single Gaussian distribution and is even unknown. The sampling of Gaussian noise conditions limits its application scenarios. To overcome these challenges, we propose a Di… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: INTERSPEECH 2024

  11. arXiv:2406.04112  [pdf, other

    cs.LG cs.AI eess.SP stat.ML

    Compressible Dynamics in Deep Overparameterized Low-Rank Learning & Adaptation

    Authors: Can Yaras, Peng Wang, Laura Balzano, Qing Qu

    Abstract: While overparameterization in machine learning models offers great benefits in terms of optimization and generalization, it also leads to increased computational requirements as model sizes grow. In this work, we show that by leveraging the inherent low-dimensional structures of data and compressible dynamics within the model parameters, we can reap the benefits of overparameterization without the… ▽ More

    Submitted 9 June, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

    Comments: Accepted at ICML'24 (Oral)

  12. arXiv:2405.17250  [pdf, ps, other

    cs.RO eess.SY

    "Pass the butter": A study on desktop-classic multitasking robotic arm based on advanced YOLOv7 and BERT

    Authors: Haohua Que, Wenbin Pan, Jie Xu, Hao Luo, Pei Wang, Li Zhang

    Abstract: In recent years, various intelligent autonomous robots have begun to appear in daily life and production. Desktop-level robots are characterized by their flexible deployment, rapid response, and suitability for light workload environments. In order to meet the current societal demand for service robot technology, this study proposes using a miniaturized desktop-level robot (by ROS) as a carrier, l… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  13. Multi-Objective Optimization-Based Waveform Design for Multi-User and Multi-Target MIMO-ISAC Systems

    Authors: Peng Wang, Dongsheng Han, Yashuai Cao, Wanli Ni, Dusit Niyato

    Abstract: Integrated sensing and communication (ISAC) opens up new service possibilities for sixth-generation (6G) systems, where both communication and sensing (C&S) functionalities co-exist by sharing the same hardware platform and radio resource. In this paper, we investigate the waveform design problem in a downlink multi-user and multi-target ISAC system under different C&S performance preferences. The… ▽ More

    Submitted 13 July, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

    Comments: This paper has been accepted for publication in IEEE Transactions on Wireless Communications

  14. arXiv:2405.09163  [pdf, other

    eess.SY

    DVS-RG: Differential Variable Speed Limits Control using Deep Reinforcement Learning with Graph State Representation

    Authors: Jingwen Yang, Ping Wang, Fatemeh Golpayegani, Shen Wang

    Abstract: Variable speed limit (VSL) control is an established yet challenging problem to improve freeway traffic mobility and alleviate bottlenecks by customizing speed limits at proper locations based on traffic conditions. Recent advances in deep reinforcement learning (DRL) have shown promising results in solving VSL control problems by interacting with sophisticated environments. However, the modeling… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

  15. arXiv:2405.01644  [pdf

    eess.IV cs.CV physics.med-ph

    A Classification-Based Adaptive Segmentation Pipeline: Feasibility Study Using Polycystic Liver Disease and Metastases from Colorectal Cancer CT Images

    Authors: Peilong Wang, Timothy L. Kline, Andy D. Missert, Cole J. Cook, Matthew R. Callstrom, Alex Chan, Robert P. Hartman, Zachary S. Kelm, Panagiotis Korfiatis

    Abstract: Automated segmentation tools often encounter accuracy and adaptability issues when applied to images of different pathology. The purpose of this study is to explore the feasibility of building a workflow to efficiently route images to specifically trained segmentation models. By implementing a deep learning classifier to automatically classify the images and route them to appropriate segmentation… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: J Digit Imaging. Inform. med. (2024)

  16. arXiv:2404.07444  [pdf, other

    cs.NI eess.SP

    Two-Way Aerial Secure Communications via Distributed Collaborative Beamforming under Eavesdropper Collusion

    Authors: Jiahui Li, Geng Sun, Qingqing Wu, Shuang Liang, Pengfei Wang, Dusit Niyato

    Abstract: Unmanned aerial vehicles (UAVs)-enabled aerial communication provides a flexible, reliable, and cost-effective solution for a range of wireless applications. However, due to the high line-of-sight (LoS) probability, aerial communications between UAVs are vulnerable to eavesdropping attacks, particularly when multiple eavesdroppers collude. In this work, we aim to introduce distributed collaborativ… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

    Comments: This paper has been accepted by IEEE INFOCOM 2024

  17. arXiv:2404.05600  [pdf, other

    cs.CL cs.SD eess.AS

    SpeechAlign: Aligning Speech Generation to Human Preferences

    Authors: Dong Zhang, Zhaowei Li, Shimin Li, Xin Zhang, Pengyu Wang, Yaqian Zhou, Xipeng Qiu

    Abstract: Speech language models have significantly advanced in generating realistic speech, with neural codec language models standing out. However, the integration of human feedback to align speech outputs to human preferences is often neglected. This paper addresses this gap by first analyzing the distribution gap in codec language models, highlighting how it leads to discrepancies between the training a… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

    Comments: Work in progress

  18. arXiv:2403.20018  [pdf, other

    eess.IV cs.CV

    SCINeRF: Neural Radiance Fields from a Snapshot Compressive Image

    Authors: Yunhao Li, Xiaodong Wang, Ping Wang, Xin Yuan, Peidong Liu

    Abstract: In this paper, we explore the potential of Snapshot Compressive Imaging (SCI) technique for recovering the underlying 3D scene representation from a single temporal compressed image. SCI is a cost-effective method that enables the recording of high-dimensional data, such as hyperspectral or temporal information, into a single image using low-cost 2D imaging sensors. To achieve this, a series of sp… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

  19. arXiv:2403.06167  [pdf, other

    eess.SY

    Direct Shooting Method for Numerical Optimal Control: A Modified Transcription Approach

    Authors: Jiawei Tang, Yuxing Zhong, Pengyu Wang, Xingzhou Chen, Shuang Wu, Ling Shi

    Abstract: Direct shooting is an efficient method to solve numerical optimal control. It utilizes the Runge-Kutta scheme to discretize a continuous-time optimal control problem making the problem solvable by nonlinear programming solvers. However, conventional direct shooting raises a contradictory dynamics issue when using an augmented state to handle {high-order} systems. This paper fills the research gap… ▽ More

    Submitted 10 March, 2024; originally announced March 2024.

    Comments: Accepted by ECC24

  20. arXiv:2403.03635  [pdf, other

    cs.IT eess.SP

    Processing Load Allocation of On-Board Multi-User Detection for Payload-Constrained Satellite Networks

    Authors: Sirui Miao, Neng Ye, Peisen Wang, Qiaolin Ouyang

    Abstract: The rapid advance of mega-constellation facilitates the booming of direct-to-satellite massive access, where multi-user detection is critical to alleviate the induced inter-user interference. While centralized implementation of on-board detection induces unaffordable complexity for a single satellite, this paper proposes to allocate the processing load among cooperative satellites for finest explo… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

  21. arXiv:2402.14576  [pdf, other

    cs.NI cs.LG eess.SY

    Edge Caching Based on Deep Reinforcement Learning and Transfer Learning

    Authors: Farnaz Niknia, Ping Wang, Zixu Wang, Aakash Agarwal, Adib S. Rezaei

    Abstract: This paper addresses the escalating challenge of redundant data transmission in networks. The surge in traffic has strained backhaul links and backbone networks, prompting the exploration of caching solutions at the edge router. Existing work primarily relies on Markov Decision Processes (MDP) for caching issues, assuming fixed-time interval decisions; however, real-world scenarios involve random… ▽ More

    Submitted 29 February, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

  22. arXiv:2402.09422  [pdf, other

    eess.SP

    Traffic Flow and Speed Monitoring Based On Optical Fiber Distributed Acoustic Sensor

    Authors: Linlin Wang, Shixin Wang, Peng Wang, Wei Wang, Dezhao Wang, Yongcai Wang, Shanwen Wang

    Abstract: In the realm of intelligent transportation systems, accurate and reliable traffic monitoring is crucial. Traditional devices, such as cameras and lidars, face limitations in adverse weather conditions and complex traffic scenarios, prompting the need for more resilient technologies. This paper presents traffic flow monitoring method using optical fiber-based Distributed Acoustic Sensors (DAS). An… ▽ More

    Submitted 20 January, 2024; originally announced February 2024.

    Comments: 10 pages,23 figures, references added

  23. arXiv:2401.17681  [pdf, ps, other

    cs.IT eess.SP

    Joint Transceiver Optimization for MmWave/THz MU-MIMO ISAC Systems

    Authors: Peilan Wang, Jun Fang, Xianlong Zeng, Zhi Chen, Hongbin Li

    Abstract: In this paper, we consider the problem of joint transceiver design for millimeter wave (mmWave)/Terahertz (THz) multi-user MIMO integrated sensing and communication (ISAC) systems. Such a problem is formulated into a nonconvex optimization problem, with the objective of maximizing a weighted sum of communication users' rates and the passive radar's signal-to-clutter-and-noise-ratio (SCNR). By expl… ▽ More

    Submitted 31 January, 2024; originally announced January 2024.

  24. arXiv:2401.07246  [pdf, ps, other

    eess.SY math.OC

    Delayed finite-dimensional observer-based control of 2D linear parabolic PDEs

    Authors: Pengfei Wang, Emilia Fridman

    Abstract: Recently, a constructive method was suggested for finite-dimensional observer-based control of 1D linear heat equation, which is robust to input/output delays. In this paper, we aim to extend this method to the 2D case with general time-varying input/output delays (known output delay and unknown input delay) or sawtooth delays (that correspond to network-based control). We use the modal decomposit… ▽ More

    Submitted 14 January, 2024; originally announced January 2024.

  25. arXiv:2401.05709  [pdf, other

    cs.NI eess.SP

    Probability-based Distance Estimation Model for 3D DV-Hop Localization in WSNs

    Authors: Penghong Wang, Hao Wang, Wenrui Li, Xiaopeng Fan, Debin Zhao

    Abstract: Localization is one of the pivotal issues in wireless sensor network applications. In 3D localization studies, most algorithms focus on enhancing the location prediction process, lacking theoretical derivation of the detection distance of an anchor node at the varying hops, engenders a localization performance bottleneck. To address this issue, we propose a probability-based average distance estim… ▽ More

    Submitted 11 January, 2024; originally announced January 2024.

  26. arXiv:2401.02046  [pdf, other

    eess.AS cs.SD

    CTC Blank Triggered Dynamic Layer-Skipping for Efficient CTC-based Speech Recognition

    Authors: Junfeng Hou, Peiyao Wang, Jincheng Zhang, Meng Yang, Minwei Feng, Jingcheng Yin

    Abstract: Deploying end-to-end speech recognition models with limited computing resources remains challenging, despite their impressive performance. Given the gradual increase in model size and the wide range of model applications, selectively executing model components for different inputs to improve the inference efficiency is of great interest. In this paper, we propose a dynamic layer-skipping method th… ▽ More

    Submitted 3 January, 2024; originally announced January 2024.

    Comments: accepted by ASRU 2023

  27. arXiv:2312.13752  [pdf

    eess.IV cs.AI cs.CV

    Hunting imaging biomarkers in pulmonary fibrosis: Benchmarks of the AIIB23 challenge

    Authors: Yang Nan, Xiaodan Xing, Shiyi Wang, Zeyu Tang, Federico N Felder, Sheng Zhang, Roberta Eufrasia Ledda, Xiaoliu Ding, Ruiqi Yu, Weiping Liu, Feng Shi, Tianyang Sun, Zehong Cao, Minghui Zhang, Yun Gu, Hanxiao Zhang, Jian Gao, Pingyu Wang, Wen Tang, Pengxin Yu, Han Kang, Junqiang Chen, Xing Lu, Boyu Zhang, Michail Mamalakis , et al. (16 additional authors not shown)

    Abstract: Airway-related quantitative imaging biomarkers are crucial for examination, diagnosis, and prognosis in pulmonary diseases. However, the manual delineation of airway trees remains prohibitively time-consuming. While significant efforts have been made towards enhancing airway modelling, current public-available datasets concentrate on lung diseases with moderate morphological variations. The intric… ▽ More

    Submitted 16 April, 2024; v1 submitted 21 December, 2023; originally announced December 2023.

    Comments: 19 pages

  28. arXiv:2312.08931  [pdf, other

    cs.SD cs.AI cs.MM eess.AS

    N-Gram Unsupervised Compoundation and Feature Injection for Better Symbolic Music Understanding

    Authors: Jinhao Tian, Zuchao Li, Jiajia Li, Ping Wang

    Abstract: The first step to apply deep learning techniques for symbolic music understanding is to transform musical pieces (mainly in MIDI format) into sequences of predefined tokens like note pitch, note velocity, and chords. Subsequently, the sequences are fed into a neural sequence model to accomplish specific tasks. Music sequences exhibit strong correlations between adjacent elements, making them prime… ▽ More

    Submitted 14 December, 2023; v1 submitted 13 December, 2023; originally announced December 2023.

    Comments: 8 pages, 2 figures, aaai2024

    MSC Class: 68T07 ACM Class: I.2.7

  29. arXiv:2312.05256  [pdf, other

    eess.IV cs.AI

    Holistic Evaluation of GPT-4V for Biomedical Imaging

    Authors: Zhengliang Liu, Hanqi Jiang, Tianyang Zhong, Zihao Wu, Chong Ma, Yiwei Li, Xiaowei Yu, Yutong Zhang, Yi Pan, Peng Shu, Yanjun Lyu, Lu Zhang, Junjie Yao, Peixin Dong, Chao Cao, Zhenxiang Xiao, Jiaqi Wang, Huan Zhao, Shaochen Xu, Yaonai Wei, Jingyuan Chen, Haixing Dai, Peilong Wang, Hao He, Zewei Wang , et al. (25 additional authors not shown)

    Abstract: In this paper, we present a large-scale evaluation probing GPT-4V's capabilities and limitations for biomedical image analysis. GPT-4V represents a breakthrough in artificial general intelligence (AGI) for computer vision, with applications in the biomedical domain. We assess GPT-4V's performance across 16 medical imaging categories, including radiology, oncology, ophthalmology, pathology, and mor… ▽ More

    Submitted 10 November, 2023; originally announced December 2023.

  30. arXiv:2312.01292  [pdf, ps, other

    cs.NI eess.SP

    Joint Beam Scheduling and Power Optimization for Beam Hopping LEO Satellite Systems

    Authors: Shuang Zheng, Xing Zhang, Peng Wang, Wenbo Wang

    Abstract: Low earth orbit (LEO) satellite communications can provide ubiquitous and reliable services, making it an essential part of the Internet of Everything network. Beam hopping (BH) is an emerging technology for effectively addressing the issue of low resource utilization caused by the non-uniform spatio-temporal distribution of traffic demands. However, how to allocate multi-dimensional resources in… ▽ More

    Submitted 3 December, 2023; originally announced December 2023.

  31. arXiv:2312.00951  [pdf, other

    cs.RO eess.SY

    AV4EV: Open-Source Modular Autonomous Electric Vehicle Platform for Making Mobility Research Accessible

    Authors: Zhijie Qiao, Mingyan Zhou, Zhijun Zhuang, Tejas Agarwal, Felix Jahncke, Po-Jen Wang, Jason Friedman, Hongyi Lai, Divyanshu Sahu, Tomáš Nagy, Martin Endler, Jason Schlessman, Rahul Mangharam

    Abstract: When academic researchers develop and validate autonomous driving algorithms, there is a challenge in balancing high-performance capabilities with the cost and complexity of the vehicle platform. Much of today's research on autonomous vehicles (AV) is limited to experimentation on expensive commercial vehicles that require large skilled teams to retrofit the vehicles and test them in dedicated fac… ▽ More

    Submitted 12 April, 2024; v1 submitted 1 December, 2023; originally announced December 2023.

    Comments: 6 pages, 5 figures

  32. arXiv:2311.16116  [pdf, ps, other

    cs.NI eess.SY

    Resource Scheduling for UAVs-aided D2D Networks: A Multi-objective Optimization Approach

    Authors: Hongyang Pan, Yanheng Liu, Geng Sun, Pengfei Wang, Chau Yuen

    Abstract: Unmanned aerial vehicles (UAVs)-aided device-todevice (D2D) networks have attracted great interests with the development of 5G/6G communications, while there are several challenges about resource scheduling in UAVs-aided D2D networks. In this work, we formulate a UAVs-aided D2D network resource scheduling optimization problem (NetResSOP) to comprehensively consider the number of deployed UAVs, UAV… ▽ More

    Submitted 30 September, 2023; originally announced November 2023.

  33. arXiv:2311.14812  [pdf, other

    astro-ph.IM astro-ph.CO eess.SP

    Robust Joint Estimation of Galaxy Redshift and Spectral Templates using Online Dictionary Learning

    Authors: Sean Bryan, Ayan Barekzai, Delondrae Carter, Philip Mauskopf, Julian Mena, Danielle Rivera, Abel S. Uriarte, Pao-Yu Wang

    Abstract: We present a novel approach to analyzing astronomical spectral survey data using our non-linear extension of an online dictionary learning algorithm. Current and upcoming surveys such as SPHEREx will use spectral data to build a 3D map of the universe by estimating the redshifts of millions of galaxies. Existing algorithms rely on hand-curated external templates and have limited performance due to… ▽ More

    Submitted 24 November, 2023; originally announced November 2023.

    Comments: 9 pages, 5 figures, Submitted to IEEE Transactions on Pattern Analysis and Machine Intelligence

  34. arXiv:2311.12820  [pdf, other

    cs.CV cs.AI cs.CL eess.IV

    MSG-BART: Multi-granularity Scene Graph-Enhanced Encoder-Decoder Language Model for Video-grounded Dialogue Generation

    Authors: Hongcheng Liu, Zhe Chen, Hui Li, Pingjie Wang, Yanfeng Wang, Yu Wang

    Abstract: Generating dialogue grounded in videos requires a high level of understanding and reasoning about the visual scenes in the videos. However, existing large visual-language models are not effective due to their latent features and decoder-only structure, especially with respect to spatio-temporal relationship reasoning. In this paper, we propose a novel approach named MSG-BART, which enhances the in… ▽ More

    Submitted 26 September, 2023; originally announced November 2023.

    Comments: 5 pages,3 figures

  35. CrackCLF: Automatic Pavement Crack Detection based on Closed-Loop Feedback

    Authors: Chong Li, Zhun Fan, Ying Chen, Huibiao Lin, Laura Moretti, Giuseppe Loprencipe, Weihua Sheng, Kelvin C. P. Wang

    Abstract: Automatic pavement crack detection is an important task to ensure the functional performances of pavements during their service life. Inspired by deep learning (DL), the encoder-decoder framework is a powerful tool for crack detection. However, these models are usually open-loop (OL) systems that tend to treat thin cracks as the background. Meanwhile, these models can not automatically correct err… ▽ More

    Submitted 20 November, 2023; originally announced November 2023.

    Journal ref: IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS,2023

  36. arXiv:2310.19602  [pdf, other

    cs.SD eess.AS

    DCHT: Deep Complex Hybrid Transformer for Speech Enhancement

    Authors: Jialu Li, Junhui Li, Pu Wang, Youshan Zhang

    Abstract: Most of the current deep learning-based approaches for speech enhancement only operate in the spectrogram or waveform domain. Although a cross-domain transformer combining waveform- and spectrogram-domain inputs has been proposed, its performance can be further improved. In this paper, we present a novel deep complex hybrid transformer that integrates both spectrogram and waveform domains approach… ▽ More

    Submitted 30 October, 2023; originally announced October 2023.

    Comments: IEEE DDP conference

  37. arXiv:2310.19588  [pdf, other

    cs.SD cs.CL eess.AS

    DPATD: Dual-Phase Audio Transformer for Denoising

    Authors: Junhui Li, Pu Wang, Jialu Li, Xinzhe Wang, Youshan Zhang

    Abstract: Recent high-performance transformer-based speech enhancement models demonstrate that time domain methods could achieve similar performance as time-frequency domain methods. However, time-domain speech enhancement systems typically receive input audio sequences consisting of a large number of time steps, making it challenging to model extremely long sequences and train models to perform adequately.… ▽ More

    Submitted 30 October, 2023; originally announced October 2023.

    Comments: IEEE DDP

  38. arXiv:2310.15883  [pdf, other

    eess.SY

    Attitude Takeover Control for Noncooperative Space Targets Based on Gaussian Processes with Online Model Learning

    Authors: Yuhan Liu, Pengyu Wang, Chang-Hun Lee, Roland Tóth

    Abstract: One major challenge for autonomous attitude takeover control for on-orbit servicing of spacecraft is that an accurate dynamic motion model of the combined vehicles is highly nonlinear, complex and often costly to identify online, which makes traditional model-based control impractical for this task. To address this issue, a recursive online sparse Gaussian Process (GP)-based learning strategy for… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

    Comments: 17 pages, 14 figures. Submitted to in IEEE Transactions on Aerospace and Electronic Systems

  39. arXiv:2310.14806  [pdf, ps, other

    cs.CL cs.SD eess.AS

    Leveraging Timestamp Information for Serialized Joint Streaming Recognition and Translation

    Authors: Sara Papi, Peidong Wang, Junkun Chen, Jian Xue, Naoyuki Kanda, Jinyu Li, Yashesh Gaur

    Abstract: The growing need for instant spoken language transcription and translation is driven by increased global communication and cross-lingual interactions. This has made offering translations in multiple languages essential for user applications. Traditional approaches to automatic speech recognition (ASR) and speech translation (ST) have often relied on separate systems, leading to inefficiencies in c… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

    Comments: \c{opyright} 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

  40. arXiv:2310.09840  [pdf, ps, other

    eess.SP

    Towards Structural Sparse Precoding: Dynamic Time, Frequency, Space, and Power Multistage Resource Programming

    Authors: Zhongxiang Wei, Ping Wang, Qingjiang Shi, Xu Zhu, Christos Masouros

    Abstract: In last decades, dynamic resource programming in partial resource domains has been extensively investigated for single time slot optimizations. However, with the emerging real-time media applications in fifth-generation communications, their new quality of service requirements are often measured in temporal dimension. This requires multistage optimization for full resource domain dynamic programmi… ▽ More

    Submitted 15 October, 2023; originally announced October 2023.

  41. arXiv:2310.02399  [pdf, other

    cs.NI eess.SP

    Can 5G NR Sidelink communications support wireless augmented reality?

    Authors: Ashutosh Srivastava, Qing Zhao, Yi Lu, Ping Wang, Qi Qu, Zhu Ji, Yee Sin Chan, Shivendra S. Panwar

    Abstract: Smart glasses that support augmented reality (AR) have the potential to become the consumer's primary medium of connecting to the future internet. For the best quality of user experience, AR glasses must have a small form factor and long battery life, while satisfying the data rate and latency requirements of AR applications. To extend the AR glasses' battery life, the computation and processing i… ▽ More

    Submitted 3 October, 2023; originally announced October 2023.

    Comments: 7 pages, 7 figures, accepted for publication in 2023 IEEE Global Communications Conference: Mobile and Wireless Networks (Globecom 2023 MWN), Kuala Lumpur, Malaysia, Dec. 2023

  42. arXiv:2309.11139  [pdf, other

    eess.IV cs.CV

    More complex encoder is not all you need

    Authors: Weibin Yang, Longwei Xu, Pengwei Wang, Dehua Geng, Yusong Li, Mingyuan Xu, Zhiqi Dong

    Abstract: U-Net and its variants have been widely used in medical image segmentation. However, most current U-Net variants confine their improvement strategies to building more complex encoder, while leaving the decoder unchanged or adopting a simple symmetric structure. These approaches overlook the true functionality of the decoder: receiving low-resolution feature maps from the encoder and restoring feat… ▽ More

    Submitted 27 October, 2023; v1 submitted 20 September, 2023; originally announced September 2023.

  43. arXiv:2309.08157  [pdf, other

    eess.AS cs.SD

    RVAE-EM: Generative speech dereverberation based on recurrent variational auto-encoder and convolutive transfer function

    Authors: Pengyu Wang, Xiaofei Li

    Abstract: In indoor scenes, reverberation is a crucial factor in degrading the perceived quality and intelligibility of speech. In this work, we propose a generative dereverberation method. Our approach is based on a probabilistic model utilizing a recurrent variational auto-encoder (RVAE) network and the convolutive transfer function (CTF) approximation. Different from most previous approaches, the output… ▽ More

    Submitted 17 October, 2023; v1 submitted 15 September, 2023; originally announced September 2023.

    Comments: Submitted to ICASSP2024

  44. arXiv:2309.08007  [pdf, ps, other

    eess.AS cs.CL cs.SD

    DiariST: Streaming Speech Translation with Speaker Diarization

    Authors: Mu Yang, Naoyuki Kanda, Xiaofei Wang, Junkun Chen, Peidong Wang, Jian Xue, Jinyu Li, Takuya Yoshioka

    Abstract: End-to-end speech translation (ST) for conversation recordings involves several under-explored challenges such as speaker diarization (SD) without accurate word time stamps and handling of overlapping speech in a streaming fashion. In this work, we propose DiariST, the first streaming ST and SD solution. It is built upon a neural transducer-based streaming ST system and integrates token-level seri… ▽ More

    Submitted 22 January, 2024; v1 submitted 14 September, 2023; originally announced September 2023.

    Comments: Accepted to ICASSP 2024

  45. arXiv:2309.07648  [pdf, other

    eess.AS cs.CL cs.SD

    Incorporating Class-based Language Model for Named Entity Recognition in Factorized Neural Transducer

    Authors: Peng Wang, Yifan Yang, Zheng Liang, Tian Tan, Shiliang Zhang, Xie Chen

    Abstract: Despite advancements of end-to-end (E2E) models in speech recognition, named entity recognition (NER) is still challenging but critical for semantic understanding. Previous studies mainly focus on various rule-based or attention-based contextual biasing algorithms. However, their performance might be sensitive to the biasing weight or degraded by excessive attention to the named entity list, along… ▽ More

    Submitted 8 June, 2024; v1 submitted 14 September, 2023; originally announced September 2023.

    Comments: Accepted in INTERSPEECH 2024

  46. arXiv:2309.02106  [pdf, other

    cs.CL cs.AI cs.LG eess.AS

    Leveraging Label Information for Multimodal Emotion Recognition

    Authors: Peiying Wang, Sunlu Zeng, Junqing Chen, Lu Fan, Meng Chen, Youzheng Wu, Xiaodong He

    Abstract: Multimodal emotion recognition (MER) aims to detect the emotional status of a given expression by combining the speech and text information. Intuitively, label information should be capable of helping the model locate the salient tokens/frames relevant to the specific emotion, which finally facilitates the MER task. Inspired by this, we propose a novel approach for MER by leveraging label informat… ▽ More

    Submitted 5 September, 2023; originally announced September 2023.

    Comments: Accepted by Interspeech 2023

  47. arXiv:2308.10157  [pdf, ps, other

    eess.IV cs.CV

    Contrastive Diffusion Model with Auxiliary Guidance for Coarse-to-Fine PET Reconstruction

    Authors: Zeyu Han, Yuhan Wang, Luping Zhou, Peng Wang, Binyu Yan, Jiliu Zhou, Yan Wang, Dinggang Shen

    Abstract: To obtain high-quality positron emission tomography (PET) scans while reducing radiation exposure to the human body, various approaches have been proposed to reconstruct standard-dose PET (SPET) images from low-dose PET (LPET) images. One widely adopted technique is the generative adversarial networks (GANs), yet recently, diffusion probabilistic models (DPMs) have emerged as a compelling alternat… ▽ More

    Submitted 20 August, 2023; originally announced August 2023.

    Comments: Accepted and presented in MICCAI 2023. To be published in Proceedings

  48. arXiv:2308.10142  [pdf, ps, other

    eess.IV cs.CV

    Polymerized Feature-based Domain Adaptation for Cervical Cancer Dose Map Prediction

    Authors: Jie Zeng, Zeyu Han, Xingchen Peng, Jianghong Xiao, Peng Wang, Yan Wang

    Abstract: Recently, deep learning (DL) has automated and accelerated the clinical radiation therapy (RT) planning significantly by predicting accurate dose maps. However, most DL-based dose map prediction methods are data-driven and not applicable for cervical cancer where only a small amount of data is available. To address this problem, this paper proposes to transfer the rich knowledge learned from anoth… ▽ More

    Submitted 19 August, 2023; originally announced August 2023.

    Comments: Accepted and presented in ISBI 2023. To be published in Proceedings

  49. arXiv:2308.05365  [pdf

    eess.IV cs.CV

    TriDo-Former: A Triple-Domain Transformer for Direct PET Reconstruction from Low-Dose Sinograms

    Authors: Jiaqi Cui, Pinxian Zeng, Xinyi Zeng, Peng Wang, Xi Wu, Jiliu Zhou, Yan Wang, Dinggang Shen

    Abstract: To obtain high-quality positron emission tomography (PET) images while minimizing radiation exposure, various methods have been proposed for reconstructing standard-dose PET (SPET) images from low-dose PET (LPET) sinograms directly. However, current methods often neglect boundaries during sinogram-to-image reconstruction, resulting in high-frequency distortion in the frequency domain and diminishe… ▽ More

    Submitted 10 August, 2023; originally announced August 2023.

  50. arXiv:2308.03806  [pdf, other

    cs.CR cs.SD eess.AS

    SoK: Acoustic Side Channels

    Authors: Ping Wang, Shishir Nagaraja, Aurélien Bourquard, Haichang Gao, Jeff Yan

    Abstract: We provide a state-of-the-art analysis of acoustic side channels, cover all the significant academic research in the area, discuss their security implications and countermeasures, and identify areas for future research. We also make an attempt to bridge side channels and inverse problems, two fields that appear to be completely isolated from each other but have deep connections.

    Submitted 6 August, 2023; originally announced August 2023.

    Comments: 16 pages