Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 61 results for author: Bai, J

Searching in archive eess. Search in all archives.
.
  1. arXiv:2409.16637  [pdf, ps, other

    eess.IV cs.CV

    Deep-Learning Recognition of Scanning Transmission Electron Microscopy: Quantifying and Mitigating the Influence of Gaussian Noises

    Authors: Hanlei Zhang, Jincheng Bai, Xiabo Chen, Can Li, Chuanjian Zhong, Jiye Fang, Guangwen Zhou

    Abstract: Scanning transmission electron microscopy (STEM) is a powerful tool to reveal the morphologies and structures of materials, thereby attracting intensive interests from the scientific and industrial communities. The outstanding spatial (atomic level) and temporal (ms level) resolutions of the STEM techniques generate fruitful amounts of high-definition data, thereby enabling the high-volume and hig… ▽ More

    Submitted 25 September, 2024; originally announced September 2024.

  2. arXiv:2409.13292  [pdf, other

    eess.AS cs.SD

    Exploring Text-Queried Sound Event Detection with Audio Source Separation

    Authors: Han Yin, Jisheng Bai, Yang Xiao, Hui Wang, Siqi Zheng, Yafeng Chen, Rohan Kumar Das, Chong Deng, Jianfeng Chen

    Abstract: In sound event detection (SED), overlapping sound events pose a significant challenge, as certain events can be easily masked by background noise or other events, resulting in poor detection performance. To address this issue, we propose the text-queried SED (TQ-SED) framework. Specifically, we first pre-train a language-queried audio source separation (LASS) model to separate the audio tracks cor… ▽ More

    Submitted 20 September, 2024; originally announced September 2024.

    Comments: Submitted to ICASSP2025

  3. arXiv:2409.11964  [pdf, other

    cs.SD cs.LG eess.AS

    Data Efficient Acoustic Scene Classification using Teacher-Informed Confusing Class Instruction

    Authors: Jin Jie Sean Yeo, Ee-Leng Tan, Jisheng Bai, Santi Peksi, Woon-Seng Gan

    Abstract: In this technical report, we describe the SNTL-NTU team's submission for Task 1 Data-Efficient Low-Complexity Acoustic Scene Classification of the detection and classification of acoustic scenes and events (DCASE) 2024 challenge. Three systems are introduced to tackle training splits of different sizes. For small training splits, we explored reducing the complexity of the provided baseline model b… ▽ More

    Submitted 18 September, 2024; originally announced September 2024.

    Comments: 5 pages, 3 figures

  4. arXiv:2409.11700  [pdf, other

    eess.SP

    Real-Time Sound Event Localization and Detection: Deployment Challenges on Edge Devices

    Authors: Jun Wei Yeow, Ee-Leng Tan, Jisheng Bai, Santi Peksi, Woon-Seng Gan

    Abstract: Sound event localization and detection (SELD) is critical for various real-world applications, including smart monitoring and Internet of Things (IoT) systems. Although deep neural networks (DNNs) represent the state-of-the-art approach for SELD, their significant computational complexity and model sizes present challenges for deployment on resource-constrained edge devices, especially under real-… ▽ More

    Submitted 18 September, 2024; originally announced September 2024.

    Comments: Submitted to ICASSP'25. Code is available at this link : https://github.com/itsjunwei/Realtime-SELD-Edge

  5. arXiv:2409.10980  [pdf

    eess.IV cs.CV

    PSFHS Challenge Report: Pubic Symphysis and Fetal Head Segmentation from Intrapartum Ultrasound Images

    Authors: Jieyun Bai, Zihao Zhou, Zhanhong Ou, Gregor Koehler, Raphael Stock, Klaus Maier-Hein, Marawan Elbatel, Robert Martí, Xiaomeng Li, Yaoyang Qiu, Panjie Gou, Gongping Chen, Lei Zhao, Jianxun Zhang, Yu Dai, Fangyijie Wang, Guénolé Silvestre, Kathleen Curran, Hongkun Sun, Jing Xu, Pengzhou Cai, Lu Jiang, Libin Lan, Dong Ni, Mei Zhong , et al. (4 additional authors not shown)

    Abstract: Segmentation of the fetal and maternal structures, particularly intrapartum ultrasound imaging as advocated by the International Society of Ultrasound in Obstetrics and Gynecology (ISUOG) for monitoring labor progression, is a crucial first step for quantitative diagnosis and clinical decision-making. This requires specialized analysis by obstetrics professionals, in a task that i) is highly time-… ▽ More

    Submitted 17 September, 2024; originally announced September 2024.

  6. arXiv:2409.09754  [pdf, other

    cs.CV cs.RO eess.IV physics.optics

    Towards Single-Lens Controllable Depth-of-Field Imaging via All-in-Focus Aberration Correction and Monocular Depth Estimation

    Authors: Xiaolong Qian, Qi Jiang, Yao Gao, Shaohua Gao, Zhonghua Yi, Lei Sun, Kai Wei, Haifeng Li, Kailun Yang, Kaiwei Wang, Jian Bai

    Abstract: Controllable Depth-of-Field (DoF) imaging commonly produces amazing visual effects based on heavy and expensive high-end lenses. However, confronted with the increasing demand for mobile scenarios, it is desirable to achieve a lightweight solution with Minimalist Optical Systems (MOS). This work centers around two major limitations of MOS, i.e., the severe optical aberrations and uncontrollable Do… ▽ More

    Submitted 15 September, 2024; originally announced September 2024.

    Comments: The source code and the established dataset will be publicly available at https://github.com/XiaolongQian/DCDI

  7. arXiv:2409.06456  [pdf, other

    cs.SD eess.AS

    Attention-Based Beamformer For Multi-Channel Speech Enhancement

    Authors: Jinglin Bai, Hao Li, Xueliang Zhang, Fei Chen

    Abstract: Minimum Variance Distortionless Response (MVDR) is a classical adaptive beamformer that theoretically ensures the distortionless transmission of signals in the target direction, which makes it popular in real applications. Its noise reduction performance actually depends on the accuracy of the noise and speech spatial covariance matrices (SCMs) estimation. Time-frequency masks are often used to co… ▽ More

    Submitted 13 September, 2024; v1 submitted 10 September, 2024; originally announced September 2024.

  8. arXiv:2409.06245  [pdf, other

    cs.SD eess.AS

    A Two-Stage Band-Split Mamba-2 Network For Music Separation

    Authors: Jinglin Bai, Yuan Fang, Jiajie Wang, Xueliang Zhang

    Abstract: Music source separation (MSS) aims to separate mixed music into its distinct tracks, such as vocals, bass, drums, and more. MSS is considered to be a challenging audio separation task due to the complexity of music signals. Although the RNN and Transformer architecture are not perfect, they are commonly used to model the music sequence for MSS. Recently, Mamba-2 has already demonstrated high effic… ▽ More

    Submitted 13 September, 2024; v1 submitted 10 September, 2024; originally announced September 2024.

  9. arXiv:2409.05809  [pdf, other

    physics.optics cs.CV eess.IV

    A Flexible Framework for Universal Computational Aberration Correction via Automatic Lens Library Generation and Domain Adaptation

    Authors: Qi Jiang, Yao Gao, Shaohua Gao, Zhonghua Yi, Lei Sun, Hao Shi, Kailun Yang, Kaiwei Wang, Jian Bai

    Abstract: Emerging universal Computational Aberration Correction (CAC) paradigms provide an inspiring solution to light-weight and high-quality imaging without repeated data preparation and model training to accommodate new lens designs. However, the training databases in these approaches, i.e., the lens libraries (LensLibs), suffer from their limited coverage of real-world aberration behaviors. In this wor… ▽ More

    Submitted 9 September, 2024; originally announced September 2024.

  10. arXiv:2409.05784  [pdf, other

    cs.SD eess.AS

    Vector Quantized Diffusion Model Based Speech Bandwidth Extension

    Authors: Yuan Fang, Jinglin Bai, Jiajie Wang, Xueliang Zhang

    Abstract: Recent advancements in neural audio codec (NAC) unlock new potential in audio signal processing. Studies have increasingly explored leveraging the latent features of NAC for various speech signal processing tasks. This paper introduces the first approach to speech bandwidth extension (BWE) that utilizes the discrete features obtained from NAC. By restoring high-frequency details within highly comp… ▽ More

    Submitted 14 September, 2024; v1 submitted 9 September, 2024; originally announced September 2024.

    Comments: 4pages

  11. arXiv:2407.09021  [pdf, other

    eess.AS

    Squeeze-and-Excite ResNet-Conformers for Sound Event Localization, Detection, and Distance Estimation for DCASE 2024 Challenge

    Authors: Jun Wei Yeow, Ee-Leng Tan, Jisheng Bai, Santi Peksi, Woon-Seng Gan

    Abstract: This technical report details our systems submitted for Task 3 of the DCASE 2024 Challenge: Audio and Audiovisual Sound Event Localization and Detection (SELD) with Source Distance Estimation (SDE). We address only the audio-only SELD with SDE (SELDDE) task in this report. We propose to improve the existing ResNet-Conformer architectures with Squeeze-and-Excitation blocks in order to introduce add… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: Technical report for DCASE 2024 Challenge Task 3

  12. arXiv:2407.03654  [pdf, other

    eess.AS

    Mixstyle based Domain Generalization for Sound Event Detection with Heterogeneous Training Data

    Authors: Yang Xiao, Han Yin, Jisheng Bai, Rohan Kumar Das

    Abstract: This work explores domain generalization (DG) for sound event detection (SED), advancing adaptability towards real-world scenarios. Our approach employs a mean-teacher framework with domain generalization to integrate heterogeneous training data, while preserving the SED model performance across the datasets. Specifically, we first apply mixstyle to the frequency dimension to adapt the mel-spectro… ▽ More

    Submitted 29 August, 2024; v1 submitted 4 July, 2024; originally announced July 2024.

    Comments: Submitted to ICASSP 2025

  13. arXiv:2407.00291  [pdf, other

    eess.AS cs.SD

    FMSG-JLESS Submission for DCASE 2024 Task4 on Sound Event Detection with Heterogeneous Training Dataset and Potentially Missing Labels

    Authors: Yang Xiao, Han Yin, Jisheng Bai, Rohan Kumar Das

    Abstract: This report presents the systems developed and submitted by Fortemedia Singapore (FMSG) and Joint Laboratory of Environmental Sound Sensing (JLESS) for DCASE 2024 Task 4. The task focuses on recognizing event classes and their time boundaries, given that multiple events can be present and may overlap in an audio recording. The novelty this year is a dataset with two sources, making it challenging… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

    Comments: Technical report for DCASE 2024 Challenge Task 4

  14. arXiv:2406.09695  [pdf, other

    eess.SP

    Machine Learning-based Near-field Emitter Location Sensing via Grouped Hybrid Analog and Digital XL-MIMO Receive Array

    Authors: Yifan Li, Feng Shu, Kang Wei, Jiatong Bai, Cunhua Pan, Yongpeng Wu, Yaoliang Song, Jiangzhou Wang

    Abstract: As a green MIMO structure, the partially-connected hybrid analog and digital (PC-HAD) structure has been widely used in the far-field (FF) scenario for it can significantly reduce the hardware cost and complexity of large-scale or extremely large-scale MIMO (XL-MIMO) array. Recently, near-field (NF) emitter localization including direction-of-arrival (DOA) and range estimations has drawn a lot of… ▽ More

    Submitted 3 October, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

  15. arXiv:2406.07880  [pdf, other

    cs.CV eess.IV

    A Comprehensive Survey on Machine Learning Driven Material Defect Detection: Challenges, Solutions, and Future Prospects

    Authors: Jun Bai, Di Wu, Tristan Shelley, Peter Schubel, David Twine, John Russell, Xuesen Zeng, Ji Zhang

    Abstract: Material defects (MD) represent a primary challenge affecting product performance and giving rise to safety issues in related products. The rapid and accurate identification and localization of MD constitute crucial research endeavours in addressing contemporary challenges associated with MD. Although conventional non-destructive testing methods such as ultrasonic and X-ray approaches have mitigat… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  16. arXiv:2406.05696  [pdf, other

    eess.SP

    Two Power Allocation and Beamforming Strategies for Active IRS-aided Wireless Network via Machine Learning

    Authors: Qiankun Cheng, Jiatong Bai, Baihua Shi, Wei Gao, Feng Shu

    Abstract: This paper models an active intelligent reflecting surface (IRS) -assisted wireless communication network, which has the ability to adjust power between BS and IRS. We aim to maximize the signal-to-noise ratio of user by jointly designing power allocation (PA) factor, active IRS phase shift matrix, and beamforming vector of BS, subject to a total power constraint. To tackle this non-convex problem… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

  17. arXiv:2405.09556  [pdf, other

    eess.SP cs.AI cs.IT

    Co-learning-aided Multi-modal-deep-learning Framework of Passive DOA Estimators for a Heterogeneous Hybrid Massive MIMO Receiver

    Authors: Jiatong Bai, Feng Shu, Qinghe Zheng, Bo Xu, Baihua Shi, Yiwen Chen, Weibin Zhang, Xianpeng Wang

    Abstract: Due to its excellent performance in rate and resolution, fully-digital (FD) massive multiple-input multiple-output (MIMO) antenna arrays has been widely applied in data transmission and direction of arrival (DOA) measurements, etc. But it confronts with two main challenges: high computational complexity and circuit cost. The two problems may be addressed well by hybrid analog-digital (HAD) structu… ▽ More

    Submitted 12 June, 2024; v1 submitted 27 April, 2024; originally announced May 2024.

  18. Robust Covariance-Based Activity Detection for Massive Access

    Authors: Jianan Bai, Erik G. Larsson

    Abstract: The wireless channel is undergoing continuous changes, and the block-fading assumption, despite its popularity in theoretical contexts, never holds true in practical scenarios. This discrepancy is particularly critical for user activity detection in grant-free random access, where joint processing across multiple resource blocks is usually undesirable. In this paper, we propose employing a low-dim… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

    Comments: 5 pages, 11 figures. Asilomar SSC 2023 Conference

  19. Array SAR 3D Sparse Imaging Based on Regularization by Denoising Under Few Observed Data

    Authors: Yangyang Wang, Xu Zhan, Jing Gao, Jinjie Yao, Shunjun Wei, JianSheng Bai

    Abstract: Array synthetic aperture radar (SAR) three-dimensional (3D) imaging can obtain 3D information of the target region, which is widely used in environmental monitoring and scattering information measurement. In recent years, with the development of compressed sensing (CS) theory, sparse signal processing is used in array SAR 3D imaging. Compared with matched filter (MF), sparse SAR imaging can effect… ▽ More

    Submitted 26 May, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

  20. arXiv:2405.02942  [pdf, other

    physics.optics cs.CV cs.RO eess.IV

    Design, analysis, and manufacturing of a glass-plastic hybrid minimalist aspheric panoramic annular lens

    Authors: Shaohua Gao, Qi Jiang, Yiqi Liao, Yi Qiu, Wanglei Ying, Kailun Yang, Kaiwei Wang, Benhao Zhang, Jian Bai

    Abstract: We propose a high-performance glass-plastic hybrid minimalist aspheric panoramic annular lens (ASPAL) to solve several major limitations of the traditional panoramic annular lens (PAL), such as large size, high weight, and complex system. The field of view (FoV) of the ASPAL is 360°x(35°~110°) and the imaging quality is close to the diffraction limit. This large FoV ASPAL is composed of only 4 len… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

    Comments: Accepted to Optics & Laser Technology

  21. arXiv:2405.01074  [pdf, other

    cs.IT eess.SY

    Stability Analysis of Interacting Wireless Repeaters

    Authors: Erik G. Larsson, Jianan Bai

    Abstract: We consider a wireless network with multiple single-antenna repeaters that amplify and instantaneously re-transmit the signals they receive to improve the channel rank and system coverage. Due to the positive feedback formed by inter-repeater interference, stability could become a critical issue. We investigate the problem of determining the maximum amplification gain that the repeaters can use wi… ▽ More

    Submitted 7 July, 2024; v1 submitted 2 May, 2024; originally announced May 2024.

    Comments: Accepted to SPAWC 2024. 5 pages, 7 figures

  22. arXiv:2404.00136  [pdf, other

    physics.optics eess.SP

    Tunable X-band opto-electronic synthesizer with ultralow phase noise

    Authors: Igor Kudelin, Pedram Shirmohammadi, William Groman, Samin Hanifi, Megan L. Kelleher, Dahyeon Lee, Takuma Nakamura, Charles A. McLemore, Alexander Lind, Dylan Meyer, Junwu Bai, Joe C. Campbell, Steven M. Bowers, Franklyn Quinlan, Scott A. Diddams

    Abstract: Modern communication, navigation, and radar systems rely on low noise and frequency-agile microwave sources. In this application space, photonic systems provide an attractive alternative to conventional microwave synthesis by leveraging high spectral purity lasers and optical frequency combs to generate microwaves with exceedingly low phase noise. However, these photonic techniques suffer from a l… ▽ More

    Submitted 29 March, 2024; originally announced April 2024.

  23. arXiv:2403.20130  [pdf, other

    cs.SD cs.LG eess.AS

    Sound event localization and classification using WASN in Outdoor Environment

    Authors: Dongzhe Zhang, Jianfeng Chen, Jisheng Bai, Mou Wang

    Abstract: Deep learning-based sound event localization and classification is an emerging research area within wireless acoustic sensor networks. However, current methods for sound event localization and classification typically rely on a single microphone array, making them susceptible to signal attenuation and environmental noise, which limits their monitoring range. Moreover, methods using multiple microp… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

  24. arXiv:2403.18707  [pdf, other

    math.OC eess.SY

    Connections between Reachability and Time Optimality

    Authors: Juho Bae, Ji Hoon Bai, Byung-Yoon Lee, Jun-Yong Lee, Chang-Hun Lee

    Abstract: This paper presents the concept of an equivalence relation between the set of optimal control problems. By leveraging this concept, we show that the boundary of the reachability set can be constructed by the solutions of time optimal problems. Alongside, a more generalized equivalence theorem is presented together. The findings facilitate the use of solution structures from a certain class of opti… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: Submitted to Automatica

  25. arXiv:2402.02694  [pdf, other

    eess.AS cs.LG cs.SD

    Description on IEEE ICME 2024 Grand Challenge: Semi-supervised Acoustic Scene Classification under Domain Shift

    Authors: Jisheng Bai, Mou Wang, Haohe Liu, Han Yin, Yafei Jia, Siwei Huang, Yutong Du, Dongzhe Zhang, Dongyuan Shi, Woon-Seng Gan, Mark D. Plumbley, Susanto Rahardja, Bin Xiang, Jianfeng Chen

    Abstract: Acoustic scene classification (ASC) is a crucial research problem in computational auditory scene analysis, and it aims to recognize the unique acoustic characteristics of an environment. One of the challenges of the ASC task is the domain shift between training and testing data. Since 2018, ASC challenges have focused on the generalization of ASC models across different recording devices. Althoug… ▽ More

    Submitted 28 February, 2024; v1 submitted 4 February, 2024; originally announced February 2024.

  26. arXiv:2401.14304  [pdf, other

    eess.SY

    Constraint-Aware Mesh Refinement Method by Reachability Set Envelope of Curvature Bounded Paths

    Authors: Juho Bae, Ji Hoon Bai, Byung-Yoon Lee, Jun-Yong Lee

    Abstract: This paper presents an enhanced direct-method-based approach for the real-time solution of optimal control problems to handle path constraints, such as obstacles. The principal contributions of this work are twofold: first, the existing methods for constructing reachability sets in the literature are extended to derive the envelope of these sets, which determines the region swept by all feasible t… ▽ More

    Submitted 4 March, 2024; v1 submitted 25 January, 2024; originally announced January 2024.

    Comments: Preprint submitted to Automatica

  27. arXiv:2401.08992  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Efficient Adapter Finetuning for Tail Languages in Streaming Multilingual ASR

    Authors: Junwen Bai, Bo Li, Qiujia Li, Tara N. Sainath, Trevor Strohman

    Abstract: The end-to-end ASR model is often desired in the streaming multilingual scenario since it is easier to deploy and can benefit from pre-trained speech models such as powerful foundation models. Meanwhile, the heterogeneous nature and imbalanced data abundance of different languages may cause performance degradation, leading to asynchronous peak performance for different languages during training, e… ▽ More

    Submitted 17 January, 2024; originally announced January 2024.

    Comments: Accepted to ICASSP 2024

  28. arXiv:2401.08678  [pdf, other

    eess.AS cs.SD

    Sub-band and Full-band Interactive U-Net with DPRNN for Demixing Cross-talk Stereo Music

    Authors: Han Yin, Mou Wang, Jisheng Bai, Dongyuan Shi, Woon-Seng Gan, Jianfeng Chen

    Abstract: This paper presents a detailed description of our proposed methods for the ICASSP 2024 Cadenza Challenge. Experimental results show that the proposed system can achieve better performance than official baselines.

    Submitted 10 January, 2024; originally announced January 2024.

    Comments: Submitted to ICASSP 2024

  29. arXiv:2312.16772  [pdf, other

    eess.IV cs.CV cs.LG

    Unsupversied feature correlation model to predict breast abnormal variation maps in longitudinal mammograms

    Authors: Jun Bai, Annie Jin, Madison Adams, Clifford Yang, Sheida Nabavi

    Abstract: Breast cancer continues to be a significant cause of mortality among women globally. Timely identification and precise diagnosis of breast abnormalities are critical for enhancing patient prognosis. In this study, we focus on improving the early detection and accurate diagnosis of breast abnormalities, which is crucial for improving patient outcomes and reducing the mortality rate of breast cancer… ▽ More

    Submitted 27 December, 2023; originally announced December 2023.

  30. arXiv:2311.14068  [pdf, other

    eess.AS

    Interactive Dual-Conformer with Scene-Inspired Mask for Soft Sound Event Detection

    Authors: Han Yin, Jisheng Bai, Mou Wang, Dongyuan Shi, Woon-Seng Gan, Jianfeng Chen

    Abstract: Traditional binary hard labels for sound event detection (SED) lack details about the complexity and variability of sound event distributions. Recently, a novel annotation workflow is proposed to generate fine-grained non-binary soft labels, resulting in a new real-life dataset named MAESTRO Real for SED. In this paper, we first propose an interactive dual-conformer (IDC) module, in which a cross-… ▽ More

    Submitted 7 December, 2023; v1 submitted 23 November, 2023; originally announced November 2023.

    Comments: to be improved (unfinished)

  31. arXiv:2311.12371  [pdf, other

    eess.AS

    AudioLog: LLMs-Powered Long Audio Logging with Hybrid Token-Semantic Contrastive Learning

    Authors: Jisheng Bai, Han Yin, Mou Wang, Dongyuan Shi, Woon-Seng Gan, Jianfeng Chen, Susanto Rahardja

    Abstract: Previous studies in automated audio captioning have faced difficulties in accurately capturing the complete temporal details of acoustic scenes and events within long audio sequences. This paper presents AudioLog, a large language models (LLMs)-powered audio logging system with hybrid token-semantic contrastive learning. Specifically, we propose to fine-tune the pre-trained hierarchical token-sema… ▽ More

    Submitted 4 January, 2024; v1 submitted 21 November, 2023; originally announced November 2023.

  32. arXiv:2309.07566  [pdf, other

    cs.SD cs.AI eess.AS

    Speech-to-Speech Translation with Discrete-Unit-Based Style Transfer

    Authors: Yongqi Wang, Jionghao Bai, Rongjie Huang, Ruiqi Li, Zhiqing Hong, Zhou Zhao

    Abstract: Direct speech-to-speech translation (S2ST) with discrete self-supervised representations has achieved remarkable accuracy, but is unable to preserve the speaker timbre of the source speech. Meanwhile, the scarcity of high-quality speaker-parallel data poses a challenge for learning style transfer during translation. We design an S2ST pipeline with style-transfer capability on the basis of discrete… ▽ More

    Submitted 19 July, 2024; v1 submitted 14 September, 2023; originally announced September 2023.

    Comments: accepted by ACL SRW 2024

  33. arXiv:2308.08100  [pdf, other

    eess.SP

    A New Heterogeneous Hybrid Massive MIMO Receiver with An Intrinsic Ability of Removing Phase Ambiguity of DOA Estimation via Machine Learning

    Authors: Feng Shu, Baihua Shi, Yiwen Chen, Jiatong Bai, Yifan Li, Tingting Liu, Zhu Han, Xiaohu You

    Abstract: Massive multiple input multiple output (MIMO) antenna arrays eventuate a huge amount of circuit costs and computational complexity. To satisfy the needs of high precision and low cost in future green wireless communication, the conventional Hybrid analog and digital MIMO receive structure emerges a natural choice. But it exists an issue of the phase ambiguity in direction of arrival (DOA) estimati… ▽ More

    Submitted 28 May, 2024; v1 submitted 15 August, 2023; originally announced August 2023.

  34. arXiv:2308.05305  [pdf, other

    eess.IV cs.CV cs.LG

    From CNN to Transformer: A Review of Medical Image Segmentation Models

    Authors: Wenjian Yao, Jiajun Bai, Wei Liao, Yuheng Chen, Mengjuan Liu, Yao Xie

    Abstract: Medical image segmentation is an important step in medical image analysis, especially as a crucial prerequisite for efficient disease diagnosis and treatment. The use of deep learning for image segmentation has become a prevalent trend. The widely adopted approach currently is U-Net and its variants. Additionally, with the remarkable success of pre-trained models in natural language processing tas… ▽ More

    Submitted 9 August, 2023; originally announced August 2023.

    Comments: 18 pages, 8 figures

  35. arXiv:2307.08239  [pdf, other

    eess.AS

    Dynamic Kernel Convolution Network with Scene-dedicate Training for Sound Event Localization and Detection

    Authors: Siwei Huang, Jianfeng Chen, Jisheng Bai, Yafei Jia, Dongzhe Zhang

    Abstract: DNN-based methods have shown high performance in sound event localization and detection(SELD). While in real spatial sound scenes, reverberation and the imbalanced presence of various sound events increase the complexity of the SELD task. In this paper, we propose an effective SELD system in real spatial scenes.In our approach, a dynamic kernel convolution module is introduced after the convolutio… ▽ More

    Submitted 17 July, 2023; originally announced July 2023.

    Comments: 11 pages, 6 figures

  36. arXiv:2307.01124  [pdf

    eess.IV cs.CV

    Cross-modality Attention Adapter: A Glioma Segmentation Fine-tuning Method for SAM Using Multimodal Brain MR Images

    Authors: Xiaoyu Shi, Shurong Chai, Yinhao Li, Jingliang Cheng, Jie Bai, Guohua Zhao, Yen-Wei Chen

    Abstract: According to the 2021 World Health Organization (WHO) Classification scheme for gliomas, glioma segmentation is a very important basis for diagnosis and genotype prediction. In general, 3D multimodal brain MRI is an effective diagnostic tool. In the past decade, there has been an increase in the use of machine learning, particularly deep learning, for medical images processing. Thanks to the devel… ▽ More

    Submitted 3 July, 2023; originally announced July 2023.

  37. arXiv:2306.10065  [pdf, other

    eess.AS cs.AI cs.MM cs.SD

    Taming Diffusion Models for Music-driven Conducting Motion Generation

    Authors: Zhuoran Zhao, Jinbin Bai, Delong Chen, Debang Wang, Yubo Pan

    Abstract: Generating the motion of orchestral conductors from a given piece of symphony music is a challenging task since it requires a model to learn semantic music features and capture the underlying distribution of real conducting motion. Prior works have applied Generative Adversarial Networks (GAN) to this task, but the promising diffusion model, which recently showed its advantages in terms of both tr… ▽ More

    Submitted 13 November, 2023; v1 submitted 14 June, 2023; originally announced June 2023.

    Comments: Accepted by AAAI 2023 Summer Symposium with Best Paper Award

  38. arXiv:2306.04987  [pdf, other

    eess.AS cs.SD

    Convolutional Recurrent Neural Network with Attention for 3D Speech Enhancement

    Authors: Han Yin, Jisheng Bai, Mou Wang, Siwei Huang, Yafei Jia, Jianfeng Chen

    Abstract: 3D speech enhancement can effectively improve the auditory experience and plays a crucial role in augmented reality technology. However, traditional convolutional-based speech enhancement methods have limitations in extracting dynamic voice information. In this paper, we incorporate a dual-path recurrent neural network block into the U-Net to iteratively extract dynamic audio information in both t… ▽ More

    Submitted 19 November, 2023; v1 submitted 8 June, 2023; originally announced June 2023.

    Comments: Published on IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC 2023)

  39. arXiv:2306.01303  [pdf, ps, other

    cs.CL cs.SD eess.AS

    DistilXLSR: A Light Weight Cross-Lingual Speech Representation Model

    Authors: Haoyu Wang, Siyuan Wang, Wei-Qiang Zhang, Jinfeng Bai

    Abstract: Multilingual self-supervised speech representation models have greatly enhanced the speech recognition performance for low-resource languages, and the compression of these huge models has also become a crucial prerequisite for their industrial application. In this paper, we propose DistilXLSR, a distilled cross-lingual speech representation model. By randomly shuffling the phonemes of existing spe… ▽ More

    Submitted 2 June, 2023; originally announced June 2023.

    Comments: Accepted by INTERSPEECH 2023

  40. arXiv:2302.12186  [pdf, other

    cs.CV eess.IV

    RSFDM-Net: Real-time Spatial and Frequency Domains Modulation Network for Underwater Image Enhancement

    Authors: Jingxia Jiang, Jinbin Bai, Yun Liu, Junjie Yin, Sixiang Chen, Tian Ye, Erkang Chen

    Abstract: Underwater images typically experience mixed degradations of brightness and structure caused by the absorption and scattering of light by suspended particles. To address this issue, we propose a Real-time Spatial and Frequency Domains Modulation Network (RSFDM-Net) for the efficient enhancement of colors and details in underwater images. Specifically, our proposed conditional network is designed w… ▽ More

    Submitted 23 February, 2023; originally announced February 2023.

  41. arXiv:2302.01496  [pdf, ps, other

    cs.CL cs.LG cs.SD eess.AS

    Efficient Domain Adaptation for Speech Foundation Models

    Authors: Bo Li, Dongseong Hwang, Zhouyuan Huo, Junwen Bai, Guru Prakash, Tara N. Sainath, Khe Chai Sim, Yu Zhang, Wei Han, Trevor Strohman, Francoise Beaufays

    Abstract: Foundation models (FMs), that are trained on broad data at scale and are adaptable to a wide range of downstream tasks, have brought large interest in the research community. Benefiting from the diverse data sources such as different modalities, languages and application domains, foundation models have demonstrated strong generalization and knowledge transfer capabilities. In this paper, we presen… ▽ More

    Submitted 2 February, 2023; originally announced February 2023.

  42. arXiv:2212.12844  [pdf, other

    eess.IV cs.CV

    Weakly-Supervised Deep Learning Model for Prostate Cancer Diagnosis and Gleason Grading of Histopathology Images

    Authors: Mohammad Mahdi Behzadi, Mohammad Madani, Hanzhang Wang, Jun Bai, Ankit Bhardwaj, Anna Tarakanova, Harold Yamase, Ga Hie Nam, Sheida Nabavi

    Abstract: Prostate cancer is the most common cancer in men worldwide and the second leading cause of cancer death in the United States. One of the prognostic features in prostate cancer is the Gleason grading of histopathology images. The Gleason grade is assigned based on tumor architecture on Hematoxylin and Eosin (H&E) stained whole slide images (WSI) by the pathologists. This process is time-consuming a… ▽ More

    Submitted 24 December, 2022; originally announced December 2022.

  43. arXiv:2211.04445  [pdf, other

    cs.CR cs.AI cs.LG eess.SY

    Physics-Constrained Backdoor Attacks on Power System Fault Localization

    Authors: Jianing Bai, Ren Wang, Zuyi Li

    Abstract: The advances in deep learning (DL) techniques have the potential to deliver transformative technological breakthroughs to numerous complex tasks in modern power systems that suffer from increasing uncertainty and nonlinearity. However, the vulnerability of DL has yet to be thoroughly explored in power system tasks under various physical constraints. This work, for the first time, proposes a novel… ▽ More

    Submitted 7 November, 2022; originally announced November 2022.

  44. arXiv:2211.01087  [pdf, other

    cs.SD eess.AS

    DSPGAN: a GAN-based universal vocoder for high-fidelity TTS by time-frequency domain supervision from DSP

    Authors: Kun Song, Yongmao Zhang, Yi Lei, Jian Cong, Hanzhao Li, Lei Xie, Gang He, Jinfeng Bai

    Abstract: Recent development of neural vocoders based on the generative adversarial neural network (GAN) has shown obvious advantages of generating raw waveform conditioned on mel-spectrogram with fast inference speed and lightweight networks. Whereas, it is still challenging to train a universal neural vocoder that can synthesize high-fidelity speech from various scenarios with unseen speakers, languages,… ▽ More

    Submitted 28 May, 2023; v1 submitted 2 November, 2022; originally announced November 2022.

    Comments: Accepted to ICASSP 2023

  45. arXiv:2210.06091  [pdf

    cs.CL cs.SD eess.AS

    Summary on the ISCSLP 2022 Chinese-English Code-Switching ASR Challenge

    Authors: Shuhao Deng, Chengfei Li, Jinfeng Bai, Qingqing Zhang, Wei-Qiang Zhang, Runyan Yang, Gaofeng Cheng, Pengyuan Zhang, Yonghong Yan

    Abstract: Code-switching automatic speech recognition becomes one of the most challenging and the most valuable scenarios of automatic speech recognition, due to the code-switching phenomenon between multilingual language and the frequent occurrence of code-switching phenomenon in daily life. The ISCSLP 2022 Chinese-English Code-Switching Automatic Speech Recognition (CSASR) Challenge aims to promote the de… ▽ More

    Submitted 13 October, 2022; v1 submitted 12 October, 2022; originally announced October 2022.

    Comments: accepted by ISCSLP 2022

  46. SSDPT: Self-Supervised Dual-Path Transformer for Anomalous Sound Detection in Machine Condition Monitoring

    Authors: Jisheng Bai, Jianfeng Chen, Mou Wang, Muhammad Saad Ayub, Qingli Yan

    Abstract: Anomalous sound detection for machine condition monitoring has great potential in the development of Industry 4.0. However, these anomalous sounds of machines are usually unavailable in normal conditions. Therefore, the models employed have to learn acoustic representations with normal sounds for training, and detect anomalous sounds while testing. In this article, we propose a self-supervised dua… ▽ More

    Submitted 5 August, 2022; originally announced August 2022.

  47. Activity Detection in Distributed MIMO: Distributed AMP via Likelihood Ratio Fusion

    Authors: Jianan Bai, Erik G. Larsson

    Abstract: We develop a new algorithm for activity detection for grant-free multiple access in distributed multiple-input multiple-output (MIMO). The algorithm is a distributed version of the approximate message passing (AMP) based on a soft combination of likelihood ratios computed independently at multiple access points. The underpinning theoretical basis of our algorithm is a new observation that we made… ▽ More

    Submitted 22 September, 2022; v1 submitted 5 August, 2022; originally announced August 2022.

    Comments: 5 pages, 2 figures. This paper has been accepted for publication in IEEE Wireless Communications Letters. Code available at https://github.com/jiananbai/distributed-AMP

  48. arXiv:2206.13135  [pdf

    cs.CL cs.SD eess.AS

    TALCS: An Open-Source Mandarin-English Code-Switching Corpus and a Speech Recognition Baseline

    Authors: Chengfei Li, Shuhao Deng, Yaoping Wang, Guangjing Wang, Yaguang Gong, Changbin Chen, Jinfeng Bai

    Abstract: This paper introduces a new corpus of Mandarin-English code-switching speech recognition--TALCS corpus, suitable for training and evaluating code-switching speech recognition systems. TALCS corpus is derived from real online one-to-one English teaching scenes in TAL education group, which contains roughly 587 hours of speech sampled at 16 kHz. To our best knowledge, TALCS corpus is the largest wel… ▽ More

    Submitted 27 June, 2022; originally announced June 2022.

    Comments: accepted by INTERSPEECH 2022

  49. arXiv:2205.05570  [pdf, other

    cs.CV cs.RO eess.IV physics.optics

    Review on Panoramic Imaging and Its Applications in Scene Understanding

    Authors: Shaohua Gao, Kailun Yang, Hao Shi, Kaiwei Wang, Jian Bai

    Abstract: With the rapid development of high-speed communication and artificial intelligence technologies, human perception of real-world scenes is no longer limited to the use of small Field of View (FoV) and low-dimensional scene detection devices. Panoramic imaging emerges as the next generation of innovative intelligent instruments for environmental perception and measurement. However, while satisfying… ▽ More

    Submitted 14 October, 2022; v1 submitted 11 May, 2022; originally announced May 2022.

    Comments: Accepted to IEEE Transactions on Instrumentation and Measurement. 34 pages, 15 figures, 420 references

  50. A Squeeze-and-Excitation and Transformer based Cross-task System for Environmental Sound Recognition

    Authors: Jisheng Bai, Jianfeng Chen, Mou Wang, Muhammad Saad Ayub

    Abstract: Environmental sound recognition (ESR) is an emerging research topic in audio pattern recognition. Many tasks are presented to resort to computational models for ESR in real-life applications. However, current models are usually designed for individual tasks, and are not robust and applicable to other tasks. Cross-task models, which promote unified knowledge modeling across various tasks, have not… ▽ More

    Submitted 21 November, 2023; v1 submitted 15 March, 2022; originally announced March 2022.