Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 197 results for author: Zhang, D

Searching in archive eess. Search in all archives.
.
  1. arXiv:2407.05709  [pdf, other

    eess.IV cs.CV

    Heterogeneous window transformer for image denoising

    Authors: Chunwei Tian, Menghua Zheng, Chia-Wen Lin, Zhiwu Li, David Zhang

    Abstract: Deep networks can usually depend on extracting more structural information to improve denoising results. However, they may ignore correlation between pixels from an image to pursue better denoising performance. Window transformer can use long- and short-distance modeling to interact pixels to address mentioned problem. To make a tradeoff between distance modeling and denoising time, we propose a h… ▽ More

    Submitted 14 July, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

  2. arXiv:2407.05310  [pdf, other

    eess.SP cs.NE cs.SD eess.AS

    Ternary Spike-based Neuromorphic Signal Processing System

    Authors: Shuai Wang, Dehao Zhang, Ammar Belatreche, Yichen Xiao, Hongyu Qing, Wenjie We, Malu Zhang, Yang Yang

    Abstract: Deep Neural Networks (DNNs) have been successfully implemented across various signal processing fields, resulting in significant enhancements in performance. However, DNNs generally require substantial computational resources, leading to significant economic costs and posing challenges for their deployment on resource-constrained edge devices. In this study, we take advantage of spiking neural net… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

  3. arXiv:2407.04738  [pdf

    eess.SP cs.LG cs.RO

    A Contrastive Learning Based Convolutional Neural Network for ERP Brain-Computer Interfaces

    Authors: Yuntian Cui, Xinke Shen, Dan Zhang, Chen Yang

    Abstract: ERP-based EEG detection is gaining increasing attention in the field of brain-computer interfaces. However, due to the complexity of ERP signal components, their low signal-to-noise ratio, and significant inter-subject variability, cross-subject ERP signal detection has been challenging. The continuous advancement in deep learning has greatly contributed to addressing this issue. This brief propos… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: 5 pages, 2 figures, 2 tables

  4. arXiv:2407.00718  [pdf, other

    eess.IV cs.CV

    ASPS: Augmented Segment Anything Model for Polyp Segmentation

    Authors: Huiqian Li, Dingwen Zhang, Jieru Yao, Longfei Han, Zhongyu Li, Junwei Han

    Abstract: Polyp segmentation plays a pivotal role in colorectal cancer diagnosis. Recently, the emergence of the Segment Anything Model (SAM) has introduced unprecedented potential for polyp segmentation, leveraging its powerful pre-training capability on large-scale datasets. However, due to the domain gap between natural and endoscopy images, SAM encounters two limitations in achieving effective performan… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: Accepted by MICCAI2024

  5. arXiv:2406.15985  [pdf, other

    eess.SY cs.AI

    Deep-MPC: A DAGGER-Driven Imitation Learning Strategy for Optimal Constrained Battery Charging

    Authors: Jorge Espin, Dong Zhang, Daniele Toti, Andrea Pozzi

    Abstract: In the realm of battery charging, several complex aspects demand meticulous attention, including thermal management, capacity degradation, and the need for rapid charging while maintaining safety and battery lifespan. By employing the imitation learning paradigm, this manuscript introduces an innovative solution to confront the inherent challenges often associated with conventional predictive cont… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: 7 pages, 4 figures, submitted to American Control Conference 2024 (ACC2024)

  6. arXiv:2406.13179  [pdf, other

    cs.SD cs.AI cs.NE eess.AS

    Global-Local Convolution with Spiking Neural Networks for Energy-efficient Keyword Spotting

    Authors: Shuai Wang, Dehao Zhang, Kexin Shi, Yuchen Wang, Wenjie Wei, Jibin Wu, Malu Zhang

    Abstract: Thanks to Deep Neural Networks (DNNs), the accuracy of Keyword Spotting (KWS) has made substantial progress. However, as KWS systems are usually implemented on edge devices, energy efficiency becomes a critical requirement besides performance. Here, we take advantage of spiking neural networks' energy efficiency and propose an end-to-end lightweight KWS model. The model consists of two innovative… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  7. arXiv:2406.13145  [pdf, other

    eess.SY cs.LG

    Constructing and Evaluating Digital Twins: An Intelligent Framework for DT Development

    Authors: Longfei Ma, Nan Cheng, Xiucheng Wang, Jiong Chen, Yinjun Gao, Dongxiao Zhang, Jun-Jie Zhang

    Abstract: The development of Digital Twins (DTs) represents a transformative advance for simulating and optimizing complex systems in a controlled digital space. Despite their potential, the challenge of constructing DTs that accurately replicate and predict the dynamics of real-world systems remains substantial. This paper introduces an intelligent framework for the construction and evaluation of DTs, spec… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  8. arXiv:2406.08626  [pdf, other

    eess.SY

    Safety-Driven Battery Charging: A Fisher Information-guided Adaptive MPC with Real-time Parameter Identification

    Authors: Jorge Espin, Yuichi Kajiura, Dong Zhang

    Abstract: Lithium-ion (Li-ion) batteries are ubiquitous in modern energy storage systems, highlighting the critical need to comprehend and optimize their performance. Yet, battery models often exhibit poor parameter identifiability which hinders the development of effective battery management strategies and impacts their overall performance, longevity, and safety. This manuscript explores the integration of… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: 6 pages, 2 figures, submitted to Modeling, Estimation, and Control Conference (MECC 2024)

  9. Fast-Fading Channel and Power Optimization of the Magnetic Inductive Cellular Network

    Authors: Honglei Ma, Erwu Liu, Zhijun Fang, Rui Wang, Yongbin Gao, Wenjun Yu, Dongming Zhang

    Abstract: The cellular network of magnetic Induction (MI) communication holds promise in long-distance underground environments. In the traditional MI communication, there is no fast-fading channel since the MI channel is treated as a quasi-static channel. However, for the vehicle (mobile) MI (VMI) communication, the unpredictable antenna vibration brings the remarkable fast-fading. As such fast-fading cann… ▽ More

    Submitted 7 July, 2024; v1 submitted 7 June, 2024; originally announced June 2024.

    Comments: This work has been accepted by the IEEE TWC for publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  10. arXiv:2406.04350  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Prompt-guided Precise Audio Editing with Diffusion Models

    Authors: Manjie Xu, Chenxing Li, Duzhen zhang, Dan Su, Wei Liang, Dong Yu

    Abstract: Audio editing involves the arbitrary manipulation of audio content through precise control. Although text-guided diffusion models have made significant advancements in text-to-audio generation, they still face challenges in finding a flexible and precise way to modify target events within an audio track. We present a novel approach, referred to as PPAE, which serves as a general module for diffusi… ▽ More

    Submitted 11 May, 2024; originally announced June 2024.

    Comments: Accepted by ICML 2024

  11. arXiv:2406.00341  [pdf, other

    eess.IV cs.CV

    DSCA: A Digital Subtraction Angiography Sequence Dataset and Spatio-Temporal Model for Cerebral Artery Segmentation

    Authors: Qihang Xie, Mengguo Guo, Lei Mou, Dan Zhang, Da Chen, Caifeng Shan, Yitian Zhao, Ruisheng Su, Jiong Zhang

    Abstract: Cerebrovascular diseases (CVDs) remain a leading cause of global disability and mortality. Digital Subtraction Angiography (DSA) sequences, recognized as the golden standard for diagnosing CVDs, can clearly visualize the dynamic flow and reveal pathological conditions within the cerebrovasculature. Therefore, precise segmentation of cerebral arteries (CAs) and classification between their main tru… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  12. arXiv:2405.17659  [pdf, other

    eess.IV cs.CV

    Enhancing Global Sensitivity and Uncertainty Quantification in Medical Image Reconstruction with Monte Carlo Arbitrary-Masked Mamba

    Authors: Jiahao Huang, Liutao Yang, Fanwen Wang, Yang Nan, Weiwen Wu, Chengyan Wang, Kuangyu Shi, Angelica I. Aviles-Rivero, Carola-Bibiane Schönlieb, Daoqiang Zhang, Guang Yang

    Abstract: Deep learning has been extensively applied in medical image reconstruction, where Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) represent the predominant paradigms, each possessing distinct advantages and inherent limitations: CNNs exhibit linear complexity with local sensitivity, whereas ViTs demonstrate quadratic complexity with global sensitivity. The emerging Mamba has sh… ▽ More

    Submitted 25 June, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

  13. arXiv:2405.04867  [pdf, other

    eess.IV cs.CV

    MIPI 2024 Challenge on Demosaic for HybridEVS Camera: Methods and Results

    Authors: Yaqi Wu, Zhihao Fan, Xiaofeng Chu, Jimmy S. Ren, Xiaoming Li, Zongsheng Yue, Chongyi Li, Shangcheng Zhou, Ruicheng Feng, Yuekun Dai, Peiqing Yang, Chen Change Loy, Senyan Xu, Zhijing Sun, Jiaying Zhu, Yurui Zhu, Xueyang Fu, Zheng-Jun Zha, Jun Cao, Cheng Li, Shu Chen, Liang Ma, Shiyang Zhou, Haijin Zeng, Kai Feng , et al. (24 additional authors not shown)

    Abstract: The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photogra… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: MIPI@CVPR2024. Website: https://mipi-challenge.org/MIPI2024/

  14. arXiv:2404.16484  [pdf, other

    cs.CV eess.IV

    Real-Time 4K Super-Resolution of Compressed AVIF Images. AIS 2024 Challenge Survey

    Authors: Marcos V. Conde, Zhijun Lei, Wen Li, Cosmin Stejerean, Ioannis Katsavounidis, Radu Timofte, Kihwan Yoon, Ganzorig Gankhuyag, Jiangtao Lv, Long Sun, Jinshan Pan, Jiangxin Dong, Jinhui Tang, Zhiyuan Li, Hao Wei, Chenyang Ge, Dongyang Zhang, Tianle Liu, Huaian Chen, Yi Jin, Menghan Zhou, Yiqiang Yan, Si Gao, Biao Wu, Shaoli Liu , et al. (50 additional authors not shown)

    Abstract: This paper introduces a novel benchmark as part of the AIS 2024 Real-Time Image Super-Resolution (RTSR) Challenge, which aims to upscale compressed images from 540p to 4K resolution (4x factor) in real-time on commercial GPUs. For this, we use a diverse test set containing a variety of 4K images ranging from digital art to gaming and photography. The images are compressed using the modern AVIF cod… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: CVPR 2024, AI for Streaming (AIS) Workshop

  15. arXiv:2404.05600  [pdf, other

    cs.CL cs.SD eess.AS

    SpeechAlign: Aligning Speech Generation to Human Preferences

    Authors: Dong Zhang, Zhaowei Li, Shimin Li, Xin Zhang, Pengyu Wang, Yaqian Zhou, Xipeng Qiu

    Abstract: Speech language models have significantly advanced in generating realistic speech, with neural codec language models standing out. However, the integration of human feedback to align speech outputs to human preferences is often neglected. This paper addresses this gap by first analyzing the distribution gap in codec language models, highlighting how it leads to discrepancies between the training a… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

    Comments: Work in progress

  16. arXiv:2404.01192  [pdf, other

    eess.IV cs.CV

    iMD4GC: Incomplete Multimodal Data Integration to Advance Precise Treatment Response Prediction and Survival Analysis for Gastric Cancer

    Authors: Fengtao Zhou, Yingxue Xu, Yanfen Cui, Shenyan Zhang, Yun Zhu, Weiyang He, Jiguang Wang, Xin Wang, Ronald Chan, Louis Ho Shing Lau, Chu Han, Dafu Zhang, Zhenhui Li, Hao Chen

    Abstract: Gastric cancer (GC) is a prevalent malignancy worldwide, ranking as the fifth most common cancer with over 1 million new cases and 700 thousand deaths in 2020. Locally advanced gastric cancer (LAGC) accounts for approximately two-thirds of GC diagnoses, and neoadjuvant chemotherapy (NACT) has emerged as the standard treatment for LAGC. However, the effectiveness of NACT varies significantly among… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

    Comments: 27 pages, 9 figures, 3 tables (under review)

  17. arXiv:2403.20130  [pdf, other

    cs.SD cs.LG eess.AS

    Sound event localization and classification using WASN in Outdoor Environment

    Authors: Dongzhe Zhang, Jianfeng Chen, Jisheng Bai, Mou Wang

    Abstract: Deep learning-based sound event localization and classification is an emerging research area within wireless acoustic sensor networks. However, current methods for sound event localization and classification typically rely on a single microphone array, making them susceptible to signal attenuation and environmental noise, which limits their monitoring range. Moreover, methods using multiple microp… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

  18. arXiv:2403.08434  [pdf, other

    cs.RO eess.SY

    GRF-based Predictive Flocking Control with Dynamic Pattern Formation

    Authors: Chenghao Yu, Dengyu Zhang, Qingrui Zhang

    Abstract: It is promising but challenging to design flocking control for a robot swarm to autonomously follow changing patterns or shapes in a optimal distributed manner. The optimal flocking control with dynamic pattern formation is, therefore, investigated in this paper. A predictive flocking control algorithm is proposed based on a Gibbs random field (GRF), where bio-inspired potential energies are used… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

    Comments: Accepted by ICRA 2024

  19. arXiv:2402.18451  [pdf, other

    eess.IV cs.CV

    MambaMIR: An Arbitrary-Masked Mamba for Joint Medical Image Reconstruction and Uncertainty Estimation

    Authors: Jiahao Huang, Liutao Yang, Fanwen Wang, Yang Nan, Angelica I. Aviles-Rivero, Carola-Bibiane Schönlieb, Daoqiang Zhang, Guang Yang

    Abstract: The recent Mamba model has shown remarkable adaptability for visual representation learning, including in medical imaging tasks. This study introduces MambaMIR, a Mamba-based model for medical image reconstruction, as well as its Generative Adversarial Network-based variant, MambaMIR-GAN. Our proposed MambaMIR inherits several advantages, such as linear complexity, global receptive fields, and dyn… ▽ More

    Submitted 25 June, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

  20. arXiv:2402.17776  [pdf, other

    eess.SP

    A New Architecture for Energy Efficient Fault Detection Using Energy Harvesters

    Authors: Dongti Zhang, Patricio Peralta-Braz, Chun Tung Chou, Elena Atroshchenko, Mehrisadat Makki Alamdari, Mahbub Hassan

    Abstract: The current battery-powered fault detection system for vibration monitoring has a rather limited lifetime. This is because the high-frequency sampling (typically tens of kilo-Hertz) required for vibration monitoring results in high energy consumption in both the analog-to-digital (ADC) converter and wireless transmissions. This paper proposes a new fault detection architecture that can significant… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

    Comments: 8 pages, 8 figures

  21. arXiv:2402.14213  [pdf

    q-bio.NC cs.LG eess.SP

    Contrastive Learning of Shared Spatiotemporal EEG Representations Across Individuals for Naturalistic Neuroscience

    Authors: Xinke Shen, Lingyi Tao, Xuyang Chen, Sen Song, Quanying Liu, Dan Zhang

    Abstract: Neural representations induced by naturalistic stimuli offer insights into how humans respond to stimuli in daily life. Understanding neural mechanisms underlying naturalistic stimuli processing hinges on the precise identification and extraction of the shared neural patterns that are consistently present across individuals. Targeting the Electroencephalogram (EEG) technique, known for its rich sp… ▽ More

    Submitted 13 July, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

    Comments: 54 pages, 17 figures

  22. arXiv:2402.10251  [pdf, other

    q-bio.NC cs.AI cs.LG eess.SP

    Brant-2: Foundation Model for Brain Signals

    Authors: Zhizhang Yuan, Daoze Zhang, Junru Chen, Gefei Gu, Yang Yang

    Abstract: Foundational models benefit from pre-training on large amounts of unlabeled data and enable strong performance in a wide variety of applications with a small amount of labeled data. Such models can be particularly effective in analyzing brain signals, as this field encompasses numerous application scenarios, and it is costly to perform large-scale annotation. In this work, we present the largest f… ▽ More

    Submitted 28 March, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

    Comments: 14 pages, 7 figures

  23. arXiv:2402.09434  [pdf, other

    eess.SP cs.LG

    Disentangling Imperfect: A Wavelet-Infused Multilevel Heterogeneous Network for Human Activity Recognition in Flawed Wearable Sensor Data

    Authors: Mengna Liu, Dong Xiang, Xu Cheng, Xiufeng Liu, Dalin Zhang, Shengyong Chen, Christian S. Jensen

    Abstract: The popularity and diffusion of wearable devices provides new opportunities for sensor-based human activity recognition that leverages deep learning-based algorithms. Although impressive advances have been made, two major challenges remain. First, sensor data is often incomplete or noisy due to sensor placement and other issues as well as data transmission failure, calling for imputation of missin… ▽ More

    Submitted 26 January, 2024; originally announced February 2024.

    Comments: 14 pages, 7 figures

  24. arXiv:2402.09048  [pdf, other

    eess.SP

    Sensing in Bi-Static ISAC Systems with Clock Asynchronism: A Signal Processing Perspective

    Authors: Kai Wu, Jacopo Pegoraro, Francesca Meneghello, J. Andrew Zhang, Jesus O. Lacruz, Joerg Widmer, Francesco Restuccia, Michele Rossi, Xiaojing Huang, Daqing Zhang, Giuseppe Caire, Y. Jay Guo

    Abstract: Integrated Sensing and Communication (ISAC) has been identified as a pillar usage scenario for the impending 6G era. Bi-static sensing, a major type of sensing in ISAC, is promising to expedite ISAC in the near future, as it requires minimal changes to the existing network infrastructure. However, a critical challenge for bi-static sensing is clock asynchronism due to the use of different clocks a… ▽ More

    Submitted 24 June, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

    Comments: 20 pages, 6 figures, 1 table

  25. arXiv:2402.06894  [pdf, other

    cs.CL cs.AI cs.LG cs.SD eess.AS

    GenTranslate: Large Language Models are Generative Multilingual Speech and Machine Translators

    Authors: Yuchen Hu, Chen Chen, Chao-Han Huck Yang, Ruizhe Li, Dong Zhang, Zhehuai Chen, Eng Siong Chng

    Abstract: Recent advances in large language models (LLMs) have stepped forward the development of multilingual speech and machine translation by its reduced representation errors and incorporated external knowledge. However, both translation tasks typically utilize beam search decoding and top-1 hypothesis selection for inference. These techniques struggle to fully exploit the rich information in the divers… ▽ More

    Submitted 16 May, 2024; v1 submitted 10 February, 2024; originally announced February 2024.

    Comments: 18 pages, Accepted by ACL 2024. This work is open sourced at: https://github.com/YUCHEN005/GenTranslate

  26. arXiv:2402.02694  [pdf, other

    eess.AS cs.LG cs.SD

    Description on IEEE ICME 2024 Grand Challenge: Semi-supervised Acoustic Scene Classification under Domain Shift

    Authors: Jisheng Bai, Mou Wang, Haohe Liu, Han Yin, Yafei Jia, Siwei Huang, Yutong Du, Dongzhe Zhang, Dongyuan Shi, Woon-Seng Gan, Mark D. Plumbley, Susanto Rahardja, Bin Xiang, Jianfeng Chen

    Abstract: Acoustic scene classification (ASC) is a crucial research problem in computational auditory scene analysis, and it aims to recognize the unique acoustic characteristics of an environment. One of the challenges of the ASC task is the domain shift between training and testing data. Since 2018, ASC challenges have focused on the generalization of ASC models across different recording devices. Althoug… ▽ More

    Submitted 28 February, 2024; v1 submitted 4 February, 2024; originally announced February 2024.

  27. arXiv:2402.01271  [pdf, other

    eess.AS cs.SD

    An Intra-BRNN and GB-RVQ Based END-TO-END Neural Audio Codec

    Authors: Linping Xu, Jiawei Jiang, Dejun Zhang, Xianjun Xia, Li Chen, Yijian Xiao, Piao Ding, Shenyi Song, Sixing Yin, Ferdous Sohel

    Abstract: Recently, neural networks have proven to be effective in performing speech coding task at low bitrates. However, under-utilization of intra-frame correlations and the error of quantizer specifically degrade the reconstructed audio quality. To improve the coding quality, we present an end-to-end neural speech codec, namely CBRC (Convolutional and Bidirectional Recurrent neural Codec). An interleave… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

    Comments: INTERSPEECH 2023

  28. arXiv:2401.14750  [pdf, ps, other

    eess.SY

    Decentralized Zeno-Free Event-Triggered Control For Multiple Networks Subject to Stochastic Network Delays and Poisson Pulsing Attacks

    Authors: Dandan Zhang, Xin Jin, Hongye Su

    Abstract: By designing the decentralized time-regularized (Zeno-free) event-triggered strategies for the state-feedback control law, this paper considers the stochastic stabilization of a class of networked control systems, where two sources of randomness exist in multiple decentralized networks that operate asynchronously and independently: the communication channels are constrained by the stochastic netwo… ▽ More

    Submitted 11 April, 2024; v1 submitted 26 January, 2024; originally announced January 2024.

    Comments: 18 pages, 13 figures

  29. arXiv:2401.13527  [pdf, other

    cs.CL cs.SD eess.AS

    SpeechGPT-Gen: Scaling Chain-of-Information Speech Generation

    Authors: Dong Zhang, Xin Zhang, Jun Zhan, Shimin Li, Yaqian Zhou, Xipeng Qiu

    Abstract: Benefiting from effective speech modeling, current Speech Large Language Models (SLLMs) have demonstrated exceptional capabilities in in-context speech generation and efficient generalization to unseen speakers. However, the prevailing information modeling process is encumbered by certain redundancies, leading to inefficiencies in speech generation. We propose Chain-of-Information Generation (CoIG… ▽ More

    Submitted 25 January, 2024; v1 submitted 24 January, 2024; originally announced January 2024.

    Comments: work in progress

  30. arXiv:2401.12783  [pdf, other

    cs.AI cs.LG eess.SP

    A Review of Deep Learning Methods for Photoplethysmography Data

    Authors: Guangkun Nie, Jiabao Zhu, Gongzheng Tang, Deyun Zhang, Shijia Geng, Qinghao Zhao, Shenda Hong

    Abstract: Photoplethysmography (PPG) is a highly promising device due to its advantages in portability, user-friendly operation, and non-invasive capabilities to measure a wide range of physiological information. Recent advancements in deep learning have demonstrated remarkable outcomes by leveraging PPG signals for tasks related to personal health management and other multifaceted applications. In this rev… ▽ More

    Submitted 23 January, 2024; originally announced January 2024.

  31. arXiv:2401.11675  [pdf, other

    eess.IV

    Rethinking Cross-Attention for Infrared and Visible Image Fusion

    Authors: Lihua Jian, Songlei Xiong, Han Yan, Xiaoguang Niu, Shaowu Wu, Di Zhang

    Abstract: The salient information of an infrared image and the abundant texture of a visible image can be fused to obtain a comprehensive image. As can be known, the current fusion methods based on Transformer techniques for infrared and visible (IV) images have exhibited promising performance. However, the attention mechanism of the previous Transformer-based methods was prone to extract common information… ▽ More

    Submitted 21 January, 2024; originally announced January 2024.

  32. arXiv:2312.15244  [pdf, ps, other

    cs.IT eess.SP

    Fluid Antenna Array Enhanced Over-the-Air Computation

    Authors: Deyou Zhang, Sicong Ye, Ming Xiao, Kezhi Wang, Marco Di Renzo, Mikael Skoglund

    Abstract: Over-the-air computation (AirComp) has emerged as a promising technology for fast wireless data aggregation by harnessing the superposition property of wireless multiple-access channels. This paper investigates a fluid antenna (FA) array-enhanced AirComp system, employing the new degrees of freedom achieved by antenna movements. Specifically, we jointly optimize the transceiver design and antenna… ▽ More

    Submitted 23 December, 2023; originally announced December 2023.

  33. arXiv:2312.11775  [pdf

    eess.IV cs.CV

    Towards SAMBA: Segment Anything Model for Brain Tumor Segmentation in Sub-Sharan African Populations

    Authors: Mohannad Barakat, Noha Magdy, Jjuuko George William, Ethel Phiri, Raymond Confidence, Dong Zhang, Udunna C Anazodo

    Abstract: Gliomas, the most prevalent primary brain tumors, require precise segmentation for diagnosis and treatment planning. However, this task poses significant challenges, particularly in the African population, were limited access to high-quality imaging data hampers algorithm performance. In this study, we propose an innovative approach combining the Segment Anything Model (SAM) and a voting network f… ▽ More

    Submitted 18 December, 2023; originally announced December 2023.

    Comments: 13 pages, 6 figures, 2 tables

  34. arXiv:2312.11770  [pdf

    cs.CV eess.IV

    Bridging the Gap: Generalising State-of-the-Art U-Net Models to Sub-Saharan African Populations

    Authors: Alyssa R. Amod, Alexandra Smith, Pearly Joubert, Confidence Raymond, Dong Zhang, Udunna C. Anazodo, Dodzi Motchon, Tinashe E. M. Mutsvangwa, SĂ©bastien Quetin

    Abstract: A critical challenge for tumour segmentation models is the ability to adapt to diverse clinical settings, particularly when applied to poor-quality neuroimaging data. The uncertainty surrounding this adaptation stems from the lack of representative datasets, leaving top-performing models without exposure to common artifacts found in MRI data throughout Sub-Saharan Africa (SSA). We replicated a fra… ▽ More

    Submitted 18 December, 2023; originally announced December 2023.

    Comments: 14 pages, 5 figures, 3 tables

  35. arXiv:2312.00308  [pdf, other

    cs.CV eess.IV stat.AP

    A knowledge-based data-driven (KBDD) framework for all-day identification of cloud types using satellite remote sensing

    Authors: Longfeng Nie, Yuntian Chen, Mengge Du, Changqi Sun, Dongxiao Zhang

    Abstract: Cloud types, as a type of meteorological data, are of particular significance for evaluating changes in rainfall, heatwaves, water resources, floods and droughts, food security and vegetation cover, as well as land use. In order to effectively utilize high-resolution geostationary observations, a knowledge-based data-driven (KBDD) framework for all-day identification of cloud types based on spectr… ▽ More

    Submitted 30 November, 2023; originally announced December 2023.

  36. arXiv:2311.18418  [pdf, ps, other

    cs.IT eess.SP

    Beamforming Design for Active RIS-Aided Over-the-Air Computation

    Authors: Deyou Zhang, Ming Xiao, Mikael Skoglund, H. Vincent Poor

    Abstract: Over-the-air computation (AirComp) is emerging as a promising technology for wireless data aggregation. However, its performance is hampered by users with poor channel conditions. To mitigate such a performance bottleneck, this paper introduces an active reconfigurable intelligence surface (RIS) into the AirComp system. Specifically, we begin by exploring the ideal RIS model and propose a joint op… ▽ More

    Submitted 30 November, 2023; originally announced November 2023.

  37. arXiv:2311.16374  [pdf, other

    cs.LG eess.SY

    Physics-Informed Neural Network for Discovering Systems with Unmeasurable States with Application to Lithium-Ion Batteries

    Authors: Yuichi Kajiura, Jorge Espin, Dong Zhang

    Abstract: Combining machine learning with physics is a trending approach for discovering unknown dynamics, and one of the most intensively studied frameworks is the physics-informed neural network (PINN). However, PINN often fails to optimize the network due to its difficulty in concurrently minimizing multiple losses originating from the system's governing equations. This problem can be more serious when t… ▽ More

    Submitted 27 November, 2023; originally announced November 2023.

    Comments: 7 pages, 4 figure, submitted to American Control Conference 2024

  38. arXiv:2311.03982  [pdf, ps, other

    cs.IT eess.SP

    Federated Learning via Active RIS Assisted Over-the-Air Computation

    Authors: Deyou Zhang, Ming Xiao, Mikael Skoglund, H. Vincent Poor

    Abstract: In this paper, we propose leveraging the active reconfigurable intelligence surface (RIS) to support reliable gradient aggregation for over-the-air computation (AirComp) enabled federated learning (FL) systems. An analysis of the FL convergence property reveals that minimizing gradient aggregation errors in each training round is crucial for narrowing the convergence gap. As such, we formulate an… ▽ More

    Submitted 7 November, 2023; originally announced November 2023.

    Comments: This paper was submitted to the IEEE International Conference on Machine Learning for Communication and Networking (ICMLCN), Stockholm, Sweden, 2024

  39. arXiv:2311.03974  [pdf, ps, other

    cs.IT eess.SP

    NOMA Enabled Multi-Access Edge Computing: A Joint MU-MIMO Precoding and Computation Offloading Design

    Authors: Deyou Zhang, Meng Wang, Shuo Shi, Ming Xiao

    Abstract: This letter investigates computation offloading and transmit precoding co-design for multi-access edge computing (MEC), where multiple MEC users (MUs) equipped with multiple antennas access the MEC server in a non-orthogonal multiple access manner. We aim to minimize the total energy consumption of all MUs while satisfying the latency constraints by jointly optimizing the computational frequency,… ▽ More

    Submitted 7 November, 2023; originally announced November 2023.

  40. arXiv:2310.10095  [pdf, other

    eess.IV cs.CV cs.LG

    A Multi-Scale Spatial Transformer U-Net for Simultaneously Automatic Reorientation and Segmentation of 3D Nuclear Cardiac Images

    Authors: Yangfan Ni, Duo Zhang, Gege Ma, Lijun Lu, Zhongke Huang, Wentao Zhu

    Abstract: Accurate reorientation and segmentation of the left ventricular (LV) is essential for the quantitative analysis of myocardial perfusion imaging (MPI), in which one critical step is to reorient the reconstructed transaxial nuclear cardiac images into standard short-axis slices for subsequent image processing. Small-scale LV myocardium (LV-MY) region detection and the diverse cardiac structures of i… ▽ More

    Submitted 16 October, 2023; originally announced October 2023.

    Comments: 17 pages, 7 figures

  41. arXiv:2310.09693  [pdf

    eess.SY

    Influence of Acceleration and Deceleration Capability on Machine Tool Feed System Performance

    Authors: Xuesong Wang, Yi Zhou, Dongsheng Zhang

    Abstract: With the increasing demand for high speed and high precision machining of machine tools, the problem of which factors of feed system ultimately determine the performance of machine tools is becoming more and more prominent. At present, the feed system is designed mainly by limiting the load inertia ratio. This design method ignores the match between electromechanical system, motion process and con… ▽ More

    Submitted 13 December, 2023; v1 submitted 14 October, 2023; originally announced October 2023.

  42. arXiv:2310.07405  [pdf, ps, other

    cs.IT eess.SP

    IRS Assisted Federated Learning A Broadband Over-the-Air Aggregation Approach

    Authors: Deyou Zhang, Ming Xiao, Zhibo Pang, Lihui Wang, H. Vincent Poor

    Abstract: We consider a broadband over-the-air computation empowered model aggregation approach for wireless federated learning (FL) systems and propose to leverage an intelligent reflecting surface (IRS) to combat wireless fading and noise. We first investigate the conventional node-selection based framework, where a few edge nodes are dropped in model aggregation to control the aggregation error. We analy… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

    Comments: This paper has been accepted by IEEE Transactions on Wireless Communications

  43. arXiv:2309.15374  [pdf, other

    eess.IV cs.RO

    DREAM-PCD: Deep Reconstruction and Enhancement of mmWave Radar Pointcloud

    Authors: Ruixu Geng, Yadong Li, Dongheng Zhang, Jincheng Wu, Yating Gao, Yang Hu, Yan Chen

    Abstract: Millimeter-wave (mmWave) radar pointcloud offers attractive potential for 3D sensing, thanks to its robustness in challenging conditions such as smoke and low illumination. However, existing methods failed to simultaneously address the three main challenges in mmWave radar pointcloud reconstruction: specular information lost, low angular resolution, and strong interference and noise. In this paper… ▽ More

    Submitted 26 September, 2023; originally announced September 2023.

    Comments: 13 pages, 9 figures

  44. arXiv:2308.16692  [pdf, other

    cs.CL cs.SD eess.AS

    SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models

    Authors: Xin Zhang, Dong Zhang, Shimin Li, Yaqian Zhou, Xipeng Qiu

    Abstract: Current speech large language models build upon discrete speech representations, which can be categorized into semantic tokens and acoustic tokens. However, existing speech tokens are not specifically designed for speech language modeling. To assess the suitability of speech tokens for building speech language models, we established the first benchmark, SLMTokBench. Our results indicate that neith… ▽ More

    Submitted 22 January, 2024; v1 submitted 31 August, 2023; originally announced August 2023.

    Comments: Accepted by ICLR 2024. Project page is at https://0nutation.github.io/SpeechTokenizer.github.io/

  45. arXiv:2308.14393  [pdf

    eess.SY cs.RO

    Research on the Influence of Underwater Environment on the Dynamic Performance of the Mechanical Leg of a Deep-sea Crawling and Swimming Robot

    Authors: Lihui Liao, Baoren Li, Dijia Zhang, Luping Gao, Mboulé Ngwa, Jingmin Du

    Abstract: The performance of underwater crawling and adjustment of the body posture for underwater manipulating of the deep-sea crawling and swimming robot (DCSR) is directly influenced by the dynamic performance of the underwater mechanical legs (UWML), as it serves as the executive mechanism of the DCSR. Compared with the mechanical legs of legged robots working on land, the UWML of the DCSR not only poss… ▽ More

    Submitted 28 August, 2023; originally announced August 2023.

    Comments: conference for 2023 IEEE 9th International Conference on Fluid Power and Mechatronics (FPM2023)

    MSC Class: 93C40 ACM Class: C.5

  46. arXiv:2308.11631  [pdf

    eess.SP cs.LG stat.AP

    Deep learning-based flow disaggregation for short-term hydropower plant operations

    Authors: Duo Zhang

    Abstract: High temporal resolution data plays a vital role in effective short-term hydropower plant operations. In the majority of the Norwegian hydropower system, inflow data is predominantly collected at daily resolutions through measurement installations. However, for enhanced precision in managerial decision-making within hydropower plants, hydrological data with intraday resolutions, such as hourly dat… ▽ More

    Submitted 22 September, 2023; v1 submitted 11 August, 2023; originally announced August 2023.

  47. arXiv:2308.02187  [pdf

    eess.SY

    Decoupling control parameter method to study the coupling characteristics of subsystems in the feed system

    Authors: Dongsheng Zhang, Xuesong Wang, Tingting Zhang

    Abstract: When developing high-speed and high-precision CNC machine tools, subsystem coupling effects must be considered while designing the feed system to maximize its dynamic performance. Currently, the influence of changes in control parameters on the matching characteristics of each subsystem was not yet considered when studying the coupling relationship between subsystems. Therefore, it is difficult to… ▽ More

    Submitted 14 October, 2023; v1 submitted 4 August, 2023; originally announced August 2023.

  48. arXiv:2308.00247  [pdf, other

    eess.IV cs.CV

    Unleashing the Power of Self-Supervised Image Denoising: A Comprehensive Review

    Authors: Dan Zhang, Fangfang Zhou, Felix Albu, Yuanzhou Wei, Xiao Yang, Yuan Gu, Qiang Li

    Abstract: The advent of deep learning has brought a revolutionary transformation to image denoising techniques. However, the persistent challenge of acquiring noise-clean pairs for supervised methods in real-world scenarios remains formidable, necessitating the exploration of more practical self-supervised image denoising. This paper focuses on self-supervised image denoising methods that offer effective so… ▽ More

    Submitted 25 March, 2024; v1 submitted 31 July, 2023; originally announced August 2023.

    Comments: 24 pages

  49. arXiv:2307.08239  [pdf, other

    eess.AS

    Dynamic Kernel Convolution Network with Scene-dedicate Training for Sound Event Localization and Detection

    Authors: Siwei Huang, Jianfeng Chen, Jisheng Bai, Yafei Jia, Dongzhe Zhang

    Abstract: DNN-based methods have shown high performance in sound event localization and detection(SELD). While in real spatial sound scenes, reverberation and the imbalanced presence of various sound events increase the complexity of the SELD task. In this paper, we propose an effective SELD system in real spatial scenes.In our approach, a dynamic kernel convolution module is introduced after the convolutio… ▽ More

    Submitted 17 July, 2023; originally announced July 2023.

    Comments: 11 pages, 6 figures

  50. arXiv:2307.07518  [pdf

    cs.AI cs.CL cs.CV eess.IV

    CephGPT-4: An Interactive Multimodal Cephalometric Measurement and Diagnostic System with Visual Large Language Model

    Authors: Lei Ma, Jincong Han, Zhaoxin Wang, Dian Zhang

    Abstract: Large-scale multimodal language models (LMMs) have achieved remarkable success in general domains. However, the exploration of diagnostic language models based on multimodal cephalometric medical data remains limited. In this paper, we propose a novel multimodal cephalometric analysis and diagnostic dialogue model. Firstly, a multimodal orthodontic medical dataset is constructed, comprising cephal… ▽ More

    Submitted 1 July, 2023; originally announced July 2023.