Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 229 results for author: Zhu, J

Searching in archive eess. Search in all archives.
.
  1. arXiv:2407.19178  [pdf, other

    cs.CV eess.SP

    Power-LLaVA: Large Language and Vision Assistant for Power Transmission Line Inspection

    Authors: Jiahao Wang, Mingxuan Li, Haichen Luo, Jinguo Zhu, Aijun Yang, Mingzhe Rong, Xiaohua Wang

    Abstract: The inspection of power transmission line has achieved notable achievements in the past few years, primarily due to the integration of deep learning technology. However, current inspection approaches continue to encounter difficulties in generalization and intelligence, which restricts their further applicability. In this paper, we introduce Power-LLaVA, the first large language and vision assista… ▽ More

    Submitted 27 July, 2024; originally announced July 2024.

  2. arXiv:2406.15846  [pdf, other

    cs.CL eess.AS

    Revisiting Interpolation Augmentation for Speech-to-Text Generation

    Authors: Chen Xu, Jie Wang, Xiaoqian Liu, Qianqian Dong, Chunliang Zhang, Tong Xiao, Jingbo Zhu, Dapeng Man, Wu Yang

    Abstract: Speech-to-text (S2T) generation systems frequently face challenges in low-resource scenarios, primarily due to the lack of extensive labeled datasets. One emerging solution is constructing virtual training samples by interpolating inputs and labels, which has notably enhanced system generalization in other domains. Despite its potential, this technique's application in S2T tasks has remained under… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: ACL 2024 Findings

  3. arXiv:2406.13977  [pdf, other

    eess.IV cs.CV

    Similarity-aware Syncretic Latent Diffusion Model for Medical Image Translation with Representation Learning

    Authors: Tingyi Lin, Pengju Lyu, Jie Zhang, Yuqing Wang, Cheng Wang, Jianjun Zhu

    Abstract: Non-contrast CT (NCCT) imaging may reduce image contrast and anatomical visibility, potentially increasing diagnostic uncertainty. In contrast, contrast-enhanced CT (CECT) facilitates the observation of regions of interest (ROI). Leading generative models, especially the conditional diffusion model, demonstrate remarkable capabilities in medical image modality transformation. Typical conditional d… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  4. arXiv:2406.09326  [pdf, other

    cs.SD cs.AI cs.CV cs.MM eess.AS

    PianoMotion10M: Dataset and Benchmark for Hand Motion Generation in Piano Performance

    Authors: Qijun Gan, Song Wang, Shengtao Wu, Jianke Zhu

    Abstract: Recently, artificial intelligence techniques for education have been received increasing attentions, while it still remains an open problem to design the effective music instrument instructing systems. Although key presses can be directly derived from sheet music, the transitional movements among key presses require more extensive guidance in piano performance. In this work, we construct a piano-h… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Codes and Dataset: https://agnjason.github.io/PianoMotion-page

  5. arXiv:2406.04881  [pdf, other

    cs.IT eess.SP

    MIMO Capacity Analysis and Channel Estimation for Electromagnetic Information Theory

    Authors: Jieao Zhu, Vincent Y. F. Tan, Linglong Dai

    Abstract: Electromagnetic information theory (EIT) is an interdisciplinary subject that serves to integrate deterministic electromagnetic theory with stochastic Shannon's information theory. Existing EIT analysis operates in the continuous space domain, which is not aligned with the practical algorithms working in the discrete space domain. This mismatch leads to a significant difficulty in application of E… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: Submitted to the IEEE TWC. In this paper, we established the discrete-continuous correspondence for electromagnetic information theory (EIT), thus enabling analytical tools in the continuous space domain to be applied to discrete space MIMO architectures. Simulation codes will be provided at http://oa.ee.tsinghua.edu.cn/dailinglong/publications/publications.html

  6. arXiv:2406.00497  [pdf, ps, other

    cs.SD cs.AI cs.CL eess.AS

    Recent Advances in End-to-End Simultaneous Speech Translation

    Authors: Xiaoqian Liu, Guoqiang Hu, Yangfan Du, Erfeng He, YingFeng Luo, Chen Xu, Tong Xiao, Jingbo Zhu

    Abstract: Simultaneous speech translation (SimulST) is a demanding task that involves generating translations in real-time while continuously processing speech input. This paper offers a comprehensive overview of the recent developments in SimulST research, focusing on four major challenges. Firstly, the complexities associated with processing lengthy and continuous speech streams pose significant hurdles.… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  7. arXiv:2405.17295  [pdf, other

    eess.SP

    In-sensor Computing ANN Capacitive Sensors

    Authors: Guihua Zhao, Yating Peng, Jiaxin Zhu, Xin Tang, Zhiyi Yu

    Abstract: This letter proposes an in-sensor computing multiply-and-accumulate (MAC) circuit based on capacitance. The MAC circuits can constitute an artificial neural network(ANN) layer and be operated as ANN classifiers and autoencoders. The proposed circuit is a promising scheme for capacitive ANN image sensors, showing competitively high efficiency and lower power.

    Submitted 27 May, 2024; originally announced May 2024.

  8. arXiv:2405.07777  [pdf, other

    cs.CV eess.IV

    GMSR:Gradient-Guided Mamba for Spectral Reconstruction from RGB Images

    Authors: Xinying Wang, Zhixiong Huang, Sifan Zhang, Jiawen Zhu, Lin Feng

    Abstract: Mainstream approaches to spectral reconstruction (SR) primarily focus on designing Convolution- and Transformer-based architectures. However, CNN methods often face challenges in handling long-range dependencies, whereas Transformers are constrained by computational efficiency limitations. Recent breakthroughs in state-space model (e.g., Mamba) has attracted significant attention due to its near-l… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

  9. arXiv:2405.07281  [pdf, ps, other

    eess.SP

    Movable Antennas Aided Multicast MISO Communication Systems

    Authors: Zhenqiao Cheng, Nanxi Li, Ruizhe Long, Jianchi Zhu, Chongjun Ouyang, Peng Chen

    Abstract: A novel multicast communication system with movable antennas (MAs) is proposed, where the antenna position optimization is exploited to enhance the transmission rate. Specifically, an MA-assisted two-user multicast multiple-input single-input system is considered. The joint optimization of the transmit beamforming vector and transmit MA positions is studied by modeling the motion of the MA element… ▽ More

    Submitted 12 May, 2024; originally announced May 2024.

    Comments: 5 pages

  10. arXiv:2405.04867  [pdf, other

    eess.IV cs.CV

    MIPI 2024 Challenge on Demosaic for HybridEVS Camera: Methods and Results

    Authors: Yaqi Wu, Zhihao Fan, Xiaofeng Chu, Jimmy S. Ren, Xiaoming Li, Zongsheng Yue, Chongyi Li, Shangcheng Zhou, Ruicheng Feng, Yuekun Dai, Peiqing Yang, Chen Change Loy, Senyan Xu, Zhijing Sun, Jiaying Zhu, Yurui Zhu, Xueyang Fu, Zheng-Jun Zha, Jun Cao, Cheng Li, Shu Chen, Liang Ma, Shiyang Zhou, Haijin Zeng, Kai Feng , et al. (24 additional authors not shown)

    Abstract: The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photogra… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: MIPI@CVPR2024. Website: https://mipi-challenge.org/MIPI2024/

  11. arXiv:2405.03300  [pdf, other

    cs.IT eess.SP

    Active RIS-Aided Massive MIMO With Imperfect CSI and Phase Noise

    Authors: Zhangjie Peng, Jianchen Zhu, Cunhua Pan, Zaichen Zhang, Daniel Benevides da Costa, Maged Elkashlan, George K. Karagiannidis

    Abstract: Active reconfigurable intelligent surface (RIS) has attracted significant attention as a recently proposed RIS architecture. Owing to its capability to amplify the incident signals, active RIS can mitigate the multiplicative fading effect inherent in the passive RIS-aided system. In this paper, we consider an active RIS-aided uplink multi-user massive multiple-input multiple-output (MIMO) system i… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  12. arXiv:2404.16223  [pdf, other

    cs.CV eess.IV

    Deep RAW Image Super-Resolution. A NTIRE 2024 Challenge Survey

    Authors: Marcos V. Conde, Florin-Alexandru Vasluianu, Radu Timofte, Jianxing Zhang, Jia Li, Fan Wang, Xiaopeng Li, Zikun Liu, Hyunhee Park, Sejun Song, Changho Kim, Zhijuan Huang, Hongyuan Yu, Cheng Wan, Wending Xiang, Jiamin Lin, Hang Zhong, Qiaosong Zhang, Yue Sun, Xuanwu Yin, Kunlong Zuo, Senyan Xu, Siyuan Jiang, Zhijing Sun, Jiaying Zhu , et al. (10 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2024 RAW Image Super-Resolution Challenge, highlighting the proposed solutions and results. New methods for RAW Super-Resolution could be essential in modern Image Signal Processing (ISP) pipelines, however, this problem is not as explored as in the RGB domain. Th goal of this challenge is to upscale RAW Bayer images by 2x, considering unknown degradations such as nois… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: CVPR 2024 - NTIRE Workshop

  13. arXiv:2404.08285  [pdf

    cs.CV cs.AI eess.SY

    A Survey of Neural Network Robustness Assessment in Image Recognition

    Authors: Jie Wang, Jun Ai, Minyan Lu, Haoran Su, Dan Yu, Yutao Zhang, Junda Zhu, Jingyu Liu

    Abstract: In recent years, there has been significant attention given to the robustness assessment of neural networks. Robustness plays a critical role in ensuring reliable operation of artificial intelligence (AI) systems in complex and uncertain environments. Deep learning's robustness problem is particularly significant, highlighted by the discovery of adversarial attacks on image classification models.… ▽ More

    Submitted 15 April, 2024; v1 submitted 12 April, 2024; originally announced April 2024.

    Comments: Corrected typos and grammatical errors in Section 5

  14. arXiv:2404.07556  [pdf, other

    eess.IV cs.CV

    Attention-Aware Laparoscopic Image Desmoking Network with Lightness Embedding and Hybrid Guided Embedding

    Authors: Ziteng Liu, Jiahua Zhu, Bainan Liu, Hao Liu, Wenpeng Gao, Yili Fu

    Abstract: This paper presents a novel method of smoke removal from the laparoscopic images. Due to the heterogeneous nature of surgical smoke, a two-stage network is proposed to estimate the smoke distribution and reconstruct a clear, smoke-free surgical scene. The utilization of the lightness channel plays a pivotal role in providing vital information pertaining to smoke density. The reconstruction of smok… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

    Comments: ISBI2024

  15. arXiv:2404.01875  [pdf, other

    eess.SP cs.DC cs.IT cs.LG

    Satellite Federated Edge Learning: Architecture Design and Convergence Analysis

    Authors: Yuanming Shi, Li Zeng, Jingyang Zhu, Yong Zhou, Chunxiao Jiang, Khaled B. Letaief

    Abstract: The proliferation of low-earth-orbit (LEO) satellite networks leads to the generation of vast volumes of remote sensing data which is traditionally transferred to the ground server for centralized processing, raising privacy and bandwidth concerns. Federated edge learning (FEEL), as a distributed machine learning approach, has the potential to address these challenges by sharing only model paramet… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: 16 pages, 15 figures

  16. arXiv:2403.16062  [pdf

    eess.SP

    Holography inspired self-controlled reconfigurable intelligent surface

    Authors: Jieao Zhu, Ze Gu, Qian Ma, Linglong Dai, Tie Jun Cui

    Abstract: Among various promising candidate technologies for the sixth-generation (6G) wireless communications, recent advances in microwave metasurfaces have sparked a new research area of reconfigurable intelligent surfaces (RISs). By controllably reprogramming the wireless propagation channel, RISs are envisioned to achieve low-cost wireless capacity boosting, coverage extension, and enhanced energy effi… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

    Comments: Traditional BS-controlled RISs suffer from complicated control cables. To "cut" the control cables, we propose a self-controlled RIS by leveraging the holographic interference principle, thus realizing autonomous RIS beamforming

  17. arXiv:2403.15029  [pdf

    eess.SY

    On the Solution Uniqueness of Data-Driven Modeling of Flexible Loads

    Authors: Shuai Lu, Jiayi Ding, Wei Gu, Junpeng Zhu, Yijun Xu, Zhaoyang Dong, Zezheng Sun

    Abstract: This letter first explores the solution uniqueness of the data-driven modeling of price-responsive flexible loads (PFL). The PFL on the demand side is critical in modern power systems. An accurate PFL model is fundamental for system operations. Yet, whether the PFL model can be uniquely and correctly identified from operational data remains unclear. To address this, we analyze the structural and p… ▽ More

    Submitted 17 July, 2024; v1 submitted 22 March, 2024; originally announced March 2024.

  18. arXiv:2403.12425  [pdf, other

    cs.CV cs.SD eess.AS

    Multimodal Fusion Method with Spatiotemporal Sequences and Relationship Learning for Valence-Arousal Estimation

    Authors: Jun Yu, Gongpeng Zhao, Yongqi Wang, Zhihong Wei, Yang Zheng, Zerui Zhang, Zhongpeng Cai, Guochen Xie, Jichao Zhu, Wangyuan Zhu

    Abstract: This paper presents our approach for the VA (Valence-Arousal) estimation task in the ABAW6 competition. We devised a comprehensive model by preprocessing video frames and audio segments to extract visual and audio features. Through the utilization of Temporal Convolutional Network (TCN) modules, we effectively captured the temporal and spatial correlations between these features. Subsequently, we… ▽ More

    Submitted 20 March, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

    Comments: 8 pages,3 figures

  19. arXiv:2403.12268  [pdf, other

    cs.IT eess.SP

    Near-Field Channel Modeling for Electromagnetic Information Theory

    Authors: Zhongzhichao Wan, Jieao Zhu, Linglong Dai

    Abstract: Electromagnetic information theory (EIT) is one of the emerging topics for 6G communication due to its potential to reveal the performance limit of wireless communication systems. For EIT, the research foundation is reasonable and accurate channel modeling. Existing channel modeling works for EIT in non-line-of-sight (NLoS) scenario focus on far-field modeling, which can not accurately capture the… ▽ More

    Submitted 26 May, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

    Comments: In this paper, we propose the near-field channel model for EIT based on electromagnetic scattering theory. Then, we derive the analytical expression of the correlation function of the fields and analyze the characteristics of it. Finally, we design a channel estimation scheme for near-field scenario

  20. arXiv:2403.11757  [pdf, other

    cs.MM cs.LG cs.SD eess.AS

    Efficient Feature Extraction and Late Fusion Strategy for Audiovisual Emotional Mimicry Intensity Estimation

    Authors: Jun Yu, Wangyuan Zhu, Jichao Zhu

    Abstract: In this paper, we present the solution to the Emotional Mimicry Intensity (EMI) Estimation challenge, which is part of 6th Affective Behavior Analysis in-the-wild (ABAW) Competition.The EMI Estimation challenge task aims to evaluate the emotional intensity of seed videos by assessing them from a set of predefined emotion categories (i.e., "Admiration", "Amusement", "Determination", "Empathic Pain"… ▽ More

    Submitted 19 March, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

  21. arXiv:2403.08504  [pdf, other

    cs.CV cs.RO eess.IV

    Offboard Occupancy Refinement with Hybrid Propagation for Autonomous Driving

    Authors: Hao Shi, Song Wang, Jiaming Zhang, Xiaoting Yin, Zhongdao Wang, Guangming Wang, Jianke Zhu, Kailun Yang, Kaiwei Wang

    Abstract: Vision-based occupancy prediction, also known as 3D Semantic Scene Completion (SSC), presents a significant challenge in computer vision. Previous methods, confined to onboard processing, struggle with simultaneous geometric and semantic estimation, continuity across varying viewpoints, and single-view occlusion. Our paper introduces OccFiner, a novel offboard framework designed to enhance the acc… ▽ More

    Submitted 7 July, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

    Comments: The source code will be made publicly available at https://github.com/MasterHow/OccFiner

  22. arXiv:2402.16349  [pdf, other

    cs.LG eess.SY

    C-GAIL: Stabilizing Generative Adversarial Imitation Learning with Control Theory

    Authors: Tianjiao Luo, Tim Pearce, Huayu Chen, Jianfei Chen, Jun Zhu

    Abstract: Generative Adversarial Imitation Learning (GAIL) trains a generative policy to mimic a demonstrator. It uses on-policy Reinforcement Learning (RL) to optimize a reward signal derived from a GAN-like discriminator. A major drawback of GAIL is its training instability - it inherits the complex training dynamics of GANs, and the distribution shift introduced by RL. This can cause oscillations during… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

  23. arXiv:2402.02688  [pdf, ps, other

    cs.IT eess.SP eess.SY

    Successive Bayesian Reconstructor for FAS Channel Estimation

    Authors: Zijian Zhang, Jieao Zhu, Linglong Dai, Robert W. Heath Jr

    Abstract: Fluid antenna systems (FASs) can reconfigure their locations freely within a spatially continuous space. To keep favorable antenna positions, the channel state information (CSI) acquisition for FASs is essential. While some techniques have been proposed, most existing FAS channel estimators require several channel assumptions, such as slow variation and angular-domain sparsity. When these assumpti… ▽ More

    Submitted 4 February, 2024; originally announced February 2024.

    Comments: Accepted by IEEE WCNC 2024. This paper proposes S-BAR as a general solution to estimate FAS channels. More insights can be found in the journal version of this paper: arXiv:2312.06551. arXiv admin note: substantial text overlap with arXiv:2312.06551

  24. arXiv:2401.13276  [pdf, other

    eess.AS

    SCNet: Sparse Compression Network for Music Source Separation

    Authors: Weinan Tong, Jiaxu Zhu, Jun Chen, Shiyin Kang, Tao Jiang, Yang Li, Zhiyong Wu, Helen Meng

    Abstract: Deep learning-based methods have made significant achievements in music source separation. However, obtaining good results while maintaining a low model complexity remains challenging in super wide-band music source separation. Previous works either overlook the differences in subbands or inadequately address the problem of information loss when generating subband features. In this paper, we propo… ▽ More

    Submitted 24 January, 2024; originally announced January 2024.

    Comments: Accepted by ICASSP 2024

  25. arXiv:2401.12783  [pdf, other

    cs.AI cs.LG eess.SP

    A Review of Deep Learning Methods for Photoplethysmography Data

    Authors: Guangkun Nie, Jiabao Zhu, Gongzheng Tang, Deyun Zhang, Shijia Geng, Qinghao Zhao, Shenda Hong

    Abstract: Photoplethysmography (PPG) is a highly promising device due to its advantages in portability, user-friendly operation, and non-invasive capabilities to measure a wide range of physiological information. Recent advancements in deep learning have demonstrated remarkable outcomes by leveraging PPG signals for tasks related to personal health management and other multifaceted applications. In this rev… ▽ More

    Submitted 23 January, 2024; originally announced January 2024.

  26. arXiv:2401.10494  [pdf, other

    eess.AS cs.SD

    A Two-Stage Framework in Cross-Spectrum Domain for Real-Time Speech Enhancement

    Authors: Yuewei Zhang, Huanbin Zou, Jie Zhu

    Abstract: Two-stage pipeline is popular in speech enhancement tasks due to its superiority over traditional single-stage methods. The current two-stage approaches usually enhance the magnitude spectrum in the first stage, and further modify the complex spectrum to suppress the residual noise and recover the speech phase in the second stage. The above whole process is performed in the short-time Fourier tran… ▽ More

    Submitted 19 January, 2024; originally announced January 2024.

    Comments: Accepted by ICASSP 2024

  27. arXiv:2401.06149  [pdf, other

    cs.CV cs.LG eess.IV

    Image Classifier Based Generative Method for Planar Antenna Design

    Authors: Yang Zhong, Weiping Dou, Andrew Cohen, Dia'a Bisharat, Yuandong Tian, Jiang Zhu, Qing Huo Liu

    Abstract: To extend the antenna design on printed circuit boards (PCBs) for more engineers of interest, we propose a simple method that models PCB antennas with a few basic components. By taking two separate steps to decide their geometric dimensions and positions, antenna prototypes can be facilitated with no experience required. Random sampling statistics relate to the quality of dimensions are used in se… ▽ More

    Submitted 16 December, 2023; originally announced January 2024.

    Comments: 13 pages, 18 figures

  28. arXiv:2401.02670  [pdf

    eess.SY

    A full-time scale energy management and battery size optimization for off-grid renewable power to hydrogen systems: A battery energy storage-based grid-forming case in Inner Mongolian

    Authors: Jie Zhu

    Abstract: Hydrogen plays an important role in the context of global carbon reduction. For an off-grid renewable power to hydrogen system (OReP2HS), a grid-forming (GFM) source is essential to provide frequency and voltage references. Here, we take battery works as a GFM source, and the OReP2HS we focus on is comprised of solar photovoltaic, wind turbines, and alkaline electrolyzers for hydrogen generation.… ▽ More

    Submitted 5 January, 2024; originally announced January 2024.

  29. arXiv:2401.01673  [pdf, other

    cs.IT eess.SP

    Coded Beam Training

    Authors: Tianyue Zheng, Jieao Zhu, Qiumo Yu, Yongli Yan, Linglong Dai

    Abstract: In extremely large-scale multiple input multiple output (XL-MIMO) systems for future sixth-generation (6G) communications, codebook-based beam training stands out as a promising technology to acquire channel state information (CSI). Despite their effectiveness, when the pilot overhead is limited, existing beam training methods suffer from significant achievable rate degradation for remote users wi… ▽ More

    Submitted 6 March, 2024; v1 submitted 3 January, 2024; originally announced January 2024.

    Comments: In this paper, we introduce channel coding theory into hierarchical beam training and propose a beam training scheme called coded beam training. By leveraging the error-correcting capability of channel codes, the proposed coded beam training method can enable reliable beam training performance for remote users with low SNR, while keeping training overhead low

  30. arXiv:2401.00194  [pdf, ps, other

    cs.IT eess.SP

    On the Identifiability from Modulo Measurements under DFT Sensing Matrix

    Authors: Qi Zhang, Jiang Zhu, Fengzhong Qu, Zheng Zhu, De Wen Soh

    Abstract: Unlimited sampling was recently introduced to deal with the clipping or saturation of measurements where a modulo operator is applied before sampling. In this paper, we investigate the identifiability of the model where measurements are acquired under a discrete Fourier transform (DFT) sensing matrix first followed by a modulo operator (modulo-DFT). Firstly, based on the theorems of cyclotomic pol… ▽ More

    Submitted 30 December, 2023; originally announced January 2024.

  31. arXiv:2312.17024  [pdf, other

    cs.DS cs.IT eess.IV eess.SP

    Selective Run-Length Encoding

    Authors: Xutan Peng, Yi Zhang, Dejia Peng, Jiafa Zhu

    Abstract: Run-Length Encoding (RLE) is one of the most fundamental tools in data compression. However, its compression power drops significantly if there lacks consecutive elements in the sequence. In extreme cases, the output of the encoder may require more space than the input (aka size inflation). To alleviate this issue, using combinatorics, we quantify RLE's space savings for a given input distribution… ▽ More

    Submitted 28 December, 2023; originally announced December 2023.

    Comments: Accepted at DCC 2024

  32. arXiv:2312.15653  [pdf, other

    cs.IT eess.SP

    Index Modulation for Fluid Antenna-Assisted MIMO Communications: System Design and Performance Analysis

    Authors: Jing Zhu, Gaojie Chen, Pengyu Gao, Pei Xiao, Zihuai Lin, Atta Quddus

    Abstract: In this paper, we propose a transmission mechanism for fluid antennas (FAs) enabled multiple-input multiple-output (MIMO) communication systems based on index modulation (IM), named FA-IM, which incorporates the principle of IM into FAs-assisted MIMO system to improve the spectral efficiency (SE) without increasing the hardware complexity. In FA-IM, the information bits are mapped not only to the… ▽ More

    Submitted 25 December, 2023; originally announced December 2023.

    Comments: 12 pages,9 figures, publish to TWC

  33. arXiv:2312.14473  [pdf, other

    math.OC eess.SY

    Coordinated Active-Reactive Power Management of ReP2H Systems with Multiple Electrolyzers

    Authors: Yangjun Zeng, Buxiang Zhou, Jie Zhu, Jiarong Li, Bosen Yang, Jin Lin, Yiwei Qiu

    Abstract: Utility-scale renewable power-to-hydrogen (ReP2H) production typically uses thyristor rectifiers (TRs) to supply power to multiple electrolyzers (ELZs). They exhibit a nonlinear and non-decouplable relation between active and reactive power. The on-off scheduling and load allocation of multiple ELZs simultaneously impact energy conversion efficiency and AC-side active and reactive power flow. Impr… ▽ More

    Submitted 22 December, 2023; originally announced December 2023.

  34. arXiv:2312.14018  [pdf, ps, other

    eess.SP

    Enabling Secure Wireless Communications via Movable Antennas

    Authors: Zhenqiao Cheng, Nanxi Li, Jianchi Zhu, Xiaoming She, Chongjun Ouyang, Peng Chen

    Abstract: A pioneering secure transmission scheme is proposed, which harnesses movable antennas (MAs) to optimize antenna positions for augmenting the physical layer security. Particularly, an MA-enabled secure wireless system is considered, where a multi-antenna transmitter communicates with a single-antenna receiver in the presence of an eavesdropper. The beamformer and antenna positions at the transmitte… ▽ More

    Submitted 21 December, 2023; originally announced December 2023.

    Comments: Accepted by IEEE ICASSP 2024

  35. arXiv:2312.11125  [pdf, other

    eess.SP

    A Low-Complexity Range Estimation with Adjusted Affine Frequency Division Multiplexing Waveform

    Authors: Jiajun Zhu, Yanqun Tang, Xizhang Wei, Haoran Yin, Jinming Du, Zhengpeng Wang, Yuqinng Liu

    Abstract: Affine frequency division multiplexing (AFDM) is a recently proposed communication waveform for time-varying channel scenarios. As a chirp-based multicarrier modulation technique it can not only satisfy the needs of multiple scenarios in future mobile communication networks but also achieve good performance in radar sensing by adjusting the built-in parameters, making it a promising air interface… ▽ More

    Submitted 29 December, 2023; v1 submitted 18 December, 2023; originally announced December 2023.

    Comments: The paper has been submitted to IEEE WCNC 2024 WS-13: Mobile Sensing-Communication-Computation Synergy for 6G Internet of Things

  36. arXiv:2312.10952  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    Soft Alignment of Modality Space for End-to-end Speech Translation

    Authors: Yuhao Zhang, Kaiqi Kou, Bei Li, Chen Xu, Chunliang Zhang, Tong Xiao, Jingbo Zhu

    Abstract: End-to-end Speech Translation (ST) aims to convert speech into target text within a unified model. The inherent differences between speech and text modalities often impede effective cross-modal and cross-lingual transfer. Existing methods typically employ hard alignment (H-Align) of individual speech and text segments, which can degrade textual representations. To address this, we introduce Soft A… ▽ More

    Submitted 18 December, 2023; originally announced December 2023.

    Comments: Accepted to ICASSP2024

  37. arXiv:2312.06551  [pdf, ps, other

    cs.IT eess.SP eess.SY

    Successive Bayesian Reconstructor for Channel Estimation in Fluid Antenna Systems

    Authors: Zijian Zhang, Jieao Zhu, Linglong Dai, Robert W. Heath Jr

    Abstract: Fluid antenna systems (FASs) can reconfigure their antenna locations freely within a spatially continuous space. To keep favorable antenna positions, the channel state information (CSI) acquisition for FASs is essential. While some techniques have been proposed, most existing FAS channel estimators require several channel assumptions, such as slow variation and angular-domain sparsity. When these… ▽ More

    Submitted 17 January, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

    Comments: 13 pages, 8 figures. This paper proposes S-BAR as a general solution to estimate FAS channels. Unlike model-based estimators, the proposed S-BAR is prior-aided, which builds the experiential kernel for CSI acquisition. Simulation codes will be provided at: http://oa.ee.tsinghua.edu.cn/dailinglong/publications/publications.html

  38. arXiv:2312.06197  [pdf, other

    cs.SD cs.MM eess.AS

    MART: Learning Hierarchical Music Audio Representations with Part-Whole Transformer

    Authors: Dong Yao, Jieming Zhu, Jiahao Xun, Shengyu Zhang, Zhou Zhao, Liqun Deng, Wenqiao Zhang, Zhenhua Dong, Xin Jiang

    Abstract: Recent research in self-supervised contrastive learning of music representations has demonstrated remarkable results across diverse downstream tasks. However, a prevailing trend in existing methods involves representing equally-sized music clips in either waveform or spectrogram formats, often overlooking the intrinsic part-whole hierarchies within music. In our quest to comprehend the bottom-up s… ▽ More

    Submitted 19 April, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

    Comments: Short paper accepted by WWW 2024. This is revised and condensed based on the previous version titled "Music-PAW: Learning Music Representations via Hierarchical Part-whole Interaction and Contrast". For more experimental details and discussions, please refer to the original long paper at arXiv:2312.06197v1

  39. arXiv:2312.03491  [pdf, other

    cs.LG cs.SD eess.AS

    Schrodinger Bridges Beat Diffusion Models on Text-to-Speech Synthesis

    Authors: Zehua Chen, Guande He, Kaiwen Zheng, Xu Tan, Jun Zhu

    Abstract: In text-to-speech (TTS) synthesis, diffusion models have achieved promising generation quality. However, because of the pre-defined data-to-noise diffusion process, their prior distribution is restricted to a noisy representation, which provides little information of the generation target. In this work, we present a novel TTS system, Bridge-TTS, making the first attempt to substitute the noisy Gau… ▽ More

    Submitted 6 December, 2023; originally announced December 2023.

  40. Synergistic Perception and Control Simplex for Verifiable Safe Vertical Landing

    Authors: Ayoosh Bansal, Yang Zhao, James Zhu, Sheng Cheng, Yuliang Gu, Hyung-Jin Yoon, Hunmin Kim, Naira Hovakimyan, Lui Sha

    Abstract: Perception, Planning, and Control form the essential components of autonomy in advanced air mobility. This work advances the holistic integration of these components to enhance the performance and robustness of the complete cyber-physical system. We adapt Perception Simplex, a system for verifiable collision avoidance amidst obstacle detection faults, to the vertical landing maneuver for autonomou… ▽ More

    Submitted 5 December, 2023; originally announced December 2023.

    Comments: To appear in AIAA SciTech 2024

    ACM Class: C.3; C.4; J.7

    Journal ref: AIAA SCITECH 2024 Forum, p. 1167

  41. arXiv:2312.01125  [pdf, other

    cs.IT eess.SP

    Design and Performance Analysis of Index Modulation Empowered AFDM System

    Authors: Jing Zhu, Qu Luo, Gaojie Chen, Pei Xiao, Lixia Xiao

    Abstract: In this letter, we incorporate index modulation (IM) into affine frequency division multiplexing (AFDM), called AFDM-IM, to enhance the bit error rate (BER) and energy efficiency (EE) performance. In this scheme, the information bits are conveyed not only by $M$-ary constellation symbols, but also by the activation of the chirp subcarriers (SCs) indices, which are determined based on the incoming… ▽ More

    Submitted 2 December, 2023; originally announced December 2023.

  42. arXiv:2311.16155  [pdf, other

    eess.SP cs.LG

    Deep Learning-Based Frequency Offset Estimation

    Authors: Tao Chen, Shilian Zheng, Jiawei Zhu, Qi Xuan, Xiaoniu Yang

    Abstract: In wireless communication systems, the asynchronization of the oscillators in the transmitter and the receiver along with the Doppler shift due to relative movement may lead to the presence of carrier frequency offset (CFO) in the received signals. Estimation of CFO is crucial for subsequent processing such as coherent demodulation. In this brief, we demonstrate the utilization of deep learning fo… ▽ More

    Submitted 8 November, 2023; originally announced November 2023.

  43. arXiv:2311.08323  [pdf, other

    cs.CL cs.SD eess.AS

    The taste of IPA: Towards open-vocabulary keyword spotting and forced alignment in any language

    Authors: Jian Zhu, Changbing Yang, Farhan Samir, Jahurul Islam

    Abstract: In this project, we demonstrate that phoneme-based models for speech processing can achieve strong crosslinguistic generalizability to unseen languages. We curated the IPAPACK, a massively multilingual speech corpora with phonemic transcriptions, encompassing more than 115 languages from diverse language families, selectively checked by linguists. Based on the IPAPACK, we propose CLAP-IPA, a multi… ▽ More

    Submitted 1 April, 2024; v1 submitted 14 November, 2023; originally announced November 2023.

    Comments: NAACL 2024 Main Conference

  44. arXiv:2311.07001  [pdf, other

    eess.SY

    Modeling the impact of extreme summer drought on conventional and renewable generation capacity: methods and a case study on the Eastern U.S. power system

    Authors: Hang Shuai, Fangxing Li, Jinxiang Zhu, William Jerome Tingen II, Srijib Mukherjee

    Abstract: The United States has witnessed a growing prevalence of droughts in recent years, posing significant challenges to water supplies and power generation. The resulting impacts on power systems, including reduced capacity and the potential for power outages, underscore the need for accurate assessment methods to ensure the reliable operation of the nation's energy infrastructure. A critical step is t… ▽ More

    Submitted 12 November, 2023; originally announced November 2023.

    Comments: 15 pages, 16 figures

  45. arXiv:2311.04772  [pdf, other

    eess.IV cs.CV

    GCS-ICHNet: Assessment of Intracerebral Hemorrhage Prognosis using Self-Attention with Domain Knowledge Integration

    Authors: Xuhao Shan, Xinyang Li, Ruiquan Ge, Shibin Wu, Ahmed Elazab, Jichao Zhu, Lingyan Zhang, Gangyong Jia, Qingying Xiao, Xiang Wan, Changmiao Wang

    Abstract: Intracerebral Hemorrhage (ICH) is a severe condition resulting from damaged brain blood vessel ruptures, often leading to complications and fatalities. Timely and accurate prognosis and management are essential due to its high mortality rate. However, conventional methods heavily rely on subjective clinician expertise, which can lead to inaccurate diagnoses and delays in treatment. Artificial inte… ▽ More

    Submitted 8 November, 2023; originally announced November 2023.

    Comments: 6 pages, 3 figures, 5 tables, published to BIBM 2023

  46. arXiv:2311.04537  [pdf, other

    eess.SP cs.IT cs.LG

    Deep Learning Assisted Multiuser MIMO Load Modulated Systems for Enhanced Downlink mmWave Communications

    Authors: Ercong Yu, Jinle Zhu, Qiang Li, Zilong Liu, Hongyang Chen, Shlomo Shamai, H. Vincent Poor

    Abstract: This paper is focused on multiuser load modulation arrays (MU-LMAs) which are attractive due to their low system complexity and reduced cost for millimeter wave (mmWave) multi-input multi-output (MIMO) systems. The existing precoding algorithm for downlink MU-LMA relies on a sub-array structured (SAS) transmitter which may suffer from decreased degrees of freedom and complex system configuration.… ▽ More

    Submitted 8 November, 2023; originally announced November 2023.

    Comments: 14 pages, Journal, accepted by IEEE TWC

  47. arXiv:2311.03810  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    Rethinking and Improving Multi-task Learning for End-to-end Speech Translation

    Authors: Yuhao Zhang, Chen Xu, Bei Li, Hao Chen, Tong Xiao, Chunliang Zhang, Jingbo Zhu

    Abstract: Significant improvements in end-to-end speech translation (ST) have been achieved through the application of multi-task learning. However, the extent to which auxiliary tasks are highly consistent with the ST task, and how much this approach truly helps, have not been thoroughly studied. In this paper, we investigate the consistency between different tasks, considering different times and modules.… ▽ More

    Submitted 7 November, 2023; originally announced November 2023.

    Comments: Accepted to EMNLP2023 main conference

  48. arXiv:2310.15901  [pdf, other

    eess.SP

    Enhancing Energy Efficiency for Reconfigurable Intelligent Surfaces with Practical Power Models

    Authors: Zhiyi Li, Jida Zhang, Jieao Zhu, Shi Jin, Linglong Dai

    Abstract: Reconfigurable intelligent surfaces (RISs) are widely considered a promising technology for future wireless communication systems. As an important indicator of RIS-assisted communication systems in green wireless communications, energy efficiency (EE) has recently received intensive research interest as an optimization target. However, most previous works have ignored the different power consumpti… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

    Comments: Reconfigurable intelligent surface is a promising 6G technology. However, RIS power models are inaccurate. In this paper, we construct a practical power model for RIS communication systems with an SDP-relaxation algorithm, achieving optimal energy efficiency

  49. arXiv:2310.12446  [pdf, other

    cs.IT eess.SP

    Can Electromagnetic Information Theory Improve Wireless Systems? A Channel Estimation Example

    Authors: Jieao Zhu, Zhongzhichao Wan, Linglong Dai, Tie Jun Cui

    Abstract: Electromagnetic information theory (EIT) is an emerging interdisciplinary subject that integrates classical Maxwell electromagnetics and Shannon information theory. The goal of EIT is to uncover the information transmission mechanisms from an electromagnetic (EM) perspective in wireless systems. Existing works on EIT are mainly focused on the analysis of EM channel characteristics, degrees-of-free… ▽ More

    Submitted 6 February, 2024; v1 submitted 18 October, 2023; originally announced October 2023.

    Comments: Electromagnetic information theory (EIT) is an emerging interdisciplinary subject, aiming at providing a unified analytical framework for wireless systems as well as guiding practical system design. This paper answers the question: "Whether can we improve wireless communication systems via EIT"?

  50. arXiv:2310.10089  [pdf, other

    cs.LG cs.IT eess.SP

    Over-the-Air Federated Learning and Optimization

    Authors: Jingyang Zhu, Yuanming Shi, Yong Zhou, Chunxiao Jiang, Wei Chen, Khaled B. Letaief

    Abstract: Federated learning (FL), as an emerging distributed machine learning paradigm, allows a mass of edge devices to collaboratively train a global model while preserving privacy. In this tutorial, we focus on FL via over-the-air computation (AirComp), which is proposed to reduce the communication overhead for FL over wireless networks at the cost of compromising in the learning performance due to mode… ▽ More

    Submitted 16 October, 2023; originally announced October 2023.

    Comments: 31 pages, 11 figures