Search | arXiv e-print repository

arXiv:2407.19178 [pdf, other]

Power-LLaVA: Large Language and Vision Assistant for Power Transmission Line Inspection

Authors: Jiahao Wang, Mingxuan Li, Haichen Luo, Jinguo Zhu, Aijun Yang, Mingzhe Rong, Xiaohua Wang

Abstract: The inspection of power transmission line has achieved notable achievements in the past few years, primarily due to the integration of deep learning technology. However, current inspection approaches continue to encounter difficulties in generalization and intelligence, which restricts their further applicability. In this paper, we introduce Power-LLaVA, the first large language and vision assista… ▽ More The inspection of power transmission line has achieved notable achievements in the past few years, primarily due to the integration of deep learning technology. However, current inspection approaches continue to encounter difficulties in generalization and intelligence, which restricts their further applicability. In this paper, we introduce Power-LLaVA, the first large language and vision assistant designed to offer professional and reliable inspection services for power transmission line by engaging in dialogues with humans. Moreover, we also construct a large-scale and high-quality dataset specialized for the inspection task. By employing a two-stage training strategy on the constructed dataset, Power-LLaVA demonstrates exceptional performance at a comparatively low training cost. Extensive experiments further prove the great capabilities of Power-LLaVA within the realm of power transmission line inspection. Code shall be released. △ Less

Submitted 27 July, 2024; originally announced July 2024.

arXiv:2406.15846 [pdf, other]

Revisiting Interpolation Augmentation for Speech-to-Text Generation

Authors: Chen Xu, Jie Wang, Xiaoqian Liu, Qianqian Dong, Chunliang Zhang, Tong Xiao, Jingbo Zhu, Dapeng Man, Wu Yang

Abstract: Speech-to-text (S2T) generation systems frequently face challenges in low-resource scenarios, primarily due to the lack of extensive labeled datasets. One emerging solution is constructing virtual training samples by interpolating inputs and labels, which has notably enhanced system generalization in other domains. Despite its potential, this technique's application in S2T tasks has remained under… ▽ More Speech-to-text (S2T) generation systems frequently face challenges in low-resource scenarios, primarily due to the lack of extensive labeled datasets. One emerging solution is constructing virtual training samples by interpolating inputs and labels, which has notably enhanced system generalization in other domains. Despite its potential, this technique's application in S2T tasks has remained under-explored. In this paper, we delve into the utility of interpolation augmentation, guided by several pivotal questions. Our findings reveal that employing an appropriate strategy in interpolation augmentation significantly enhances performance across diverse tasks, architectures, and data scales, offering a promising avenue for more robust S2T systems in resource-constrained settings. △ Less

Submitted 22 June, 2024; originally announced June 2024.

Comments: ACL 2024 Findings

arXiv:2406.13977 [pdf, other]

Similarity-aware Syncretic Latent Diffusion Model for Medical Image Translation with Representation Learning

Authors: Tingyi Lin, Pengju Lyu, Jie Zhang, Yuqing Wang, Cheng Wang, Jianjun Zhu

Abstract: Non-contrast CT (NCCT) imaging may reduce image contrast and anatomical visibility, potentially increasing diagnostic uncertainty. In contrast, contrast-enhanced CT (CECT) facilitates the observation of regions of interest (ROI). Leading generative models, especially the conditional diffusion model, demonstrate remarkable capabilities in medical image modality transformation. Typical conditional d… ▽ More Non-contrast CT (NCCT) imaging may reduce image contrast and anatomical visibility, potentially increasing diagnostic uncertainty. In contrast, contrast-enhanced CT (CECT) facilitates the observation of regions of interest (ROI). Leading generative models, especially the conditional diffusion model, demonstrate remarkable capabilities in medical image modality transformation. Typical conditional diffusion models commonly generate images with guidance of segmentation labels for medical modal transformation. Limited access to authentic guidance and its low cardinality can pose challenges to the practical clinical application of conditional diffusion models. To achieve an equilibrium of generative quality and clinical practices, we propose a novel Syncretic generative model based on the latent diffusion model for medical image translation (S$^2$LDM), which can realize high-fidelity reconstruction without demand of additional condition during inference. S$^2$LDM enhances the similarity in distinct modal images via syncretic encoding and diffusing, promoting amalgamated information in the latent space and generating medical images with more details in contrast-enhanced regions. However, syncretic latent spaces in the frequency domain tend to favor lower frequencies, commonly locate in identical anatomic structures. Thus, S$^2$LDM applies adaptive similarity loss and dynamic similarity to guide the generation and supplements the shortfall in high-frequency details throughout the training process. Quantitative experiments confirm the effectiveness of our approach in medical image translation. Our code will release lately. △ Less

Submitted 19 June, 2024; originally announced June 2024.

arXiv:2406.09326 [pdf, other]

PianoMotion10M: Dataset and Benchmark for Hand Motion Generation in Piano Performance

Authors: Qijun Gan, Song Wang, Shengtao Wu, Jianke Zhu

Abstract: Recently, artificial intelligence techniques for education have been received increasing attentions, while it still remains an open problem to design the effective music instrument instructing systems. Although key presses can be directly derived from sheet music, the transitional movements among key presses require more extensive guidance in piano performance. In this work, we construct a piano-h… ▽ More Recently, artificial intelligence techniques for education have been received increasing attentions, while it still remains an open problem to design the effective music instrument instructing systems. Although key presses can be directly derived from sheet music, the transitional movements among key presses require more extensive guidance in piano performance. In this work, we construct a piano-hand motion generation benchmark to guide hand movements and fingerings for piano playing. To this end, we collect an annotated dataset, PianoMotion10M, consisting of 116 hours of piano playing videos from a bird's-eye view with 10 million annotated hand poses. We also introduce a powerful baseline model that generates hand motions from piano audios through a position predictor and a position-guided gesture generator. Furthermore, a series of evaluation metrics are designed to assess the performance of the baseline model, including motion similarity, smoothness, positional accuracy of left and right hands, and overall fidelity of movement distribution. Despite that piano key presses with respect to music scores or audios are already accessible, PianoMotion10M aims to provide guidance on piano fingering for instruction purposes. The dataset and source code can be accessed at https://agnjason.github.io/PianoMotion-page. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: Codes and Dataset: https://agnjason.github.io/PianoMotion-page

arXiv:2406.04881 [pdf, other]

MIMO Capacity Analysis and Channel Estimation for Electromagnetic Information Theory

Authors: Jieao Zhu, Vincent Y. F. Tan, Linglong Dai

Abstract: Electromagnetic information theory (EIT) is an interdisciplinary subject that serves to integrate deterministic electromagnetic theory with stochastic Shannon's information theory. Existing EIT analysis operates in the continuous space domain, which is not aligned with the practical algorithms working in the discrete space domain. This mismatch leads to a significant difficulty in application of E… ▽ More Electromagnetic information theory (EIT) is an interdisciplinary subject that serves to integrate deterministic electromagnetic theory with stochastic Shannon's information theory. Existing EIT analysis operates in the continuous space domain, which is not aligned with the practical algorithms working in the discrete space domain. This mismatch leads to a significant difficulty in application of EIT methodologies to practical discrete space systems, which is called as the discrete-continuous gap in this paper. To bridge this gap, we establish the discrete-continuous correspondence with a prolate spheroidal wave function (PSWF)-based ergodic capacity analysis framework. Specifically, we state and prove some discrete-continuous correspondence lemmas to establish a firm theoretical connection between discrete information-theoretic quantities to their continuous counterparts. With these lemmas, we apply the PSWF ergodic capacity bound to advanced MIMO architectures such as continuous-aperture MIMO (CAP-MIMO) and extremely large-scale MIMO (XL-MIMO). From this PSWF capacity bound, we discover the capacity saturation phenomenon both theoretically and empirically. Although the growth of MIMO performance is fundamentally limited in this EIT-based analysis framework, we reveal new opportunities in MIMO channel estimation by exploiting the EIT knowledge about the channel. Inspired by the PSWF capacity bound, we utilize continuous PSWFs to improve the pilot design of discrete MIMO channel estimators, which is called as the PSWF channel estimator (PSWF-CE). Simulation results demonstrate improved performances of the proposed PSWF-CE, compared to traditional minimum mean squared error (MMSE) and compressed sensing-based estimators. △ Less

Submitted 7 June, 2024; originally announced June 2024.

Comments: Submitted to the IEEE TWC. In this paper, we established the discrete-continuous correspondence for electromagnetic information theory (EIT), thus enabling analytical tools in the continuous space domain to be applied to discrete space MIMO architectures. Simulation codes will be provided at http://oa.ee.tsinghua.edu.cn/dailinglong/publications/publications.html

arXiv:2406.00497 [pdf, ps, other]

Recent Advances in End-to-End Simultaneous Speech Translation

Authors: Xiaoqian Liu, Guoqiang Hu, Yangfan Du, Erfeng He, YingFeng Luo, Chen Xu, Tong Xiao, Jingbo Zhu

Abstract: Simultaneous speech translation (SimulST) is a demanding task that involves generating translations in real-time while continuously processing speech input. This paper offers a comprehensive overview of the recent developments in SimulST research, focusing on four major challenges. Firstly, the complexities associated with processing lengthy and continuous speech streams pose significant hurdles.… ▽ More Simultaneous speech translation (SimulST) is a demanding task that involves generating translations in real-time while continuously processing speech input. This paper offers a comprehensive overview of the recent developments in SimulST research, focusing on four major challenges. Firstly, the complexities associated with processing lengthy and continuous speech streams pose significant hurdles. Secondly, satisfying real-time requirements presents inherent difficulties due to the need for immediate translation output. Thirdly, striking a balance between translation quality and latency constraints remains a critical challenge. Finally, the scarcity of annotated data adds another layer of complexity to the task. Through our exploration of these challenges and the proposed solutions, we aim to provide valuable insights into the current landscape of SimulST research and suggest promising directions for future exploration. △ Less

Submitted 1 June, 2024; originally announced June 2024.

arXiv:2405.17295 [pdf, other]

In-sensor Computing ANN Capacitive Sensors

Authors: Guihua Zhao, Yating Peng, Jiaxin Zhu, Xin Tang, Zhiyi Yu

Abstract: This letter proposes an in-sensor computing multiply-and-accumulate (MAC) circuit based on capacitance. The MAC circuits can constitute an artificial neural network(ANN) layer and be operated as ANN classifiers and autoencoders. The proposed circuit is a promising scheme for capacitive ANN image sensors, showing competitively high efficiency and lower power. This letter proposes an in-sensor computing multiply-and-accumulate (MAC) circuit based on capacitance. The MAC circuits can constitute an artificial neural network(ANN) layer and be operated as ANN classifiers and autoencoders. The proposed circuit is a promising scheme for capacitive ANN image sensors, showing competitively high efficiency and lower power. △ Less

Submitted 27 May, 2024; originally announced May 2024.

arXiv:2405.07777 [pdf, other]

GMSR:Gradient-Guided Mamba for Spectral Reconstruction from RGB Images

Authors: Xinying Wang, Zhixiong Huang, Sifan Zhang, Jiawen Zhu, Lin Feng

Abstract: Mainstream approaches to spectral reconstruction (SR) primarily focus on designing Convolution- and Transformer-based architectures. However, CNN methods often face challenges in handling long-range dependencies, whereas Transformers are constrained by computational efficiency limitations. Recent breakthroughs in state-space model (e.g., Mamba) has attracted significant attention due to its near-l… ▽ More Mainstream approaches to spectral reconstruction (SR) primarily focus on designing Convolution- and Transformer-based architectures. However, CNN methods often face challenges in handling long-range dependencies, whereas Transformers are constrained by computational efficiency limitations. Recent breakthroughs in state-space model (e.g., Mamba) has attracted significant attention due to its near-linear computational efficiency and superior performance, prompting our investigation into its potential for SR problem. To this end, we propose the Gradient-guided Mamba for Spectral Reconstruction from RGB Images, dubbed GMSR-Net. GMSR-Net is a lightweight model characterized by a global receptive field and linear computational complexity. Its core comprises multiple stacked Gradient Mamba (GM) blocks, each featuring a tri-branch structure. In addition to benefiting from efficient global feature representation by Mamba block, we further innovatively introduce spatial gradient attention and spectral gradient attention to guide the reconstruction of spatial and spectral cues. GMSR-Net demonstrates a significant accuracy-efficiency trade-off, achieving state-of-the-art performance while markedly reducing the number of parameters and computational burdens. Compared to existing approaches, GMSR-Net slashes parameters and FLOPS by substantial margins of 10 times and 20 times, respectively. Code is available at https://github.com/wxy11-27/GMSR. △ Less

Submitted 13 May, 2024; originally announced May 2024.

arXiv:2405.07281 [pdf, ps, other]

Movable Antennas Aided Multicast MISO Communication Systems

Authors: Zhenqiao Cheng, Nanxi Li, Ruizhe Long, Jianchi Zhu, Chongjun Ouyang, Peng Chen

Abstract: A novel multicast communication system with movable antennas (MAs) is proposed, where the antenna position optimization is exploited to enhance the transmission rate. Specifically, an MA-assisted two-user multicast multiple-input single-input system is considered. The joint optimization of the transmit beamforming vector and transmit MA positions is studied by modeling the motion of the MA element… ▽ More A novel multicast communication system with movable antennas (MAs) is proposed, where the antenna position optimization is exploited to enhance the transmission rate. Specifically, an MA-assisted two-user multicast multiple-input single-input system is considered. The joint optimization of the transmit beamforming vector and transmit MA positions is studied by modeling the motion of the MA elements as discrete movements. A low-complexity greedy search-based algorithm is proposed to tackle this non-convex inter-programming problem. A branch-and-bound (BAB)-based method is proposed to achieve the optimal multicast rate with a reduced time complexity than the brute-force search by assuming the two users suffer similar line-of-sight path losses. Numerical results reveal that the proposed MA systems significantly improve the multicast rate compared to conventional fixed-position antennas (FPAs)-based systems. △ Less

Submitted 12 May, 2024; originally announced May 2024.

Comments: 5 pages

arXiv:2405.04867 [pdf, other]

MIPI 2024 Challenge on Demosaic for HybridEVS Camera: Methods and Results

Authors: Yaqi Wu, Zhihao Fan, Xiaofeng Chu, Jimmy S. Ren, Xiaoming Li, Zongsheng Yue, Chongyi Li, Shangcheng Zhou, Ruicheng Feng, Yuekun Dai, Peiqing Yang, Chen Change Loy, Senyan Xu, Zhijing Sun, Jiaying Zhu, Yurui Zhu, Xueyang Fu, Zheng-Jun Zha, Jun Cao, Cheng Li, Shu Chen, Liang Ma, Shiyang Zhou, Haijin Zeng, Kai Feng , et al. (24 additional authors not shown)

Abstract: The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photogra… ▽ More The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photography and imaging (MIPI). Building on the achievements of the previous MIPI Workshops held at ECCV 2022 and CVPR 2023, we introduce our third MIPI challenge including three tracks focusing on novel image sensors and imaging algorithms. In this paper, we summarize and review the Nighttime Flare Removal track on MIPI 2024. In total, 170 participants were successfully registered, and 14 teams submitted results in the final testing phase. The developed solutions in this challenge achieved state-of-the-art performance on Nighttime Flare Removal. More details of this challenge and the link to the dataset can be found at https://mipi-challenge.org/MIPI2024/. △ Less

Submitted 8 May, 2024; originally announced May 2024.

Comments: MIPI@CVPR2024. Website: https://mipi-challenge.org/MIPI2024/

arXiv:2405.03300 [pdf, other]

Active RIS-Aided Massive MIMO With Imperfect CSI and Phase Noise

Authors: Zhangjie Peng, Jianchen Zhu, Cunhua Pan, Zaichen Zhang, Daniel Benevides da Costa, Maged Elkashlan, George K. Karagiannidis

Abstract: Active reconfigurable intelligent surface (RIS) has attracted significant attention as a recently proposed RIS architecture. Owing to its capability to amplify the incident signals, active RIS can mitigate the multiplicative fading effect inherent in the passive RIS-aided system. In this paper, we consider an active RIS-aided uplink multi-user massive multiple-input multiple-output (MIMO) system i… ▽ More Active reconfigurable intelligent surface (RIS) has attracted significant attention as a recently proposed RIS architecture. Owing to its capability to amplify the incident signals, active RIS can mitigate the multiplicative fading effect inherent in the passive RIS-aided system. In this paper, we consider an active RIS-aided uplink multi-user massive multiple-input multiple-output (MIMO) system in the presence of phase noise at the active RIS. Specifically, we employ a two-timescale scheme, where the beamforming at the base station (BS) is adjusted based on the instantaneous aggregated channel state information (CSI) and the statistical CSI serves as the basis for designing the phase shifts at the active RIS, so that the feedback overhead and computational complexity can be significantly reduced. The aggregated channel composed of the cascaded and direct channels is estimated by utilizing the linear minimum mean square error (LMMSE) technique. Based on the estimated channel, we derive the analytical closed-form expression of a lower bound of the achievable rate. The power scaling laws in the active RIS-aided system are investigated based on the theoretical expressions. When the transmit power of each user is scaled down by the number of BS antennas M or reflecting elements N, we find that the thermal noise will cause the lower bound of the achievable rate to approach zero, as the number of M or N increases to infinity. Moreover, an optimization approach based on genetic algorithms (GA) is introduced to tackle the phase shift optimization problem. Numerical results reveal that the active RIS can greatly enhance the performance of the considered system under various settings. △ Less

Submitted 6 May, 2024; originally announced May 2024.

arXiv:2404.16223 [pdf, other]

Deep RAW Image Super-Resolution. A NTIRE 2024 Challenge Survey

Authors: Marcos V. Conde, Florin-Alexandru Vasluianu, Radu Timofte, Jianxing Zhang, Jia Li, Fan Wang, Xiaopeng Li, Zikun Liu, Hyunhee Park, Sejun Song, Changho Kim, Zhijuan Huang, Hongyuan Yu, Cheng Wan, Wending Xiang, Jiamin Lin, Hang Zhong, Qiaosong Zhang, Yue Sun, Xuanwu Yin, Kunlong Zuo, Senyan Xu, Siyuan Jiang, Zhijing Sun, Jiaying Zhu , et al. (10 additional authors not shown)

Abstract: This paper reviews the NTIRE 2024 RAW Image Super-Resolution Challenge, highlighting the proposed solutions and results. New methods for RAW Super-Resolution could be essential in modern Image Signal Processing (ISP) pipelines, however, this problem is not as explored as in the RGB domain. Th goal of this challenge is to upscale RAW Bayer images by 2x, considering unknown degradations such as nois… ▽ More This paper reviews the NTIRE 2024 RAW Image Super-Resolution Challenge, highlighting the proposed solutions and results. New methods for RAW Super-Resolution could be essential in modern Image Signal Processing (ISP) pipelines, however, this problem is not as explored as in the RGB domain. Th goal of this challenge is to upscale RAW Bayer images by 2x, considering unknown degradations such as noise and blur. In the challenge, a total of 230 participants registered, and 45 submitted results during thee challenge period. The performance of the top-5 submissions is reviewed and provided here as a gauge for the current state-of-the-art in RAW Image Super-Resolution. △ Less

Submitted 24 April, 2024; originally announced April 2024.

Comments: CVPR 2024 - NTIRE Workshop

arXiv:2404.08285 [pdf]

A Survey of Neural Network Robustness Assessment in Image Recognition

Authors: Jie Wang, Jun Ai, Minyan Lu, Haoran Su, Dan Yu, Yutao Zhang, Junda Zhu, Jingyu Liu

Abstract: In recent years, there has been significant attention given to the robustness assessment of neural networks. Robustness plays a critical role in ensuring reliable operation of artificial intelligence (AI) systems in complex and uncertain environments. Deep learning's robustness problem is particularly significant, highlighted by the discovery of adversarial attacks on image classification models.… ▽ More In recent years, there has been significant attention given to the robustness assessment of neural networks. Robustness plays a critical role in ensuring reliable operation of artificial intelligence (AI) systems in complex and uncertain environments. Deep learning's robustness problem is particularly significant, highlighted by the discovery of adversarial attacks on image classification models. Researchers have dedicated efforts to evaluate robustness in diverse perturbation conditions for image recognition tasks. Robustness assessment encompasses two main techniques: robustness verification/ certification for deliberate adversarial attacks and robustness testing for random data corruptions. In this survey, we present a detailed examination of both adversarial robustness (AR) and corruption robustness (CR) in neural network assessment. Analyzing current research papers and standards, we provide an extensive overview of robustness assessment in image recognition. Three essential aspects are analyzed: concepts, metrics, and assessment methods. We investigate the perturbation metrics and range representations used to measure the degree of perturbations on images, as well as the robustness metrics specifically for the robustness conditions of classification models. The strengths and limitations of the existing methods are also discussed, and some potential directions for future research are provided. △ Less

Submitted 15 April, 2024; v1 submitted 12 April, 2024; originally announced April 2024.

Comments: Corrected typos and grammatical errors in Section 5

arXiv:2404.07556 [pdf, other]

Attention-Aware Laparoscopic Image Desmoking Network with Lightness Embedding and Hybrid Guided Embedding

Authors: Ziteng Liu, Jiahua Zhu, Bainan Liu, Hao Liu, Wenpeng Gao, Yili Fu

Abstract: This paper presents a novel method of smoke removal from the laparoscopic images. Due to the heterogeneous nature of surgical smoke, a two-stage network is proposed to estimate the smoke distribution and reconstruct a clear, smoke-free surgical scene. The utilization of the lightness channel plays a pivotal role in providing vital information pertaining to smoke density. The reconstruction of smok… ▽ More This paper presents a novel method of smoke removal from the laparoscopic images. Due to the heterogeneous nature of surgical smoke, a two-stage network is proposed to estimate the smoke distribution and reconstruct a clear, smoke-free surgical scene. The utilization of the lightness channel plays a pivotal role in providing vital information pertaining to smoke density. The reconstruction of smoke-free image is guided by a hybrid embedding, which combines the estimated smoke mask with the initial image. Experimental results demonstrate that the proposed method boasts a Peak Signal to Noise Ratio that is $2.79\%$ higher than the state-of-the-art methods, while also exhibits a remarkable $38.2\%$ reduction in run-time. Overall, the proposed method offers comparable or even superior performance in terms of both smoke removal quality and computational efficiency when compared to existing state-of-the-art methods. This work will be publicly available on http://homepage.hit.edu.cn/wpgao △ Less

Submitted 11 April, 2024; originally announced April 2024.

Comments: ISBI2024

arXiv:2404.01875 [pdf, other]

Satellite Federated Edge Learning: Architecture Design and Convergence Analysis

Authors: Yuanming Shi, Li Zeng, Jingyang Zhu, Yong Zhou, Chunxiao Jiang, Khaled B. Letaief

Abstract: The proliferation of low-earth-orbit (LEO) satellite networks leads to the generation of vast volumes of remote sensing data which is traditionally transferred to the ground server for centralized processing, raising privacy and bandwidth concerns. Federated edge learning (FEEL), as a distributed machine learning approach, has the potential to address these challenges by sharing only model paramet… ▽ More The proliferation of low-earth-orbit (LEO) satellite networks leads to the generation of vast volumes of remote sensing data which is traditionally transferred to the ground server for centralized processing, raising privacy and bandwidth concerns. Federated edge learning (FEEL), as a distributed machine learning approach, has the potential to address these challenges by sharing only model parameters instead of raw data. Although promising, the dynamics of LEO networks, characterized by the high mobility of satellites and short ground-to-satellite link (GSL) duration, pose unique challenges for FEEL. Notably, frequent model transmission between the satellites and ground incurs prolonged waiting time and large transmission latency. This paper introduces a novel FEEL algorithm, named FEDMEGA, tailored to LEO mega-constellation networks. By integrating inter-satellite links (ISL) for intra-orbit model aggregation, the proposed algorithm significantly reduces the usage of low data rate and intermittent GSL. Our proposed method includes a ring all-reduce based intra-orbit aggregation mechanism, coupled with a network flow-based transmission scheme for global model aggregation, which enhances transmission efficiency. Theoretical convergence analysis is provided to characterize the algorithm performance. Extensive simulations show that our FEDMEGA algorithm outperforms existing satellite FEEL algorithms, exhibiting an approximate 30% improvement in convergence rate. △ Less

Submitted 2 April, 2024; originally announced April 2024.

Comments: 16 pages, 15 figures

arXiv:2403.16062 [pdf]

Holography inspired self-controlled reconfigurable intelligent surface

Authors: Jieao Zhu, Ze Gu, Qian Ma, Linglong Dai, Tie Jun Cui

Abstract: Among various promising candidate technologies for the sixth-generation (6G) wireless communications, recent advances in microwave metasurfaces have sparked a new research area of reconfigurable intelligent surfaces (RISs). By controllably reprogramming the wireless propagation channel, RISs are envisioned to achieve low-cost wireless capacity boosting, coverage extension, and enhanced energy effi… ▽ More Among various promising candidate technologies for the sixth-generation (6G) wireless communications, recent advances in microwave metasurfaces have sparked a new research area of reconfigurable intelligent surfaces (RISs). By controllably reprogramming the wireless propagation channel, RISs are envisioned to achieve low-cost wireless capacity boosting, coverage extension, and enhanced energy efficiency. To reprogram the channel, each meta-atom on RIS needs an external control signal, which is usually generated by base station (BS). However, BS-controlled RISs require complicated control cables, which hamper their massive deployments. Here, we eliminate the need for BS control by proposing a self-controlled RIS (SC-RIS), which is inspired by the optical holography principle. Different from the existing BS-controlled RISs, each meta-atom of SC-RIS is integrated with an additional power detector for holographic recording. By applying the classical Fourier-transform processing to the measured hologram, SC-RIS is capable of retrieving the user's channel state information required for beamforming, thus enabling autonomous RIS beamforming without control cables. Owing to this WiFi-like plug-and-play capability without the BS control, SC-RISs are expected to enable easy and massive deployments in the future 6G systems. △ Less

Submitted 24 March, 2024; originally announced March 2024.

Comments: Traditional BS-controlled RISs suffer from complicated control cables. To "cut" the control cables, we propose a self-controlled RIS by leveraging the holographic interference principle, thus realizing autonomous RIS beamforming

arXiv:2403.15029 [pdf]

On the Solution Uniqueness of Data-Driven Modeling of Flexible Loads

Authors: Shuai Lu, Jiayi Ding, Wei Gu, Junpeng Zhu, Yijun Xu, Zhaoyang Dong, Zezheng Sun

Abstract: This letter first explores the solution uniqueness of the data-driven modeling of price-responsive flexible loads (PFL). The PFL on the demand side is critical in modern power systems. An accurate PFL model is fundamental for system operations. Yet, whether the PFL model can be uniquely and correctly identified from operational data remains unclear. To address this, we analyze the structural and p… ▽ More This letter first explores the solution uniqueness of the data-driven modeling of price-responsive flexible loads (PFL). The PFL on the demand side is critical in modern power systems. An accurate PFL model is fundamental for system operations. Yet, whether the PFL model can be uniquely and correctly identified from operational data remains unclear. To address this, we analyze the structural and practical identifiability of the PFL model, deriving the condition for the solution unique-ness. Besides, we point out the practical implications of the results. Numerical tests validate this work. △ Less

Submitted 17 July, 2024; v1 submitted 22 March, 2024; originally announced March 2024.

arXiv:2403.12425 [pdf, other]

Multimodal Fusion Method with Spatiotemporal Sequences and Relationship Learning for Valence-Arousal Estimation

Authors: Jun Yu, Gongpeng Zhao, Yongqi Wang, Zhihong Wei, Yang Zheng, Zerui Zhang, Zhongpeng Cai, Guochen Xie, Jichao Zhu, Wangyuan Zhu

Abstract: This paper presents our approach for the VA (Valence-Arousal) estimation task in the ABAW6 competition. We devised a comprehensive model by preprocessing video frames and audio segments to extract visual and audio features. Through the utilization of Temporal Convolutional Network (TCN) modules, we effectively captured the temporal and spatial correlations between these features. Subsequently, we… ▽ More This paper presents our approach for the VA (Valence-Arousal) estimation task in the ABAW6 competition. We devised a comprehensive model by preprocessing video frames and audio segments to extract visual and audio features. Through the utilization of Temporal Convolutional Network (TCN) modules, we effectively captured the temporal and spatial correlations between these features. Subsequently, we employed a Transformer encoder structure to learn long-range dependencies, thereby enhancing the model's performance and generalization ability. Our method leverages a multimodal data fusion approach, integrating pre-trained audio and video backbones for feature extraction, followed by TCN-based spatiotemporal encoding and Transformer-based temporal information capture. Experimental results demonstrate the effectiveness of our approach, achieving competitive performance in VA estimation on the AffWild2 dataset. △ Less

Submitted 20 March, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

Comments: 8 pages,3 figures

arXiv:2403.12268 [pdf, other]

Near-Field Channel Modeling for Electromagnetic Information Theory

Authors: Zhongzhichao Wan, Jieao Zhu, Linglong Dai

Abstract: Electromagnetic information theory (EIT) is one of the emerging topics for 6G communication due to its potential to reveal the performance limit of wireless communication systems. For EIT, the research foundation is reasonable and accurate channel modeling. Existing channel modeling works for EIT in non-line-of-sight (NLoS) scenario focus on far-field modeling, which can not accurately capture the… ▽ More Electromagnetic information theory (EIT) is one of the emerging topics for 6G communication due to its potential to reveal the performance limit of wireless communication systems. For EIT, the research foundation is reasonable and accurate channel modeling. Existing channel modeling works for EIT in non-line-of-sight (NLoS) scenario focus on far-field modeling, which can not accurately capture the characteristics of the channel in near-field. In this paper, we propose the near-field channel model for EIT based on electromagnetic scattering theory. We model the channel by using non-stationary Gaussian random fields and derive the analytical expression of the correlation function of the fields. Furthermore, we analyze the characteristics of the proposed channel model, e.g., channel degrees of freedom (DoF). Finally, we design a channel estimation scheme for near-field scenario by integrating the electromagnetic prior information of the proposed model. Numerical analysis verifies the correctness of the proposed scheme and shows that it can outperform existing schemes like least square (LS) and orthogonal matching pursuit (OMP). △ Less

Submitted 26 May, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

Comments: In this paper, we propose the near-field channel model for EIT based on electromagnetic scattering theory. Then, we derive the analytical expression of the correlation function of the fields and analyze the characteristics of it. Finally, we design a channel estimation scheme for near-field scenario

arXiv:2403.11757 [pdf, other]

Efficient Feature Extraction and Late Fusion Strategy for Audiovisual Emotional Mimicry Intensity Estimation

Authors: Jun Yu, Wangyuan Zhu, Jichao Zhu

Abstract: In this paper, we present the solution to the Emotional Mimicry Intensity (EMI) Estimation challenge, which is part of 6th Affective Behavior Analysis in-the-wild (ABAW) Competition.The EMI Estimation challenge task aims to evaluate the emotional intensity of seed videos by assessing them from a set of predefined emotion categories (i.e., "Admiration", "Amusement", "Determination", "Empathic Pain"… ▽ More In this paper, we present the solution to the Emotional Mimicry Intensity (EMI) Estimation challenge, which is part of 6th Affective Behavior Analysis in-the-wild (ABAW) Competition.The EMI Estimation challenge task aims to evaluate the emotional intensity of seed videos by assessing them from a set of predefined emotion categories (i.e., "Admiration", "Amusement", "Determination", "Empathic Pain", "Excitement" and "Joy"). To tackle this challenge, we extracted rich dual-channel visual features based on ResNet18 and AUs for the video modality and effective single-channel features based on Wav2Vec2.0 for the audio modality. This allowed us to obtain comprehensive emotional features for the audiovisual modality. Additionally, leveraging a late fusion strategy, we averaged the predictions of the visual and acoustic models, resulting in a more accurate estimation of audiovisual emotional mimicry intensity. Experimental results validate the effectiveness of our approach, with the average Pearson's correlation Coefficient($ρ$) across the 6 emotion dimensionson the validation set achieving 0.3288. △ Less

Submitted 19 March, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

arXiv:2403.08504 [pdf, other]

Offboard Occupancy Refinement with Hybrid Propagation for Autonomous Driving

Authors: Hao Shi, Song Wang, Jiaming Zhang, Xiaoting Yin, Zhongdao Wang, Guangming Wang, Jianke Zhu, Kailun Yang, Kaiwei Wang

Abstract: Vision-based occupancy prediction, also known as 3D Semantic Scene Completion (SSC), presents a significant challenge in computer vision. Previous methods, confined to onboard processing, struggle with simultaneous geometric and semantic estimation, continuity across varying viewpoints, and single-view occlusion. Our paper introduces OccFiner, a novel offboard framework designed to enhance the acc… ▽ More Vision-based occupancy prediction, also known as 3D Semantic Scene Completion (SSC), presents a significant challenge in computer vision. Previous methods, confined to onboard processing, struggle with simultaneous geometric and semantic estimation, continuity across varying viewpoints, and single-view occlusion. Our paper introduces OccFiner, a novel offboard framework designed to enhance the accuracy of vision-based occupancy predictions. OccFiner operates in two hybrid phases: 1) a multi-to-multi local propagation network that implicitly aligns and processes multiple local frames for correcting onboard model errors and consistently enhancing occupancy accuracy across all distances. 2) the region-centric global propagation, focuses on refining labels using explicit multi-view geometry and integrating sensor bias, especially to increase the accuracy of distant occupied voxels. Extensive experiments demonstrate that OccFiner improves both geometric and semantic accuracy across various types of coarse occupancy, setting a new state-of-the-art performance on the SemanticKITTI dataset. Notably, OccFiner elevates vision-based SSC models to a level even surpassing that of LiDAR-based onboard SSC models. Furthermore, OccFiner is the first to achieve automatic annotation of SSC in a purely vision-based approach. Quantitative experiments prove that OccFiner successfully facilitates occupancy data loop-closure in autonomous driving. Additionally, we quantitatively and qualitatively validate the superiority of the offboard approach on city-level SSC static maps. The source code will be made publicly available at https://github.com/MasterHow/OccFiner. △ Less

Submitted 7 July, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

Comments: The source code will be made publicly available at https://github.com/MasterHow/OccFiner

arXiv:2402.16349 [pdf, other]

C-GAIL: Stabilizing Generative Adversarial Imitation Learning with Control Theory

Authors: Tianjiao Luo, Tim Pearce, Huayu Chen, Jianfei Chen, Jun Zhu

Abstract: Generative Adversarial Imitation Learning (GAIL) trains a generative policy to mimic a demonstrator. It uses on-policy Reinforcement Learning (RL) to optimize a reward signal derived from a GAN-like discriminator. A major drawback of GAIL is its training instability - it inherits the complex training dynamics of GANs, and the distribution shift introduced by RL. This can cause oscillations during… ▽ More Generative Adversarial Imitation Learning (GAIL) trains a generative policy to mimic a demonstrator. It uses on-policy Reinforcement Learning (RL) to optimize a reward signal derived from a GAN-like discriminator. A major drawback of GAIL is its training instability - it inherits the complex training dynamics of GANs, and the distribution shift introduced by RL. This can cause oscillations during training, harming its sample efficiency and final policy performance. Recent work has shown that control theory can help with the convergence of a GAN's training. This paper extends this line of work, conducting a control-theoretic analysis of GAIL and deriving a novel controller that not only pushes GAIL to the desired equilibrium but also achieves asymptotic stability in a 'one-step' setting. Based on this, we propose a practical algorithm 'Controlled-GAIL' (C-GAIL). On MuJoCo tasks, our controlled variant is able to speed up the rate of convergence, reduce the range of oscillation and match the expert's distribution more closely both for vanilla GAIL and GAIL-DAC. △ Less

Submitted 26 February, 2024; originally announced February 2024.

arXiv:2402.02688 [pdf, ps, other]

Successive Bayesian Reconstructor for FAS Channel Estimation

Authors: Zijian Zhang, Jieao Zhu, Linglong Dai, Robert W. Heath Jr

Abstract: Fluid antenna systems (FASs) can reconfigure their locations freely within a spatially continuous space. To keep favorable antenna positions, the channel state information (CSI) acquisition for FASs is essential. While some techniques have been proposed, most existing FAS channel estimators require several channel assumptions, such as slow variation and angular-domain sparsity. When these assumpti… ▽ More Fluid antenna systems (FASs) can reconfigure their locations freely within a spatially continuous space. To keep favorable antenna positions, the channel state information (CSI) acquisition for FASs is essential. While some techniques have been proposed, most existing FAS channel estimators require several channel assumptions, such as slow variation and angular-domain sparsity. When these assumptions are not reasonable, the model mismatch may lead to unpredictable performance loss. In this paper, we propose the successive Bayesian reconstructor (S-BAR) as a general solution to estimate FAS channels. Unlike model-based estimators, the proposed S-BAR is prior-aided, which builds the experiential kernel for CSI acquisition. Inspired by Bayesian regression, the key idea of S-BAR is to model the FAS channels as a stochastic process, whose uncertainty can be successively eliminated by kernel-based sampling and regression. In this way, the predictive mean of the regressed stochastic process can be viewed as the maximum a posterior (MAP) estimator of FAS channels. Simulation results verify that, in both model-mismatched and model-matched cases, the proposed S-BAR can achieve higher estimation accuracy than the existing schemes. △ Less

Submitted 4 February, 2024; originally announced February 2024.

Comments: Accepted by IEEE WCNC 2024. This paper proposes S-BAR as a general solution to estimate FAS channels. More insights can be found in the journal version of this paper: arXiv:2312.06551. arXiv admin note: substantial text overlap with arXiv:2312.06551

arXiv:2401.13276 [pdf, other]

SCNet: Sparse Compression Network for Music Source Separation

Authors: Weinan Tong, Jiaxu Zhu, Jun Chen, Shiyin Kang, Tao Jiang, Yang Li, Zhiyong Wu, Helen Meng

Abstract: Deep learning-based methods have made significant achievements in music source separation. However, obtaining good results while maintaining a low model complexity remains challenging in super wide-band music source separation. Previous works either overlook the differences in subbands or inadequately address the problem of information loss when generating subband features. In this paper, we propo… ▽ More Deep learning-based methods have made significant achievements in music source separation. However, obtaining good results while maintaining a low model complexity remains challenging in super wide-band music source separation. Previous works either overlook the differences in subbands or inadequately address the problem of information loss when generating subband features. In this paper, we propose SCNet, a novel frequency-domain network to explicitly split the spectrogram of the mixture into several subbands and introduce a sparsity-based encoder to model different frequency bands. We use a higher compression ratio on subbands with less information to improve the information density and focus on modeling subbands with more information. In this way, the separation performance can be significantly improved using lower computational consumption. Experiment results show that the proposed model achieves a signal to distortion ratio (SDR) of 9.0 dB on the MUSDB18-HQ dataset without using extra data, which outperforms state-of-the-art methods. Specifically, SCNet's CPU inference time is only 48% of HT Demucs, one of the previous state-of-the-art models. △ Less

Submitted 24 January, 2024; originally announced January 2024.

Comments: Accepted by ICASSP 2024

arXiv:2401.12783 [pdf, other]

A Review of Deep Learning Methods for Photoplethysmography Data

Authors: Guangkun Nie, Jiabao Zhu, Gongzheng Tang, Deyun Zhang, Shijia Geng, Qinghao Zhao, Shenda Hong

Abstract: Photoplethysmography (PPG) is a highly promising device due to its advantages in portability, user-friendly operation, and non-invasive capabilities to measure a wide range of physiological information. Recent advancements in deep learning have demonstrated remarkable outcomes by leveraging PPG signals for tasks related to personal health management and other multifaceted applications. In this rev… ▽ More Photoplethysmography (PPG) is a highly promising device due to its advantages in portability, user-friendly operation, and non-invasive capabilities to measure a wide range of physiological information. Recent advancements in deep learning have demonstrated remarkable outcomes by leveraging PPG signals for tasks related to personal health management and other multifaceted applications. In this review, we systematically reviewed papers that applied deep learning models to process PPG data between January 1st of 2017 and July 31st of 2023 from Google Scholar, PubMed and Dimensions. Each paper is analyzed from three key perspectives: tasks, models, and data. We finally extracted 193 papers where different deep learning frameworks were used to process PPG signals. Based on the tasks addressed in these papers, we categorized them into two major groups: medical-related, and non-medical-related. The medical-related tasks were further divided into seven subgroups, including blood pressure analysis, cardiovascular monitoring and diagnosis, sleep health, mental health, respiratory monitoring and analysis, blood glucose analysis, as well as others. The non-medical-related tasks were divided into four subgroups, which encompass signal processing, biometric identification, electrocardiogram reconstruction, and human activity recognition. In conclusion, significant progress has been made in the field of using deep learning methods to process PPG data recently. This allows for a more thorough exploration and utilization of the information contained in PPG signals. However, challenges remain, such as limited quantity and quality of publicly available databases, a lack of effective validation in real-world scenarios, and concerns about the interpretability, scalability, and complexity of deep learning models. Moreover, there are still emerging research areas that require further investigation. △ Less

Submitted 23 January, 2024; originally announced January 2024.

arXiv:2401.10494 [pdf, other]

A Two-Stage Framework in Cross-Spectrum Domain for Real-Time Speech Enhancement

Authors: Yuewei Zhang, Huanbin Zou, Jie Zhu

Abstract: Two-stage pipeline is popular in speech enhancement tasks due to its superiority over traditional single-stage methods. The current two-stage approaches usually enhance the magnitude spectrum in the first stage, and further modify the complex spectrum to suppress the residual noise and recover the speech phase in the second stage. The above whole process is performed in the short-time Fourier tran… ▽ More Two-stage pipeline is popular in speech enhancement tasks due to its superiority over traditional single-stage methods. The current two-stage approaches usually enhance the magnitude spectrum in the first stage, and further modify the complex spectrum to suppress the residual noise and recover the speech phase in the second stage. The above whole process is performed in the short-time Fourier transform (STFT) spectrum domain. In this paper, we re-implement the above second sub-process in the short-time discrete cosine transform (STDCT) spectrum domain. The reason is that we have found STDCT performs greater noise suppression capability than STFT. Additionally, the implicit phase of STDCT ensures simpler and more efficient phase recovery, which is challenging and computationally expensive in the STFT-based methods. Therefore, we propose a novel two-stage framework called the STFT-STDCT spectrum fusion network (FDFNet) for speech enhancement in cross-spectrum domain. Experimental results demonstrate that the proposed FDFNet outperforms the previous two-stage methods and also exhibits superior performance compared to other advanced systems. △ Less

Submitted 19 January, 2024; originally announced January 2024.

Comments: Accepted by ICASSP 2024

arXiv:2401.06149 [pdf, other]

Image Classifier Based Generative Method for Planar Antenna Design

Authors: Yang Zhong, Weiping Dou, Andrew Cohen, Dia'a Bisharat, Yuandong Tian, Jiang Zhu, Qing Huo Liu

Abstract: To extend the antenna design on printed circuit boards (PCBs) for more engineers of interest, we propose a simple method that models PCB antennas with a few basic components. By taking two separate steps to decide their geometric dimensions and positions, antenna prototypes can be facilitated with no experience required. Random sampling statistics relate to the quality of dimensions are used in se… ▽ More To extend the antenna design on printed circuit boards (PCBs) for more engineers of interest, we propose a simple method that models PCB antennas with a few basic components. By taking two separate steps to decide their geometric dimensions and positions, antenna prototypes can be facilitated with no experience required. Random sampling statistics relate to the quality of dimensions are used in selecting among dimension candidates. A novel image-based classifier using a convolutional neural network (CNN) is introduced to further determine the positions of these fixed-dimension components. Two examples from wearable products have been chosen to examine the entire workflow. Their final designs are realistic and their performance metrics are not inferior to the ones designed by experienced engineers. △ Less

Submitted 16 December, 2023; originally announced January 2024.

Comments: 13 pages, 18 figures

arXiv:2401.02670 [pdf]

A full-time scale energy management and battery size optimization for off-grid renewable power to hydrogen systems: A battery energy storage-based grid-forming case in Inner Mongolian

Authors: Jie Zhu

Abstract: Hydrogen plays an important role in the context of global carbon reduction. For an off-grid renewable power to hydrogen system (OReP2HS), a grid-forming (GFM) source is essential to provide frequency and voltage references. Here, we take battery works as a GFM source, and the OReP2HS we focus on is comprised of solar photovoltaic, wind turbines, and alkaline electrolyzers for hydrogen generation.… ▽ More Hydrogen plays an important role in the context of global carbon reduction. For an off-grid renewable power to hydrogen system (OReP2HS), a grid-forming (GFM) source is essential to provide frequency and voltage references. Here, we take battery works as a GFM source, and the OReP2HS we focus on is comprised of solar photovoltaic, wind turbines, and alkaline electrolyzers for hydrogen generation. An elaborative full-time scale energy management strategy (EMS) covers GFM control to system scheduling (from milliseconds to hours) and is proposed to support high-precision production simulations. Loads of electrolysers are designed to track the renewable power closely to partly replace the energy balancing requirement of the battery. A continuous operation strategy during an emergency like a unit drop is also included in the presented EMS. An off-line simulation-based iterative searching algorithm is developed to find the most suitable size of the battery with the minimum levelized cost of hydrogen (LCOH). A practical OReP2H project located in Inner Mongolian, China is taken as a case study. Simulation results demonstrate the feasibility of the proposed method, and the optimized capacity of the battery (3.4MWh) only accounts for 13.6% of the total installed capacity of the sources with the proposed EMS. Sensitivity analysis shows that LOCH rises from 28.829 CYN/kg to 37.814 CYN/kg, with the time-step for fast power regulation of electrolysers changing from 4s to 1 min with a ramp rate of 0.05MW/s. △ Less

Submitted 5 January, 2024; originally announced January 2024.

arXiv:2401.01673 [pdf, other]

Coded Beam Training

Authors: Tianyue Zheng, Jieao Zhu, Qiumo Yu, Yongli Yan, Linglong Dai

Abstract: In extremely large-scale multiple input multiple output (XL-MIMO) systems for future sixth-generation (6G) communications, codebook-based beam training stands out as a promising technology to acquire channel state information (CSI). Despite their effectiveness, when the pilot overhead is limited, existing beam training methods suffer from significant achievable rate degradation for remote users wi… ▽ More In extremely large-scale multiple input multiple output (XL-MIMO) systems for future sixth-generation (6G) communications, codebook-based beam training stands out as a promising technology to acquire channel state information (CSI). Despite their effectiveness, when the pilot overhead is limited, existing beam training methods suffer from significant achievable rate degradation for remote users with low signal-to-noise ratio (SNR). To tackle this challenge, leveraging the error-correcting capability of channel codes, we introduce channel coding theory into hierarchical beam training to extend the coverage area. Specifically, we establish the duality between hierarchical beam training and channel coding, and the proposed coded beam training scheme serves as a general framework. Then, we present two specific implementations exemplified by coded beam training methods based on Hamming codes and convolutional codes, during which the beam encoding and decoding processes are refined respectively to better accommodate the beam training problem. Simulation results have demonstrated that the proposed coded beam training method can enable reliable beam training performance for remote users with low SNR while keeping training overhead low. △ Less

Submitted 6 March, 2024; v1 submitted 3 January, 2024; originally announced January 2024.

Comments: In this paper, we introduce channel coding theory into hierarchical beam training and propose a beam training scheme called coded beam training. By leveraging the error-correcting capability of channel codes, the proposed coded beam training method can enable reliable beam training performance for remote users with low SNR, while keeping training overhead low

arXiv:2401.00194 [pdf, ps, other]

On the Identifiability from Modulo Measurements under DFT Sensing Matrix

Authors: Qi Zhang, Jiang Zhu, Fengzhong Qu, Zheng Zhu, De Wen Soh

Abstract: Unlimited sampling was recently introduced to deal with the clipping or saturation of measurements where a modulo operator is applied before sampling. In this paper, we investigate the identifiability of the model where measurements are acquired under a discrete Fourier transform (DFT) sensing matrix first followed by a modulo operator (modulo-DFT). Firstly, based on the theorems of cyclotomic pol… ▽ More Unlimited sampling was recently introduced to deal with the clipping or saturation of measurements where a modulo operator is applied before sampling. In this paper, we investigate the identifiability of the model where measurements are acquired under a discrete Fourier transform (DFT) sensing matrix first followed by a modulo operator (modulo-DFT). Firstly, based on the theorems of cyclotomic polynomials, we derive a sufficient condition for uniquely identifying the original signal in modulo-DFT. Additionally, for periodic bandlimited signals (PBSs) under unlimited sampling which can be viewed as a special case of modulo-DFT, the necessary and sufficient condition for the unique recovery of the original signal are provided. Moreover, we show that when the oversampling factor exceeds $3(1+1/P)$, PBS is always identifiable from the modulo samples, where $P$ is the number of harmonics including the fundamental component in the positive frequency part. △ Less

Submitted 30 December, 2023; originally announced January 2024.

arXiv:2312.17024 [pdf, other]

Selective Run-Length Encoding

Authors: Xutan Peng, Yi Zhang, Dejia Peng, Jiafa Zhu

Abstract: Run-Length Encoding (RLE) is one of the most fundamental tools in data compression. However, its compression power drops significantly if there lacks consecutive elements in the sequence. In extreme cases, the output of the encoder may require more space than the input (aka size inflation). To alleviate this issue, using combinatorics, we quantify RLE's space savings for a given input distribution… ▽ More Run-Length Encoding (RLE) is one of the most fundamental tools in data compression. However, its compression power drops significantly if there lacks consecutive elements in the sequence. In extreme cases, the output of the encoder may require more space than the input (aka size inflation). To alleviate this issue, using combinatorics, we quantify RLE's space savings for a given input distribution. With this insight, we develop the first algorithm that automatically identifies suitable symbols, then selectively encodes these symbols with RLE while directly storing the others without RLE. Through experiments on real-world datasets of various modalities, we empirically validate that our method, which maintains RLE's efficiency advantage, can effectively mitigate the size inflation dilemma. △ Less

Submitted 28 December, 2023; originally announced December 2023.

Comments: Accepted at DCC 2024

arXiv:2312.15653 [pdf, other]

Index Modulation for Fluid Antenna-Assisted MIMO Communications: System Design and Performance Analysis

Authors: Jing Zhu, Gaojie Chen, Pengyu Gao, Pei Xiao, Zihuai Lin, Atta Quddus

Abstract: In this paper, we propose a transmission mechanism for fluid antennas (FAs) enabled multiple-input multiple-output (MIMO) communication systems based on index modulation (IM), named FA-IM, which incorporates the principle of IM into FAs-assisted MIMO system to improve the spectral efficiency (SE) without increasing the hardware complexity. In FA-IM, the information bits are mapped not only to the… ▽ More In this paper, we propose a transmission mechanism for fluid antennas (FAs) enabled multiple-input multiple-output (MIMO) communication systems based on index modulation (IM), named FA-IM, which incorporates the principle of IM into FAs-assisted MIMO system to improve the spectral efficiency (SE) without increasing the hardware complexity. In FA-IM, the information bits are mapped not only to the modulation symbols, but also the index of FA position patterns. Additionally, the FA position pattern codebook is carefully designed to further enhance the system performance by maximizing the effective channel gains. Then, a low-complexity detector, referred to efficient sparse Bayesian detector, is proposed by exploiting the inherent sparsity of the transmitted FA-IM signal vectors. Finally, a closed-form expression for the upper bound on the average bit error probability (ABEP) is derived under the finite-path and infinite-path channel condition. Simulation results show that the proposed scheme is capable of improving the SE performance compared to the existing FAs-assisted MIMO and the fixed position antennas (FPAs)-assisted MIMO systems while obviating any additional hardware costs. It has also been shown that the proposed scheme outperforms the conventional FA-assisted MIMO scheme in terms of error performance under the same transmission rate. △ Less

Submitted 25 December, 2023; originally announced December 2023.

Comments: 12 pages,9 figures, publish to TWC

arXiv:2312.14473 [pdf, other]

Coordinated Active-Reactive Power Management of ReP2H Systems with Multiple Electrolyzers

Authors: Yangjun Zeng, Buxiang Zhou, Jie Zhu, Jiarong Li, Bosen Yang, Jin Lin, Yiwei Qiu

Abstract: Utility-scale renewable power-to-hydrogen (ReP2H) production typically uses thyristor rectifiers (TRs) to supply power to multiple electrolyzers (ELZs). They exhibit a nonlinear and non-decouplable relation between active and reactive power. The on-off scheduling and load allocation of multiple ELZs simultaneously impact energy conversion efficiency and AC-side active and reactive power flow. Impr… ▽ More Utility-scale renewable power-to-hydrogen (ReP2H) production typically uses thyristor rectifiers (TRs) to supply power to multiple electrolyzers (ELZs). They exhibit a nonlinear and non-decouplable relation between active and reactive power. The on-off scheduling and load allocation of multiple ELZs simultaneously impact energy conversion efficiency and AC-side active and reactive power flow. Improper scheduling may result in excessive reactive power demand, causing voltage violations and increased network losses, compromising safety and economy. To address these challenges, this paper first explores trade-offs between the efficiency and the reactive load of the electrolyzers. Subsequently, we propose a coordinated approach for scheduling the active and reactive power in the ReP2H system. A mixed-integer second-order cone programming (MISOCP) is established to jointly optimize active and reactive power by coordinating the ELZs, renewable energy sources, energy storage (ES), and var compensations. Case studies demonstrate that the proposed method reduces losses by 3.06% in an off-grid ReP2H system while increasing hydrogen production by 5.27% in average. △ Less

Submitted 22 December, 2023; originally announced December 2023.

arXiv:2312.14018 [pdf, ps, other]

Enabling Secure Wireless Communications via Movable Antennas

Authors: Zhenqiao Cheng, Nanxi Li, Jianchi Zhu, Xiaoming She, Chongjun Ouyang, Peng Chen

Abstract: A pioneering secure transmission scheme is proposed, which harnesses movable antennas (MAs) to optimize antenna positions for augmenting the physical layer security. Particularly, an MA-enabled secure wireless system is considered, where a multi-antenna transmitter communicates with a single-antenna receiver in the presence of an eavesdropper. The beamformer and antenna positions at the transmitte… ▽ More A pioneering secure transmission scheme is proposed, which harnesses movable antennas (MAs) to optimize antenna positions for augmenting the physical layer security. Particularly, an MA-enabled secure wireless system is considered, where a multi-antenna transmitter communicates with a single-antenna receiver in the presence of an eavesdropper. The beamformer and antenna positions at the transmitter are jointly optimized under two criteria: power consumption minimization and secrecy rate maximization. For each scenario, a novel suboptimal algorithm was proposed to tackle the resulting nonconvex optimization problem, capitalizing on the approaches of alternating optimization and gradient descent. Numerical results demonstrate that the proposed MA systems significantly improve physical layer security compared to various benchmark schemes relying on conventional fixed-position antennas (FPAs). △ Less

Submitted 21 December, 2023; originally announced December 2023.

Comments: Accepted by IEEE ICASSP 2024

arXiv:2312.11125 [pdf, other]

A Low-Complexity Range Estimation with Adjusted Affine Frequency Division Multiplexing Waveform

Authors: Jiajun Zhu, Yanqun Tang, Xizhang Wei, Haoran Yin, Jinming Du, Zhengpeng Wang, Yuqinng Liu

Abstract: Affine frequency division multiplexing (AFDM) is a recently proposed communication waveform for time-varying channel scenarios. As a chirp-based multicarrier modulation technique it can not only satisfy the needs of multiple scenarios in future mobile communication networks but also achieve good performance in radar sensing by adjusting the built-in parameters, making it a promising air interface… ▽ More Affine frequency division multiplexing (AFDM) is a recently proposed communication waveform for time-varying channel scenarios. As a chirp-based multicarrier modulation technique it can not only satisfy the needs of multiple scenarios in future mobile communication networks but also achieve good performance in radar sensing by adjusting the built-in parameters, making it a promising air interface waveform in integrated sensing and communication (ISAC) applications. In this paper, we investigate an AFDM-based radar system and analyze the radar ambiguity function of AFDM with different built-in parameters, based on which we find an AFDM waveform with the specific parameter c2 owns the near-optimal time-domain ambiguity function. Then a low-complexity algorithm based on matched filtering for high-resolution target range estimation is proposed for this specific AFDM waveform. Through simulation and analysis, the specific AFDM waveform has near-optimal range estimation performance with the proposed low-complexity algorithm while having the same bit error rate (BER) performance as orthogonal time frequency space (OTFS) using simple linear minimum mean square error (LMMSE) equalizer. △ Less

Submitted 29 December, 2023; v1 submitted 18 December, 2023; originally announced December 2023.

Comments: The paper has been submitted to IEEE WCNC 2024 WS-13: Mobile Sensing-Communication-Computation Synergy for 6G Internet of Things

arXiv:2312.10952 [pdf, other]

Soft Alignment of Modality Space for End-to-end Speech Translation

Authors: Yuhao Zhang, Kaiqi Kou, Bei Li, Chen Xu, Chunliang Zhang, Tong Xiao, Jingbo Zhu

Abstract: End-to-end Speech Translation (ST) aims to convert speech into target text within a unified model. The inherent differences between speech and text modalities often impede effective cross-modal and cross-lingual transfer. Existing methods typically employ hard alignment (H-Align) of individual speech and text segments, which can degrade textual representations. To address this, we introduce Soft A… ▽ More End-to-end Speech Translation (ST) aims to convert speech into target text within a unified model. The inherent differences between speech and text modalities often impede effective cross-modal and cross-lingual transfer. Existing methods typically employ hard alignment (H-Align) of individual speech and text segments, which can degrade textual representations. To address this, we introduce Soft Alignment (S-Align), using adversarial training to align the representation spaces of both modalities. S-Align creates a modality-invariant space while preserving individual modality quality. Experiments on three languages from the MuST-C dataset show S-Align outperforms H-Align across multiple tasks and offers translation capabilities on par with specialized translation models. △ Less

Submitted 18 December, 2023; originally announced December 2023.

Comments: Accepted to ICASSP2024

arXiv:2312.06551 [pdf, ps, other]

Successive Bayesian Reconstructor for Channel Estimation in Fluid Antenna Systems

Authors: Zijian Zhang, Jieao Zhu, Linglong Dai, Robert W. Heath Jr

Abstract: Fluid antenna systems (FASs) can reconfigure their antenna locations freely within a spatially continuous space. To keep favorable antenna positions, the channel state information (CSI) acquisition for FASs is essential. While some techniques have been proposed, most existing FAS channel estimators require several channel assumptions, such as slow variation and angular-domain sparsity. When these… ▽ More Fluid antenna systems (FASs) can reconfigure their antenna locations freely within a spatially continuous space. To keep favorable antenna positions, the channel state information (CSI) acquisition for FASs is essential. While some techniques have been proposed, most existing FAS channel estimators require several channel assumptions, such as slow variation and angular-domain sparsity. When these assumptions are not reasonable, the model mismatch may lead to unpredictable performance loss. In this paper, we propose the successive Bayesian reconstructor (S-BAR) as a general solution to estimate FAS channels. Unlike model-based estimators, the proposed S-BAR is prior-aided, which builds the experiential kernel for CSI acquisition. Inspired by Bayesian regression, the key idea of S-BAR is to model the FAS channels as a stochastic process, whose uncertainty can be successively eliminated by kernel-based sampling and regression. In this way, the predictive mean of the regressed stochastic process can be viewed as the maximum a posterior (MAP) estimator of FAS channels. Simulation results verify that, in both model-mismatched and model-matched cases, the proposed S-BAR can achieve higher estimation accuracy than the existing schemes. △ Less

Submitted 17 January, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

Comments: 13 pages, 8 figures. This paper proposes S-BAR as a general solution to estimate FAS channels. Unlike model-based estimators, the proposed S-BAR is prior-aided, which builds the experiential kernel for CSI acquisition. Simulation codes will be provided at: http://oa.ee.tsinghua.edu.cn/dailinglong/publications/publications.html

arXiv:2312.06197 [pdf, other]

MART: Learning Hierarchical Music Audio Representations with Part-Whole Transformer

Authors: Dong Yao, Jieming Zhu, Jiahao Xun, Shengyu Zhang, Zhou Zhao, Liqun Deng, Wenqiao Zhang, Zhenhua Dong, Xin Jiang

Abstract: Recent research in self-supervised contrastive learning of music representations has demonstrated remarkable results across diverse downstream tasks. However, a prevailing trend in existing methods involves representing equally-sized music clips in either waveform or spectrogram formats, often overlooking the intrinsic part-whole hierarchies within music. In our quest to comprehend the bottom-up s… ▽ More Recent research in self-supervised contrastive learning of music representations has demonstrated remarkable results across diverse downstream tasks. However, a prevailing trend in existing methods involves representing equally-sized music clips in either waveform or spectrogram formats, often overlooking the intrinsic part-whole hierarchies within music. In our quest to comprehend the bottom-up structure of music, we introduce MART, a hierarchical music representation learning approach that facilitates feature interactions among cropped music clips while considering their part-whole hierarchies. Specifically, we propose a hierarchical part-whole transformer to capture the structural relationships between music clips in a part-whole hierarchy. Furthermore, a hierarchical contrastive learning objective is crafted to align part-whole music representations at adjacent levels, progressively establishing a multi-hierarchy representation space. The effectiveness of our music representation learning from part-whole hierarchies has been empirically validated across multiple downstream tasks, including music classification and cover song identification. △ Less

Submitted 19 April, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

Comments: Short paper accepted by WWW 2024. This is revised and condensed based on the previous version titled "Music-PAW: Learning Music Representations via Hierarchical Part-whole Interaction and Contrast". For more experimental details and discussions, please refer to the original long paper at arXiv:2312.06197v1

arXiv:2312.03491 [pdf, other]

Schrodinger Bridges Beat Diffusion Models on Text-to-Speech Synthesis

Authors: Zehua Chen, Guande He, Kaiwen Zheng, Xu Tan, Jun Zhu

Abstract: In text-to-speech (TTS) synthesis, diffusion models have achieved promising generation quality. However, because of the pre-defined data-to-noise diffusion process, their prior distribution is restricted to a noisy representation, which provides little information of the generation target. In this work, we present a novel TTS system, Bridge-TTS, making the first attempt to substitute the noisy Gau… ▽ More In text-to-speech (TTS) synthesis, diffusion models have achieved promising generation quality. However, because of the pre-defined data-to-noise diffusion process, their prior distribution is restricted to a noisy representation, which provides little information of the generation target. In this work, we present a novel TTS system, Bridge-TTS, making the first attempt to substitute the noisy Gaussian prior in established diffusion-based TTS methods with a clean and deterministic one, which provides strong structural information of the target. Specifically, we leverage the latent representation obtained from text input as our prior, and build a fully tractable Schrodinger bridge between it and the ground-truth mel-spectrogram, leading to a data-to-data process. Moreover, the tractability and flexibility of our formulation allow us to empirically study the design spaces such as noise schedules, as well as to develop stochastic and deterministic samplers. Experimental results on the LJ-Speech dataset illustrate the effectiveness of our method in terms of both synthesis quality and sampling efficiency, significantly outperforming our diffusion counterpart Grad-TTS in 50-step/1000-step synthesis and strong fast TTS models in few-step scenarios. Project page: https://bridge-tts.github.io/ △ Less

Submitted 6 December, 2023; originally announced December 2023.

arXiv:2312.02937 [pdf, other]

doi 10.2514/6.2024-1167

Synergistic Perception and Control Simplex for Verifiable Safe Vertical Landing

Authors: Ayoosh Bansal, Yang Zhao, James Zhu, Sheng Cheng, Yuliang Gu, Hyung-Jin Yoon, Hunmin Kim, Naira Hovakimyan, Lui Sha

Abstract: Perception, Planning, and Control form the essential components of autonomy in advanced air mobility. This work advances the holistic integration of these components to enhance the performance and robustness of the complete cyber-physical system. We adapt Perception Simplex, a system for verifiable collision avoidance amidst obstacle detection faults, to the vertical landing maneuver for autonomou… ▽ More Perception, Planning, and Control form the essential components of autonomy in advanced air mobility. This work advances the holistic integration of these components to enhance the performance and robustness of the complete cyber-physical system. We adapt Perception Simplex, a system for verifiable collision avoidance amidst obstacle detection faults, to the vertical landing maneuver for autonomous air mobility vehicles. We improve upon this system by replacing static assumptions of control capabilities with dynamic confirmation, i.e., real-time confirmation of control limitations of the system, ensuring reliable fulfillment of safety maneuvers and overrides, without dependence on overly pessimistic assumptions. Parameters defining control system capabilities and limitations, e.g., maximum deceleration, are continuously tracked within the system and used to make safety-critical decisions. We apply these techniques to propose a verifiable collision avoidance solution for autonomous aerial mobility vehicles operating in cluttered and potentially unsafe environments. △ Less

Submitted 5 December, 2023; originally announced December 2023.

Comments: To appear in AIAA SciTech 2024

ACM Class: C.3; C.4; J.7

Journal ref: AIAA SCITECH 2024 Forum, p. 1167

arXiv:2312.01125 [pdf, other]

Design and Performance Analysis of Index Modulation Empowered AFDM System

Authors: Jing Zhu, Qu Luo, Gaojie Chen, Pei Xiao, Lixia Xiao

Abstract: In this letter, we incorporate index modulation (IM) into affine frequency division multiplexing (AFDM), called AFDM-IM, to enhance the bit error rate (BER) and energy efficiency (EE) performance. In this scheme, the information bits are conveyed not only by $M$-ary constellation symbols, but also by the activation of the chirp subcarriers (SCs) indices, which are determined based on the incoming… ▽ More In this letter, we incorporate index modulation (IM) into affine frequency division multiplexing (AFDM), called AFDM-IM, to enhance the bit error rate (BER) and energy efficiency (EE) performance. In this scheme, the information bits are conveyed not only by $M$-ary constellation symbols, but also by the activation of the chirp subcarriers (SCs) indices, which are determined based on the incoming bit streams. Then, two power allocation strategies, namely power reallocation (PR) strategy and power saving (PS) strategy, are proposed to enhance BER and EE performance, respectively. Furthermore, the average bit error probability (ABEP) is theoretically analyzed. Simulation results demonstrate that the proposed AFDM-IM scheme achieves better BER performance than the conventional AFDM scheme. △ Less

Submitted 2 December, 2023; originally announced December 2023.

arXiv:2311.16155 [pdf, other]

Deep Learning-Based Frequency Offset Estimation

Authors: Tao Chen, Shilian Zheng, Jiawei Zhu, Qi Xuan, Xiaoniu Yang

Abstract: In wireless communication systems, the asynchronization of the oscillators in the transmitter and the receiver along with the Doppler shift due to relative movement may lead to the presence of carrier frequency offset (CFO) in the received signals. Estimation of CFO is crucial for subsequent processing such as coherent demodulation. In this brief, we demonstrate the utilization of deep learning fo… ▽ More In wireless communication systems, the asynchronization of the oscillators in the transmitter and the receiver along with the Doppler shift due to relative movement may lead to the presence of carrier frequency offset (CFO) in the received signals. Estimation of CFO is crucial for subsequent processing such as coherent demodulation. In this brief, we demonstrate the utilization of deep learning for CFO estimation by employing a residual network (ResNet) to learn and extract signal features from the raw in-phase (I) and quadrature (Q) components of the signals. We use multiple modulation schemes in the training set to make the trained model adaptable to multiple modulations or even new signals. In comparison to the commonly used traditional CFO estimation methods, our proposed IQ-ResNet method exhibits superior performance across various scenarios including different oversampling ratios, various signal lengths, and different channels △ Less

Submitted 8 November, 2023; originally announced November 2023.

arXiv:2311.08323 [pdf, other]

The taste of IPA: Towards open-vocabulary keyword spotting and forced alignment in any language

Authors: Jian Zhu, Changbing Yang, Farhan Samir, Jahurul Islam

Abstract: In this project, we demonstrate that phoneme-based models for speech processing can achieve strong crosslinguistic generalizability to unseen languages. We curated the IPAPACK, a massively multilingual speech corpora with phonemic transcriptions, encompassing more than 115 languages from diverse language families, selectively checked by linguists. Based on the IPAPACK, we propose CLAP-IPA, a multi… ▽ More In this project, we demonstrate that phoneme-based models for speech processing can achieve strong crosslinguistic generalizability to unseen languages. We curated the IPAPACK, a massively multilingual speech corpora with phonemic transcriptions, encompassing more than 115 languages from diverse language families, selectively checked by linguists. Based on the IPAPACK, we propose CLAP-IPA, a multi-lingual phoneme-speech contrastive embedding model capable of open-vocabulary matching between arbitrary speech signals and phonemic sequences. The proposed model was tested on 95 unseen languages, showing strong generalizability across languages. Temporal alignments between phonemes and speech signals also emerged from contrastive training, enabling zeroshot forced alignment in unseen languages. We further introduced a neural forced aligner IPA-ALIGNER by finetuning CLAP-IPA with the Forward-Sum loss to learn better phone-to-audio alignment. Evaluation results suggest that IPA-ALIGNER can generalize to unseen languages without adaptation. △ Less

Submitted 1 April, 2024; v1 submitted 14 November, 2023; originally announced November 2023.

Comments: NAACL 2024 Main Conference

arXiv:2311.07001 [pdf, other]

Modeling the impact of extreme summer drought on conventional and renewable generation capacity: methods and a case study on the Eastern U.S. power system

Authors: Hang Shuai, Fangxing Li, Jinxiang Zhu, William Jerome Tingen II, Srijib Mukherjee

Abstract: The United States has witnessed a growing prevalence of droughts in recent years, posing significant challenges to water supplies and power generation. The resulting impacts on power systems, including reduced capacity and the potential for power outages, underscore the need for accurate assessment methods to ensure the reliable operation of the nation's energy infrastructure. A critical step is t… ▽ More The United States has witnessed a growing prevalence of droughts in recent years, posing significant challenges to water supplies and power generation. The resulting impacts on power systems, including reduced capacity and the potential for power outages, underscore the need for accurate assessment methods to ensure the reliable operation of the nation's energy infrastructure. A critical step is to evaluate the usable capacity of a regional power system's generation fleet, which is a complex undertaking and requires precise modeling of the effects of hydrological and meteorological conditions on diverse generating technologies. This paper proposes a systematic, analytical approach for assessing the impacts of extreme summer drought events on the available capacity of hydro, thermal, and renewable energy generators. More specifically, the systematic framework provides plant-level capacity derating models for hydroelectric, once-through cooling thermoelectric, recirculating cooling thermoelectric, combustion turbine, solar PV, and wind turbine systems. Application of the proposed impact assessment framework to the 2025 generation fleet of the real-world power system in the PJM and SERC regions yields insightful results. By examining the daily usable capacity of 6,055 at-risk generators throughout the study region, we find that in the event of the recurrence of the 2007 southeastern summer drought in the near future, the usable capacity of all at-risk power plants may experience a substantial decrease compared to a typical summer, falling within the range of 71% to 81%. The sensitivity analysis reveals that the usable capacity would experience a more pronounced decline under more severe drought conditions. The findings of this study offer valuable insights, enabling stakeholders to enhance the resilience of power systems against the potential effects of extreme drought in the future. △ Less

Submitted 12 November, 2023; originally announced November 2023.

Comments: 15 pages, 16 figures

arXiv:2311.04772 [pdf, other]

GCS-ICHNet: Assessment of Intracerebral Hemorrhage Prognosis using Self-Attention with Domain Knowledge Integration

Authors: Xuhao Shan, Xinyang Li, Ruiquan Ge, Shibin Wu, Ahmed Elazab, Jichao Zhu, Lingyan Zhang, Gangyong Jia, Qingying Xiao, Xiang Wan, Changmiao Wang

Abstract: Intracerebral Hemorrhage (ICH) is a severe condition resulting from damaged brain blood vessel ruptures, often leading to complications and fatalities. Timely and accurate prognosis and management are essential due to its high mortality rate. However, conventional methods heavily rely on subjective clinician expertise, which can lead to inaccurate diagnoses and delays in treatment. Artificial inte… ▽ More Intracerebral Hemorrhage (ICH) is a severe condition resulting from damaged brain blood vessel ruptures, often leading to complications and fatalities. Timely and accurate prognosis and management are essential due to its high mortality rate. However, conventional methods heavily rely on subjective clinician expertise, which can lead to inaccurate diagnoses and delays in treatment. Artificial intelligence (AI) models have been explored to assist clinicians, but many prior studies focused on model modification without considering domain knowledge. This paper introduces a novel deep learning algorithm, GCS-ICHNet, which integrates multimodal brain CT image data and the Glasgow Coma Scale (GCS) score to improve ICH prognosis. The algorithm utilizes a transformer-based fusion module for assessment. GCS-ICHNet demonstrates high sensitivity 81.03% and specificity 91.59%, outperforming average clinicians and other state-of-the-art methods. △ Less

Submitted 8 November, 2023; originally announced November 2023.

Comments: 6 pages, 3 figures, 5 tables, published to BIBM 2023

arXiv:2311.04537 [pdf, other]

Deep Learning Assisted Multiuser MIMO Load Modulated Systems for Enhanced Downlink mmWave Communications

Authors: Ercong Yu, Jinle Zhu, Qiang Li, Zilong Liu, Hongyang Chen, Shlomo Shamai, H. Vincent Poor

Abstract: This paper is focused on multiuser load modulation arrays (MU-LMAs) which are attractive due to their low system complexity and reduced cost for millimeter wave (mmWave) multi-input multi-output (MIMO) systems. The existing precoding algorithm for downlink MU-LMA relies on a sub-array structured (SAS) transmitter which may suffer from decreased degrees of freedom and complex system configuration.… ▽ More This paper is focused on multiuser load modulation arrays (MU-LMAs) which are attractive due to their low system complexity and reduced cost for millimeter wave (mmWave) multi-input multi-output (MIMO) systems. The existing precoding algorithm for downlink MU-LMA relies on a sub-array structured (SAS) transmitter which may suffer from decreased degrees of freedom and complex system configuration. Furthermore, a conventional LMA codebook with codewords uniformly distributed on a hypersphere may not be channel-adaptive and may lead to increased signal detection complexity. In this paper, we conceive an MU-LMA system employing a full-array structured (FAS) transmitter and propose two algorithms accordingly. The proposed FAS-based system addresses the SAS structural problems and can support larger numbers of users. For LMA-imposed constant-power downlink precoding, we propose an FAS-based normalized block diagonalization (FAS-NBD) algorithm. However, the forced normalization may result in performance degradation. This degradation, together with the aforementioned codebook design problems, is difficult to solve analytically. This motivates us to propose a Deep Learning-enhanced (FAS-DL-NBD) algorithm for adaptive codebook design and codebook-independent decoding. It is shown that the proposed algorithms are robust to imperfect knowledge of channel state information and yield excellent error performance. Moreover, the FAS-DL-NBD algorithm enables signal detection with low complexity as the number of bits per codeword increases. △ Less

Submitted 8 November, 2023; originally announced November 2023.

Comments: 14 pages, Journal, accepted by IEEE TWC

arXiv:2311.03810 [pdf, other]

Rethinking and Improving Multi-task Learning for End-to-end Speech Translation

Authors: Yuhao Zhang, Chen Xu, Bei Li, Hao Chen, Tong Xiao, Chunliang Zhang, Jingbo Zhu

Abstract: Significant improvements in end-to-end speech translation (ST) have been achieved through the application of multi-task learning. However, the extent to which auxiliary tasks are highly consistent with the ST task, and how much this approach truly helps, have not been thoroughly studied. In this paper, we investigate the consistency between different tasks, considering different times and modules.… ▽ More Significant improvements in end-to-end speech translation (ST) have been achieved through the application of multi-task learning. However, the extent to which auxiliary tasks are highly consistent with the ST task, and how much this approach truly helps, have not been thoroughly studied. In this paper, we investigate the consistency between different tasks, considering different times and modules. We find that the textual encoder primarily facilitates cross-modal conversion, but the presence of noise in speech impedes the consistency between text and speech representations. Furthermore, we propose an improved multi-task learning (IMTL) approach for the ST task, which bridges the modal gap by mitigating the difference in length and representation. We conduct experiments on the MuST-C dataset. The results demonstrate that our method attains state-of-the-art results. Moreover, when additional data is used, we achieve the new SOTA result on MuST-C English to Spanish task with 20.8% of the training time required by the current SOTA method. △ Less

Submitted 7 November, 2023; originally announced November 2023.

Comments: Accepted to EMNLP2023 main conference

arXiv:2310.15901 [pdf, other]

Enhancing Energy Efficiency for Reconfigurable Intelligent Surfaces with Practical Power Models

Authors: Zhiyi Li, Jida Zhang, Jieao Zhu, Shi Jin, Linglong Dai

Abstract: Reconfigurable intelligent surfaces (RISs) are widely considered a promising technology for future wireless communication systems. As an important indicator of RIS-assisted communication systems in green wireless communications, energy efficiency (EE) has recently received intensive research interest as an optimization target. However, most previous works have ignored the different power consumpti… ▽ More Reconfigurable intelligent surfaces (RISs) are widely considered a promising technology for future wireless communication systems. As an important indicator of RIS-assisted communication systems in green wireless communications, energy efficiency (EE) has recently received intensive research interest as an optimization target. However, most previous works have ignored the different power consumption between ON and OFF states of the PIN diodes attached to each RIS element. This oversight results in extensive unnecessary power consumption and reduction of actual EE due to the inaccurate power model. To address this issue, in this paper, we first utilize a practical power model for a RIS-assisted multi-user multiple-input single-output (MU-MISO) communication system, which takes into account the difference in power dissipation caused by ON-OFF states of RIS's PIN diodes. Based on this model, we formulate a more accurate EE optimization problem. However, this problem is non-convex and has mixed-integer properties, which poses a challenge for optimization. To solve the problem, an effective alternating optimization (AO) algorithm framework is utilized to optimize the base station and RIS beamforming precoder separately. To obtain the essential RIS beamforming precoder, we develop two effective methods based on maximum gradient search and SDP relaxation respectively. Theoretical analysis shows the exponential complexity of the original problem has been reduced to polynomial complexity. Simulation results demonstrate that the proposed algorithm outperforms the existing ones, leading to a significant increase in EE across a diverse set of scenarios. △ Less

Submitted 24 October, 2023; originally announced October 2023.

Comments: Reconfigurable intelligent surface is a promising 6G technology. However, RIS power models are inaccurate. In this paper, we construct a practical power model for RIS communication systems with an SDP-relaxation algorithm, achieving optimal energy efficiency

arXiv:2310.12446 [pdf, other]

Can Electromagnetic Information Theory Improve Wireless Systems? A Channel Estimation Example

Authors: Jieao Zhu, Zhongzhichao Wan, Linglong Dai, Tie Jun Cui

Abstract: Electromagnetic information theory (EIT) is an emerging interdisciplinary subject that integrates classical Maxwell electromagnetics and Shannon information theory. The goal of EIT is to uncover the information transmission mechanisms from an electromagnetic (EM) perspective in wireless systems. Existing works on EIT are mainly focused on the analysis of EM channel characteristics, degrees-of-free… ▽ More Electromagnetic information theory (EIT) is an emerging interdisciplinary subject that integrates classical Maxwell electromagnetics and Shannon information theory. The goal of EIT is to uncover the information transmission mechanisms from an electromagnetic (EM) perspective in wireless systems. Existing works on EIT are mainly focused on the analysis of EM channel characteristics, degrees-of-freedom, and system capacity. However, these works do not clarify whether EIT can improve wireless communication systems. To fill in this gap, in this paper, we provide a novel example that EIT can improve the performance of classical minimum mean squared error (MMSE) channel estimators by replacing the channel covariance matrix with an EM correlation function (EMCF). Specifically, by averaging the solutions of Maxwell's equations over a tunable angular distribution, we obtain a spatio-temporal correlation function (STCF) of the EM channel, which we name as the EMCF. Since classical MMSE estimators can exploit prior information contained in the channel covariance matrix, the substitution of EMCF for the covariance matrix introduces EM side information into MMSE estimators. Furthermore, we dynamically tune the EMCF parameters to better fit the channel observations. Simulation results show that the proposed EIT-MMSE channel estimator outperforms traditional MMSE estimators, thus proving that EIT is beneficial to wireless communication systems. △ Less

Submitted 6 February, 2024; v1 submitted 18 October, 2023; originally announced October 2023.

Comments: Electromagnetic information theory (EIT) is an emerging interdisciplinary subject, aiming at providing a unified analytical framework for wireless systems as well as guiding practical system design. This paper answers the question: "Whether can we improve wireless communication systems via EIT"?

arXiv:2310.10089 [pdf, other]

Over-the-Air Federated Learning and Optimization

Authors: Jingyang Zhu, Yuanming Shi, Yong Zhou, Chunxiao Jiang, Wei Chen, Khaled B. Letaief

Abstract: Federated learning (FL), as an emerging distributed machine learning paradigm, allows a mass of edge devices to collaboratively train a global model while preserving privacy. In this tutorial, we focus on FL via over-the-air computation (AirComp), which is proposed to reduce the communication overhead for FL over wireless networks at the cost of compromising in the learning performance due to mode… ▽ More Federated learning (FL), as an emerging distributed machine learning paradigm, allows a mass of edge devices to collaboratively train a global model while preserving privacy. In this tutorial, we focus on FL via over-the-air computation (AirComp), which is proposed to reduce the communication overhead for FL over wireless networks at the cost of compromising in the learning performance due to model aggregation error arising from channel fading and noise. We first provide a comprehensive study on the convergence of AirComp-based FedAvg (AirFedAvg) algorithms under both strongly convex and non-convex settings with constant and diminishing learning rates in the presence of data heterogeneity. Through convergence and asymptotic analysis, we characterize the impact of aggregation error on the convergence bound and provide insights for system design with convergence guarantees. Then we derive convergence rates for AirFedAvg algorithms for strongly convex and non-convex objectives. For different types of local updates that can be transmitted by edge devices (i.e., local model, gradient, and model difference), we reveal that transmitting local model in AirFedAvg may cause divergence in the training procedure. In addition, we consider more practical signal processing schemes to improve the communication efficiency and further extend the convergence analysis to different forms of model aggregation error caused by these signal processing schemes. Extensive simulation results under different settings of objective functions, transmitted local information, and communication schemes verify the theoretical conclusions. △ Less

Submitted 16 October, 2023; originally announced October 2023.

Comments: 31 pages, 11 figures

Showing 1–50 of 229 results for author: Zhu, J