Search | arXiv e-print repository

An Explainable Non-local Network for COVID-19 Diagnosis

Authors: Jingfu Yang, Peng Huang, Jing Hu, Shu Hu, Siwei Lyu, Xin Wang, Jun Guo, Xi Wu

Abstract: The CNN has achieved excellent results in the automatic classification of medical images. In this study, we propose a novel deep residual 3D attention non-local network (NL-RAN) to classify CT images included COVID-19, common pneumonia, and normal to perform rapid and explainable COVID-19 diagnosis. We built a deep residual 3D attention non-local network that could achieve end-to-end training. The… ▽ More The CNN has achieved excellent results in the automatic classification of medical images. In this study, we propose a novel deep residual 3D attention non-local network (NL-RAN) to classify CT images included COVID-19, common pneumonia, and normal to perform rapid and explainable COVID-19 diagnosis. We built a deep residual 3D attention non-local network that could achieve end-to-end training. The network is embedded with a nonlocal module to capture global information, while a 3D attention module is embedded to focus on the details of the lesion so that it can directly analyze the 3D lung CT and output the classification results. The output of the attention module can be used as a heat map to increase the interpretability of the model. 4079 3D CT scans were included in this study. Each scan had a unique label (novel coronavirus pneumonia, common pneumonia, and normal). The CT scans cohort was randomly split into a training set of 3263 scans, a validation set of 408 scans, and a testing set of 408 scans. And compare with existing mainstream classification methods, such as CovNet, CBAM, ResNet, etc. Simultaneously compare the visualization results with visualization methods such as CAM. Model performance was evaluated using the Area Under the ROC Curve(AUC), precision, and F1-score. The NL-RAN achieved the AUC of 0.9903, the precision of 0.9473, and the F1-score of 0.9462, surpass all the classification methods compared. The heat map output by the attention module is also clearer than the heat map output by CAM. Our experimental results indicate that our proposed method performs significantly better than existing methods. In addition, the first attention module outputs a heat map containing detailed outline information to increase the interpretability of the model. Our experiments indicate that the inference of our model is fast. It can provide real-time assistance with diagnosis. △ Less

Submitted 8 August, 2024; originally announced August 2024.

arXiv:2406.16943 [pdf, other]

doi 10.1109/CSCAIoT62585.2024.00005

EarDA: Towards Accurate and Data-Efficient Earable Activity Sensing

Authors: Shengzhe Lyu, Yongliang Chen, Di Duan, Renqi Jia, Weitao Xu

Abstract: In the realm of smart sensing with the Internet of Things, earable devices are empowered with the capability of multi-modality sensing and intelligence of context-aware computing, leading to its wide usage in Human Activity Recognition (HAR). Nonetheless, unlike the movements captured by Inertial Measurement Unit (IMU) sensors placed on the upper or lower body, those motion signals obtained from e… ▽ More In the realm of smart sensing with the Internet of Things, earable devices are empowered with the capability of multi-modality sensing and intelligence of context-aware computing, leading to its wide usage in Human Activity Recognition (HAR). Nonetheless, unlike the movements captured by Inertial Measurement Unit (IMU) sensors placed on the upper or lower body, those motion signals obtained from earable devices show significant changes in amplitudes and patterns, especially in the presence of dynamic and unpredictable head movements, posing a significant challenge for activity classification. In this work, we present EarDA, an adversarial-based domain adaptation system to extract the domain-independent features across different sensor locations. Moreover, while most deep learning methods commonly rely on training with substantial amounts of labeled data to offer good accuracy, the proposed scheme can release the potential usage of publicly available smartphone-based IMU datasets. Furthermore, we explore the feasibility of applying a filter-based data processing method to mitigate the impact of head movement. EarDA, the proposed system, enables more data-efficient and accurate activity sensing. It achieves an accuracy of 88.8% under HAR task, demonstrating a significant 43% improvement over methods without domain adaptation. This clearly showcases its effectiveness in mitigating domain gaps. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: accepted by 2024 IEEE Coupling of Sensing & Computing in AIoT Systems (CSCAIoT)

arXiv:2405.00135 [pdf, other]

Improving Channel Resilience for Task-Oriented Semantic Communications: A Unified Information Bottleneck Approach

Authors: Shuai Lyu, Yao Sun, Linke Guo, Xiaoyong Yuan, Fang Fang, Lan Zhang, Xianbin Wang

Abstract: Task-oriented semantic communications (TSC) enhance radio resource efficiency by transmitting task-relevant semantic information. However, current research often overlooks the inherent semantic distinctions among encoded features. Due to unavoidable channel variations from time and frequency-selective fading, semantically sensitive feature units could be more susceptible to erroneous inference if… ▽ More Task-oriented semantic communications (TSC) enhance radio resource efficiency by transmitting task-relevant semantic information. However, current research often overlooks the inherent semantic distinctions among encoded features. Due to unavoidable channel variations from time and frequency-selective fading, semantically sensitive feature units could be more susceptible to erroneous inference if corrupted by dynamic channels. Therefore, this letter introduces a unified channel-resilient TSC framework via information bottleneck. This framework complements existing TSC approaches by controlling information flow to capture fine-grained feature-level semantic robustness. Experiments on a case study for real-time subchannel allocation validate the framework's effectiveness. △ Less

Submitted 30 April, 2024; originally announced May 2024.

Comments: This work has been submitted to the IEEE Communications Letters

arXiv:2311.06712 [pdf, other]

PuzzleTuning: Explicitly Bridge Pathological and Natural Image with Puzzles

Authors: Tianyi Zhang, Shangqing Lyu, Yanli Lei, Sicheng Chen, Nan Ying, Yufang He, Yu Zhao, Yunlu Feng, Hwee Kuan Lee, Guanglei Zhang

Abstract: Pathological image analysis is a crucial field in computer vision. Due to the annotation scarcity in the pathological field, pre-training with self-supervised learning (SSL) is widely applied to learn on unlabeled images. However, the current SSL-based pathological pre-training: (1) does not explicitly explore the essential focuses of the pathological field, and (2) does not effectively bridge wit… ▽ More Pathological image analysis is a crucial field in computer vision. Due to the annotation scarcity in the pathological field, pre-training with self-supervised learning (SSL) is widely applied to learn on unlabeled images. However, the current SSL-based pathological pre-training: (1) does not explicitly explore the essential focuses of the pathological field, and (2) does not effectively bridge with and thus take advantage of the knowledge from natural images. To explicitly address them, we propose our large-scale PuzzleTuning framework, containing the following innovations. Firstly, we define three task focuses that can effectively bridge knowledge of pathological and natural domain: appearance consistency, spatial consistency, and restoration understanding. Secondly, we devise a novel multiple puzzle restoring task, which explicitly pre-trains the model regarding these focuses. Thirdly, we introduce an explicit prompt-tuning process to incrementally integrate the domain-specific knowledge. It builds a bridge to align the large domain gap between natural and pathological images. Additionally, a curriculum-learning training strategy is designed to regulate task difficulty, making the model adaptive to the puzzle restoring complexity. Experimental results show that our PuzzleTuning framework outperforms the previous state-of-the-art methods in various downstream tasks on multiple datasets. The code, demo, and pre-trained weights are available at https://github.com/sagizty/PuzzleTuning. △ Less

Submitted 22 April, 2024; v1 submitted 11 November, 2023; originally announced November 2023.

Comments: 13 pages, 9 figures, 8 tables

arXiv:2311.05836 [pdf, other]

UMedNeRF: Uncertainty-aware Single View Volumetric Rendering for Medical Neural Radiance Fields

Authors: Jing Hu, Qinrui Fan, Shu Hu, Siwei Lyu, Xi Wu, Xin Wang

Abstract: In the field of clinical medicine, computed tomography (CT) is an effective medical imaging modality for the diagnosis of various pathologies. Compared with X-ray images, CT images can provide more information, including multi-planar slices and three-dimensional structures for clinical diagnosis. However, CT imaging requires patients to be exposed to large doses of ionizing radiation for a long ti… ▽ More In the field of clinical medicine, computed tomography (CT) is an effective medical imaging modality for the diagnosis of various pathologies. Compared with X-ray images, CT images can provide more information, including multi-planar slices and three-dimensional structures for clinical diagnosis. However, CT imaging requires patients to be exposed to large doses of ionizing radiation for a long time, which may cause irreversible physical harm. In this paper, we propose an Uncertainty-aware MedNeRF (UMedNeRF) network based on generated radiation fields. The network can learn a continuous representation of CT projections from 2D X-ray images by obtaining the internal structure and depth information and using adaptive loss weights to ensure the quality of the generated images. Our model is trained on publicly available knee and chest datasets, and we show the results of CT projection rendering with a single X-ray and compare our method with other methods based on generated radiation fields. △ Less

Submitted 1 March, 2024; v1 submitted 9 November, 2023; originally announced November 2023.

arXiv:2310.17902 [pdf]

CPIA Dataset: A Comprehensive Pathological Image Analysis Dataset for Self-supervised Learning Pre-training

Authors: Nan Ying, Yanli Lei, Tianyi Zhang, Shangqing Lyu, Chunhui Li, Sicheng Chen, Zeyu Liu, Yu Zhao, Guanglei Zhang

Abstract: Pathological image analysis is a crucial field in computer-aided diagnosis, where deep learning is widely applied. Transfer learning using pre-trained models initialized on natural images has effectively improved the downstream pathological performance. However, the lack of sophisticated domain-specific pathological initialization hinders their potential. Self-supervised learning (SSL) enables pre… ▽ More Pathological image analysis is a crucial field in computer-aided diagnosis, where deep learning is widely applied. Transfer learning using pre-trained models initialized on natural images has effectively improved the downstream pathological performance. However, the lack of sophisticated domain-specific pathological initialization hinders their potential. Self-supervised learning (SSL) enables pre-training without sample-level labels, which has great potential to overcome the challenge of expensive annotations. Thus, studies focusing on pathological SSL pre-training call for a comprehensive and standardized dataset, similar to the ImageNet in computer vision. This paper presents the comprehensive pathological image analysis (CPIA) dataset, a large-scale SSL pre-training dataset combining 103 open-source datasets with extensive standardization. The CPIA dataset contains 21,427,877 standardized images, covering over 48 organs/tissues and about 100 kinds of diseases, which includes two main data types: whole slide images (WSIs) and characteristic regions of interest (ROIs). A four-scale WSI standardization process is proposed based on the uniform resolution in microns per pixel (MPP), while the ROIs are divided into three scales artificially. This multi-scale dataset is built with the diagnosis habits under the supervision of experienced senior pathologists. The CPIA dataset facilitates a comprehensive pathological understanding and enables pattern discovery explorations. Additionally, to launch the CPIA dataset, several state-of-the-art (SOTA) baselines of SSL pre-training and downstream evaluation are specially conducted. The CPIA dataset along with baselines is available at https://github.com/zhanglab2021/CPIA_Dataset. △ Less

Submitted 27 October, 2023; originally announced October 2023.

arXiv:2307.14491 [pdf, other]

A Unified Framework for Modality-Agnostic Deepfakes Detection

Authors: Cai Yu, Peng Chen, Jiahe Tian, Jin Liu, Jiao Dai, Xi Wang, Yesheng Chai, Shan Jia, Siwei Lyu, Jizhong Han

Abstract: As AI-generated content (AIGC) thrives, deepfakes have expanded from single-modality falsification to cross-modal fake content creation, where either audio or visual components can be manipulated. While using two unimodal detectors can detect audio-visual deepfakes, cross-modal forgery clues could be overlooked. Existing multimodal deepfake detection methods typically establish correspondence betw… ▽ More As AI-generated content (AIGC) thrives, deepfakes have expanded from single-modality falsification to cross-modal fake content creation, where either audio or visual components can be manipulated. While using two unimodal detectors can detect audio-visual deepfakes, cross-modal forgery clues could be overlooked. Existing multimodal deepfake detection methods typically establish correspondence between the audio and visual modalities for binary real/fake classification, and require the co-occurrence of both modalities. However, in real-world multi-modal applications, missing modality scenarios may occur where either modality is unavailable. In such cases, audio-visual detection methods are less practical than two independent unimodal methods. Consequently, the detector can not always obtain the number or type of manipulated modalities beforehand, necessitating a fake-modality-agnostic audio-visual detector. In this work, we introduce a comprehensive framework that is agnostic to fake modalities, which facilitates the identification of multimodal deepfakes and handles situations with missing modalities, regardless of the manipulations embedded in audio, video, or even cross-modal forms. To enhance the modeling of cross-modal forgery clues, we employ audio-visual speech recognition (AVSR) as a preliminary task. This efficiently extracts speech correlations across modalities, a feature challenging for deepfakes to replicate. Additionally, we propose a dual-label detection approach that follows the structure of AVSR to support the independent detection of each modality. Extensive experiments on three audio-visual datasets show that our scheme outperforms state-of-the-art detection methods with promising performance on modality-agnostic audio/video deepfakes. △ Less

Submitted 24 October, 2023; v1 submitted 26 July, 2023; originally announced July 2023.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2305.05813 [pdf, other]

Change Detection Methods for Remote Sensing in the Last Decade: A Comprehensive Review

Authors: Guangliang Cheng, Yunmeng Huang, Xiangtai Li, Shuchang Lyu, Zhaoyang Xu, Qi Zhao, Shiming Xiang

Abstract: Change detection is an essential and widely utilized task in remote sensing that aims to detect and analyze changes occurring in the same geographical area over time, which has broad applications in urban development, agricultural surveys, and land cover monitoring. Detecting changes in remote sensing images is a complex challenge due to various factors, including variations in image quality, nois… ▽ More Change detection is an essential and widely utilized task in remote sensing that aims to detect and analyze changes occurring in the same geographical area over time, which has broad applications in urban development, agricultural surveys, and land cover monitoring. Detecting changes in remote sensing images is a complex challenge due to various factors, including variations in image quality, noise, registration errors, illumination changes, complex landscapes, and spatial heterogeneity. In recent years, deep learning has emerged as a powerful tool for feature extraction and addressing these challenges. Its versatility has resulted in its widespread adoption for numerous image-processing tasks. This paper presents a comprehensive survey of significant advancements in change detection for remote sensing images over the past decade. We first introduce some preliminary knowledge for the change detection task, such as problem definition, datasets, evaluation metrics, and transformer basics, as well as provide a detailed taxonomy of existing algorithms from three different perspectives: algorithm granularity, supervision modes, and learning frameworks in the methodology section. This survey enables readers to gain systematic knowledge of change detection tasks from various angles. We then summarize the state-of-the-art performance on several dominant change detection datasets, providing insights into the strengths and limitations of existing algorithms. Based on our survey, some future research directions for change detection in remote sensing are well identified. This survey paper will shed some light on the community and inspire further research efforts in the change detection task. △ Less

Submitted 9 May, 2023; originally announced May 2023.

Comments: 21 pages, 4 figures, 10 tables

arXiv:2304.13085 [pdf, other]

AI-Synthesized Voice Detection Using Neural Vocoder Artifacts

Authors: Chengzhe Sun, Shan Jia, Shuwei Hou, Siwei Lyu

Abstract: Advancements in AI-synthesized human voices have created a growing threat of impersonation and disinformation, making it crucial to develop methods to detect synthetic human voices. This study proposes a new approach to identifying synthetic human voices by detecting artifacts of vocoders in audio signals. Most DeepFake audio synthesis models use a neural vocoder, a neural network that generates w… ▽ More Advancements in AI-synthesized human voices have created a growing threat of impersonation and disinformation, making it crucial to develop methods to detect synthetic human voices. This study proposes a new approach to identifying synthetic human voices by detecting artifacts of vocoders in audio signals. Most DeepFake audio synthesis models use a neural vocoder, a neural network that generates waveforms from temporal-frequency representations like mel-spectrograms. By identifying neural vocoder processing in audio, we can determine if a sample is synthesized. To detect synthetic human voices, we introduce a multi-task learning framework for a binary-class RawNet2 model that shares the feature extractor with a vocoder identification module. By treating vocoder identification as a pretext task, we constrain the feature extractor to focus on vocoder artifacts and provide discriminative features for the final binary classifier. Our experiments show that the improved RawNet2 model based on vocoder identification achieves high classification performance on the binary task overall. △ Less

Submitted 27 April, 2023; v1 submitted 25 April, 2023; originally announced April 2023.

Comments: Paper accepted in CVPRW 2023. Codes and data can be found at https://github.com/csun22/Synthetic-Voice-Detection-Vocoder-Artifacts. arXiv admin note: substantial text overlap with arXiv:2302.09198

arXiv:2302.09198 [pdf, other]

Exposing AI-Synthesized Human Voices Using Neural Vocoder Artifacts

Authors: Chengzhe Sun, Shan Jia, Shuwei Hou, Ehab AlBadawy, Siwei Lyu

Abstract: The advancements of AI-synthesized human voices have introduced a growing threat of impersonation and disinformation. It is therefore of practical importance to developdetection methods for synthetic human voices. This work proposes a new approach to detect synthetic human voices based on identifying artifacts of neural vocoders in audio signals. A neural vocoder is a specially designed neural net… ▽ More The advancements of AI-synthesized human voices have introduced a growing threat of impersonation and disinformation. It is therefore of practical importance to developdetection methods for synthetic human voices. This work proposes a new approach to detect synthetic human voices based on identifying artifacts of neural vocoders in audio signals. A neural vocoder is a specially designed neural network that synthesizes waveforms from temporal-frequency representations, e.g., mel-spectrograms. The neural vocoder is a core component in most DeepFake audio synthesis models. Hence the identification of neural vocoder processing implies that an audio sample may have been synthesized. To take advantage of the vocoder artifacts for synthetic human voice detection, we introduce a multi-task learning framework for a binary-class RawNet2 model that shares the front-end feature extractor with a vocoder identification module. We treat the vocoder identification as a pretext task to constrain the front-end feature extractor to focus on vocoder artifacts and provide discriminative features for the final binary classifier. Our experiments show that the improved RawNet2 model based on vocoder identification achieves an overall high classification performance on the binary task. △ Less

Submitted 27 April, 2023; v1 submitted 17 February, 2023; originally announced February 2023.

Comments: Dataset and codes will be available at https://github.com/csun22/LibriVoc-Dataset

arXiv:2112.13513 [pdf]

MSHT: Multi-stage Hybrid Transformer for the ROSE Image Analysis of Pancreatic Cancer

Authors: Tianyi Zhang, Yunlu Feng, Yu Zhao, Guangda Fan, Aiming Yang, Shangqin Lyu, Peng Zhang, Fan Song, Chenbin Ma, Yangyang Sun, Youdan Feng, Guanglei Zhang

Abstract: Pancreatic cancer is one of the most malignant cancers in the world, which deteriorates rapidly with very high mortality. The rapid on-site evaluation (ROSE) technique innovates the workflow by immediately analyzing the fast stained cytopathological images with on-site pathologists, which enables faster diagnosis in this time-pressured process. However, the wider expansion of ROSE diagnosis has be… ▽ More Pancreatic cancer is one of the most malignant cancers in the world, which deteriorates rapidly with very high mortality. The rapid on-site evaluation (ROSE) technique innovates the workflow by immediately analyzing the fast stained cytopathological images with on-site pathologists, which enables faster diagnosis in this time-pressured process. However, the wider expansion of ROSE diagnosis has been hindered by the lack of experienced pathologists. To overcome this problem, we propose a hybrid high-performance deep learning model to enable the automated workflow, thus freeing the occupation of the valuable time of pathologists. By firstly introducing the Transformer block into this field with our particular multi-stage hybrid design, the spatial features generated by the convolutional neural network (CNN) significantly enhance the Transformer global modeling. Turning multi-stage spatial features as global attention guidance, this design combines the robustness from the inductive bias of CNN with the sophisticated global modeling power of Transformer. A dataset of 4240 ROSE images is collected to evaluate the method in this unexplored field. The proposed multi-stage hybrid Transformer (MSHT) achieves 95.68% in classification accuracy, which is distinctively higher than the state-of-the-art models. Facing the need for interpretability, MSHT outperforms its counterparts with more accurate attention regions. The results demonstrate that the MSHT can distinguish cancer samples accurately at an unprecedented image scale, laying the foundation for deploying automatic decision systems and enabling the expansion of ROSE in clinical practice. The code and records are available at: https://github.com/sagizty/Multi-Stage-Hybrid-Transformer. △ Less

Submitted 27 December, 2021; originally announced December 2021.

Comments: 12 pages, 10 figures

arXiv:2112.07415 [pdf, ps, other]

Stochastic Planner-Actor-Critic for Unsupervised Deformable Image Registration

Authors: Ziwei Luo, Jing Hu, Xin Wang, Shu Hu, Bin Kong, Youbing Yin, Qi Song, Xi Wu, Siwei Lyu

Abstract: Large deformations of organs, caused by diverse shapes and nonlinear shape changes, pose a significant challenge for medical image registration. Traditional registration methods need to iteratively optimize an objective function via a specific deformation model along with meticulous parameter tuning, but which have limited capabilities in registering images with large deformations. While deep lear… ▽ More Large deformations of organs, caused by diverse shapes and nonlinear shape changes, pose a significant challenge for medical image registration. Traditional registration methods need to iteratively optimize an objective function via a specific deformation model along with meticulous parameter tuning, but which have limited capabilities in registering images with large deformations. While deep learning-based methods can learn the complex mapping from input images to their respective deformation field, it is regression-based and is prone to be stuck at local minima, particularly when large deformations are involved. To this end, we present Stochastic Planner-Actor-Critic (SPAC), a novel reinforcement learning-based framework that performs step-wise registration. The key notion is warping a moving image successively by each time step to finally align to a fixed image. Considering that it is challenging to handle high dimensional continuous action and state spaces in the conventional reinforcement learning (RL) framework, we introduce a new concept `Plan' to the standard Actor-Critic model, which is of low dimension and can facilitate the actor to generate a tractable high dimensional action. The entire framework is based on unsupervised training and operates in an end-to-end manner. We evaluate our method on several 2D and 3D medical image datasets, some of which contain large deformations. Our empirical results highlight that our work achieves consistent, significant gains and outperforms state-of-the-art methods. △ Less

Submitted 30 April, 2022; v1 submitted 14 December, 2021; originally announced December 2021.

Comments: Accepted by AAAI 2022

arXiv:2112.03099 [pdf, other]

VocBench: A Neural Vocoder Benchmark for Speech Synthesis

Authors: Ehab A. AlBadawy, Andrew Gibiansky, Qing He, Jilong Wu, Ming-Ching Chang, Siwei Lyu

Abstract: Neural vocoders, used for converting the spectral representations of an audio signal to the waveforms, are a commonly used component in speech synthesis pipelines. It focuses on synthesizing waveforms from low-dimensional representation, such as Mel-Spectrograms. In recent years, different approaches have been introduced to develop such vocoders. However, it becomes more challenging to assess thes… ▽ More Neural vocoders, used for converting the spectral representations of an audio signal to the waveforms, are a commonly used component in speech synthesis pipelines. It focuses on synthesizing waveforms from low-dimensional representation, such as Mel-Spectrograms. In recent years, different approaches have been introduced to develop such vocoders. However, it becomes more challenging to assess these new vocoders and compare their performance to previous ones. To address this problem, we present VocBench, a framework that benchmark the performance of state-of-the art neural vocoders. VocBench uses a systematic study to evaluate different neural vocoders in a shared environment that enables a fair comparison between them. In our experiments, we use the same setup for datasets, training pipeline, and evaluation metrics for all neural vocoders. We perform a subjective and objective evaluation to compare the performance of each vocoder along a different axis. Our results demonstrate that the framework is capable of showing the competitive efficacy and the quality of the synthesized samples for each vocoder. VocBench framework is available at https://github.com/facebookresearch/vocoder-benchmark. △ Less

Submitted 6 December, 2021; originally announced December 2021.

Comments: To appear in icassp 2022

arXiv:2109.06638 [pdf, other]

Learnable Discrete Wavelet Pooling (LDW-Pooling) For Convolutional Networks

Authors: Bor-Shiun Wang, Jun-Wei Hsieh, Ming-Ching Chang, Ping-Yang Chen, Lipeng Ke, Siwei Lyu

Abstract: Pooling is a simple but essential layer in modern deep CNN architectures for feature aggregation and extraction. Typical CNN design focuses on the conv layers and activation functions, while leaving the pooling layers with fewer options. We introduce the Learning Discrete Wavelet Pooling (LDW-Pooling) that can be applied universally to replace standard pooling operations to better extract features… ▽ More Pooling is a simple but essential layer in modern deep CNN architectures for feature aggregation and extraction. Typical CNN design focuses on the conv layers and activation functions, while leaving the pooling layers with fewer options. We introduce the Learning Discrete Wavelet Pooling (LDW-Pooling) that can be applied universally to replace standard pooling operations to better extract features with improved accuracy and efficiency. Motivated from the wavelet theory, we adopt the low-pass (L) and high-pass (H) filters horizontally and vertically for pooling on a 2D feature map. Feature signals are decomposed into four (LL, LH, HL, HH) subbands to retain features better and avoid information dropping. The wavelet transform ensures features after pooling can be fully preserved and recovered. We next adopt an energy-based attention learning to fine-select crucial and representative features. LDW-Pooling is effective and efficient when compared with other state-of-the-art pooling techniques such as WaveletPooling and LiftPooling. Extensive experimental validation shows that LDW-Pooling can be applied to a wide range of standard CNN architectures and consistently outperform standard (max, mean, mixed, and stochastic) pooling operations. △ Less

Submitted 20 October, 2021; v1 submitted 13 September, 2021; originally announced September 2021.

Comments: Accepted by BMVC 2021

arXiv:2002.02909 [pdf, other]

Domain Embedded Multi-model Generative Adversarial Networks for Image-based Face Inpainting

Authors: Xian Zhang, Xin Wang, Bin Kong, Canghong Shi, Youbing Yin, Qi Song, Siwei Lyu, Jiancheng Lv, Canghong Shi, Xiaojie Li

Abstract: Prior knowledge of face shape and structure plays an important role in face inpainting. However, traditional face inpainting methods mainly focus on the generated image resolution of the missing portion without consideration of the special particularities of the human face explicitly and generally produce discordant facial parts. To solve this problem, we present a domain embedded multi-model gene… ▽ More Prior knowledge of face shape and structure plays an important role in face inpainting. However, traditional face inpainting methods mainly focus on the generated image resolution of the missing portion without consideration of the special particularities of the human face explicitly and generally produce discordant facial parts. To solve this problem, we present a domain embedded multi-model generative adversarial model for inpainting of face images with large cropped regions. We firstly represent only face regions using the latent variable as the domain knowledge and combine it with the non-face parts textures to generate high-quality face images with plausible contents. Two adversarial discriminators are finally used to judge whether the generated distribution is close to the real distribution or not. It can not only synthesize novel image structures but also explicitly utilize the embedded face domain knowledge to generate better predictions with consistency on structures and appearance. Experiments on both CelebA and CelebA-HQ face datasets demonstrate that our proposed approach achieved state-of-the-art performance and generates higher quality inpainting results than existing ones. △ Less

Submitted 20 June, 2020; v1 submitted 5 February, 2020; originally announced February 2020.

arXiv:2001.05763 [pdf, ps, other]

GMD-Based Hybrid Beamforming for Large Reconfigurable Intelligent Surface Assisted Millimeter-Wave Massive MIMO

Authors: Keke Ying, Zhen Gao, Shanxiang Lyu, Yongpeng Wu, Hua Wang, Mohamed-Slim Alouini

Abstract: Reconfigurable intelligent surface (RIS) is considered to be an energy-efficient approach to reshape the wireless environment for improved throughput. Its passive feature greatly reduces the energy consumption, which makes RIS a promising technique for enabling the future smart city. Existing beamforming designs for RIS mainly focus on optimizing the spectral efficiency for single carrier systems.… ▽ More Reconfigurable intelligent surface (RIS) is considered to be an energy-efficient approach to reshape the wireless environment for improved throughput. Its passive feature greatly reduces the energy consumption, which makes RIS a promising technique for enabling the future smart city. Existing beamforming designs for RIS mainly focus on optimizing the spectral efficiency for single carrier systems. To avoid the complicated bit allocation on different spatial domain subchannels in MIMO systems, in this paper, we propose a geometric mean decomposition-based beamforming for RIS-assisted millimeter wave (mmWave) hybrid MIMO systems so that multiple parallel data streams in the spatial domain can be considered to have the same channel gain. Specifically, by exploiting the common angular-domain sparsity of mmWave massive MIMO channels over different subcarriers, a simultaneous orthogonal match pursuit algorithm is utilized to obtain the optimal multiple beams from an oversampling 2D-DFT codebook. Moreover, by only leveraging the angle of arrival and angle of departure associated with the line of sight (LoS) channels, we further design the phase shifters for RIS by maximizing the array gain for LoS channel. Simulation results show that the proposed scheme can achieve better BER performance than conventional approaches. Our work is an initial attempt to discuss the broadband hybrid beamforming for RIS-assisted mmWave hybrid MIMO systems. △ Less

Submitted 16 January, 2020; v1 submitted 16 January, 2020; originally announced January 2020.

Comments: 8 pages, 6 figures, accepted by IEEE Access.This is an initial attempt to discuss the broadband hybrid beamforming for RIS-assisted mmWave hybrid MIMO systems

arXiv:1912.06278 [pdf, ps, other]

doi 10.1109/TSP.2019.2959194

On Low-complexity Lattice Reduction Algorithms for Large-scale MIMO Detection: the Blessing of Sequential Reduction

Authors: Shanxiang Lyu, Jinming Wen, Jian Weng, Cong Ling

Abstract: Lattice reduction is a popular preprocessing strategy in multiple-input multiple-output (MIMO) detection. In a quest for developing a low-complexity reduction algorithm for large-scale problems, this paper investigates a new framework called sequential reduction (SR), which aims to reduce the lengths of all basis vectors. The performance upper bounds of the strongest reduction in SR are given when… ▽ More Lattice reduction is a popular preprocessing strategy in multiple-input multiple-output (MIMO) detection. In a quest for developing a low-complexity reduction algorithm for large-scale problems, this paper investigates a new framework called sequential reduction (SR), which aims to reduce the lengths of all basis vectors. The performance upper bounds of the strongest reduction in SR are given when the lattice dimension is no larger than 4. The proposed new framework enables the implementation of a hash-based low-complexity lattice reduction algorithm, which becomes especially tempting when applied to large-scale MIMO detection. Simulation results show that, compared to other reduction algorithms, the hash-based SR algorithm exhibits the lowest complexity while maintaining comparable error performance. △ Less

Submitted 12 December, 2019; originally announced December 2019.

Comments: Sequential reduction is not in the LLL family, but is generalizing greedy reduction (Nguyen and Stehle) and element-based reduction (Zhou and Ma)

Journal ref: IEEE Transactions on Signal Processing, Online ISSN: 1941-0476

arXiv:1909.12962 [pdf, other]

Celeb-DF: A Large-scale Challenging Dataset for DeepFake Forensics

Authors: Yuezun Li, Xin Yang, Pu Sun, Honggang Qi, Siwei Lyu

Abstract: AI-synthesized face-swapping videos, commonly known as DeepFakes, is an emerging problem threatening the trustworthiness of online information. The need to develop and evaluate DeepFake detection algorithms calls for large-scale datasets. However, current DeepFake datasets suffer from low visual quality and do not resemble DeepFake videos circulated on the Internet. We present a new large-scale ch… ▽ More AI-synthesized face-swapping videos, commonly known as DeepFakes, is an emerging problem threatening the trustworthiness of online information. The need to develop and evaluate DeepFake detection algorithms calls for large-scale datasets. However, current DeepFake datasets suffer from low visual quality and do not resemble DeepFake videos circulated on the Internet. We present a new large-scale challenging DeepFake video dataset, Celeb-DF, which contains 5,639 high-quality DeepFake videos of celebrities generated using improved synthesis process. We conduct a comprehensive evaluation of DeepFake detection methods and datasets to demonstrate the escalated level of challenges posed by Celeb-DF. △ Less

Submitted 16 March, 2020; v1 submitted 27 September, 2019; originally announced September 2019.

arXiv:1603.05557 [pdf, ps, other]

doi 10.1109/TAC.2019.2922450

Dynamic Modularity Approach to Adaptive Inner/Outer Loop Control of Robotic Systems

Authors: Hanlei Wang, Wei Ren, Chien Chern Cheah, Yongchun Xie, Shangke Lyu

Abstract: Modern applications of robotics typically involve a robot control system with an inner PI (proportional-integral) or PID (proportional-integral-derivative) control loop and an outer user-specified control loop. The existing outer loop controllers, however, do not take into consideration the dynamic effects of robots and their effectiveness relies on the ad hoc assumption that the inner PI or PID c… ▽ More Modern applications of robotics typically involve a robot control system with an inner PI (proportional-integral) or PID (proportional-integral-derivative) control loop and an outer user-specified control loop. The existing outer loop controllers, however, do not take into consideration the dynamic effects of robots and their effectiveness relies on the ad hoc assumption that the inner PI or PID control loop is fast enough, and other torque-based control algorithms cannot be implemented in robotics with closed architecture. This paper investigates the adaptive control of robotic systems with an inner/outer loop structure, taking into full account the effects of the dynamics and the system uncertainties, and both the task-space control and joint-space control are considered. We propose a dynamic modularity approach to resolve this issue, and a class of adaptive outer loop control schemes is proposed and their role is to dynamically generate the joint velocity (or position) command for the low-level joint servoing loop. Without relying on the ad hoc assumption that the joint servoing is fast enough or the modification of the low-level joint controller structure, we rigorously show that the proposed outer loop controllers can ensure the stability and convergence of the closed-loop system. We also propose the outer loop versions of several standard joint-space direct/composite adaptive controllers for rigid or flexible-joint robots, and a promising conclusion may be that most torque-based adaptive controllers for robots can be designed to fit the inner/outer loop structure by using the new definition of the joint velocity (or position) command. Simulation results are provided to show the performance of various adaptive outer loop controllers, using a three-DOF (degree-of-freedom) manipulator, and experiment results using the UR10 robotic system are also presented. △ Less

Submitted 7 January, 2017; v1 submitted 17 March, 2016; originally announced March 2016.

Comments: This version is mainly for including the experimental results

Journal ref: IEEE Transactions on Automatic Control, 65(6): 2760-2767, 2020

Showing 1–19 of 19 results for author: Lyu, S