Search | arXiv e-print repository

Cardiovascular Disease Detection from Multi-View Chest X-rays with BI-Mamba

Authors: Zefan Yang, Jiajin Zhang, Ge Wang, Mannudeep K. Kalra, Pingkun Yan

Abstract: Accurate prediction of Cardiovascular disease (CVD) risk in medical imaging is central to effective patient health management. Previous studies have demonstrated that imaging features in computed tomography (CT) can help predict CVD risk. However, CT entails notable radiation exposure, which may result in adverse health effects for patients. In contrast, chest X-ray emits significantly lower level… ▽ More Accurate prediction of Cardiovascular disease (CVD) risk in medical imaging is central to effective patient health management. Previous studies have demonstrated that imaging features in computed tomography (CT) can help predict CVD risk. However, CT entails notable radiation exposure, which may result in adverse health effects for patients. In contrast, chest X-ray emits significantly lower levels of radiation, offering a safer option. This rationale motivates our investigation into the feasibility of using chest X-ray for predicting CVD risk. Convolutional Neural Networks (CNNs) and Transformers are two established network architectures for computer-aided diagnosis. However, they struggle to model very high resolution chest X-ray due to the lack of large context modeling power or quadratic time complexity. Inspired by state space sequence models (SSMs), a new class of network architectures with competitive sequence modeling power as Transfomers and linear time complexity, we propose Bidirectional Image Mamba (BI-Mamba) to complement the unidirectional SSMs with opposite directional information. BI-Mamba utilizes parallel forward and backwark blocks to encode longe-range dependencies of multi-view chest X-rays. We conduct extensive experiments on images from 10,395 subjects in National Lung Screening Trail (NLST). Results show that BI-Mamba outperforms ResNet-50 and ViT-S with comparable parameter size, and saves significant amount of GPU memory during training. Besides, BI-Mamba achieves promising performance compared with previous state of the art in CT, unraveling the potential of chest X-ray for CVD risk prediction. △ Less

Submitted 28 May, 2024; originally announced May 2024.

Comments: Early accepted paper for MICCAI 2024

arXiv:2403.00274 [pdf, other]

CustomListener: Text-guided Responsive Interaction for User-friendly Listening Head Generation

Authors: Xi Liu, Ying Guo, Cheng Zhen, Tong Li, Yingying Ao, Pengfei Yan

Abstract: Listening head generation aims to synthesize a non-verbal responsive listener head by modeling the correlation between the speaker and the listener in dynamic conversion.The applications of listener agent generation in virtual interaction have promoted many works achieving the diverse and fine-grained motion generation. However, they can only manipulate motions through simple emotional labels, but… ▽ More Listening head generation aims to synthesize a non-verbal responsive listener head by modeling the correlation between the speaker and the listener in dynamic conversion.The applications of listener agent generation in virtual interaction have promoted many works achieving the diverse and fine-grained motion generation. However, they can only manipulate motions through simple emotional labels, but cannot freely control the listener's motions. Since listener agents should have human-like attributes (e.g. identity, personality) which can be freely customized by users, this limits their realism. In this paper, we propose a user-friendly framework called CustomListener to realize the free-form text prior guided listener generation. To achieve speaker-listener coordination, we design a Static to Dynamic Portrait module (SDP), which interacts with speaker information to transform static text into dynamic portrait token with completion rhythm and amplitude information. To achieve coherence between segments, we design a Past Guided Generation Module (PGG) to maintain the consistency of customized listener attributes through the motion prior, and utilize a diffusion-based structure conditioned on the portrait token and the motion prior to realize the controllable generation. To train and evaluate our model, we have constructed two text-annotated listening head datasets based on ViCo and RealTalk, which provide text-video paired labels. Extensive experiments have verified the effectiveness of our model. △ Less

Submitted 29 March, 2024; v1 submitted 29 February, 2024; originally announced March 2024.

Comments: Accepted by CVPR 2024

arXiv:2312.06462 [pdf, other]

Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-Visual Segmentation

Authors: Qi Yang, Xing Nie, Tong Li, Pengfei Gao, Ying Guo, Cheng Zhen, Pengfei Yan, Shiming Xiang

Abstract: Recently, an audio-visual segmentation (AVS) task has been introduced, aiming to group pixels with sounding objects within a given video. This task necessitates a first-ever audio-driven pixel-level understanding of the scene, posing significant challenges. In this paper, we propose an innovative audio-visual transformer framework, termed COMBO, an acronym for COoperation of Multi-order Bilateral… ▽ More Recently, an audio-visual segmentation (AVS) task has been introduced, aiming to group pixels with sounding objects within a given video. This task necessitates a first-ever audio-driven pixel-level understanding of the scene, posing significant challenges. In this paper, we propose an innovative audio-visual transformer framework, termed COMBO, an acronym for COoperation of Multi-order Bilateral relatiOns. For the first time, our framework explores three types of bilateral entanglements within AVS: pixel entanglement, modality entanglement, and temporal entanglement. Regarding pixel entanglement, we employ a Siam-Encoder Module (SEM) that leverages prior knowledge to generate more precise visual features from the foundational model. For modality entanglement, we design a Bilateral-Fusion Module (BFM), enabling COMBO to align corresponding visual and auditory signals bi-directionally. As for temporal entanglement, we introduce an innovative adaptive inter-frame consistency loss according to the inherent rules of temporal. Comprehensive experiments and ablation studies on AVSBench-object (84.7 mIoU on S4, 59.2 mIou on MS3) and AVSBench-semantic (42.1 mIoU on AVSS) datasets demonstrate that COMBO surpasses previous state-of-the-art methods. Code and more results will be publicly available at https://yannqi.github.io/AVS-COMBO/. △ Less

Submitted 7 April, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

Comments: CVPR 2024 Highlight. 13 pages, 10 figures

arXiv:2311.03679 [pdf, other]

Unsupervised convolutional neural network fusion approach for change detection in remote sensing images

Authors: Weidong Yan, Pei Yan, Li Cao

Abstract: With the rapid development of deep learning, a variety of change detection methods based on deep learning have emerged in recent years. However, these methods usually require a large number of training samples to train the network model, so it is very expensive. In this paper, we introduce a completely unsupervised shallow convolutional neural network (USCNN) fusion approach for change detection.… ▽ More With the rapid development of deep learning, a variety of change detection methods based on deep learning have emerged in recent years. However, these methods usually require a large number of training samples to train the network model, so it is very expensive. In this paper, we introduce a completely unsupervised shallow convolutional neural network (USCNN) fusion approach for change detection. Firstly, the bi-temporal images are transformed into different feature spaces by using convolution kernels of different sizes to extract multi-scale information of the images. Secondly, the output features of bi-temporal images at the same convolution kernels are subtracted to obtain the corresponding difference images, and the difference feature images at the same scale are fused into one feature image by using 1 * 1 convolution layer. Finally, the output features of different scales are concatenated and a 1 * 1 convolution layer is used to fuse the multi-scale information of the image. The model parameters are obtained by a redesigned sparse function. Our model has three features: the entire training process is conducted in an unsupervised manner, the network architecture is shallow, and the objective function is sparse. Thus, it can be seen as a kind of lightweight network model. Experimental results on four real remote sensing datasets indicate the feasibility and effectiveness of the proposed approach. △ Less

Submitted 6 November, 2023; originally announced November 2023.

arXiv:2309.01207 [pdf, other]

Spectral Adversarial MixUp for Few-Shot Unsupervised Domain Adaptation

Authors: Jiajin Zhang, Hanqing Chao, Amit Dhurandhar, Pin-Yu Chen, Ali Tajer, Yangyang Xu, Pingkun Yan

Abstract: Domain shift is a common problem in clinical applications, where the training images (source domain) and the test images (target domain) are under different distributions. Unsupervised Domain Adaptation (UDA) techniques have been proposed to adapt models trained in the source domain to the target domain. However, those methods require a large number of images from the target domain for model train… ▽ More Domain shift is a common problem in clinical applications, where the training images (source domain) and the test images (target domain) are under different distributions. Unsupervised Domain Adaptation (UDA) techniques have been proposed to adapt models trained in the source domain to the target domain. However, those methods require a large number of images from the target domain for model training. In this paper, we propose a novel method for Few-Shot Unsupervised Domain Adaptation (FSUDA), where only a limited number of unlabeled target domain samples are available for training. To accomplish this challenging task, first, a spectral sensitivity map is introduced to characterize the generalization weaknesses of models in the frequency domain. We then developed a Sensitivity-guided Spectral Adversarial MixUp (SAMix) method to generate target-style images to effectively suppresses the model sensitivity, which leads to improved model generalizability in the target domain. We demonstrated the proposed method and rigorously evaluated its performance on multiple tasks using several public datasets. △ Less

Submitted 3 September, 2023; originally announced September 2023.

Comments: Accepted by MICCAI 2023

arXiv:2307.14634 [pdf, other]

Fact-Checking of AI-Generated Reports

Authors: Razi Mahmood, Ge Wang, Mannudeep Kalra, Pingkun Yan

Abstract: With advances in generative artificial intelligence (AI), it is now possible to produce realistic-looking automated reports for preliminary reads of radiology images. This can expedite clinical workflows, improve accuracy and reduce overall costs. However, it is also well-known that such models often hallucinate, leading to false findings in the generated reports. In this paper, we propose a new m… ▽ More With advances in generative artificial intelligence (AI), it is now possible to produce realistic-looking automated reports for preliminary reads of radiology images. This can expedite clinical workflows, improve accuracy and reduce overall costs. However, it is also well-known that such models often hallucinate, leading to false findings in the generated reports. In this paper, we propose a new method of fact-checking of AI-generated reports using their associated images. Specifically, the developed examiner differentiates real and fake sentences in reports by learning the association between an image and sentences describing real or potentially fake findings. To train such an examiner, we first created a new dataset of fake reports by perturbing the findings in the original ground truth radiology reports associated with images. Text encodings of real and fake sentences drawn from these reports are then paired with image encodings to learn the mapping to real/fake labels. The utility of such an examiner is demonstrated for verifying automatically generated reports by detecting and removing fake sentences. Future generative AI approaches can use the resulting tool to validate their reports leading to a more responsible use of AI in expediting clinical workflows. △ Less

Submitted 27 July, 2023; originally announced July 2023.

Comments: 10 pages, 3 figures, 3 tables

arXiv:2304.12513 [pdf, other]

A fast and flexible algorithm for microstructure reconstruction combining simulated annealing and deep learning

Authors: Zhenchuan Ma, Xiaohai He, Pengcheng Yan, Fan Zhang, Qizhi Teng

Abstract: The microstructure analyses of porous media have considerable research value for the study of macroscopic properties. As the premise of conducting these analyses, the accurate reconstruction of microstructure digital model is also an important component of the research. Computational reconstruction algorithms of microstructure have attracted much attention due to their low cost and excellent perfo… ▽ More The microstructure analyses of porous media have considerable research value for the study of macroscopic properties. As the premise of conducting these analyses, the accurate reconstruction of microstructure digital model is also an important component of the research. Computational reconstruction algorithms of microstructure have attracted much attention due to their low cost and excellent performance. However, it is still a challenge for computational reconstruction algorithms to achieve faster and more efficient reconstruction. The bottleneck lies in computational reconstruction algorithms, they are either too slow (traditional reconstruction algorithms) or not flexible to the training process (deep learning reconstruction algorithms). To address these limitations, we proposed a fast and flexible computational reconstruction algorithm, neural networks based on improved simulated annealing framework (ISAF-NN). The proposed algorithm is flexible and can complete training and reconstruction in a short time with only one two-dimensional image. By adjusting the size of input, it can also achieve reconstruction of arbitrary size. Finally, the proposed algorithm is experimentally performed on a variety of isotropic and anisotropic materials to verify the effectiveness and generalization. △ Less

Submitted 24 April, 2023; originally announced April 2023.

arXiv:2304.02649 [pdf, other]

Specialty-Oriented Generalist Medical AI for Chest CT Screening

Authors: Chuang Niu, Qing Lyu, Christopher D. Carothers, Parisa Kaviani, Josh Tan, Pingkun Yan, Mannudeep K. Kalra, Christopher T. Whitlow, Ge Wang

Abstract: Modern medical records include a vast amount of multimodal free text clinical data and imaging data from radiology, cardiology, and digital pathology. Fully mining such big data requires multitasking; otherwise, occult but important aspects may be overlooked, adversely affecting clinical management and population healthcare. Despite remarkable successes of AI in individual tasks with single-modal… ▽ More Modern medical records include a vast amount of multimodal free text clinical data and imaging data from radiology, cardiology, and digital pathology. Fully mining such big data requires multitasking; otherwise, occult but important aspects may be overlooked, adversely affecting clinical management and population healthcare. Despite remarkable successes of AI in individual tasks with single-modal data, the progress in developing generalist medical AI remains relatively slow to combine multimodal data for multitasks because of the dual challenges of data curation and model architecture. The data challenge involves querying and curating multimodal structured and unstructured text, alphanumeric, and especially 3D tomographic scans on an individual patient level for real-time decisions and on a scale to estimate population health statistics. The model challenge demands a scalable and adaptable network architecture to integrate multimodal datasets for diverse clinical tasks. Here we propose the first-of-its-kind medical multimodal-multitask foundation model (M3FM) with application in lung cancer screening and related tasks. After we curated a comprehensive multimodal multitask dataset consisting of 49 clinical data types including 163,725 chest CT series and 17 medical tasks involved in LCS, we develop a multimodal question-answering framework as a unified training and inference strategy to synergize multimodal information and perform multiple tasks via free-text prompting. M3FM consistently outperforms the state-of-the-art single-modal task-specific models, identifies multimodal data elements informative for clinical tasks and flexibly adapts to new tasks with a small out-of-distribution dataset. As a specialty-oriented generalist medical AI model, M3FM paves the way for similar breakthroughs in other areas of medicine, closing the gap between specialists and the generalist. △ Less

Submitted 24 April, 2024; v1 submitted 3 April, 2023; originally announced April 2023.

arXiv:2208.14598 [pdf]

doi 10.3389/fenrg.2022.985600

Intelligent detect for substation insulator defects based on CenterMask

Authors: Bo Ye, Feng Li, Mingxuan Li, Peipei Yan, Huiting Yang, Lihua Wang

Abstract: With the development of intelligent operation and maintenance of substations, the daily inspection of substations needs to process massive video and image data. This puts forward higher requirements on the processing speed and accuracy of defect detection. Based on the end-to-end learning paradigm, this paper proposes an intelligent detection method for substation insulator defects based on Center… ▽ More With the development of intelligent operation and maintenance of substations, the daily inspection of substations needs to process massive video and image data. This puts forward higher requirements on the processing speed and accuracy of defect detection. Based on the end-to-end learning paradigm, this paper proposes an intelligent detection method for substation insulator defects based on CenterMask. First, the backbone network VoVNet is improved according to the residual connection and eSE module, which effectively solves the problems of deep network saturation and gradient information loss. On this basis, an insulator mask generation method based on a spatial attentiondirected mechanism is proposed. Insulators with complex image backgrounds are accurately segmented. Then, three strategies of pixel-wise regression prediction, multi-scale features and centerness are introduced. The anchor-free single-stage target detector accurately locates the defect points of insulators. Finally, an example analysis is carried out with the substation inspection image of a power supply company in a certain area to verify the effectiveness and robustness of the proposed method. △ Less

Submitted 30 August, 2022; originally announced August 2022.

Comments: 3 figures,1 table

Journal ref: Frontiers in Energy Research 10 (2022) 985600

arXiv:2208.06127 [pdf, other]

An investigation on selecting audio pre-trained models for audio captioning

Authors: Peiran Yan, Shengchen Li

Abstract: Audio captioning is a task that generates description of audio based on content. Pre-trained models are widely used in audio captioning due to high complexity. Unless a comprehensive system is re-trained, it is hard to determine how well pre-trained models contribute to audio captioning system. To prevent the time consuming and energy consuming process of retraining, it is necessary to propose a p… ▽ More Audio captioning is a task that generates description of audio based on content. Pre-trained models are widely used in audio captioning due to high complexity. Unless a comprehensive system is re-trained, it is hard to determine how well pre-trained models contribute to audio captioning system. To prevent the time consuming and energy consuming process of retraining, it is necessary to propose a preditor of performance for the pre-trained model in audio captioning. In this paper, a series of pre-trained models are investigated for the correlation between extracted audio features and the performance of audio captioning. A couple of predictor is proposed based on the experiment results.The result demonstrates that the kurtosis and skewness of audio features extracted may act as an indicator of the performance of audio captioning systems for pre-trained audio due to the high correlation between kurtosis and skewness of audio features and the performance of audio captioning systems. △ Less

Submitted 12 August, 2022; originally announced August 2022.

Comments: 5 pages, 7 figures

arXiv:2207.05231 [pdf, other]

Regression Metric Loss: Learning a Semantic Representation Space for Medical Images

Authors: Hanqing Chao, Jiajin Zhang, Pingkun Yan

Abstract: Regression plays an essential role in many medical imaging applications for estimating various clinical risk or measurement scores. While training strategies and loss functions have been studied for the deep neural networks in medical image classification tasks, options for regression tasks are very limited. One of the key challenges is that the high-dimensional feature representation learned by e… ▽ More Regression plays an essential role in many medical imaging applications for estimating various clinical risk or measurement scores. While training strategies and loss functions have been studied for the deep neural networks in medical image classification tasks, options for regression tasks are very limited. One of the key challenges is that the high-dimensional feature representation learned by existing popular loss functions like Mean Squared Error or L1 loss is hard to interpret. In this paper, we propose a novel Regression Metric Loss (RM-Loss), which endows the representation space with the semantic meaning of the label space by finding a representation manifold that is isometric to the label space. Experiments on two regression tasks, i.e. coronary artery calcium score estimation and bone age assessment, show that RM-Loss is superior to the existing popular regression losses on both performance and interpretability. Code is available at https://github.com/DIAL-RPI/Regression-Metric-Loss. △ Less

Submitted 11 July, 2022; originally announced July 2022.

Comments: Accepted by MICCAI2022

arXiv:2206.07156 [pdf, other]

doi 10.1109/TMI.2023.3270140

Federated Multi-organ Segmentation with Inconsistent Labels

Authors: Xuanang Xu, Hannah H. Deng, Jaime Gateno, Pingkun Yan

Abstract: Federated learning is an emerging paradigm allowing large-scale decentralized learning without sharing data across different data owners, which helps address the concern of data privacy in medical image analysis. However, the requirement for label consistency across clients by the existing methods largely narrows its application scope. In practice, each clinical site may only annotate certain orga… ▽ More Federated learning is an emerging paradigm allowing large-scale decentralized learning without sharing data across different data owners, which helps address the concern of data privacy in medical image analysis. However, the requirement for label consistency across clients by the existing methods largely narrows its application scope. In practice, each clinical site may only annotate certain organs of interest with partial or no overlap with other sites. Incorporating such partially labeled data into a unified federation is an unexplored problem with clinical significance and urgency. This work tackles the challenge by using a novel federated multi-encoding U-Net (Fed-MENU) method for multi-organ segmentation. In our method, a multi-encoding U-Net (MENU-Net) is proposed to extract organ-specific features through different encoding sub-networks. Each sub-network can be seen as an expert of a specific organ and trained for that client. Moreover, to encourage the organ-specific features extracted by different sub-networks to be informative and distinctive, we regularize the training of the MENU-Net by designing an auxiliary generic decoder (AGD). Extensive experiments on six public abdominal CT datasets show that our Fed-MENU method can effectively obtain a federated learning model using the partially labeled datasets with superior performance to other models trained by either localized or centralized learning methods. Source code is publicly available at https://github.com/DIAL-RPI/Fed-MENU. △ Less

Submitted 25 May, 2023; v1 submitted 14 June, 2022; originally announced June 2022.

Comments: v1: 10 pages, 5 figures; v2: 14 pages, 5 figures, accepted by IEEE Transactions on Medical Imaging (TMI), published version available at https://doi.org/10.1109/TMI.2023.3270140, source code available at https://github.com/DIAL-RPI/Fed-MENU

arXiv:2205.08278 [pdf, other]

doi 10.1016/j.cageo.2023.105356

Multiscale reconstruction of porous media based on multiple dictionaries learning

Authors: Pengcheng Yan, Qizhi Teng, Xiaohai He, Zhenchuan Ma, Ningning Zhang

Abstract: Digital modeling of the microstructure is important for studying the physical and transport properties of porous media. Multiscale modeling for porous media can accurately characterize macro-pores and micro-pores in a large-FoV (field of view) high-resolution three-dimensional pore structure model. This paper proposes a multiscale reconstruction algorithm based on multiple dictionaries learning, i… ▽ More Digital modeling of the microstructure is important for studying the physical and transport properties of porous media. Multiscale modeling for porous media can accurately characterize macro-pores and micro-pores in a large-FoV (field of view) high-resolution three-dimensional pore structure model. This paper proposes a multiscale reconstruction algorithm based on multiple dictionaries learning, in which edge patterns and micro-pore patterns from homology high-resolution pore structure are introduced into low-resolution pore structure to build a fine multiscale pore structure model. The qualitative and quantitative comparisons of the experimental results show that the results of multiscale reconstruction are similar to the real high-resolution pore structure in terms of complex pore geometry and pore surface morphology. The geometric, topological and permeability properties of multiscale reconstruction results are almost identical to those of the real high-resolution pore structures. The experiments also demonstrate the proposal algorithm is capable of multiscale reconstruction without regard to the size of the input. This work provides an effective method for fine multiscale modeling of porous media. △ Less

Submitted 16 May, 2022; originally announced May 2022.

arXiv:2204.02450 [pdf, ps, other]

Federated Cross Learning for Medical Image Segmentation

Authors: Xuanang Xu, Hannah H. Deng, Tianyi Chen, Tianshu Kuang, Joshua C. Barber, Daeseung Kim, Jaime Gateno, James J. Xia, Pingkun Yan

Abstract: Federated learning (FL) can collaboratively train deep learning models using isolated patient data owned by different hospitals for various clinical applications, including medical image segmentation. However, a major problem of FL is its performance degradation when dealing with data that are not independently and identically distributed (non-iid), which is often the case in medical images. In th… ▽ More Federated learning (FL) can collaboratively train deep learning models using isolated patient data owned by different hospitals for various clinical applications, including medical image segmentation. However, a major problem of FL is its performance degradation when dealing with data that are not independently and identically distributed (non-iid), which is often the case in medical images. In this paper, we first conduct a theoretical analysis on the FL algorithm to reveal the problem of model aggregation during training on non-iid data. With the insights gained through the analysis, we propose a simple yet effective method, federated cross learning (FedCross), to tackle this challenging problem. Unlike the conventional FL methods that combine multiple individually trained local models on a server node, our FedCross sequentially trains the global model across different clients in a round-robin manner, and thus the entire training procedure does not involve any model aggregation steps. To further improve its performance to be comparable with the centralized learning method, we combine the FedCross with an ensemble learning mechanism to compose a federated cross ensemble learning (FedCrossEns) method. Finally, we conduct extensive experiments using a set of public datasets. The experimental results show that the proposed FedCross training strategy outperforms the mainstream FL methods on non-iid data. In addition to improving the segmentation performance, our FedCrossEns can further provide a quantitative estimation of the model uncertainty, demonstrating the effectiveness and clinical significance of our designs. Source code is publicly available at https://github.com/DIAL-RPI/FedCross. △ Less

Submitted 22 May, 2023; v1 submitted 5 April, 2022; originally announced April 2022.

Comments: v1: 10 pages, 4 figures; v2: 12 pages, 5 figures, accepted by Medical Imaging with Deep Learning (MIDL) 2023 conference, camera-ready version available at https://openreview.net/forum?id=DrZbwobH_zo , source code available at https://github.com/DIAL-RPI/FedCross

arXiv:2203.13118 [pdf, other]

X-ray Dissectography Improves Lung Nodule Detection

Authors: Chuang Niu, Giridhar Dasegowda, Pingkun Yan, Mannudeep K. Kalra, Ge Wang

Abstract: Although radiographs are the most frequently used worldwide due to their cost-effectiveness and widespread accessibility, the structural superposition along the x-ray paths often renders suspicious or concerning lung nodules difficult to detect. In this study, we apply "X-ray dissectography" to dissect lungs digitally from a few radiographic projections, suppress the interference of irrelevant str… ▽ More Although radiographs are the most frequently used worldwide due to their cost-effectiveness and widespread accessibility, the structural superposition along the x-ray paths often renders suspicious or concerning lung nodules difficult to detect. In this study, we apply "X-ray dissectography" to dissect lungs digitally from a few radiographic projections, suppress the interference of irrelevant structures, and improve lung nodule detectability. For this purpose, a collaborative detection network is designed to localize lung nodules in 2D dissected projections and 3D physical space. Our experimental results show that our approach can significantly improve the average precision by 20+% in comparison with the common baseline that detects lung nodules from original projections using a popular detection network. Potentially, this approach could help re-design the current X-ray imaging protocols and workflows and improve the diagnostic performance of chest radiographs in lung diseases. △ Less

Submitted 24 March, 2022; originally announced March 2022.

arXiv:2107.06449 [pdf, other]

End-to-end Ultrasound Frame to Volume Registration

Authors: Hengtao Guo, Xuanang Xu, Sheng Xu, Bradford J. Wood, Pingkun Yan

Abstract: Fusing intra-operative 2D transrectal ultrasound (TRUS) image with pre-operative 3D magnetic resonance (MR) volume to guide prostate biopsy can significantly increase the yield. However, such a multimodal 2D/3D registration problem is a very challenging task. In this paper, we propose an end-to-end frame-to-volume registration network (FVR-Net), which can efficiently bridge the previous research g… ▽ More Fusing intra-operative 2D transrectal ultrasound (TRUS) image with pre-operative 3D magnetic resonance (MR) volume to guide prostate biopsy can significantly increase the yield. However, such a multimodal 2D/3D registration problem is a very challenging task. In this paper, we propose an end-to-end frame-to-volume registration network (FVR-Net), which can efficiently bridge the previous research gaps by aligning a 2D TRUS frame with a 3D TRUS volume without requiring hardware tracking. The proposed FVR-Net utilizes a dual-branch feature extraction module to extract the information from TRUS frame and volume to estimate transformation parameters. We also introduce a differentiable 2D slice sampling module which allows gradients backpropagating from an unsupervised image similarity loss for content correspondence learning. Our model shows superior efficiency for real-time interventional guidance with highly competitive registration accuracy. △ Less

Submitted 13 July, 2021; originally announced July 2021.

Comments: Early accepted by MICCAI-2021

arXiv:2103.13557 [pdf, other]

Task-Oriented Low-Dose CT Image Denoising

Authors: Jiajin Zhang, Hanqing Chao, Xuanang Xu, Chuang Niu, Ge Wang, Pingkun Yan

Abstract: The extensive use of medical CT has raised a public concern over the radiation dose to the patient. Reducing the radiation dose leads to increased CT image noise and artifacts, which can adversely affect not only the radiologists judgement but also the performance of downstream medical image analysis tasks. Various low-dose CT denoising methods, especially the recent deep learning based approaches… ▽ More The extensive use of medical CT has raised a public concern over the radiation dose to the patient. Reducing the radiation dose leads to increased CT image noise and artifacts, which can adversely affect not only the radiologists judgement but also the performance of downstream medical image analysis tasks. Various low-dose CT denoising methods, especially the recent deep learning based approaches, have produced impressive results. However, the existing denoising methods are all downstream-task-agnostic and neglect the diverse needs of the downstream applications. In this paper, we introduce a novel Task-Oriented Denoising Network (TOD-Net) with a task-oriented loss leveraging knowledge from the downstream tasks. Comprehensive empirical analysis shows that the task-oriented loss complements other task agnostic losses by steering the denoiser to enhance the image quality in the task related regions of interest. Such enhancement in turn brings general boosts on the performance of various methods for the downstream task. The presented work may shed light on the future development of context-aware image denoising methods. △ Less

Submitted 10 July, 2021; v1 submitted 24 March, 2021; originally announced March 2021.

Comments: Paper accepted by MICCAI-2021

arXiv:2102.09615 [pdf, other]

Noise Entangled GAN For Low-Dose CT Simulation

Authors: Chuang Niu, Ge Wang, Pingkun Yan, Juergen Hahn, Youfang Lai, Xun Jia, Arjun Krishna, Klaus Mueller, Andreu Badal, KyleJ. Myers, Rongping Zeng

Abstract: We propose a Noise Entangled GAN (NE-GAN) for simulating low-dose computed tomography (CT) images from a higher dose CT image. First, we present two schemes to generate a clean CT image and a noise image from the high-dose CT image. Then, given these generated images, an NE-GAN is proposed to simulate different levels of low-dose CT images, where the level of generated noise can be continuously co… ▽ More We propose a Noise Entangled GAN (NE-GAN) for simulating low-dose computed tomography (CT) images from a higher dose CT image. First, we present two schemes to generate a clean CT image and a noise image from the high-dose CT image. Then, given these generated images, an NE-GAN is proposed to simulate different levels of low-dose CT images, where the level of generated noise can be continuously controlled by a noise factor. NE-GAN consists of a generator and a set of discriminators, and the number of discriminators is determined by the number of noise levels during training. Compared with the traditional methods based on the projection data that are usually unavailable in real applications, NE-GAN can directly learn from the real and/or simulated CT images and may create low-dose CT images quickly without the need of raw data or other proprietary CT scanner information. The experimental results show that the proposed method has the potential to simulate realistic low-dose CT images. △ Less

Submitted 18 February, 2021; originally announced February 2021.

arXiv:2008.06997 [pdf, other]

doi 10.1038/s41467-021-23235-4

Deep Learning Predicts Cardiovascular Disease Risks from Lung Cancer Screening Low Dose Computed Tomography

Authors: Hanqing Chao, Hongming Shan, Fatemeh Homayounieh, Ramandeep Singh, Ruhani Doda Khera, Hengtao Guo, Timothy Su, Ge Wang, Mannudeep K. Kalra, Pingkun Yan

Abstract: Cancer patients have a higher risk of cardiovascular disease (CVD) mortality than the general population. Low dose computed tomography (LDCT) for lung cancer screening offers an opportunity for simultaneous CVD risk estimation in at-risk patients. Our deep learning CVD risk prediction model, trained with 30,286 LDCTs from the National Lung Cancer Screening Trial, achieved an area under the curve (… ▽ More Cancer patients have a higher risk of cardiovascular disease (CVD) mortality than the general population. Low dose computed tomography (LDCT) for lung cancer screening offers an opportunity for simultaneous CVD risk estimation in at-risk patients. Our deep learning CVD risk prediction model, trained with 30,286 LDCTs from the National Lung Cancer Screening Trial, achieved an area under the curve (AUC) of 0.871 on a separate test set of 2,085 subjects and identified patients with high CVD mortality risks (AUC of 0.768). We validated our model against ECG-gated cardiac CT based markers, including coronary artery calcification (CAC) score, CAD-RADS score, and MESA 10-year risk score from an independent dataset of 335 subjects. Our work shows that, in high-risk patients, deep learning can convert LDCT for lung cancer screening into a dual-screening quantitative tool for CVD risk estimation. △ Less

Submitted 29 March, 2021; v1 submitted 16 August, 2020; originally announced August 2020.

arXiv:2008.01846 [pdf]

Stabilizing Deep Tomographic Reconstruction

Authors: Weiwen Wu, Dianlin Hu, Wenxiang Cong, Hongming Shan, Shaoyu Wang, Chuang Niu, Pingkun Yan, Hengyong Yu, Varut Vardhanabhuti, Ge Wang

Abstract: Tomographic image reconstruction with deep learning is an emerging field, but a recent landmark study reveals that several deep reconstruction networks are unstable for computed tomography (CT) and magnetic resonance imaging (MRI). Specifically, three kinds of instabilities were reported: (1) strong image artefacts from tiny perturbations, (2) small features missing in a deeply reconstructed image… ▽ More Tomographic image reconstruction with deep learning is an emerging field, but a recent landmark study reveals that several deep reconstruction networks are unstable for computed tomography (CT) and magnetic resonance imaging (MRI). Specifically, three kinds of instabilities were reported: (1) strong image artefacts from tiny perturbations, (2) small features missing in a deeply reconstructed image, and (3) decreased imaging performance with increased input data. On the other hand, compressed sensing (CS) inspired reconstruction methods do not suffer from these instabilities because of their built-in kernel awareness. For deep reconstruction to realize its full potential and become a mainstream approach for tomographic imaging, it is thus critically important to meet this challenge by stabilizing deep reconstruction networks. Here we propose an Analytic Compressed Iterative Deep (ACID) framework to address this challenge. ACID synergizes a deep reconstruction network trained on big data, kernel awareness from CS-inspired processing, and iterative refinement to minimize the data residual relative to real measurement. Our study demonstrates that the deep reconstruction using ACID is accurate and stable, and sheds light on the converging mechanism of the ACID iteration under a Bounded Relative Error Norm (BREN) condition. In particular, the study shows that ACID-based reconstruction is resilient against adversarial attacks, superior to classic sparsity-regularized reconstruction alone, and eliminates the three kinds of instabilities. We anticipate that this integrative data-driven approach will help promote development and translation of deep tomographic image reconstruction networks into clinical applications. △ Less

Submitted 13 September, 2021; v1 submitted 4 August, 2020; originally announced August 2020.

Comments: 78 pages, 30 figures, 149 references

arXiv:2007.10416 [pdf, other]

Integrative Analysis for COVID-19 Patient Outcome Prediction

Authors: Hanqing Chao, Xi Fang, Jiajin Zhang, Fatemeh Homayounieh, Chiara D. Arru, Subba R. Digumarthy, Rosa Babaei, Hadi K. Mobin, Iman Mohseni, Luca Saba, Alessandro Carriero, Zeno Falaschi, Alessio Pasche, Ge Wang, Mannudeep K. Kalra, Pingkun Yan

Abstract: While image analysis of chest computed tomography (CT) for COVID-19 diagnosis has been intensively studied, little work has been performed for image-based patient outcome prediction. Management of high-risk patients with early intervention is a key to lower the fatality rate of COVID-19 pneumonia, as a majority of patients recover naturally. Therefore, an accurate prediction of disease progression… ▽ More While image analysis of chest computed tomography (CT) for COVID-19 diagnosis has been intensively studied, little work has been performed for image-based patient outcome prediction. Management of high-risk patients with early intervention is a key to lower the fatality rate of COVID-19 pneumonia, as a majority of patients recover naturally. Therefore, an accurate prediction of disease progression with baseline imaging at the time of the initial presentation can help in patient management. In lieu of only size and volume information of pulmonary abnormalities and features through deep learning based image segmentation, here we combine radiomics of lung opacities and non-imaging features from demographic data, vital signs, and laboratory findings to predict need for intensive care unit (ICU) admission. To our knowledge, this is the first study that uses holistic information of a patient including both imaging and non-imaging data for outcome prediction. The proposed methods were thoroughly evaluated on datasets separately collected from three hospitals, one in the United States, one in Iran, and another in Italy, with a total 295 patients with reverse transcription polymerase chain reaction (RT-PCR) assay positive COVID-19 pneumonia. Our experimental results demonstrate that adding non-imaging features can significantly improve the performance of prediction to achieve AUC up to 0.884 and sensitivity as high as 96.1%, which can be valuable to provide clinical decision support in managing COVID-19 patients. Our methods may also be applied to other lung diseases including but not limited to community acquired pneumonia. The source code of our work is available at https://github.com/DIAL-RPI/COVID19-ICUPrediction. △ Less

Submitted 16 September, 2020; v1 submitted 20 July, 2020; originally announced July 2020.

Comments: This paper has been accepted by Medical Image Analysis. The source code of this work is available at https://github.com/DIAL-RPI/COVID19-ICUPrediction

arXiv:2006.07694 [pdf, other]

Sensorless Freehand 3D Ultrasound Reconstruction via Deep Contextual Learning

Authors: Hengtao Guo, Sheng Xu, Bradford Wood, Pingkun Yan

Abstract: Transrectal ultrasound (US) is the most commonly used imaging modality to guide prostate biopsy and its 3D volume provides even richer context information. Current methods for 3D volume reconstruction from freehand US scans require external tracking devices to provide spatial position for every frame. In this paper, we propose a deep contextual learning network (DCL-Net), which can efficiently exp… ▽ More Transrectal ultrasound (US) is the most commonly used imaging modality to guide prostate biopsy and its 3D volume provides even richer context information. Current methods for 3D volume reconstruction from freehand US scans require external tracking devices to provide spatial position for every frame. In this paper, we propose a deep contextual learning network (DCL-Net), which can efficiently exploit the image feature relationship between US frames and reconstruct 3D US volumes without any tracking device. The proposed DCL-Net utilizes 3D convolutions over a US video segment for feature extraction. An embedded self-attention module makes the network focus on the speckle-rich areas for better spatial movement prediction. We also propose a novel case-wise correlation loss to stabilize the training process for improved accuracy. Highly promising results have been obtained by using the developed method. The experiments with ablation studies demonstrate superior performance of the proposed method by comparing against other state-of-the-art methods. Source code of this work is publicly available at https://github.com/DIAL-RPI/FreehandUSRecon. △ Less

Submitted 13 June, 2020; originally announced June 2020.

Comments: Provisionally accepted by MICCAI 2020

arXiv:1910.11456 [pdf, other]

Unified Multi-scale Feature Abstraction for Medical Image Segmentation

Authors: Xi Fang, Bo Du, Sheng Xu, Bradford J. Wood, Pingkun Yan

Abstract: Automatic medical image segmentation, an essential component of medical image analysis, plays an importantrole in computer-aided diagnosis. For example, locating and segmenting the liver can be very helpful in livercancer diagnosis and treatment. The state-of-the-art models in medical image segmentation are variants ofthe encoder-decoder architecture such as fully convolutional network (FCN) and U… ▽ More Automatic medical image segmentation, an essential component of medical image analysis, plays an importantrole in computer-aided diagnosis. For example, locating and segmenting the liver can be very helpful in livercancer diagnosis and treatment. The state-of-the-art models in medical image segmentation are variants ofthe encoder-decoder architecture such as fully convolutional network (FCN) and U-Net.1A major focus ofthe FCN based segmentation methods has been on network structure engineering by incorporating the latestCNN structures such as ResNet2and DenseNet.3In addition to exploring new network structures for efficientlyabstracting high level features, incorporating structures for multi-scale image feature extraction in FCN hashelped to improve performance in segmentation tasks. In this paper, we design a new multi-scale networkarchitecture, which takes multi-scale inputs with dedicated convolutional paths to efficiently combine featuresfrom different scales to better utilize the hierarchical information. △ Less

Submitted 24 October, 2019; originally announced October 2019.

Comments: Abstract of SPIE Medical Imaging (Oral)

arXiv:1908.10245 [pdf]

Feature Exploration for Knowledge-guided and Data-driven Approach Based Cuffless Blood Pressure Measurement

Authors: Xiaorong Ding, Bryan P Yan, Yuan-Ting Zhang, Jing Liu, Peng Su, Ni Zhao

Abstract: This study explores extended feature space that is indicative of blood pressure (BP) changes for better estimation of continuous BP in an unobtrusive way. A total of 222 features were extracted from noninvasively acquired electrocardiogram (ECG) and photoplethysmogram (PPG) signals with the subject undergoing coronary angiography and/or percutaneous coronary intervention, during which intra-arteri… ▽ More This study explores extended feature space that is indicative of blood pressure (BP) changes for better estimation of continuous BP in an unobtrusive way. A total of 222 features were extracted from noninvasively acquired electrocardiogram (ECG) and photoplethysmogram (PPG) signals with the subject undergoing coronary angiography and/or percutaneous coronary intervention, during which intra-arterial BP was recorded simultaneously with the subject at rest and while administering drugs to induce BP variations. The association between the extracted features and the BP components, i.e. systolic BP (SBP), diastolic BP (DBP), mean BP (MBP), and pulse pressure (PP) were analyzed and evaluated in terms of correlation coefficient, cross sample entropy, and mutual information, respectively. Results show that the most relevant indicator for both SBP and MBP is the pulse full width at half maximum, and for DBP and PP, the amplitude between the peak of the first derivative of PPG (dPPG) to the valley of the second derivative of PPG (sdPPG) and the time interval between the peak of R wave and the sdPPG, respectively. As potential inputs to either the knowledge-guided model or data-driven method for cuffless BP calibration, the proposed expanded features are expected to improve the estimation accuracy of cuffless BP. △ Less

Submitted 27 August, 2019; originally announced August 2019.

Comments: 4 pages, 4 figures, 2 tables

arXiv:1903.02026 [pdf, other]

doi 10.1007/s00138-020-01060-x

Deep Learning in Medical Image Registration: A Survey

Authors: Grant Haskins, Uwe Kruger, Pingkun Yan

Abstract: The establishment of image correspondence through robust image registration is critical to many clinical tasks such as image fusion, organ atlas creation, and tumor growth monitoring, and is a very challenging problem. Since the beginning of the recent deep learning renaissance, the medical imaging research community has developed deep learning based approaches and achieved the state-of-the-art in… ▽ More The establishment of image correspondence through robust image registration is critical to many clinical tasks such as image fusion, organ atlas creation, and tumor growth monitoring, and is a very challenging problem. Since the beginning of the recent deep learning renaissance, the medical imaging research community has developed deep learning based approaches and achieved the state-of-the-art in many applications, including image registration. The rapid adoption of deep learning for image registration applications over the past few years necessitates a comprehensive summary and outlook, which is the main scope of this survey. This requires placing a focus on the different research areas as well as highlighting challenges that practitioners face. This survey, therefore, outlines the evolution of deep learning based medical image registration in the context of both research challenges and relevant innovations in the past few years. Further, this survey highlights future research directions to show how this field may be possibly moved forward to the next level. △ Less

Submitted 21 January, 2020; v1 submitted 5 March, 2019; originally announced March 2019.

Comments: Accepted for publication by Machine Vision and Applications on January 8, 2020

Showing 1–25 of 25 results for author: Yan, P