-
MAEEG: Masked Auto-encoder for EEG Representation Learning
Authors:
Hsiang-Yun Sherry Chien,
Hanlin Goh,
Christopher M. Sandino,
Joseph Y. Cheng
Abstract:
Decoding information from bio-signals such as EEG, using machine learning has been a challenge due to the small data-sets and difficulty to obtain labels. We propose a reconstruction-based self-supervised learning model, the masked auto-encoder for EEG (MAEEG), for learning EEG representations by learning to reconstruct the masked EEG features using a transformer architecture. We found that MAEEG…
▽ More
Decoding information from bio-signals such as EEG, using machine learning has been a challenge due to the small data-sets and difficulty to obtain labels. We propose a reconstruction-based self-supervised learning model, the masked auto-encoder for EEG (MAEEG), for learning EEG representations by learning to reconstruct the masked EEG features using a transformer architecture. We found that MAEEG can learn representations that significantly improve sleep stage classification (~5% accuracy increase) when only a small number of labels are given. We also found that input sample lengths and different ways of masking during reconstruction-based SSL pretraining have a huge effect on downstream model performance. Specifically, learning to reconstruct a larger proportion and more concentrated masked signal results in better performance on sleep classification. Our findings provide insight into how reconstruction-based SSL could help representation learning for EEG.
△ Less
Submitted 27 October, 2022;
originally announced November 2022.
-
Position Prediction as an Effective Pretraining Strategy
Authors:
Shuangfei Zhai,
Navdeep Jaitly,
Jason Ramapuram,
Dan Busbridge,
Tatiana Likhomanenko,
Joseph Yitan Cheng,
Walter Talbott,
Chen Huang,
Hanlin Goh,
Joshua Susskind
Abstract:
Transformers have gained increasing popularity in a wide range of applications, including Natural Language Processing (NLP), Computer Vision and Speech Recognition, because of their powerful representational capacity. However, harnessing this representational capacity effectively requires a large amount of data, strong regularization, or both, to mitigate overfitting. Recently, the power of the Tr…
▽ More
Transformers have gained increasing popularity in a wide range of applications, including Natural Language Processing (NLP), Computer Vision and Speech Recognition, because of their powerful representational capacity. However, harnessing this representational capacity effectively requires a large amount of data, strong regularization, or both, to mitigate overfitting. Recently, the power of the Transformer has been unlocked by self-supervised pretraining strategies based on masked autoencoders which rely on reconstructing masked inputs, directly, or contrastively from unmasked content. This pretraining strategy which has been used in BERT models in NLP, Wav2Vec models in Speech and, recently, in MAE models in Vision, forces the model to learn about relationships between the content in different parts of the input using autoencoding related objectives. In this paper, we propose a novel, but surprisingly simple alternative to content reconstruction~-- that of predicting locations from content, without providing positional information for it. Doing so requires the Transformer to understand the positional relationships between different parts of the input, from their content alone. This amounts to an efficient implementation where the pretext task is a classification problem among all possible positions for each input token. We experiment on both Vision and Speech benchmarks, where our approach brings improvements over strong supervised training baselines and is comparable to modern unsupervised/self-supervised pretraining methods. Our method also enables Transformers trained without position embeddings to outperform ones trained with full position information.
△ Less
Submitted 15 July, 2022;
originally announced July 2022.
-
Speech Emotion: Investigating Model Representations, Multi-Task Learning and Knowledge Distillation
Authors:
Vikramjit Mitra,
Hsiang-Yun Sherry Chien,
Vasudha Kowtha,
Joseph Yitan Cheng,
Erdrin Azemi
Abstract:
Estimating dimensional emotions, such as activation, valence and dominance, from acoustic speech signals has been widely explored over the past few years. While accurate estimation of activation and dominance from speech seem to be possible, the same for valence remains challenging. Previous research has shown that the use of lexical information can improve valence estimation performance. Lexical…
▽ More
Estimating dimensional emotions, such as activation, valence and dominance, from acoustic speech signals has been widely explored over the past few years. While accurate estimation of activation and dominance from speech seem to be possible, the same for valence remains challenging. Previous research has shown that the use of lexical information can improve valence estimation performance. Lexical information can be obtained from pre-trained acoustic models, where the learned representations can improve valence estimation from speech. We investigate the use of pre-trained model representations to improve valence estimation from acoustic speech signal. We also explore fusion of representations to improve emotion estimation across all three emotion dimensions: activation, valence and dominance. Additionally, we investigate if representations from pre-trained models can be distilled into models trained with low-level features, resulting in models with a less number of parameters. We show that fusion of pre-trained model embeddings result in a 79% relative improvement in concordance correlation coefficient CCC on valence estimation compared to standard acoustic feature baseline (mel-filterbank energies), while distillation from pre-trained model embeddings to lower-dimensional representations yielded a relative 12% improvement. Such performance gains were observed over two evaluation sets, indicating that our proposed architecture generalizes across those evaluation sets. We report new state-of-the-art "text-free" acoustic-only dimensional emotion estimation $CCC$ values on two MSP-Podcast evaluation sets.
△ Less
Submitted 2 July, 2022;
originally announced July 2022.
-
Spectral Decomposition in Deep Networks for Segmentation of Dynamic Medical Images
Authors:
Edgar A. Rios Piedra,
Morteza Mardani,
Frank Ong,
Ukash Nakarmi,
Joseph Y. Cheng,
Shreyas Vasanawala
Abstract:
Dynamic contrast-enhanced magnetic resonance imaging (DCE- MRI) is a widely used multi-phase technique routinely used in clinical practice. DCE and similar datasets of dynamic medical data tend to contain redundant information on the spatial and temporal components that may not be relevant for detection of the object of interest and result in unnecessarily complex computer models with long trainin…
▽ More
Dynamic contrast-enhanced magnetic resonance imaging (DCE- MRI) is a widely used multi-phase technique routinely used in clinical practice. DCE and similar datasets of dynamic medical data tend to contain redundant information on the spatial and temporal components that may not be relevant for detection of the object of interest and result in unnecessarily complex computer models with long training times that may also under-perform at test time due to the abundance of noisy heterogeneous data. This work attempts to increase the training efficacy and performance of deep networks by determining redundant information in the spatial and spectral components and show that the performance of segmentation accuracy can be maintained and potentially improved. Reported experiments include the evaluation of training/testing efficacy on a heterogeneous dataset composed of abdominal images of pediatric DCE patients, showing that drastic data reduction (higher than 80%) can preserve the dynamic information and performance of the segmentation model, while effectively suppressing noise and unwanted portion of the images.
△ Less
Submitted 29 September, 2020;
originally announced October 2020.
-
Subject-Aware Contrastive Learning for Biosignals
Authors:
Joseph Y. Cheng,
Hanlin Goh,
Kaan Dogrusoz,
Oncel Tuzel,
Erdrin Azemi
Abstract:
Datasets for biosignals, such as electroencephalogram (EEG) and electrocardiogram (ECG), often have noisy labels and have limited number of subjects (<100). To handle these challenges, we propose a self-supervised approach based on contrastive learning to model biosignals with a reduced reliance on labeled data and with fewer subjects. In this regime of limited labels and subjects, intersubject va…
▽ More
Datasets for biosignals, such as electroencephalogram (EEG) and electrocardiogram (ECG), often have noisy labels and have limited number of subjects (<100). To handle these challenges, we propose a self-supervised approach based on contrastive learning to model biosignals with a reduced reliance on labeled data and with fewer subjects. In this regime of limited labels and subjects, intersubject variability negatively impacts model performance. Thus, we introduce subject-aware learning through (1) a subject-specific contrastive loss, and (2) an adversarial training to promote subject-invariance during the self-supervised learning. We also develop a number of time-series data augmentation techniques to be used with the contrastive loss for biosignals. Our method is evaluated on publicly available datasets of two different biosignals with different tasks: EEG decoding and ECG anomaly detection. The embeddings learned using self-supervision yield competitive classification results compared to entirely supervised methods. We show that subject-invariance improves representation quality for these tasks, and observe that subject-specific loss increases performance when fine-tuning with supervised labels.
△ Less
Submitted 30 June, 2020;
originally announced July 2020.
-
Analysis of Deep Complex-Valued Convolutional Neural Networks for MRI Reconstruction
Authors:
Elizabeth K. Cole,
Joseph Y. Cheng,
John M. Pauly,
Shreyas S. Vasanawala
Abstract:
Many real-world signal sources are complex-valued, having real and imaginary components. However, the vast majority of existing deep learning platforms and network architectures do not support the use of complex-valued data. MRI data is inherently complex-valued, so existing approaches discard the richer algebraic structure of the complex data. In this work, we investigate end-to-end complex-value…
▽ More
Many real-world signal sources are complex-valued, having real and imaginary components. However, the vast majority of existing deep learning platforms and network architectures do not support the use of complex-valued data. MRI data is inherently complex-valued, so existing approaches discard the richer algebraic structure of the complex data. In this work, we investigate end-to-end complex-valued convolutional neural networks - specifically, for image reconstruction in lieu of two-channel real-valued networks. We apply this to magnetic resonance imaging reconstruction for the purpose of accelerating scan times and determine the performance of various promising complex-valued activation functions. We find that complex-valued CNNs with complex-valued convolutions provide superior reconstructions compared to real-valued convolutions with the same number of trainable parameters, over a variety of network architectures and datasets.
△ Less
Submitted 11 May, 2020; v1 submitted 3 April, 2020;
originally announced April 2020.
-
Diagnostic Image Quality Assessment and Classification in Medical Imaging: Opportunities and Challenges
Authors:
Jeffrey Ma,
Ukash Nakarmi,
Cedric Yue Sik Kin,
Christopher Sandino,
Joseph Y. Cheng,
Ali B. Syed,
Peter Wei,
John M. Pauly,
Shreyas Vasanawala
Abstract:
Magnetic Resonance Imaging (MRI) suffers from several artifacts, the most common of which are motion artifacts. These artifacts often yield images that are of non-diagnostic quality. To detect such artifacts, images are prospectively evaluated by experts for their diagnostic quality, which necessitates patient-revisits and rescans whenever non-diagnostic quality scans are encountered. This motivat…
▽ More
Magnetic Resonance Imaging (MRI) suffers from several artifacts, the most common of which are motion artifacts. These artifacts often yield images that are of non-diagnostic quality. To detect such artifacts, images are prospectively evaluated by experts for their diagnostic quality, which necessitates patient-revisits and rescans whenever non-diagnostic quality scans are encountered. This motivates the need to develop an automated framework capable of accessing medical image quality and detecting diagnostic and non-diagnostic images. In this paper, we explore several convolutional neural network-based frameworks for medical image quality assessment and investigate several challenges therein.
△ Less
Submitted 5 December, 2019;
originally announced December 2019.
-
Accelerating cardiac cine MRI using a deep learning-based ESPIRiT reconstruction
Authors:
Christopher M. Sandino,
Peng Lai,
Shreyas S. Vasanawala,
Joseph Y. Cheng
Abstract:
A novel neural network architecture, known as DL-ESPIRiT, is proposed to reconstruct rapidly acquired cardiac MRI data without field-of-view limitations which are present in previously proposed deep learning-based reconstruction frameworks. Additionally, a novel convolutional neural network based on separable 3D convolutions is integrated into DL-ESPIRiT to more efficiently learn spatiotemporal pr…
▽ More
A novel neural network architecture, known as DL-ESPIRiT, is proposed to reconstruct rapidly acquired cardiac MRI data without field-of-view limitations which are present in previously proposed deep learning-based reconstruction frameworks. Additionally, a novel convolutional neural network based on separable 3D convolutions is integrated into DL-ESPIRiT to more efficiently learn spatiotemporal priors for dynamic image reconstruction. The network is trained on fully-sampled 2D cardiac cine datasets collected from eleven healthy volunteers with IRB approval. DL-ESPIRiT is compared against a state-of-the-art parallel imaging and compressed sensing method known as $l_1$-ESPIRiT. The reconstruction accuracy of both methods is evaluated on retrospectively undersampled datasets (R=12) with respect to standard image quality metrics as well as automatic deep learning-based segmentations of left ventricular volumes. Feasibility of this approach is demonstrated in reconstructions of prospectively undersampled data which were acquired in a single heartbeat per slice.
△ Less
Submitted 18 May, 2020; v1 submitted 13 November, 2019;
originally announced November 2019.
-
Reconstruction of Undersampled 3D Non-Cartesian Image-Based Navigators for Coronary MRA Using an Unrolled Deep Learning Model
Authors:
Mario O. Malavé,
Corey A. Baron,
Srivathsan P. Koundinyan,
Christopher M. Sandino,
Frank Ong,
Joseph Y. Cheng,
Dwight G. Nishimura
Abstract:
Purpose: To rapidly reconstruct undersampled 3D non-Cartesian image-based navigators (iNAVs) using an unrolled deep learning (DL) model for non-rigid motion correction in coronary magnetic resonance angiography (CMRA).
Methods: An unrolled network is trained to reconstruct beat-to-beat 3D iNAVs acquired as part of a CMRA sequence. The unrolled model incorporates a non-uniform FFT operator to per…
▽ More
Purpose: To rapidly reconstruct undersampled 3D non-Cartesian image-based navigators (iNAVs) using an unrolled deep learning (DL) model for non-rigid motion correction in coronary magnetic resonance angiography (CMRA).
Methods: An unrolled network is trained to reconstruct beat-to-beat 3D iNAVs acquired as part of a CMRA sequence. The unrolled model incorporates a non-uniform FFT operator to perform the data consistency operation, and the regularization term is learned by a convolutional neural network (CNN) based on the proximal gradient descent algorithm. The training set includes 6,000 3D iNAVs acquired from 7 different subjects and 11 scans using a variable-density (VD) cones trajectory. For testing, 3D iNAVs from 4 additional subjects are reconstructed using the unrolled model. To validate reconstruction accuracy, global and localized motion estimates from DL model-based 3D iNAVs are compared with those extracted from 3D iNAVs reconstructed with $\textit{l}_{1}$-ESPIRiT. Then, the high-resolution coronary MRA images motion corrected with autofocusing using the $\textit{l}_{1}$-ESPIRiT and DL model-based 3D iNAVs are assessed for differences.
Results: 3D iNAVs reconstructed using the DL model-based approach and conventional $\textit{l}_{1}$-ESPIRiT generate similar global and localized motion estimates and provide equivalent coronary image quality. Reconstruction with the unrolled network completes in a fraction of the time compared to CPU and GPU implementations of $\textit{l}_{1}$-ESPIRiT (20x and 3x speed increases, respectively).
Conclusion: We have developed a deep neural network architecture to reconstruct undersampled 3D non-Cartesian VD cones iNAVs. Our approach decreases reconstruction time for 3D iNAVs, while preserving the accuracy of non-rigid motion information offered by them for correction.
△ Less
Submitted 24 October, 2019;
originally announced October 2019.
-
Compressed Sensing: From Research to Clinical Practice with Data-Driven Learning
Authors:
Joseph Y. Cheng,
Feiyu Chen,
Christopher Sandino,
Morteza Mardani,
John M. Pauly,
Shreyas S. Vasanawala
Abstract:
Compressed sensing in MRI enables high subsampling factors while maintaining diagnostic image quality. This technique enables shortened scan durations and/or improved image resolution. Further, compressed sensing can increase the diagnostic information and value from each scan performed. Overall, compressed sensing has significant clinical impact in improving the diagnostic quality and patient exp…
▽ More
Compressed sensing in MRI enables high subsampling factors while maintaining diagnostic image quality. This technique enables shortened scan durations and/or improved image resolution. Further, compressed sensing can increase the diagnostic information and value from each scan performed. Overall, compressed sensing has significant clinical impact in improving the diagnostic quality and patient experience for imaging exams. However, a number of challenges exist when moving compressed sensing from research to the clinic. These challenges include hand-crafted image priors, sensitive tuning parameters, and long reconstruction times. Data-driven learning provides a solution to address these challenges. As a result, compressed sensing can have greater clinical impact. In this tutorial, we will review the compressed sensing formulation and outline steps needed to transform this formulation to a deep learning framework. Supplementary open source code in python will be used to demonstrate this approach with open databases. Further, we will discuss considerations in applying data-driven compressed sensing in the clinical setting.
△ Less
Submitted 19 March, 2019;
originally announced March 2019.
-
Deep Residual Network for Off-Resonance Artifact Correction with Application to Pediatric Body Magnetic Resonance Angiography with 3D Cones
Authors:
David Y Zeng,
Jamil Shaikh,
Dwight G Nishimura,
Shreyas S Vasanawala,
Joseph Y Cheng
Abstract:
Purpose: Off-resonance artifact correction by deep-learning, to facilitate rapid pediatric body imaging with a scan time efficient 3D cones trajectory. Methods: A residual convolutional neural network to correct off-resonance artifacts (Off-ResNet) was trained with a prospective study of 30 pediatric magnetic resonance angiography exams. Each exam acquired a short-readout scan (1.18 ms +- 0.38) an…
▽ More
Purpose: Off-resonance artifact correction by deep-learning, to facilitate rapid pediatric body imaging with a scan time efficient 3D cones trajectory. Methods: A residual convolutional neural network to correct off-resonance artifacts (Off-ResNet) was trained with a prospective study of 30 pediatric magnetic resonance angiography exams. Each exam acquired a short-readout scan (1.18 ms +- 0.38) and a long-readout scan (3.35 ms +- 0.74) at 3T. Short-readout scans, with longer scan times but negligible off-resonance blurring, were used as reference images and augmented with additional off-resonance for supervised training examples. Long-readout scans, with greater off-resonance artifacts but shorter scan time, were corrected by autofocus and Off-ResNet and compared to short-readout scans by normalized root-mean-square error (NRMSE), structural similarity index (SSIM), and peak signal-to-noise ratio (PSNR). Scans were also compared by scoring on eight anatomical features by two radiologists, using analysis of variance with post-hoc Tukey's test. Reader agreement was determined with intraclass correlation. Results: Long-readout scans were on average 59.3% shorter than short-readout scans. Images from Off-ResNet had superior NRMSE, SSIM, and PSNR compared to uncorrected images across +-1kHz off-resonance (P<0.01). The proposed method had superior NRMSE over -677Hz to +1kHz and superior SSIM and PSNR over +-1kHz compared to autofocus (P<0.01). Radiologic scoring demonstrated that long-readout scans corrected with Off-ResNet were non-inferior to short-readout scans (P<0.01). Conclusion: The proposed method can correct off-resonance artifacts from rapid long-readout 3D cones scans to a non-inferior image quality compared to diagnostically standard short-readout scans.
△ Less
Submitted 28 September, 2018;
originally announced October 2018.
-
Highly Scalable Image Reconstruction using Deep Neural Networks with Bandpass Filtering
Authors:
Joseph Y. Cheng,
Feiyu Chen,
Marcus T. Alley,
John M. Pauly,
Shreyas S. Vasanawala
Abstract:
To increase the flexibility and scalability of deep neural networks for image reconstruction, a framework is proposed based on bandpass filtering. For many applications, sensing measurements are performed indirectly. For example, in magnetic resonance imaging, data are sampled in the frequency domain. The introduction of bandpass filtering enables leveraging known imaging physics while ensuring th…
▽ More
To increase the flexibility and scalability of deep neural networks for image reconstruction, a framework is proposed based on bandpass filtering. For many applications, sensing measurements are performed indirectly. For example, in magnetic resonance imaging, data are sampled in the frequency domain. The introduction of bandpass filtering enables leveraging known imaging physics while ensuring that the final reconstruction is consistent with actual measurements to maintain reconstruction accuracy. We demonstrate this flexible architecture for reconstructing subsampled datasets of MRI scans. The resulting high subsampling rates increase the speed of MRI acquisitions and enable the visualization rapid hemodynamics.
△ Less
Submitted 26 November, 2018; v1 submitted 8 May, 2018;
originally announced May 2018.
-
Deep Generative Adversarial Networks for Compressed Sensing Automates MRI
Authors:
Morteza Mardani,
Enhao Gong,
Joseph Y. Cheng,
Shreyas Vasanawala,
Greg Zaharchuk,
Marcus Alley,
Neil Thakur,
Song Han,
William Dally,
John M. Pauly,
Lei Xing
Abstract:
Magnetic resonance image (MRI) reconstruction is a severely ill-posed linear inverse task demanding time and resource intensive computations that can substantially trade off {\it accuracy} for {\it speed} in real-time imaging. In addition, state-of-the-art compressed sensing (CS) analytics are not cognizant of the image {\it diagnostic quality}. To cope with these challenges we put forth a novel C…
▽ More
Magnetic resonance image (MRI) reconstruction is a severely ill-posed linear inverse task demanding time and resource intensive computations that can substantially trade off {\it accuracy} for {\it speed} in real-time imaging. In addition, state-of-the-art compressed sensing (CS) analytics are not cognizant of the image {\it diagnostic quality}. To cope with these challenges we put forth a novel CS framework that permeates benefits from generative adversarial networks (GAN) to train a (low-dimensional) manifold of diagnostic-quality MR images from historical patients. Leveraging a mixture of least-squares (LS) GANs and pixel-wise $\ell_1$ cost, a deep residual network with skip connections is trained as the generator that learns to remove the {\it aliasing} artifacts by projecting onto the manifold. LSGAN learns the texture details, while $\ell_1$ controls the high-frequency noise. A multilayer convolutional neural network is then jointly trained based on diagnostic quality images to discriminate the projection quality. The test phase performs feed-forward propagation over the generator network that demands a very low computational overhead. Extensive evaluations are performed on a large contrast-enhanced MR dataset of pediatric patients. In particular, images rated based on expert radiologists corroborate that GANCS retrieves high contrast images with detailed texture relative to conventional CS, and pixel-wise schemes. In addition, it offers reconstruction under a few milliseconds, two orders of magnitude faster than state-of-the-art CS-MRI schemes.
△ Less
Submitted 31 May, 2017;
originally announced June 2017.