-
Interpretation of Intracardiac Electrograms Through Textual Representations
Authors:
William Jongwon Han,
Diana Gomez,
Avi Alok,
Chaojing Duan,
Michael A. Rosenberg,
Douglas Weber,
Emerson Liu,
Ding Zhao
Abstract:
Understanding the irregular electrical activity of atrial fibrillation (AFib) has been a key challenge in electrocardiography. For serious cases of AFib, catheter ablations are performed to collect intracardiac electrograms (EGMs). EGMs offer intricately detailed and localized electrical activity of the heart and are an ideal modality for interpretable cardiac studies. Recent advancements in artif…
▽ More
Understanding the irregular electrical activity of atrial fibrillation (AFib) has been a key challenge in electrocardiography. For serious cases of AFib, catheter ablations are performed to collect intracardiac electrograms (EGMs). EGMs offer intricately detailed and localized electrical activity of the heart and are an ideal modality for interpretable cardiac studies. Recent advancements in artificial intelligence (AI) has allowed some works to utilize deep learning frameworks to interpret EGMs during AFib. Additionally, language models (LMs) have shown exceptional performance in being able to generalize to unseen domains, especially in healthcare. In this study, we are the first to leverage pretrained LMs for finetuning of EGM interpolation and AFib classification via masked language modeling. We formulate the EGM as a textual sequence and present competitive performances on AFib classification compared against other representations. Lastly, we provide a comprehensive interpretability study to provide a multi-perspective intuition of the model's behavior, which could greatly benefit the clinical use.
△ Less
Submitted 11 April, 2024; v1 submitted 1 February, 2024;
originally announced February 2024.
-
An Adaptable Approach for Successful SIEM Adoption in Companies
Authors:
Maximilian Rosenberg,
Bettina Schneider,
Christopher Scherb,
Petra Maria Asprion
Abstract:
In corporations around the world, the topic of cybersecurity and information security is becoming increasingly important as the number of cyberattacks on themselves continues to grow. Nowadays, it is no longer just a matter of protecting against cyberattacks, but rather of detecting such attacks at an early stage and responding accordingly. There is currently no generic methodological approach for…
▽ More
In corporations around the world, the topic of cybersecurity and information security is becoming increasingly important as the number of cyberattacks on themselves continues to grow. Nowadays, it is no longer just a matter of protecting against cyberattacks, but rather of detecting such attacks at an early stage and responding accordingly. There is currently no generic methodological approach for the implementation of Security Information and Event Management (SIEM) systems that takes academic aspects into account and can be applied independently of the product or developers of the systems. Applying Hevner's design science research approach, the goal of this paper is to develop a holistic procedure model for implementing respective SIEM systems in corporations. According to the study during the validation phase, the procedure model was verified to be applicable. As desire for future research, the procedure model should be applied in various implementation projects in different enterprises to analyze its applicability and completeness.
△ Less
Submitted 2 August, 2023;
originally announced August 2023.
-
Automated Cardiovascular Record Retrieval by Multimodal Learning between Electrocardiogram and Clinical Report
Authors:
Jielin Qiu,
Jiacheng Zhu,
Shiqi Liu,
William Han,
Jingqi Zhang,
Chaojing Duan,
Michael Rosenberg,
Emerson Liu,
Douglas Weber,
Ding Zhao
Abstract:
Automated interpretation of electrocardiograms (ECG) has garnered significant attention with the advancements in machine learning methodologies. Despite the growing interest, most current studies focus solely on classification or regression tasks, which overlook a crucial aspect of clinical cardio-disease diagnosis: the diagnostic report generated by experienced human clinicians. In this paper, we…
▽ More
Automated interpretation of electrocardiograms (ECG) has garnered significant attention with the advancements in machine learning methodologies. Despite the growing interest, most current studies focus solely on classification or regression tasks, which overlook a crucial aspect of clinical cardio-disease diagnosis: the diagnostic report generated by experienced human clinicians. In this paper, we introduce a novel approach to ECG interpretation, leveraging recent breakthroughs in Large Language Models (LLMs) and Vision-Transformer (ViT) models. Rather than treating ECG diagnosis as a classification or regression task, we propose an alternative method of automatically identifying the most similar clinical cases based on the input ECG data. Also, since interpreting ECG as images is more affordable and accessible, we process ECG as encoded images and adopt a vision-language learning paradigm to jointly learn vision-language alignment between encoded ECG images and ECG diagnosis reports. Encoding ECG into images can result in an efficient ECG retrieval system, which will be highly practical and useful in clinical applications. More importantly, our findings could serve as a crucial resource for providing diagnostic services in underdeveloped regions.
△ Less
Submitted 6 November, 2023; v1 submitted 13 April, 2023;
originally announced April 2023.
-
Transfer Knowledge from Natural Language to Electrocardiography: Can We Detect Cardiovascular Disease Through Language Models?
Authors:
Jielin Qiu,
William Han,
Jiacheng Zhu,
Mengdi Xu,
Michael Rosenberg,
Emerson Liu,
Douglas Weber,
Ding Zhao
Abstract:
Recent advancements in Large Language Models (LLMs) have drawn increasing attention since the learned embeddings pretrained on large-scale datasets have shown powerful ability in various downstream applications. However, whether the learned knowledge by LLMs can be transferred to clinical cardiology remains unknown. In this work, we aim to bridge this gap by transferring the knowledge of LLMs to c…
▽ More
Recent advancements in Large Language Models (LLMs) have drawn increasing attention since the learned embeddings pretrained on large-scale datasets have shown powerful ability in various downstream applications. However, whether the learned knowledge by LLMs can be transferred to clinical cardiology remains unknown. In this work, we aim to bridge this gap by transferring the knowledge of LLMs to clinical Electrocardiography (ECG). We propose an approach for cardiovascular disease diagnosis and automatic ECG diagnosis report generation. We also introduce an additional loss function by Optimal Transport (OT) to align the distribution between ECG and language embedding. The learned embeddings are evaluated on two downstream tasks: (1) automatic ECG diagnosis report generation, and (2) zero-shot cardiovascular disease detection. Our approach is able to generate high-quality cardiac diagnosis reports and also achieves competitive zero-shot classification performance even compared with supervised baselines, which proves the feasibility of transferring knowledge from LLMs to the cardiac domain.
△ Less
Submitted 4 February, 2023; v1 submitted 21 January, 2023;
originally announced January 2023.
-
GeoECG: Data Augmentation via Wasserstein Geodesic Perturbation for Robust Electrocardiogram Prediction
Authors:
Jiacheng Zhu,
Jielin Qiu,
Zhuolin Yang,
Douglas Weber,
Michael A. Rosenberg,
Emerson Liu,
Bo Li,
Ding Zhao
Abstract:
There has been an increased interest in applying deep neural networks to automatically interpret and analyze the 12-lead electrocardiogram (ECG). The current paradigms with machine learning methods are often limited by the amount of labeled data. This phenomenon is particularly problematic for clinically-relevant data, where labeling at scale can be time-consuming and costly in terms of the specia…
▽ More
There has been an increased interest in applying deep neural networks to automatically interpret and analyze the 12-lead electrocardiogram (ECG). The current paradigms with machine learning methods are often limited by the amount of labeled data. This phenomenon is particularly problematic for clinically-relevant data, where labeling at scale can be time-consuming and costly in terms of the specialized expertise and human effort required. Moreover, deep learning classifiers may be vulnerable to adversarial examples and perturbations, which could have catastrophic consequences, for example, when applied in the context of medical treatment, clinical trials, or insurance claims. In this paper, we propose a physiologically-inspired data augmentation method to improve performance and increase the robustness of heart disease detection based on ECG signals. We obtain augmented samples by perturbing the data distribution towards other classes along the geodesic in Wasserstein space. To better utilize domain-specific knowledge, we design a ground metric that recognizes the difference between ECG signals based on physiologically determined features. Learning from 12-lead ECG signals, our model is able to distinguish five categories of cardiac conditions. Our results demonstrate improvements in accuracy and robustness, reflecting the effectiveness of our data augmentation method.
△ Less
Submitted 10 August, 2022; v1 submitted 1 August, 2022;
originally announced August 2022.
-
Cardiac Disease Diagnosis on Imbalanced Electrocardiography Data Through Optimal Transport Augmentation
Authors:
Jielin Qiu,
Jiacheng Zhu,
Mengdi Xu,
Peide Huang,
Michael Rosenberg,
Douglas Weber,
Emerson Liu,
Ding Zhao
Abstract:
In this paper, we focus on a new method of data augmentation to solve the data imbalance problem within imbalanced ECG datasets to improve the robustness and accuracy of heart disease detection. By using Optimal Transport, we augment the ECG disease data from normal ECG beats to balance the data among different categories. We build a Multi-Feature Transformer (MF-Transformer) as our classification…
▽ More
In this paper, we focus on a new method of data augmentation to solve the data imbalance problem within imbalanced ECG datasets to improve the robustness and accuracy of heart disease detection. By using Optimal Transport, we augment the ECG disease data from normal ECG beats to balance the data among different categories. We build a Multi-Feature Transformer (MF-Transformer) as our classification model, where different features are extracted from both time and frequency domains to diagnose various heart conditions. Learning from 12-lead ECG signals, our model is able to distinguish five categories of cardiac conditions. Our results demonstrate 1) the classification models' ability to make competitive predictions on five ECG categories; 2) improvements in accuracy and robustness reflecting the effectiveness of our data augmentation method.
△ Less
Submitted 16 February, 2023; v1 submitted 24 January, 2022;
originally announced February 2022.
-
Improving Problem Identification via Automated Log Clustering using Dimensionality Reduction
Authors:
Carl Martin Rosenberg,
Leon Moonen
Abstract:
Goal: We consider the problem of automatically grouping logs of runs that failed for the same underlying reasons, so that they can be treated more effectively, and investigate the following questions: (1) Does an approach developed to identify problems in system logs generalize to identifying problems in continuous deployment logs? (2) How does dimensionality reduction affect the quality of automa…
▽ More
Goal: We consider the problem of automatically grouping logs of runs that failed for the same underlying reasons, so that they can be treated more effectively, and investigate the following questions: (1) Does an approach developed to identify problems in system logs generalize to identifying problems in continuous deployment logs? (2) How does dimensionality reduction affect the quality of automated log clustering? (3) How does the criterion used for merging clusters in the clustering algorithm affect clustering quality?
Method: We replicate and extend earlier work on clustering system log files to assess its generalization to continuous deployment logs. We consider the optional inclusion of one of these dimensionality reduction techniques: Principal Component Analysis (PCA), Latent Semantic Indexing (LSI), and Non-negative Matrix Factorization (NMF). Moreover, we consider three alternative cluster merge criteria (Single Linkage, Average Linkage, and Weighted Linkage), in addition to the Complete Linkage criterion used in earlier work. We empirically evaluate the 16 resulting configurations on continuous deployment logs provided by our industrial collaborator.
Results: Our study shows that (1) identifying problems in continuous deployment logs via clustering is feasible, (2) including NMF significantly improves overall accuracy and robustness, and (3) Complete Linkage performs best of all merge criteria analyzed.
Conclusions: We conclude that problem identification via automated log clustering is improved by including dimensionality reduction, as it decreases the pipeline's sensitivity to parameter choice, thereby increasing its robustness for handling different inputs.
△ Less
Submitted 7 September, 2020;
originally announced September 2020.
-
Spectrum-Based Log Diagnosis
Authors:
Carl Martin Rosenberg,
Leon Moonen
Abstract:
We present and evaluate Spectrum-Based Log Diagnosis (SBLD), a method to help developers quickly diagnose problems found in complex integration and deployment runs. Inspired by Spectrum-Based Fault Localization, SBLD leverages the differences in event occurrences between logs for failing and passing runs, to highlight events that are stronger associated with failing runs.
Using data provided by…
▽ More
We present and evaluate Spectrum-Based Log Diagnosis (SBLD), a method to help developers quickly diagnose problems found in complex integration and deployment runs. Inspired by Spectrum-Based Fault Localization, SBLD leverages the differences in event occurrences between logs for failing and passing runs, to highlight events that are stronger associated with failing runs.
Using data provided by our industrial partner, we empirically investigate the following questions: (i) How well does SBLD reduce the effort needed to identify all failure-relevant events in the log for a failing run? (ii) How is the performance of SBLD affected by available data? (iii) How does SBLD compare to searching for simple textual patterns that often occur in failure-relevant events? We answer (i) and (ii) using summary statistics and heatmap visualizations, and for (iii) we compare three configurations of SBLD (with resp. minimum, median and maximum data) against a textual search using Wilcoxon signed-rank tests and the Vargha-Delaney measure of stochastic superiority.
Our evaluation shows that (i) SBLD achieves a significant effort reduction for the dataset used, (ii) SBLD benefits from additional logs for passing runs in general, and it benefits from additional logs for failing runs when there is a proportional amount of logs for passing runs in the data. Finally, (iii) SBLD and textual search are roughly equally effective at effort-reduction, while textual search has a slightly better recall. We investigate the cause, and discuss how it is due to the characteristics of a specific part of our data.
We conclude that SBLD shows promise as a method for diagnosing failing runs, that its performance is positively affected by additional data, but that it does not outperform textual search on the dataset considered. Future work includes investigating SBLD's generalizability on additional datasets.
△ Less
Submitted 16 August, 2020;
originally announced August 2020.
-
Enabling Continuous Operations for UAVs with an Autonomous Service Network Infrastructure
Authors:
Michael Rosenberg,
John Henry Burns,
Deeraj Nagothu,
Yu Chen
Abstract:
One of the major restrictions on the practical applications of unmanned aerial vehicles (UAV) is their incomplete self-sufficiency, which makes continuous operations infeasible without human oversights. The more oversight UAVs require, the less likely they are going to be commercially advantageous when compared to their alternatives. As an autonomous system, how much human interaction is needed to…
▽ More
One of the major restrictions on the practical applications of unmanned aerial vehicles (UAV) is their incomplete self-sufficiency, which makes continuous operations infeasible without human oversights. The more oversight UAVs require, the less likely they are going to be commercially advantageous when compared to their alternatives. As an autonomous system, how much human interaction is needed to function is one of the best indicators evaluating the limitations and inefficiencies of the UAVs. Popular UAV related research areas, such as path planning and computer vision, have enabled substantial advances in the ability of drones to act on their own. This research is dedicated to in-flight operations, in which there is not much reported effort to tackle the problem from the aspect of the supportive infrastructure. In this paper, an Autonomous Service network infrastructure (AutoServe) is proposed. Aiming at increasing the future autonomy of UAVs, the AutoServe system includes a service-oriented landing platform and a customized communication protocol. This supportive AutoServe infrastructure will autonomize many tasks currently done manually by human operators, such as battery replacement. A proof-of-concept prototype has been built and the simulation experimental study validated the design.
△ Less
Submitted 11 April, 2020;
originally announced May 2020.
-
MS MARCO: A Human Generated MAchine Reading COmprehension Dataset
Authors:
Payal Bajaj,
Daniel Campos,
Nick Craswell,
Li Deng,
Jianfeng Gao,
Xiaodong Liu,
Rangan Majumder,
Andrew McNamara,
Bhaskar Mitra,
Tri Nguyen,
Mir Rosenberg,
Xia Song,
Alina Stoica,
Saurabh Tiwary,
Tong Wang
Abstract:
We introduce a large scale MAchine Reading COmprehension dataset, which we name MS MARCO. The dataset comprises of 1,010,916 anonymized questions---sampled from Bing's search query logs---each with a human generated answer and 182,669 completely human rewritten generated answers. In addition, the dataset contains 8,841,823 passages---extracted from 3,563,535 web documents retrieved by Bing---that…
▽ More
We introduce a large scale MAchine Reading COmprehension dataset, which we name MS MARCO. The dataset comprises of 1,010,916 anonymized questions---sampled from Bing's search query logs---each with a human generated answer and 182,669 completely human rewritten generated answers. In addition, the dataset contains 8,841,823 passages---extracted from 3,563,535 web documents retrieved by Bing---that provide the information necessary for curating the natural language answers. A question in the MS MARCO dataset may have multiple answers or no answers at all. Using this dataset, we propose three different tasks with varying levels of difficulty: (i) predict if a question is answerable given a set of context passages, and extract and synthesize the answer as a human would (ii) generate a well-formed answer (if possible) based on the context passages that can be understood with the question and passage context, and finally (iii) rank a set of retrieved passages given a question. The size of the dataset and the fact that the questions are derived from real user search queries distinguishes MS MARCO from other well-known publicly available datasets for machine reading comprehension and question-answering. We believe that the scale and the real-world nature of this dataset makes it attractive for benchmarking machine reading comprehension and question-answering models.
△ Less
Submitted 31 October, 2018; v1 submitted 28 November, 2016;
originally announced November 2016.