Search | arXiv e-print repository

Hypergraph: A Unified and Uniform Definition with Application to Chemical Hypergraph and More

Abstract: The conventional definition of hypergraph has two major issues: (1) there is not a standard definition of directed hypergraph and (2) there is not a formal definition of nested hypergraph. To resolve these issues, we propose a new definition of hypergraph that unifies the concepts of undirected, directed and nested hypergraphs, and that is uniform in using hyperedge as a single construct for repre… ▽ More The conventional definition of hypergraph has two major issues: (1) there is not a standard definition of directed hypergraph and (2) there is not a formal definition of nested hypergraph. To resolve these issues, we propose a new definition of hypergraph that unifies the concepts of undirected, directed and nested hypergraphs, and that is uniform in using hyperedge as a single construct for representing high-order correlations among things, i.e., nodes and hyperedges. Specifically, we define a hyperedge to be a simple hyperedge, a nesting hyperedge, or a directed hyperedge. With this new definition, a hypergraph is nested if it has nesting hyperedge(s), and is directed if it has directed hyperedge(s). Otherwise, a hypergraph is a simple hypergraph. The uniformity and power of this new definition, with visualization, should facilitate the use of hypergraph for representing (hierarchical) high-order correlations in general and chemical systems in particular. Graph has been widely used as a mathematical structure for machine learning on molecular structures and 3D molecular geometries. However, graph has a major limitation: it can represent only pairwise correlations between nodes. Hypergraph extends graph with high-order correlations among nodes. This extension is significant or essential for machine learning on chemical systems. For molecules, this is significant as it allows the direct, explicit representation of multicenter bonds and molecular substructures. For chemical reactions, this is essential since most chemical reactions involve multiple participants. We propose the use of chemical hypergraph, a multilevel hypergraph with simple, nesting and directed hyperedges, as a single mathematical structure for representing chemical systems. We apply the new definition of hypergraph to chemical hypergraph and, as simplified versions, molecular hypergraph and chemical reaction hypergraph. △ Less

Submitted 21 August, 2024; v1 submitted 14 May, 2024; originally announced May 2024.

Comments: arXiv admin note: text overlap with arXiv:2310.03623 by other authors

arXiv:2405.12144 [pdf]

Alterations of electrocortical activity during hand movements induced by motor cortex glioma

Authors: Yihan Wu, Tao Chang, Siliang Chen, Xiaodong Niu, Yu Li, Yuan Fang, Lei Yang, Yixuan Zong, Yaoxin Yang, Yuehua Li, Mengsong Wang, Wen Yang, Yixuan Wu, Chen Fu, Xia Fang, Yuxin Quan, Xilin Peng, Qiang Sun, Marc M. Van Hulle, Yanhui Liu, Ning Jiang, Dario Farina, Yuan Yang, Jiayuan He, Qing Mao

Abstract: Glioma cells can reshape functional neuronal networks by hijacking neuronal synapses, leading to partial or complete neurological dysfunction. These mechanisms have been previously explored for language functions. However, the impact of glioma on sensorimotor functions is still unknown. Therefore, we recruited a control group of patients with unaffected motor cortex and a group of patients with gl… ▽ More Glioma cells can reshape functional neuronal networks by hijacking neuronal synapses, leading to partial or complete neurological dysfunction. These mechanisms have been previously explored for language functions. However, the impact of glioma on sensorimotor functions is still unknown. Therefore, we recruited a control group of patients with unaffected motor cortex and a group of patients with glioma-infiltrated motor cortex, and recorded high-density electrocortical signals during finger movement tasks. The results showed that glioma suppresses task-related synchronization in the high-gamma band and reduces the power across all frequency bands. The resulting atypical motor information transmission model with discrete signaling pathways and delayed responses disrupts the stability of neuronal encoding patterns for finger movement kinematics across various temporal-spatial scales. These findings demonstrate that gliomas functionally invade neural circuits within the motor cortex. This result advances our understanding of motor function processing in chronic disease states, which is important to advance the surgical strategies and neurorehabilitation approaches for patients with malignant gliomas. △ Less

Submitted 20 May, 2024; originally announced May 2024.

arXiv:2309.10128 [pdf, other]

Markov Chain-Guided Graph Construction and Sampling Depth Optimization for EEG-Based Mental Disorder Detection

Authors: Yihan Wu, Tao Chang, Peng Xu, Yangsong Zhang

Abstract: Graph Neural Networks (GNNs) have received considerable attention since its introduction. It has been widely applied in various fields due to its ability to represent graph structured data. However, the application of GNNs is constrained by two main issues. Firstly, the "over-smoothing" problem restricts the use of deeper network structures. Secondly, GNNs' applicability is greatly limited when no… ▽ More Graph Neural Networks (GNNs) have received considerable attention since its introduction. It has been widely applied in various fields due to its ability to represent graph structured data. However, the application of GNNs is constrained by two main issues. Firstly, the "over-smoothing" problem restricts the use of deeper network structures. Secondly, GNNs' applicability is greatly limited when nodes and edges are not clearly defined and expressed, as is the case with EEG data.In this study, we proposed an innovative approach that harnesses the distinctive properties of the graph structure's Markov Chain to optimize the sampling depth of deep graph convolution networks. We introduced a tailored method for constructing graph structures specifically designed for analyzing EEG data, alongside the development of a vertex-level GNN classification model for precise detection of mental disorders. In order to verify the method's performance, we conduct experiments on two disease datasets using a subject-independent experiment scenario. For the Schizophrenia (SZ) data, our method achieves an average accuracy of 100% using only the first 300 seconds of data from each subject. Similarly, for Major Depressive Disorder (MDD) data, the method yields average accuracies of over 99%. These experiments demonstrate the method's ability to effectively distinguish between healthy control (HC) subjects and patients with mental disorders. We believe this method shows great promise for clinical diagnosis. △ Less

Submitted 18 September, 2023; originally announced September 2023.

Comments: 5 figures, 4 tables

arXiv:2304.09981 [pdf, other]

Interpretable (not just posthoc-explainable) heterogeneous survivor bias-corrected treatment effects for assignment of postdischarge interventions to prevent readmissions

Authors: Hongjing Xia, Joshua C. Chang, Sarah Nowak, Sonya Mahajan, Rohit Mahajan, Ted L. Chang, Carson C. Chow

Abstract: We used survival analysis to quantify the impact of postdischarge evaluation and management (E/M) services in preventing hospital readmission or death. Our approach avoids a specific pitfall of applying machine learning to this problem, which is an inflated estimate of the effect of interventions, due to survivors bias -- where the magnitude of inflation may be conditional on heterogeneous confoun… ▽ More We used survival analysis to quantify the impact of postdischarge evaluation and management (E/M) services in preventing hospital readmission or death. Our approach avoids a specific pitfall of applying machine learning to this problem, which is an inflated estimate of the effect of interventions, due to survivors bias -- where the magnitude of inflation may be conditional on heterogeneous confounders in the population. This bias arises simply because in order to receive an intervention after discharge, a person must not have been readmitted in the intervening period. After deriving an expression for this phantom effect, we controlled for this and other biases within an inherently interpretable Bayesian survival framework. We identified case management services as being the most impactful for reducing readmissions overall. △ Less

Submitted 3 August, 2023; v1 submitted 19 April, 2023; originally announced April 2023.

Comments: Submitted

Journal ref: PMLR 219:884-905, 2023

arXiv:2207.08023 [pdf]

Distance-Geometric Graph Attention Network (DG-GAT) for 3D Molecular Geometry

Authors: Daniel T. Chang

Abstract: Deep learning for molecular science has so far mainly focused on 2D molecular graphs. Recently, however, there has been work to extend it to 3D molecular geometry, due to its scientific significance and critical importance in real-world applications. The 3D distance-geometric graph representation (DG-GR) adopts a unified scheme (distance) for representing the geometry of 3D graphs. It is invariant… ▽ More Deep learning for molecular science has so far mainly focused on 2D molecular graphs. Recently, however, there has been work to extend it to 3D molecular geometry, due to its scientific significance and critical importance in real-world applications. The 3D distance-geometric graph representation (DG-GR) adopts a unified scheme (distance) for representing the geometry of 3D graphs. It is invariant to rotation and translation of the graph, and it reflects pair-wise node interactions and their generally local nature, particularly relevant for 3D molecular geometry. To facilitate the incorporation of 3D molecular geometry in deep learning for molecular science, we adopt the new graph attention network with dynamic attention (GATv2) for use with DG-GR and propose the 3D distance-geometric graph attention network (DG-GAT). GATv2 is a great fit for DG-GR since the attention can vary by node and by distance between nodes. Experimental results of DG-GAT for the ESOL and FreeSolv datasets show major improvement (31% and 38%, respectively) over those of the standard graph convolution network based on 2D molecular graphs. The same is true for the QM9 dataset. Our work demonstrates the utility and value of DG-GAT for deep learning based on 3D molecular geometry. △ Less

Submitted 16 July, 2022; originally announced July 2022.

Comments: arXiv admin note: substantial text overlap with arXiv:2006.01785, arXiv:2007.03513

arXiv:2102.13276 [pdf, other]

Spectral Top-Down Recovery of Latent Tree Models

Authors: Yariv Aizenbud, Ariel Jaffe, Meng Wang, Amber Hu, Noah Amsel, Boaz Nadler, Joseph T. Chang, Yuval Kluger

Abstract: Modeling the distribution of high dimensional data by a latent tree graphical model is a prevalent approach in multiple scientific domains. A common task is to infer the underlying tree structure, given only observations of its terminal nodes. Many algorithms for tree recovery are computationally intensive, which limits their applicability to trees of moderate size. For large trees, a common appro… ▽ More Modeling the distribution of high dimensional data by a latent tree graphical model is a prevalent approach in multiple scientific domains. A common task is to infer the underlying tree structure, given only observations of its terminal nodes. Many algorithms for tree recovery are computationally intensive, which limits their applicability to trees of moderate size. For large trees, a common approach, termed divide-and-conquer, is to recover the tree structure in two steps. First, recover the structure separately of multiple, possibly random subsets of the terminal nodes. Second, merge the resulting subtrees to form a full tree. Here, we develop Spectral Top-Down Recovery (STDR), a deterministic divide-and-conquer approach to infer large latent tree models. Unlike previous methods, STDR partitions the terminal nodes in a non random way, based on the Fiedler vector of a suitable Laplacian matrix related to the observed nodes. We prove that under certain conditions, this partitioning is consistent with the tree structure. This, in turn, leads to a significantly simpler merging procedure of the small subtrees. We prove that STDR is statistically consistent and bound the number of samples required to accurately recover the tree with high probability. Using simulated data from several common tree models in phylogenetics, we demonstrate that STDR has a significant advantage in terms of runtime, with improved or similar accuracy. △ Less

Submitted 7 December, 2021; v1 submitted 25 February, 2021; originally announced February 2021.

arXiv:2012.04171 [pdf, other]

Sparse encoding for more-interpretable feature-selecting representations in probabilistic matrix factorization

Authors: Joshua C. Chang, Patrick Fletcher, Jungmin Han, Ted L. Chang, Shashaank Vattikuti, Bart Desmet, Ayah Zirikly, Carson C. Chow

Abstract: Dimensionality reduction methods for count data are critical to a wide range of applications in medical informatics and other fields where model interpretability is paramount. For such data, hierarchical Poisson matrix factorization (HPF) and other sparse probabilistic non-negative matrix factorization (NMF) methods are considered to be interpretable generative models. They consist of sparse trans… ▽ More Dimensionality reduction methods for count data are critical to a wide range of applications in medical informatics and other fields where model interpretability is paramount. For such data, hierarchical Poisson matrix factorization (HPF) and other sparse probabilistic non-negative matrix factorization (NMF) methods are considered to be interpretable generative models. They consist of sparse transformations for decoding their learned representations into predictions. However, sparsity in representation decoding does not necessarily imply sparsity in the encoding of representations from the original data features. HPF is often incorrectly interpreted in the literature as if it possesses encoder sparsity. The distinction between decoder sparsity and encoder sparsity is subtle but important. Due to the lack of encoder sparsity, HPF does not possess the column-clustering property of classical NMF -- the factor loading matrix does not sufficiently define how each factor is formed from the original features. We address this deficiency by self-consistently enforcing encoder sparsity, using a generalized additive model (GAM), thereby allowing one to relate each representation coordinate to a subset of the original data features. In doing so, the method also gains the ability to perform feature selection. We demonstrate our method on simulated data and give an example of how encoder sparsity is of practical use in a concrete application of representing inpatient comorbidities in Medicare patients. △ Less

Submitted 29 December, 2020; v1 submitted 7 December, 2020; originally announced December 2020.

Comments: Fixed typo in Eq 2

Report number: ICLR 2021

arXiv:1910.11390 [pdf]

Deep Learning for Molecular Graphs with Tiered Graph Autoencoders and Graph Prediction

Authors: Daniel T. Chang

Abstract: Tiered graph autoencoders provide the architecture and mechanisms for learning tiered latent representations and latent spaces for molecular graphs that explicitly represent and utilize groups (e.g., functional groups). This enables the utilization and exploration of tiered molecular latent spaces, either individually - the node (atom) tier, the group tier, or the graph (molecule) tier - or jointl… ▽ More Tiered graph autoencoders provide the architecture and mechanisms for learning tiered latent representations and latent spaces for molecular graphs that explicitly represent and utilize groups (e.g., functional groups). This enables the utilization and exploration of tiered molecular latent spaces, either individually - the node (atom) tier, the group tier, or the graph (molecule) tier - or jointly, as well as navigation across the tiers. In this paper, we discuss the use of tiered graph autoencoders together with graph prediction for molecular graphs. We show features of molecular graphs used, and groups in molecular graphs identified for some sample molecules. We briefly review graph prediction and the QM9 dataset for background information, and discuss the use of tiered graph embeddings for graph prediction, particularly weighted group pooling. We find that functional groups and ring groups effectively capture and represent the chemical essence of molecular graphs (structures). Further, tiered graph autoencoders and graph prediction together provide effective, efficient and interpretable deep learning for molecular graphs, with the former providing unsupervised, transferable learning and the latter providing supervised, task-optimized learning. △ Less

Submitted 1 July, 2021; v1 submitted 24 October, 2019; originally announced October 2019.

Comments: arXiv admin note: text overlap with arXiv:1806.08804 by other authors

arXiv:1809.02910 [pdf, other]

Localization Algorithm with Circular Representation in 2D and its Similarity to Mammalian Brains

Authors: Tsang-Kai Chang, Shengkang Chen, Ankur Mehta

Abstract: Extended Kalman filter (EKF) does not guarantee consistent mean and covariance under linearization, even though it is the main framework for robotic localization. While Lie group improves the modeling of the state space in localization, the EKF on Lie group still relies on the arbitrary Gaussian assumption in face of nonlinear models. We instead use von Mises filter for orientation estimation toge… ▽ More Extended Kalman filter (EKF) does not guarantee consistent mean and covariance under linearization, even though it is the main framework for robotic localization. While Lie group improves the modeling of the state space in localization, the EKF on Lie group still relies on the arbitrary Gaussian assumption in face of nonlinear models. We instead use von Mises filter for orientation estimation together with the conventional Kalman filter for position estimation, and thus we are able to characterize the first two moments of the state estimates. Since the proposed algorithm holds a solid probabilistic basis, it is fundamentally relieved from the inconsistency problem. Furthermore, we extend the localization algorithm to fully circular representation even for position, which is similar to grid patterns found in mammalian brains and in recurrent neural networks. The applicability of the proposed algorithms is substantiated not only by strong mathematical foundation but also by the comparison against other common localization methods. △ Less

Submitted 24 January, 2019; v1 submitted 8 September, 2018; originally announced September 2018.

Comments: 8 pages, 2 figures, submitted to the IEEE Robotics and Automation Letters (RA-L) journal with the option for presentation at RSS

arXiv:1612.03403 [pdf, other]

Mechanisms of stochastic onset and termination of atrial fibrillation episodes: Insights using a cellular automaton model

Authors: Yen Ting Lin, Eugene TY Chang, Julie Eatock, Tobias Galla, Richard H Clayton

Abstract: Mathematical models of cardiac electrical excitation are increasingly complex, with multiscale models seeking to represent and bridge physiological behaviours across temporal and spatial scales. The increasing complexity of these models makes it computationally expensive to both evaluate long term (>60 seconds) behaviour and determine sensitivity of model outputs to inputs. This is particularly re… ▽ More Mathematical models of cardiac electrical excitation are increasingly complex, with multiscale models seeking to represent and bridge physiological behaviours across temporal and spatial scales. The increasing complexity of these models makes it computationally expensive to both evaluate long term (>60 seconds) behaviour and determine sensitivity of model outputs to inputs. This is particularly relevant in models of atrial fibrillation (AF), where individual episodes last from seconds to days, and inter-episode waiting times can be minutes to months. Potential mechanisms of transition between sinus rhythm and AF have been identified but are not well understood, and it is difficult to simulate AF for long periods of time using state-of-the-art models. In this study, we implemented a Moe-type cellular automaton on a novel, topologically correct surface geometry of the left atrium. We used the model to simulate stochastic initiation and spontaneous termination of AF, arising from bursts of spontaneous activation near pulmonary veins. The simplified representation of atrial electrical activity reduced computational cost, and so permitted us to investigate AF mechanisms in a probabilistic setting. We computed large numbers (~10^5) of sample paths of the model, to infer stochastic initiation and termination rates of AF episodes using different model parameters. By generating statistical distributions of model outputs, we demonstrated how to propagate uncertainties of inputs within our microscopic level model up to a macroscopic level. Lastly, we investigated spontaneous termination in the model and found a complex dependence on its past AF trajectory, the mechanism of which merits future investigation. △ Less

Submitted 11 December, 2016; originally announced December 2016.

Comments: 14 pages, 7 figures

arXiv:1510.00576 [pdf, other]

Assessing Measures of Atrial Fibrillation Clustering via Stochastic Models of Episode Recurrence and Disease Progression

Authors: Julie Eatock, Yen Ting Lin, Eugene T. Y. Chang, Tobias Galla, Richard H. Clayton

Abstract: Atrial fibrillation (AF) is a leading cause of morbidity and mortality. AF prevalence increases with age, which is attributed to pathophysiological changes that aid AF initiation and perpetuation. Current state-of-the-art models are only capable of simulating short periods of atrial activity at high spatial resolution, whilst the majority of clinical recordings are based on infrequent temporal dat… ▽ More Atrial fibrillation (AF) is a leading cause of morbidity and mortality. AF prevalence increases with age, which is attributed to pathophysiological changes that aid AF initiation and perpetuation. Current state-of-the-art models are only capable of simulating short periods of atrial activity at high spatial resolution, whilst the majority of clinical recordings are based on infrequent temporal datasets of limited spatial resolution. Being able to estimate disease progression informed by both modelling and clinical data would be of significant interest. In addition an analysis of the temporal distribution of recorded fibrillation episodes AF density can provide insights into recurrence patterns. We present an initial analysis of the AF density measure using a simplified idealised stochastic model of a binary time series representing AF episodes. The future aim of this work is to develop robust clinical measures of progression which will be tested on models that generate long-term synthetic data. These measures would then be of clinical interest in deciding treatment strategies. △ Less

Submitted 2 October, 2015; originally announced October 2015.

Comments: 4 pages, 4 figures, submitted to Computing in Cardiology 2015

arXiv:1507.07358 [pdf, ps, other]

Modelling the progression of atrial fibrillation: A stochastic individual-based approach

Authors: Eugene TY Chang, Yen Ting Lin, Tobias Galla, Richard H Clayton, Julie Eatock

Abstract: We propose a stochastic individual-based model of the progression of atrial fibrillation (AF). The model operates at patient level over a lifetime and is based on elements of the physiology and biophysics of AF, making contact with existing mechanistic models. The outputs of the model are times when the patient is in normal rhythm and AF, and we carry out a population-level analysis of the statist… ▽ More We propose a stochastic individual-based model of the progression of atrial fibrillation (AF). The model operates at patient level over a lifetime and is based on elements of the physiology and biophysics of AF, making contact with existing mechanistic models. The outputs of the model are times when the patient is in normal rhythm and AF, and we carry out a population-level analysis of the statistics of disease progression. While the model is stylised at present and not directly predictive, future improvements are proposed to tighten the gap between existing mechanistic models of AF, and epidemiological data, with a view towards model-based personalised medicine. △ Less

Submitted 28 July, 2015; v1 submitted 27 July, 2015; originally announced July 2015.

Comments: 14 pages, 6 figures

Showing 1–12 of 12 results for author: Chang, T