-
KG-Hub -- Building and Exchanging Biological Knowledge Graphs
Authors:
J Harry Caufield,
Tim Putman,
Kevin Schaper,
Deepak R Unni,
Harshad Hegde,
Tiffany J Callahan,
Luca Cappelletti,
Sierra AT Moxon,
Vida Ravanmehr,
Seth Carbon,
Lauren E Chan,
Katherina Cortes,
Kent A Shefchek,
Glass Elsarboukh,
James P Balhoff,
Tommaso Fontana,
Nicolas Matentzoglu,
Richard M Bruskiewich,
Anne E Thessen,
Nomi L Harris,
Monica C Munoz-Torres,
Melissa A Haendel,
Peter N Robinson,
Marcin P Joachimiak,
Christopher J Mungall
, et al. (1 additional authors not shown)
Abstract:
Knowledge graphs (KGs) are a powerful approach for integrating heterogeneous data and making inferences in biology and many other domains, but a coherent solution for constructing, exchanging, and facilitating the downstream use of knowledge graphs is lacking. Here we present KG-Hub, a platform that enables standardized construction, exchange, and reuse of knowledge graphs. Features include a simp…
▽ More
Knowledge graphs (KGs) are a powerful approach for integrating heterogeneous data and making inferences in biology and many other domains, but a coherent solution for constructing, exchanging, and facilitating the downstream use of knowledge graphs is lacking. Here we present KG-Hub, a platform that enables standardized construction, exchange, and reuse of knowledge graphs. Features include a simple, modular extract-transform-load (ETL) pattern for producing graphs compliant with Biolink Model (a high-level data model for standardizing biological data), easy integration of any OBO (Open Biological and Biomedical Ontologies) ontology, cached downloads of upstream data sources, versioned and automatically updated builds with stable URLs, web-browsable storage of KG artifacts on cloud infrastructure, and easy reuse of transformed subgraphs across projects. Current KG-Hub projects span use cases including COVID-19 research, drug repurposing, microbial-environmental interactions, and rare disease research. KG-Hub is equipped with tooling to easily analyze and manipulate knowledge graphs. KG-Hub is also tightly integrated with graph machine learning (ML) tools which allow automated graph machine learning, including node embeddings and training of models for link prediction and node classification.
△ Less
Submitted 31 January, 2023;
originally announced February 2023.
-
GRAPE for Fast and Scalable Graph Processing and random walk-based Embedding
Authors:
Luca Cappelletti,
Tommaso Fontana,
Elena Casiraghi,
Vida Ravanmehr,
Tiffany J. Callahan,
Carlos Cano,
Marcin P. Joachimiak,
Christopher J. Mungall,
Peter N. Robinson,
Justin Reese,
Giorgio Valentini
Abstract:
Graph Representation Learning (GRL) methods opened new avenues for addressing complex, real-world problems represented by graphs. However, many graphs used in these applications comprise millions of nodes and billions of edges and are beyond the capabilities of current methods and software implementations. We present GRAPE, a software resource for graph processing and embedding that can scale with…
▽ More
Graph Representation Learning (GRL) methods opened new avenues for addressing complex, real-world problems represented by graphs. However, many graphs used in these applications comprise millions of nodes and billions of edges and are beyond the capabilities of current methods and software implementations. We present GRAPE, a software resource for graph processing and embedding that can scale with big graphs by using specialized and smart data structures, algorithms, and a fast parallel implementation of random walk-based methods. Compared with state-of-the-art software resources, GRAPE shows an improvement of orders of magnitude in empirical space and time complexity, as well as a competitive edge and node label prediction performance. GRAPE comprises about 1.7 million well-documented lines of Python and Rust code and provides 69 node embedding methods, 25 inference models, a collection of efficient graph processing utilities and over 80,000 graphs from the literature and other sources. Standardized interfaces allow seamless integration of third-party libraries, while ready-to-use and modular pipelines permit an easy-to-use evaluation of GRL methods, therefore also positioning GRAPE as a software resource to perform a fair comparison between methods and libraries for graph processing and embedding.
△ Less
Submitted 7 May, 2023; v1 submitted 12 October, 2021;
originally announced October 2021.
-
Check-hybrid GLDPC Codes: Systematic Elimination of Trapping Sets and Guaranteed Error Correction Capability
Authors:
Vida Ravanmehr,
Mehrdad Khatami,
David Declercq,
Bane Vasic
Abstract:
In this paper, we propose a new approach to construct a class of check-hybrid generalized low-density parity-check (CH-GLDPC) codes which are free of small trapping sets. The approach is based on converting some selected check nodes involving a trapping set into super checks corresponding to a 2-error correcting component code. Specifically, we follow two main purposes to construct the check-hybri…
▽ More
In this paper, we propose a new approach to construct a class of check-hybrid generalized low-density parity-check (CH-GLDPC) codes which are free of small trapping sets. The approach is based on converting some selected check nodes involving a trapping set into super checks corresponding to a 2-error correcting component code. Specifically, we follow two main purposes to construct the check-hybrid codes; first, based on the knowledge of the trapping sets of the global LDPC code, single parity checks are replaced by super checks to disable the trapping sets. We show that by converting specified single check nodes, denoted as critical checks, to super checks in a trapping set, the parallel bit flipping (PBF) decoder corrects the errors on a trapping set and hence eliminates the trapping set. The second purpose is to minimize the rate loss caused by replacing the super checks through finding the minimum number of such critical checks. We also present an algorithm to find critical checks in a trapping set of column-weight 3 LDPC code and then provide upper bounds on the minimum number of such critical checks such that the decoder corrects all error patterns on elementary trapping sets. Moreover, we provide a fixed set for a class of constructed check-hybrid codes. The guaranteed error correction capability of the CH-GLDPC codes is also studied. We show that a CH-GLDPC code in which each variable node is connected to 2 super checks corresponding to a 2-error correcting component code corrects up to 5 errors. The results are also extended to column-weight 4 LDPC codes. Finally, we investigate the eliminating of trapping sets of a column-weight 3 LDPC code using the Gallager B decoding algorithm and generalize the results obtained for the PBF for the Gallager B decoding algorithm.
△ Less
Submitted 18 April, 2016;
originally announced April 2016.
-
Paired Threshold Graphs
Authors:
Vida Ravanmehr,
Gregory J. Puleo,
Sadegh Bolouki,
Olgica Milenkovic
Abstract:
Threshold graphs are recursive deterministic network models that have been proposed for describing certain economic and social interactions. One drawback of this graph family is that it has limited generative attachment rules. To mitigate this problem, we introduce a new class of graphs termed Paired Threshold (PT) graphs described through vertex weights that govern the existence of edges via two…
▽ More
Threshold graphs are recursive deterministic network models that have been proposed for describing certain economic and social interactions. One drawback of this graph family is that it has limited generative attachment rules. To mitigate this problem, we introduce a new class of graphs termed Paired Threshold (PT) graphs described through vertex weights that govern the existence of edges via two inequalities. One inequality imposes the constraint that the sum of weights of adjacent vertices has to exceed a specified threshold. The second inequality ensures that adjacent vertices have a weight difference upper bounded by another threshold. We provide a conceptually simple characterization and decomposition of PT graphs, analyze their forbidden induced subgraphs and present a method for performing vertex weight assignments on PT graphs that satisfy the defining constraints. Furthermore, we describe a polynomial-time algorithm for recognizing PT graphs. We conclude our exposition with an analysis of the intersection number, diameter and clustering coefficient of PT graphs.
△ Less
Submitted 23 May, 2018; v1 submitted 31 March, 2016;
originally announced March 2016.