Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content
Validating module network learning algorithms using simulated data
Discovery of efficient anti-cancer drug combinations is a major challenge, since experimen-tal testing of all possible combinations is clearly impossible. Recent efforts to computation-ally predict drug combination responses retain this... more
Discovery of efficient anti-cancer drug combinations is a major challenge, since experimen-tal testing of all possible combinations is clearly impossible. Recent efforts to computation-ally predict drug combination responses retain this experimental search space, as model definitions typically rely on extensive drug perturbation data. We developed a dynamical model representing a cell fate decision network in the AGS gastric cancer cell line, relying on background knowledge extracted from literature and databases. We defined a set of logi-cal equations recapitulating AGS data observed in cells in their baseline proliferative state. Using the modeling software GINsim, model reduction and simulation compression tech-niques were applied to cope with the vast state space of large logical models and enable simulations of pairwise applications of specific signaling inhibitory chemical substances. Our simulations predicted synergistic growth inhibitory action of five combinations from a to...
Background: The biosciences increasingly face the challenge of integrating a wide variety of available data, information and knowledge in order to gain an understanding of biological systems. Data integration is supported by a diverse... more
Background: The biosciences increasingly face the challenge of integrating a wide variety of available data, information and knowledge in order to gain an understanding of biological systems. Data integration is supported by a diverse series of tools, but the lack of a consistent terminology to label these data still presents significant hurdles. As a consequence, much of the available biological data remains disconnected or worse: becomes misconnected. The need to address this terminology problem has spawned the building of a large number of bio-ontologies. OBOF, RDF and OWL are among the most used ontology formats to capture terms and relationships in the Life Sciences, opening the potential to use the Semantic Web to support data integration and further exploitation of integrated resources via automated retrieval and reasoning procedures. Methods: We extended the Perl suite ONTO-PERL and functionally integrated it into the Galaxy platform. The resulting ONTO-ToolKit supports the ...
DNA binding transcription factors: Setting the stage for a large-scale curation effort
Scientific progress is increasingly dependent on knowledge in computation-ready forms. In the life sciences, among others, many scientists carefully extract and structure knowledge from the scientific literature. In a process called... more
Scientific progress is increasingly dependent on knowledge in computation-ready forms. In the life sciences, among others, many scientists carefully extract and structure knowledge from the scientific literature. In a process called manual curation, they enter knowledge into spreadsheets, or into databases where it serves their and many others' research. Valuable as these curation efforts are, the range and detail of what can practically be captured and shared remains limited, because of the constraints of current curation tools. Many important contextual aspects of observations described in literature simply do not fit in the form defined by these tools, and thus cannot be captured. Here we present the design of an easy-to-use, general-purpose method and interface, that enables the precise semantic capture of virtually unlimited types of information and details, using only a minimal set of building blocks. Scientists from any discipline can use this to convert any complex knowl...
The Semantic Web standards OWL and RDF are often used to represent biomedical information as Linked Data; however, the OWL/RDF syntax, which combines both, was never optimised for querying. By combining two formal paradigms for modelling... more
The Semantic Web standards OWL and RDF are often used to represent biomedical information as Linked Data; however, the OWL/RDF syntax, which combines both, was never optimised for querying. By combining two formal paradigms for modelling Linked Data, namely multi-digraphs and Description Logic, many precise terms for relations have emerged that are defined in the Metarel relation ontology. They are especially useful in Linked Data and RDF knowledge bases that 1 rely on SPARQL querying and 2 require semantic support for chains of relations.Metarel-described multi-digraphs were used for knowledge integration and reasoning in three RDF knowledge bases in the domain of genome biology: BioGateway, Cell Cycle Ontology and Gene Expression Knowledge Base. These knowledge bases integrate both data, like KEGG, and ontologies, like Gene Ontology, in the same RDF graphs. Their libraries with biomedically relevant SPARQL queries show the practical benefits of this semantic paradigm. In addition ...
Genome-scale... more
Genome-scale 'omics' data constitute a potentially rich source of information about biological systems and their function. There is a plethora of tools and methods available to mine omics data. However, the diversity and complexity of different omics data types is a stumbling block for multi-data integration, hence there is a dire need for additional methods to exploit potential synergy from integrated orthogonal data. Rough Sets provide an efficient means to use complex information in classification approaches. Here, we set out to explore the possibilities of Rough Sets to incorporate diverse information sources in a functional classification of unknown genes. We explored the use of Rough Sets for a novel data integration strategy where gene expression data, protein features and Gene Ontology (GO) annotations were combined to describe general and biologically relevant patterns represented by If-Then rules. The descriptive rules were used to predict the function of unknown genes in Arabidopsis thaliana and Schizosaccharomyces pombe. The If-Then rule models showed success rates of up to 0.89 (discriminative and predictive power for both modeled organisms); whereas, models built solely of one data type (protein features or gene expression data) yielded success rates varying from 0.68 to 0.78. Our models were applied to generate classifications for many unknown genes, of which a sizeable number were confirmed either by PubMed literature reports or electronically interfered annotations. Finally, we studied cell cycle protein-protein interactions derived from both tandem affinity purification experiments and in silico experiments in the BioGRID interactome database and found strong experimental evidence for the predictions generated by our models. The results show that our approach can be used to build very robust models that create synergy from integrating gene expression data and protein features. The Rough Set-based method is implemented in the Rosetta toolkit kernel version 1.0.1 available at: http://rosetta.lcb.uu.se/
Summary The BioGateway App is a Cytoscape (version 3) plugin designed to provide easy query access to the BioGateway Resource Description Framework triple store, which contains functional and interaction information for proteins from... more
Summary The BioGateway App is a Cytoscape (version 3) plugin designed to provide easy query access to the BioGateway Resource Description Framework triple store, which contains functional and interaction information for proteins from several curated resources. For explorative network building, we have added a comprehensive dataset with regulatory relationships of mammalian DNA-binding transcription factors and their target genes, compiled both from curated resources and from a text mining effort. Query results are visualized using the inherent flexibility of the Cytoscape framework, and network links can be checked against curated database records or against the original publication. Availability and implementation Install through the Cytoscape application manager or visit www.biogateway.eu for download and tutorial documents. Supplementary information Supplementary information is available at Bioinformatics online.
In recent years, several authors have used probabilistic graphical models to learn expression modules and their regulatory programs from gene expression data. Here, we demonstrate the use of the synthetic data generator SynTReN for the... more
In recent years, several authors have used probabilistic graphical models to learn expression modules and their regulatory programs from gene expression data. Here, we demonstrate the use of the synthetic data generator SynTReN for the purpose of testing and comparing module network learning algorithms. We introduce a software package for learning module networks, called LeMoNe, which incorporates a novel strategy for learning regulatory programs. Novelties include the use of a bottom-up Bayesian hierarchical clustering to construct the regulatory programs, and the use of a conditional entropy measure to assign regulators to the regulation program nodes. Using SynTReN data, we test the performance of LeMoNe in a completely controlled situation and assess the effect of the methodological changes we made with respect to an existing software package, namely Genomica. Additionally, we assess the effect of various parameters, such as the size of the data set and the amount of noise, on t...
ABSTRACT
The Mauriceville and Varkud plasmids are retroid elements that propagate in the mitochondria of some Neurospora spp. strains. Previous studies of endogenous reactions in ribonucleoprotein particle preparations suggested that the plasmids... more
The Mauriceville and Varkud plasmids are retroid elements that propagate in the mitochondria of some Neurospora spp. strains. Previous studies of endogenous reactions in ribonucleoprotein particle preparations suggested that the plasmids use a novel mechanism of reverse transcription that involves synthesis of a full-length minus-strand DNA beginning at the 3' end of the plasmid transcript, which has a 3' tRNA-like structure (M. T. R. Kuiper and A. M. Lambowitz, Cell 55:693-704, 1988). In this study, we developed procedures for releasing the Mauriceville plasmid reverse transcriptase from mitochondrial ribonucleoprotein particles and partially purifying it by heparin-Sepharose chromatography. By using these soluble preparations, we show directly that the Mauriceville plasmid reverse transcriptase synthesizes full-length cDNA copies of in vitro transcripts beginning at the 3' end and has a preference for transcripts having the 3' tRNA-like structure. Further, unlike r...
Scientific progress is increasingly dependent on knowledge in computation-ready forms. In the life sciences, among others, many scientists therefore extract and structure knowledge from the literature. In a process called manual curation,... more
Scientific progress is increasingly dependent on knowledge in computation-ready forms. In the life sciences, among others, many scientists therefore extract and structure knowledge from the literature. In a process called manual curation, they enter knowledge into spreadsheets, or into databases where it serves their and many others’ research. Valuable as these curation efforts are, the range and detail of what can practically be captured and shared remains limited, because of the constraints of current curation tools. Many important contextual aspects of observations described in literature simply do not fit in the form defined by these tools, and thus cannot be captured. Here we present the design of an easy-touse, general-purpose method and interface, that enables the precise semantic capture of virtually unlimited types of information and details, using only a minimal set of building blocks. Scientists from any discipline can use this to convert any complex knowledge into a form...
Life Science information is increasingly available on the Semantic Web and this poses a demand for new tools and methodologies if it is to fulfill its potential to advance research in all areas of life sciences, including biomedicine.... more
Life Science information is increasingly available on the Semantic Web and this poses a demand for new tools and methodologies if it is to fulfill its potential to advance research in all areas of life sciences, including biomedicine. Life science information is obtained by traditionally distinct and varied scientific disciplines which explains why it is heterogeneous in its representation and in its semantics. The exploitation of this information by users relies only to a limited extent on well understood and shared formats, relations and metaphors; the interaction of users with biomedical information resources is an integral part of the definition and interpretation of the information that they provide. The need for this interactivity is reflected by current biomedical research practice. A range of life science software tools and methodologies focus on the analysis of biological networks and pathways. They provide interactive environments where relations among biological entities ...
Many different clustering algorithms have been developed to detect structure in data sets in an unsupervised way. As user intervention for these methods should be kept to a minimum, robustness with respect to userdefined initial... more
Many different clustering algorithms have been developed to detect structure in data sets in an unsupervised way. As user intervention for these methods should be kept to a minimum, robustness with respect to userdefined initial conditions is of crucial importance. In a previous study, we have shown how the robustness of a hard clustering algorithm can be increased by the removal of what we called unstable data elements. Although robustness is a main characteristic of any clustering tool, the most important feature is still the quality of the produced clusterings. This paper experimentally investigates how the removal of unstable data elements from a data set affects the quality of produced clusterings, as measured by the mutual information index, using three biological gene expression data sets. Keywords-hard clustering; cluster quality; unstable elements; mutual information context; microarray data.
Recurrent Neural Networks (RNN) have been used in multiple tasks such as speech recognition, music composition and protein homology detection. In particular, they have shown superior performance in predicting structure in time series... more
Recurrent Neural Networks (RNN) have been used in multiple tasks such as speech recognition, music composition and protein homology detection. In particular, they have shown superior performance in predicting structure in time series data. To our knowledge, RNN have not been used on DNA methylation data. Methylation patterns on chromosomal DNA represent an important form of epigenetic imprinting, a form of epigenetics that results in heritable gene expression and phenotype changes. DNA methylation is one of the mechanisms that a cell uses to fine-tune the expression levels of its individual genes, and it has been shown to affect very specific areas around specific genes. The methylation state of the human chromosomal DNA can be readily assessed with microarray technology, allowing the determination of the methylation status of thousands of positions along the individual chromosomes of the genome. With RNN analysis, we show that these methylation patterns have substantial structure, ...
Background Treating patients with combinations of drugs that have synergistic effects has become widespread practice in the clinic. Drugs work synergistically when the observed effect of a drug combination is larger than the effect... more
Background Treating patients with combinations of drugs that have synergistic effects has become widespread practice in the clinic. Drugs work synergistically when the observed effect of a drug combination is larger than the effect predicted by the reference model. The reference model is a theoretical null model that returns the combined effect of given doses of drugs under the assumption that these drugs do not interact. There is ongoing debate on what it means for drugs to not interact. The controversy transcends mathematical punctuality, as different non-interaction principles result in different reference models. A famous reference model that has been in existence for already a long time is Loewe’s reference model. Loewe’s vision on non-interaction was purely intuitive: two drugs do not interact if all combinations of doses that result in a certain given effect lie on a straight line. Results We show that Loewe’s reference model can be obtained from much more fundamental princip...
The BioGateway App is a plugin for the Cytoscape network editor, allowing users to interactively build biological networks by querying the Biogateway Resource Description Framework (RDF) triple store. BioGateway contains information from... more
The BioGateway App is a plugin for the Cytoscape network editor, allowing users to interactively build biological networks by querying the Biogateway Resource Description Framework (RDF) triple store. BioGateway contains information from several curated resources including UniProtKB, IntAct, Gene Ontology Annotations, various datasets containing transcription‐factor regulatory relations to specific target genes, and more. The BioGateway App facilitates the step‐by‐step creation of complex SPARQL queries through an intuitive Graphical User Interface, allowing users to build and explore biological interaction networks to assess, among other things, gene regulatory relationships, gene ontology annotations, and protein‐protein interactions. As the BioGateway information content is most abundant for human proteins and genes, this article describes the utility of the tool through a series of use cases on these human data, starting from the most basic levels and then detailing applications that address some of the rich complexity of the integrated data. Network refinement and display can be further optimized via the selection and filtering possibilities that the Cytoscape framework provides. The use cases also provide examples to explore network information in other species, as they become supported by BioGateway. © 2020 The Authors.
DNA-binding transcription factors recognise genomic addresses, specific sequence motifs in gene regulatory regions, to control gene transcription. A complete and reliable catalogue of all DNA-binding transcription factors is key to... more
DNA-binding transcription factors recognise genomic addresses, specific sequence motifs in gene regulatory regions, to control gene transcription. A complete and reliable catalogue of all DNA-binding transcription factors is key to investigating the delicate balance of gene regulation in response to environmental and developmental stimuli. The need for such a catalogue of proteins is demonstrated by the many lists of DNA-binding transcription factors that have been produced over the past decade.The COST Action Gene Regulation Ensemble Effort for the Knowledge Commons (GREEKC) Consortium brought together experts in the field of transcription with the aim of providing high quality and interoperable gene regulatory data. The Gene Ontology (GO) Consortium provides strict definitions for gene product function, including factors that regulate transcription. The collaboration between the GREEKC and GO Consortia has enabled the application of those definitions to produce a new curated catal...
VSM is a recently introduced method for entering and displaying any type of knowledge, in a form that is both semantically precise for computation and intuitive for human understanding. VSM is the combination of a new semantic model, and... more
VSM is a recently introduced method for entering and displaying any type of knowledge, in a form that is both semantically precise for computation and intuitive for human understanding. VSM is the combination of a new semantic model, and the design for a dedicated user interface to support it. Here we present the implementation of this user interface, as a sophisticated HTML-element, <vsm-box>, that can be embedded in any web-based curation app. We show how developers can use it for biocuration projects, customize it to particular end-user needs, and contribute to its growth. Vsm-box is open-source at https://github.com/vsmjs/vsm-box under the AGPL license, as a JavaScript (ES6) Vue.js web-component that runs in all modern web browsers. It is the capstone of the Vsmjs organization at https://github.com/vsmjs that groups its supporting modules. Extensive supplementary material on VSM and vsm-box is available at https://vsmjs.github.io.
A large variety of molecular interactions occurs between biomolecular components in cells. When one or a cascade of molecular interactions results in a regulatory effect, by one component onto a downstream component, a so-called ‘causal... more
A large variety of molecular interactions occurs between biomolecular components in cells. When one or a cascade of molecular interactions results in a regulatory effect, by one component onto a downstream component, a so-called ‘causal interaction’ takes place. Causal interactions constitute the building blocks in our understanding of larger regulatory networks in cells. These causal interactions and the biological processes they enable (e.g., gene regulation) need to be described with a careful appreciation of molecular interactions that occur between entities. A proper description of this information enables archiving, sharing, and reuse by humans and for computational science. Various representations of causal relationships between biological components are currently used in a variety of resources. Here, we propose a checklist that accommodates current representations, and call it the Minimum Information about a Molecular Interaction CAusal STatement (MI2CAST). This checklist de...
We have two theses about scientific knowledge in the age of computation. Our general claim is that scientific Knowledge Management practices emerge as second-order practices whose aim is to systematically collect, take care of and... more
We have two theses about scientific knowledge in the age of computation. Our general claim is that scientific Knowledge Management practices emerge as second-order practices whose aim is to systematically collect, take care of and mobilise first-hand disciplinary knowledge and data. Our specific thesis is that knowledge management practices are transforming biological research in at least three ways. We argue that scientific Knowledge Management a. operates with founded concepts of biological knowledge as explicated and computable, b. enables new outputs and ways of knowing within biology, and c. risks enforcing objectivist epistemologies of knowledge as some one objective thing.
It is commonplace to determine the effectiveness of the combination of drugs by comparing the observed effects to a reference model that describes the combined effect under the assumption that the drugs do not interact. Depending on what... more
It is commonplace to determine the effectiveness of the combination of drugs by comparing the observed effects to a reference model that describes the combined effect under the assumption that the drugs do not interact. Depending on what is to be understood by non-interacting behavior, several reference models have been developed in the literature. One of them is the celebrated Bliss independence model, which assimilates non-interaction with statistical independence. Intuitively, this requires the dose-response curves to have zero as minimal effect and one as maximal effect, a restriction that was indeed adopted by Bliss. However, we show how non-interaction can be interpreted in terms of statistical independence, while nevertheless allowing arbitrary values for the minimal and the maximal effect. Furthermore, our reference model allows the maximal effects of the dose-response curves to be different. In a first step, we construct a basic reference model for the case of two drugs and...
Gene ontology annotations have become an essential resource for biological interpretations of experimental findings. The process of gathering basic annotation information in tables that link gene sets with specific gene ontology terms can... more
Gene ontology annotations have become an essential resource for biological interpretations of experimental findings. The process of gathering basic annotation information in tables that link gene sets with specific gene ontology terms can be cumbersome, in particular if it requires above average computer skills or bioinformatics expertise. We have therefore developed Genes2GO, an intuitive R-based web application. Genes2GO uses the biomaRt package of Bioconductor in order to retrieve custom sets of gene ontology annotations for any list of genes from organisms covered by the Ensembl database. Genes2GO produces a binary matrix file, indicating for each gene the presence or absence of specific annotations for a gene. It should be noted that other GO tools do not offer this user-friendly access to annotations. Genes2GO is freely available and listed under http://www.semantic-systems-biology.org/tools/externaltools/.
A large gap remains between the amount of knowledge in scientific literature and the fraction that gets curated into standardized databases, despite many curation initiatives. Yet the availability of comprehensive knowledge in databases... more
A large gap remains between the amount of knowledge in scientific literature and the fraction that gets curated into standardized databases, despite many curation initiatives. Yet the availability of comprehensive knowledge in databases is crucial for exploiting existing background knowledge, both for designing follow-up experiments and for interpreting new experimental data. Structured resources also underpin the computational integration and modeling of regulatory pathways, which further aids our understanding of regulatory dynamics. We argue how cooperation between the scientific community and professional curators can increase the capacity of capturing precise knowledge from literature. We demonstrate this with a project in which we mobilize biological domain experts who curate large amounts of DNA binding transcription factors, and show that they, although new to the field of curation, can make valuable contributions by harvesting reported knowledge from scientific papers. Such...
The gastrointestinal peptide hormones cholecystokinin and gastrin exert their biological functions via cholecystokinin receptors CCK1R and CCK2R respectively. Gastrin, a central regulator of gastric acid secretion, is involved in growth... more
The gastrointestinal peptide hormones cholecystokinin and gastrin exert their biological functions via cholecystokinin receptors CCK1R and CCK2R respectively. Gastrin, a central regulator of gastric acid secretion, is involved in growth and differentiation of gastric and colonic mucosa, and there is evidence that it is pro-carcinogenic. Cholecystokinin is implicated in digestion, appetite control and body weight regulation, and may play a role in several digestive disorders. We performed a detailed analysis of the literature reporting experimental evidence on signaling pathways triggered by CCK1R and CCK2R, in order to create a comprehensive map of gastrin and cholecystokinin-mediated intracellular signaling cascades. The resulting signaling map captures 413 reactions involving 530 molecular species, and incorporates the currently available knowledge into one integrated signaling network. The decomposition of the signaling map into sub-networks revealed 18 modules that represent hig...
Biological networks are exploited in many ways for gaining new knowledge about biological systems. Graph analysis of networks may provide useful characteristics about the design principles and mechanisms of pathways and regulation... more
Biological networks are exploited in many ways for gaining new knowledge about biological systems. Graph analysis of networks may provide useful characteristics about the design principles and mechanisms of pathways and regulation processes. Building networks as an object of scientific study, however, may prove to be a painstaking task, calling for elaborate database and literature surveying in order to get a comprehensive network representation in a topological correct format. We have used such elaborate approaches for instance for building logical models with predictive power for anti-cancer drug efficacy. Alternatively, the Semantic Web brings promises of enhanced sharing and use of biological knowledge. Semantic Systems Biology (SSB) aims to utilise semantic web resources as an additional toolkit for integrative and modeling approaches aiming to analyse and understand biological systems. The SSB group at the Norwegian University of Science and Technology works towards ways to reach out to end-users/biologists in order to create some user-pull to direct further implementations of semantic web resources. One of our efforts resulted in the construction of a resource for gene expression regulation analysis: the Gene eXpression Knowledge Base GeXKB. GeXKB provides a resource for finding novel network candidates potentially involved in gene expression regulation. The construction of GeXKB prompted us to start efforts in the direction of ‘semantifying’ data from the source: the curation of Transcription Factor information from scientific literature. This resulted in the TFcheckpoint database (www.tfcheckpoint.org), and the publication of a set of curation guidelines for other volunteer curators to join in this effort. This work inspired us to see if we could bring together the global community interested in the domain of transcription regulation research, and we are in the process of initiating GRECO: the Gene Regulation Consortium. GRECO aims to facilitate communication between resource and technology providers, paving the way to develop one virtual integrated high quality knowledge resource that could be used for instance in the field of regulatory network building and analysis.
Transcriptional regulation of gene expression is an important mechanism in many biological processes. Aberrations in this mechanism have been implicated in cancer and other diseases. Effective investigation of gene expression mechanisms... more
Transcriptional regulation of gene expression is an important mechanism in many biological processes. Aberrations in this mechanism have been implicated in cancer and other diseases. Effective investigation of gene expression mechanisms requires a system-wide integration and assessment of all available knowledge of the underlying molecular networks. This calls for a method that effectively manages and integrates the available data. We have built a semantic web based knowledge system that constitutes a significant step in this direction: the Gene Expression Knowledge Base (GeXKB). The GeXKB encompasses three application on-tologies: the Gene Expression Ontology (GeXO), the Regulation of Gene Expression Ontology (ReXO), and the Regulation of Transcription Ontology (ReTO). These three ontologies, respec-tively, integrate gene expression information that is increasingly more specific, yet decreasing in coverage, from a variety of sources. The system is capable of answering complex biolo...
The last decade has seen the emergence of Systems Biology: an integrative approach to understand how biological systems are built and operate. In parallel, the Semantic Web has started to offer new technologies that make data on the web... more
The last decade has seen the emergence of Systems Biology: an integrative approach to understand how biological systems are built and operate. In parallel, the Semantic Web has started to offer new technologies that make data on the web comprehensible for computers. The merger of the two, in Semantic Systems Biology, offers new opportunities for data integration, sharing and analysis through computational querying and reasoning.

And 93 more