Phenotypes resulting from mutations in genetic model organisms can help reveal candidate genes fo... more Phenotypes resulting from mutations in genetic model organisms can help reveal candidate genes for evolutionarily important phenotypic changes in related taxa. Although testing candidate gene hypotheses experimentally in non-model organisms is typically difficult, ontology-driven information systems can help generate testable hypotheses about developmental processes in experimentally tractable organisms. Here, we tested candidate gene hypotheses suggested by expert use of the Phenoscape Knowledgebase, specifically looking for genes that are candidates responsible for evolutionarily interesting phenotypes in the ostariophysan fishes that bear resemblance to mutant phenotypes in zebrafish. For this, we searched ZFIN for genetic perturbations that result in either loss of basihyal element or loss of scales phenotypes, because these are the ancestral phenotypes observed in catfishes (Siluriformes). We tested the identified candidate genes by examining their endogenous expression patterns in the channel catfish, Ictalurus punctatus. The experimental results were consistent with the hypotheses that these features evolved via disruption in developmental pathways at, or upstream of, brpf1 and eda/edar for the ancestral losses of basihyal element and scales, respectively. These results demonstrate that ontological annotations of the phenotypic effects of genetic alterations in model organisms, when aggregated within a knowledgebase, can be used effectively to generate testable, and useful, hypotheses about evolutionary changes in morphology.
The reality of larger and larger molecular databases and the need to integrate data scalably have... more The reality of larger and larger molecular databases and the need to integrate data scalably have presented a major challenge for the use of phenotypic data. Morphology is currently primarily described in discrete publications, entrenched in non-computer readable text, and requires enormous investments of time and resources to integrate across large numbers of taxa and studies. Here we present a new methodology, using ontology-based reasoning systems working with the Phenoscape Knowledgebase (KB; kb.phenoscape.org), to automatically integrate large amounts of evolutionary character state descriptions into a synthetic character matrix of neomorphic (presence/absence) data. Using the KB, which includes more than 55 studies of sarcopterygian taxa, we generated a synthetic supermatrix of 639 variable characters scored for 1051 taxa, resulting in over 145,000 populated cells. Of these characters, over 76% were made variable through the addition of inferred presence/absence states derived...
Classification of human tumors according to their primary anatomical site of origin is fundamenta... more Classification of human tumors according to their primary anatomical site of origin is fundamental for the optimal treatment of patients with cancer. Here we describe the use of large-scale RNA profiling and supervised machine learning algorithms to construct a first-generation molecular classification scheme for carcinomas of the prostate, breast, lung, ovary, colorectum, kidney, liver, pancreas, bladder/ureter, and gastroesophagus, which collectively account for approximately 70% of all cancer-related deaths in the United States. The classification scheme was based on identifying gene subsets whose expression typifies each cancer class, and we quantified the extent to which these genes are characteristic of a specific tumor type by accurately and confidently predicting the anatomical site of tumor origin for 90% of 175 carcinomas, including 9 of 12 metastatic lesions. The predictor gene subsets include those whose expression is typical of specific types of normal epithelial differ...
Proceedings / ... International Conference on Intelligent Systems for Molecular Biology ; ISMB. International Conference on Intelligent Systems for Molecular Biology, 2000
In this paper we address the problem of reliably fitting parametric and semi-parametric models to... more In this paper we address the problem of reliably fitting parametric and semi-parametric models to spots in high density spot array images obtained in gene expression experiments. The goal is to measure the amount of label bound to an array element. A lot of spots can be modelled accurately by a Gaussian shape. In order to deal with highly overlapping spots we use robust M-estimators. When the parametric method fails (which can be detected automatically) we use a novel, robust semi-parametric method which can handle spots of different shapes accurately. The introduced techniques are evaluated experimentally.
Nearly invariably, phenotypes are reported in the scientific literature in meticulous detail, uti... more Nearly invariably, phenotypes are reported in the scientific literature in meticulous detail, utilizing the full expressivity of natural language. Often it is particularly these detailed observations (facts) that are of interest, and thus specific to the research questions that motivated observing and reporting them. However, research aiming to synthesize or integrate phenotype data across many studies or even fields is often faced with the need to abstract from detailed observations so as to construct phenotypic concepts that are common across many datasets rather than specific to a few. Yet, observations or facts that would fall under such abstracted concepts are typically not directly asserted by the original authors, usually because they are "obvious" according to common domain knowledge, and thus asserting them would be deemed redundant by anyone with sufficient domain knowledge. For example, a phenotype describing the length of a manual digit for an organism implicit...
Despite complete sequencing of the human and mouse genomes, functional annotation of novel gene f... more Despite complete sequencing of the human and mouse genomes, functional annotation of novel gene function still remains a major challenge in mammalian biology. Emerging strategies to help elucidate unknown gene function include the analysis of tissue-specific patterns of mRNA expression. A recent study investigated the steady-state mRNA expression profiling of the vast majority of protein-encoding human and mouse genes across a panel of 79 human and 61 mouse nonredundant tissues. The microarray data from this study constitutes the Genomics Institute of Novartis Foundation (GNF) Human and Mouse Gene Atlases and is publicly available for exploration through the SymAtlas web-application (http://symatlas.gnf.org/). We have recently reported the use of these data and hierarchical clustering algorithms to generate a global overview of the distribution of Rabs, SNAREs, and coat machinery components, as well as their respective adaptors, effectors, and regulators. This systems biology approach led us to propose Rab-centric protein activity hubs as a framework for an integrated coding system, the membrome network, which orchestrates the dynamics of specialized membrane architecture of differentiated cells. Here, we describe the use of the SymAtlas web-application and the Membrome datasets to help explore trafficking GTPase function. The human and mouse membrome datasets are available through the Membrome homepage (http://www.membrome.org/) and correspond to subsets of the SymAtlas content restricted to known membrane trafficking components. Considering the fragmentary nature of the current reductionist approaches in elucidating trafficking component functions, the membrome datasets provide a more focused systems biology perspective that not only complements our current understanding of transport in complex tissues but also provides an integrated perspective of Rab activity in controlling membrane architecture.
The application of semantic technologies to the integration of biological data and the interopera... more The application of semantic technologies to the integration of biological data and the interoperability of bioinformatics analysis and visualization tools has been the common theme of a series of annual BioHackathons hosted in Japan for the past five years. Here we provide a review of the activities and outcomes from the BioHackathons held in 2011 in Kyoto and 2012 in Toyama. In order to efficiently implement semantic technologies in the life sciences, participants formed various sub-groups and worked on the following topics: Resource Description Framework (RDF) models for specific domains, text mining of the literature, ontology development, essential metadata for biological databases, platforms to enable efficient Semantic Web technology development and interoperability, and the development of applications for Semantic Web data. In this review, we briefly introduce the themes covered by these sub-groups. The observations made, conclusions drawn, and software development projects that emerged from these activities are discussed.
Phenotypes resulting from mutations in genetic model organisms can help reveal candidate genes fo... more Phenotypes resulting from mutations in genetic model organisms can help reveal candidate genes for evolutionarily important phenotypic changes in related taxa. Although testing candidate gene hypotheses experimentally in non-model organisms is typically difficult, ontology-driven information systems can help generate testable hypotheses about developmental processes in experimentally tractable organisms. Here, we tested candidate gene hypotheses suggested by expert use of the Phenoscape Knowledgebase, specifically looking for genes that are candidates responsible for evolutionarily interesting phenotypes in the ostariophysan fishes that bear resemblance to mutant phenotypes in zebrafish. For this, we searched ZFIN for genetic perturbations that result in either loss of basihyal element or loss of scales phenotypes, because these are the ancestral phenotypes observed in catfishes (Siluriformes). We tested the identified candidate genes by examining their endogenous expression patterns in the channel catfish, Ictalurus punctatus. The experimental results were consistent with the hypotheses that these features evolved via disruption in developmental pathways at, or upstream of, brpf1 and eda/edar for the ancestral losses of basihyal element and scales, respectively. These results demonstrate that ontological annotations of the phenotypic effects of genetic alterations in model organisms, when aggregated within a knowledgebase, can be used effectively to generate testable, and useful, hypotheses about evolutionary changes in morphology.
The reality of larger and larger molecular databases and the need to integrate data scalably have... more The reality of larger and larger molecular databases and the need to integrate data scalably have presented a major challenge for the use of phenotypic data. Morphology is currently primarily described in discrete publications, entrenched in non-computer readable text, and requires enormous investments of time and resources to integrate across large numbers of taxa and studies. Here we present a new methodology, using ontology-based reasoning systems working with the Phenoscape Knowledgebase (KB; kb.phenoscape.org), to automatically integrate large amounts of evolutionary character state descriptions into a synthetic character matrix of neomorphic (presence/absence) data. Using the KB, which includes more than 55 studies of sarcopterygian taxa, we generated a synthetic supermatrix of 639 variable characters scored for 1051 taxa, resulting in over 145,000 populated cells. Of these characters, over 76% were made variable through the addition of inferred presence/absence states derived...
Classification of human tumors according to their primary anatomical site of origin is fundamenta... more Classification of human tumors according to their primary anatomical site of origin is fundamental for the optimal treatment of patients with cancer. Here we describe the use of large-scale RNA profiling and supervised machine learning algorithms to construct a first-generation molecular classification scheme for carcinomas of the prostate, breast, lung, ovary, colorectum, kidney, liver, pancreas, bladder/ureter, and gastroesophagus, which collectively account for approximately 70% of all cancer-related deaths in the United States. The classification scheme was based on identifying gene subsets whose expression typifies each cancer class, and we quantified the extent to which these genes are characteristic of a specific tumor type by accurately and confidently predicting the anatomical site of tumor origin for 90% of 175 carcinomas, including 9 of 12 metastatic lesions. The predictor gene subsets include those whose expression is typical of specific types of normal epithelial differ...
Proceedings / ... International Conference on Intelligent Systems for Molecular Biology ; ISMB. International Conference on Intelligent Systems for Molecular Biology, 2000
In this paper we address the problem of reliably fitting parametric and semi-parametric models to... more In this paper we address the problem of reliably fitting parametric and semi-parametric models to spots in high density spot array images obtained in gene expression experiments. The goal is to measure the amount of label bound to an array element. A lot of spots can be modelled accurately by a Gaussian shape. In order to deal with highly overlapping spots we use robust M-estimators. When the parametric method fails (which can be detected automatically) we use a novel, robust semi-parametric method which can handle spots of different shapes accurately. The introduced techniques are evaluated experimentally.
Nearly invariably, phenotypes are reported in the scientific literature in meticulous detail, uti... more Nearly invariably, phenotypes are reported in the scientific literature in meticulous detail, utilizing the full expressivity of natural language. Often it is particularly these detailed observations (facts) that are of interest, and thus specific to the research questions that motivated observing and reporting them. However, research aiming to synthesize or integrate phenotype data across many studies or even fields is often faced with the need to abstract from detailed observations so as to construct phenotypic concepts that are common across many datasets rather than specific to a few. Yet, observations or facts that would fall under such abstracted concepts are typically not directly asserted by the original authors, usually because they are "obvious" according to common domain knowledge, and thus asserting them would be deemed redundant by anyone with sufficient domain knowledge. For example, a phenotype describing the length of a manual digit for an organism implicit...
Despite complete sequencing of the human and mouse genomes, functional annotation of novel gene f... more Despite complete sequencing of the human and mouse genomes, functional annotation of novel gene function still remains a major challenge in mammalian biology. Emerging strategies to help elucidate unknown gene function include the analysis of tissue-specific patterns of mRNA expression. A recent study investigated the steady-state mRNA expression profiling of the vast majority of protein-encoding human and mouse genes across a panel of 79 human and 61 mouse nonredundant tissues. The microarray data from this study constitutes the Genomics Institute of Novartis Foundation (GNF) Human and Mouse Gene Atlases and is publicly available for exploration through the SymAtlas web-application (http://symatlas.gnf.org/). We have recently reported the use of these data and hierarchical clustering algorithms to generate a global overview of the distribution of Rabs, SNAREs, and coat machinery components, as well as their respective adaptors, effectors, and regulators. This systems biology approach led us to propose Rab-centric protein activity hubs as a framework for an integrated coding system, the membrome network, which orchestrates the dynamics of specialized membrane architecture of differentiated cells. Here, we describe the use of the SymAtlas web-application and the Membrome datasets to help explore trafficking GTPase function. The human and mouse membrome datasets are available through the Membrome homepage (http://www.membrome.org/) and correspond to subsets of the SymAtlas content restricted to known membrane trafficking components. Considering the fragmentary nature of the current reductionist approaches in elucidating trafficking component functions, the membrome datasets provide a more focused systems biology perspective that not only complements our current understanding of transport in complex tissues but also provides an integrated perspective of Rab activity in controlling membrane architecture.
The application of semantic technologies to the integration of biological data and the interopera... more The application of semantic technologies to the integration of biological data and the interoperability of bioinformatics analysis and visualization tools has been the common theme of a series of annual BioHackathons hosted in Japan for the past five years. Here we provide a review of the activities and outcomes from the BioHackathons held in 2011 in Kyoto and 2012 in Toyama. In order to efficiently implement semantic technologies in the life sciences, participants formed various sub-groups and worked on the following topics: Resource Description Framework (RDF) models for specific domains, text mining of the literature, ontology development, essential metadata for biological databases, platforms to enable efficient Semantic Web technology development and interoperability, and the development of applications for Semantic Web data. In this review, we briefly introduce the themes covered by these sub-groups. The observations made, conclusions drawn, and software development projects that emerged from these activities are discussed.
Uploads
Papers by Hilmar Lapp