HIPBI-RD (Harmonising phenomics information for a better interoperability in the rare disease fie... more HIPBI-RD (Harmonising phenomics information for a better interoperability in the rare disease field) is a three-year project which started in 2016 funded via the E-Rare 3 ERA-NET program. This project builds on three resources largely adopted by the rare disease (RD) community: Orphanet, its ontology ORDO (the Orphanet Rare Disease Ontology), HPO (the Human Phenotype Ontology) as well as PhenoTips software for the capture and sharing of structured phenotypic data for RD patients. Our project is further supported by resources developed by the European Bioinformatics Institute and the Garvan Institute. HIPBI-RD aims to provide the community with an integrated, RD-specific bioinformatics ecosystem that will harmonise the way phenomics information is stored in databases and patient files worldwide, and thereby contribute to interoperability. This ecosystem will consist of a suite of tools and ontologies, optimized to work together, and made available through commonly used software repos...
The integration of cellular and molecular structural data is key to understanding the function of... more The integration of cellular and molecular structural data is key to understanding the function of macromolecular assemblies and complexes in their in vivo context. Here we report on the outcomes of a workshop that discussed how to integrate structural data from a range of public archives. The workshop identified two main priorities: the development of tools and file formats to support segmentation (that is, the decomposition of a three-dimensional volume into regions that can be associated with defined objects), and the development of tools to support the annotation of biological structures.
The Human Induced Pluripotent Stem Cell Initiative (HipSci) isf establishing a large catalogue of... more The Human Induced Pluripotent Stem Cell Initiative (HipSci) isf establishing a large catalogue of human iPSC lines, arguably the most well characterized collection to date. The HipSci portal enables researchers to choose the right cell line for their experiment, and makes HipSci's rich catalogue of assay data easy to discover and reuse. Each cell line has genomic, transcriptomic, proteomic and cellular phenotyping data. Data are deposited in the appropriate EMBL-EBI archives, including the European Nucleotide Archive (ENA), European Genome-phenome Archive (EGA), ArrayExpress and PRoteomics IDEntifications (PRIDE) databases. The project will make 500 cell lines from healthy individuals, and from 150 patients with rare genetic diseases; these will be available through the European Collection of Authenticated Cell Cultures (ECACC). As of August 2016, 238 cell lines are available for purchase. Project data is presented through the HipSci data portal (http://www.hipsci.org/lines) and...
Authoring bio-ontologies is a task that has traditionally been undertaken by skilled experts trai... more Authoring bio-ontologies is a task that has traditionally been undertaken by skilled experts trained in understanding complex languages such as the Web Ontology Language (OWL), in tools designed for such experts. As requests for new terms are made, the need for expert ontologists represents a bottleneck in the development process. Furthermore, the ability to rigorously enforce ontology design patterns in large, collaboratively developed ontologies is difficult with existing ontology authoring software. We present Webulous, an application suite for supporting ontology creation by design patterns. Webulous provides infrastructure to specify templates for populating ontology design patterns that get transformed into OWL assertions in a target ontology. Webulous provides programmatic access to the template server and a client application has been developed for Google Sheets that allows templates to be loaded, populated and resubmitted to the Webulous server for processing. The developme...
ABSTRACT Motivation: ,The recent crop, of ,bio-medical standards has promoted the ,use ,of ontolo... more ABSTRACT Motivation: ,The recent crop, of ,bio-medical standards has promoted the ,use ,of ontologies for ,describing data ,and for use,in database applications. The standards ,compliant ArrayExpress ,database ,contains ,data from >200 species and >110,000 samples usedi nge notyping, gene expression and other functional genomics experiments. We considered two ,possible approaches in ,employing ontologies in ArrayExpress: select ,as many ontologies as ,cover ,the species, technology and sample diversity, choosing where there are non-orthogonal resources ,and attempt ,to make them,interoperable; or build an, extensible interoperable application ontology. Here we describe the development of an application focused Experimental Factor Ontology and describe,its use at ArrayExpress. www.ebi.ac.uk/ontology-lookup/browse.do?ontName=EFO
ArrayExpress is a public resource for microarray data that has two major goals: to serve as an ar... more ArrayExpress is a public resource for microarray data that has two major goals: to serve as an archive providing access to microarray data supporting publications and to build a knowledge base of gene expression profiles. ArrayExpress consists of two tightly integrated databases: ArrayExpress repository, which is an archive, and ArrayExpress data warehouse, which contains reannotated data and is optimized for queries. As of December 2005, ArrayExpress contains gene expression and other microarray data from almost 35,000 hybridizations, comprising over 1200 studies, covering 70 different species. Most data are related to peer-reviewed publications. Password-protected access to prepublication data is provided for reviewers and authors. Data in the repository can be queried by various parameters such as species, authors, or words used in the experiment description. The data warehouse provides a wide range of queries, including ones based on gene and sample properties, and provides capabilities to retrieve data combined from different studies. The ArrayExpress resource also includes Expression Profiler (EP)-a microarray data mining, analysis, and visualization tool-and MIAMExpress-an online data submission tool. This chapter describes all major ArrayExpress components from the user perspective: how to submit to, retrieve from, and analyze data in ArrayExpress.
ArrayExpress at the European Bioinformatics Institute is a public database for MIAME-compliant mi... more ArrayExpress at the European Bioinformatics Institute is a public database for MIAME-compliant microarray and transcriptomics data. It consists of two parts: the ArrayExpress Repository, which is a public archive of microarray data, and the ArrayExpress Warehouse of Gene Expression Profiles, which contains additionally curated subsets of data from the Repository. Archived experiments can be queried by experimental attributes, such as keywords, species, array platform, publication details, or accession numbers. Gene expression profiles can be queried by gene names and properties, such as Gene Ontology terms, allowing expression profiles visualization. The data can be exported and analyzed using the online data analysis tool named Expression Profiler. Data analysis components, such as data preprocessing, filtering, differentially expressed gene finding, clustering methods, and ordination-based techniques, as well as other statistical tools are all available in Expression Profiler, via integration with the statistical package R.
Despite a large and multifaceted effort to understand the vast landscape of phenotypic data, thei... more Despite a large and multifaceted effort to understand the vast landscape of phenotypic data, their current form inhibits productive data analysis. The lack of a community-wide, consensus-based, human- and machine-interpretable language for describing phenotypes and their genomic and environmental contexts is perhaps the most pressing scientific bottleneck to integration across many key fields in biology, including genomics, systems biology, development, medicine, evolution, ecology, and systematics. Here we survey the current phenomics landscape, including data resources and handling, and the progress that has been made to accurately capture relevant data descriptions for phenotypes. We present an example of the kind of integration across domains that computable phenotypes would enable, and we call upon the broader biology community, publishers, and relevant funding agencies to support efforts to surmount…
HIPBI-RD (Harmonising phenomics information for a better interoperability in the rare disease fie... more HIPBI-RD (Harmonising phenomics information for a better interoperability in the rare disease field) is a three-year project which started in 2016 funded via the E-Rare 3 ERA-NET program. This project builds on three resources largely adopted by the rare disease (RD) community: Orphanet, its ontology ORDO (the Orphanet Rare Disease Ontology), HPO (the Human Phenotype Ontology) as well as PhenoTips software for the capture and sharing of structured phenotypic data for RD patients. Our project is further supported by resources developed by the European Bioinformatics Institute and the Garvan Institute. HIPBI-RD aims to provide the community with an integrated, RD-specific bioinformatics ecosystem that will harmonise the way phenomics information is stored in databases and patient files worldwide, and thereby contribute to interoperability. This ecosystem will consist of a suite of tools and ontologies, optimized to work together, and made available through commonly used software repos...
The integration of cellular and molecular structural data is key to understanding the function of... more The integration of cellular and molecular structural data is key to understanding the function of macromolecular assemblies and complexes in their in vivo context. Here we report on the outcomes of a workshop that discussed how to integrate structural data from a range of public archives. The workshop identified two main priorities: the development of tools and file formats to support segmentation (that is, the decomposition of a three-dimensional volume into regions that can be associated with defined objects), and the development of tools to support the annotation of biological structures.
The Human Induced Pluripotent Stem Cell Initiative (HipSci) isf establishing a large catalogue of... more The Human Induced Pluripotent Stem Cell Initiative (HipSci) isf establishing a large catalogue of human iPSC lines, arguably the most well characterized collection to date. The HipSci portal enables researchers to choose the right cell line for their experiment, and makes HipSci's rich catalogue of assay data easy to discover and reuse. Each cell line has genomic, transcriptomic, proteomic and cellular phenotyping data. Data are deposited in the appropriate EMBL-EBI archives, including the European Nucleotide Archive (ENA), European Genome-phenome Archive (EGA), ArrayExpress and PRoteomics IDEntifications (PRIDE) databases. The project will make 500 cell lines from healthy individuals, and from 150 patients with rare genetic diseases; these will be available through the European Collection of Authenticated Cell Cultures (ECACC). As of August 2016, 238 cell lines are available for purchase. Project data is presented through the HipSci data portal (http://www.hipsci.org/lines) and...
Authoring bio-ontologies is a task that has traditionally been undertaken by skilled experts trai... more Authoring bio-ontologies is a task that has traditionally been undertaken by skilled experts trained in understanding complex languages such as the Web Ontology Language (OWL), in tools designed for such experts. As requests for new terms are made, the need for expert ontologists represents a bottleneck in the development process. Furthermore, the ability to rigorously enforce ontology design patterns in large, collaboratively developed ontologies is difficult with existing ontology authoring software. We present Webulous, an application suite for supporting ontology creation by design patterns. Webulous provides infrastructure to specify templates for populating ontology design patterns that get transformed into OWL assertions in a target ontology. Webulous provides programmatic access to the template server and a client application has been developed for Google Sheets that allows templates to be loaded, populated and resubmitted to the Webulous server for processing. The developme...
ABSTRACT Motivation: ,The recent crop, of ,bio-medical standards has promoted the ,use ,of ontolo... more ABSTRACT Motivation: ,The recent crop, of ,bio-medical standards has promoted the ,use ,of ontologies for ,describing data ,and for use,in database applications. The standards ,compliant ArrayExpress ,database ,contains ,data from >200 species and >110,000 samples usedi nge notyping, gene expression and other functional genomics experiments. We considered two ,possible approaches in ,employing ontologies in ArrayExpress: select ,as many ontologies as ,cover ,the species, technology and sample diversity, choosing where there are non-orthogonal resources ,and attempt ,to make them,interoperable; or build an, extensible interoperable application ontology. Here we describe the development of an application focused Experimental Factor Ontology and describe,its use at ArrayExpress. www.ebi.ac.uk/ontology-lookup/browse.do?ontName=EFO
ArrayExpress is a public resource for microarray data that has two major goals: to serve as an ar... more ArrayExpress is a public resource for microarray data that has two major goals: to serve as an archive providing access to microarray data supporting publications and to build a knowledge base of gene expression profiles. ArrayExpress consists of two tightly integrated databases: ArrayExpress repository, which is an archive, and ArrayExpress data warehouse, which contains reannotated data and is optimized for queries. As of December 2005, ArrayExpress contains gene expression and other microarray data from almost 35,000 hybridizations, comprising over 1200 studies, covering 70 different species. Most data are related to peer-reviewed publications. Password-protected access to prepublication data is provided for reviewers and authors. Data in the repository can be queried by various parameters such as species, authors, or words used in the experiment description. The data warehouse provides a wide range of queries, including ones based on gene and sample properties, and provides capabilities to retrieve data combined from different studies. The ArrayExpress resource also includes Expression Profiler (EP)-a microarray data mining, analysis, and visualization tool-and MIAMExpress-an online data submission tool. This chapter describes all major ArrayExpress components from the user perspective: how to submit to, retrieve from, and analyze data in ArrayExpress.
ArrayExpress at the European Bioinformatics Institute is a public database for MIAME-compliant mi... more ArrayExpress at the European Bioinformatics Institute is a public database for MIAME-compliant microarray and transcriptomics data. It consists of two parts: the ArrayExpress Repository, which is a public archive of microarray data, and the ArrayExpress Warehouse of Gene Expression Profiles, which contains additionally curated subsets of data from the Repository. Archived experiments can be queried by experimental attributes, such as keywords, species, array platform, publication details, or accession numbers. Gene expression profiles can be queried by gene names and properties, such as Gene Ontology terms, allowing expression profiles visualization. The data can be exported and analyzed using the online data analysis tool named Expression Profiler. Data analysis components, such as data preprocessing, filtering, differentially expressed gene finding, clustering methods, and ordination-based techniques, as well as other statistical tools are all available in Expression Profiler, via integration with the statistical package R.
Despite a large and multifaceted effort to understand the vast landscape of phenotypic data, thei... more Despite a large and multifaceted effort to understand the vast landscape of phenotypic data, their current form inhibits productive data analysis. The lack of a community-wide, consensus-based, human- and machine-interpretable language for describing phenotypes and their genomic and environmental contexts is perhaps the most pressing scientific bottleneck to integration across many key fields in biology, including genomics, systems biology, development, medicine, evolution, ecology, and systematics. Here we survey the current phenomics landscape, including data resources and handling, and the progress that has been made to accurately capture relevant data descriptions for phenotypes. We present an example of the kind of integration across domains that computable phenotypes would enable, and we call upon the broader biology community, publishers, and relevant funding agencies to support efforts to surmount…
Uploads
Papers by Helen Parkinson