BIOINFORMATICS
ORIGINAL PAPER
Databases and ontologies
Vol. 27 no. 22 2011, pages 3193–3199
doi:10.1093/bioinformatics/btr531
Advance Access publication September 22, 2011
PIDO: the primary immunodeficiency disease ontology
Nico Adams1,2,∗,† Robert Hoehndorf1 , Georgios V. Gkoutos1 , Gesine Hansen3 and
Christian Hennig3
1 Department
of Genetics, University of Cambridge, Downing Street, Cambridge CB2 3EH, 2 European Bioinformatics
Institute, Welcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK and 3 Department of Paediatric
Pneumology, Allergology, and Neonatology, Hannover Medical School, Carl-Neuberg-Strasse 1, D-30625 Hannover,
Germany
Associate Editor: Jonathan Wren
1
Motivation: Primary immunodeficiency diseases (PIDs) are
Mendelian conditions of high phenotypic complexity and low
incidence. They usually manifest in toddlers and infants, although
they can also occur much later in life. Information about PIDs
is often widely scattered throughout the clinical as well as the
research literature and hard to find for both generalists as well
as experienced clinicians. Semantic Web technologies coupled to
clinical information systems can go some way toward addressing
this problem. Ontologies are a central component of such
a system, containing and centralizing knowledge about primary
immunodeficiencies in both a human- and computer-comprehensible
form. The development of an ontology of PIDs is therefore a central
step toward developing informatics tools, which can support the
clinician in the diagnosis and treatment of these diseases.
Results: We present PIDO, the primary immunodeficiency disease
ontology. PIDO characterizes PIDs in terms of the phenotypes
commonly observed by clinicians during a diagnosis process.
Phenotype terms in PIDO are formally defined using complex
definitions based on qualities, functions, processes and structures.
We provide mappings to biomedical reference ontologies to ensure
interoperability with ontologies in other domains. Based on PIDO, we
developed the PIDFinder, an ontology-driven software prototype that
can facilitate clinical decision support. PIDO connects immunological
knowledge across resources within a common framework and
thereby enables translational research and the development of
medical applications for the domain of immunology and primary
immunodeficiency diseases.
Availability: The Primary Immunodeficiency Disease Ontology is
available under a Creative Commons Attribution 3.0 (CC-BY 3.0)
licence at http://code.google.com/p/pido/. The most recent public
release of the ontology can always be found at
http://purl.org/scimantica/pido/owl/pid.owl. An instance of the
PIDFinder software can be found at http://pidfinder.appspot.com
Contact: nico.adams@csiro.au
1.1
Received on May 13, 2011; revised on September 13, 2011;
accepted on September 18, 2011
∗ To
whom correspondence should be addressed.
address: CSIRO Materials Science and Engineering, Bayview
Avenue, Clayton, VIC 3168, Australia.
† Present
INTRODUCTION
Immunological and clinical motivation
Primary immunodeficiency diseases (PIDs) are Mendelian diseases
of low incidence, caused by defects in genes involved in the
development, maintenance and regulation of the immune system.
PIDs most often affect toddlers and infants, but can also manifest
much later in life and into adulthood (Riminton and Limaye,
2004). As a disease group, PIDs are extremely heterogeneous and
according to the most recent classification of the International Union
of Immunological Societies Primary Immunodeficiency Disease
Classification Committee, >150 distinct forms of PID have been
identified (Geha et al., 2007), although >200 PID genes are known
at the time of writing (Keerthikumar et al., 2009).
These facts make primary immunodeficiencies a challenging
group of diseases for both the practicing clinician and the biomedical
researcher alike. The first stumbling block is the comparatively low
incidence of these diseases: a recent study suggests that the average
prevalence of a PID in US households is ∼1 in 1200 persons (Boyle
and Buckley, 2007), although other sources suggest incidences of up
to 1 in 2 000 000. Many general practitioners as well as clinicians
have little or no familiarity with PIDs and consequently the time
that elapses from the first manifestation of symptoms to a confirmed
diagnosis is often long: a recent study investigating children with
Common Variable Immunodeficiency Disorders (CVID) found that
the mean time between the manifestation of symptoms and the
induction of immunoglobulin substitution therapy is 5.8 years
(Urschel et al., 2009). A second stumbling block is the fact that
the phenotypic variation associated with PIDs is usually very
high: in the case of patients with defects in the Wiskott–Aldrich
Syndrome Protein (WASP) gene, for example, the exact nature of
the gene defect (e.g. deletion versus missense or nonsense mutation,
precise location of splice-site abnormalities) will determine, whether
patients exhibit fully developed Wiskott–Aldrich Syndrome (WAS)
(X-linked thrombocytopenia, B-cell lymphoma, frequent bacterial
and fungal infections, eczema, small platelet sizes, etc.) or a milder
form (X-linked thrombocytopenia or neutropenia), which, in turn,
gives rise to a less-complex phenotype. A final stumbling block
for clinicians is associated with information retrieval: there are
very few information resources containing structured and both
human and machine-comprehensible information related to PIDs
and their phenotypes. Examples of dedicated domain databases
include the ImmunoDeficiency Resource, (Väliaho et al., 2005),
Info4PI (Samarghitean and Vihinen, 2009) and the ESID Registry
© The Author 2011. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
3193
Downloaded from http://bioinformatics.oxfordjournals.org/ by guest on December 22, 2016
ABSTRACT
N.Adams et al.
have adopted a method for formally defining phenotypes (Hoehndorf et al.,
2010b) for the development and axiomatization of PIDO.
In particular, our method of defining phenotypes enables basic
interoperability with ontologies of anatomy and physiology, and we have
included links to the FMA and the GO for this purpose. PIDO does not
develop or use a classification of PIDs such as the one proposed by the
International Union of Immunological Societies’ Primary Immunodeficiency
Disease Classification Committee (Geha et al., 2007). However, combining
PIDO’s phenotype and disease definitions with anatomy ontologies can be
used for the generation of a novel PID classification based on the axioms
in an anatomy or physiology ontology. For example, we can create a class
called ‘Agammaglobulinemias’ based on a combination of ‘Having Primary
Immunodeficiency Disease’ and ‘Having Agammaglobulinemia’, and infer
which particular PIDs satisfy the definition of ‘Agammaglobulinemias’. The
generation of PID classification based on complex class description enables
a flexible and expressive access to PIDs and their associated phenotypes.
2
The ontology of PIDS draws on terms from many different domains such
as genetics, anatomy, chemistry and proteins. The PID ontology only uses
those terms which are necessary to achieve the desired expressivity and
coverage and provides mappings to multiple established domain ontologies.
The mappings are constructed in such a way as to map class names and
definitions, but without importing axiomatizations.
The PID ontology’s classes have been mapped to key resources in
the biomedical domain to facilitate interoperability with other ontologies.
In particular, classes describing anatomical parts have been mapped to
the corresponding classes in the Foundational Model of Anatomy (FMA)
(Rosse and Mejino, 2003) and the NCI Thesaurus (Sioutos et al., 2007).
Genes have been annotated with official gene symbols, alternative gene
symbols, gene names and corresponding associated Mendelian Inheritance
in Man (MIM) (Hamosh et al., 2005) phenotypes, all of which were
retrieved from Entrez Gene (Maglott et al., 2007) as well as NCI Thesaurus
terms. Phenotypes have been annotated with corresponding terms from
the Human Phenotype Ontology (HPO) (Robinson and Mundlos, 2010).
The manual annotation with corresponding phenotype terms derived from
2.2
Phenotype and (plays_role some BiomarkerRole)
BiomarkerRole subclassOf
(role_of some Clinical_Diagnosis_Process)
By specifying the type of clinical diagnosis process, the type of biomarker
may be further specified. An imaging biomarker, for example, is a biomarker
that is observed in a radiological observation process (e.g. projection
radiography or Computed Tomography Scanning). By analogy, a cellular
biomarker is a biomarker observed during a cytometric experiment. The PID
ontology provides an extensive hierarchy of biomarkers, which is useful for
the further classification of phenotypes and any one phenotype will be able
to assume multiple biomarker roles. In the first instance, the classification
in terms of biomarkers will mirror the way in which most clinicians classify
phenotypes. The formal integration of biomarkers and diagnostic processes
in the PID ontology is the subject of future work.
Typical biomarkers are, for example, genomic biomarkers, cell functional
biomarkers, laboratory findings, etc. Collectively, the phenotypes playing
the role of biomarkers form the phenotype of a PID. All biomarkers were
manually extracted from the recent primary clinical and research literature
by domain experts. Incorporated biomarkers have been annotated with the
PubMed identifier of the manuscript from which they were taken.
2.3
SYSTEMS AND METHODS
2.1
Ontological motivation
In order to bridge the genotype–phenotype gap and to develop successful
computable representations of primary immunodeficiencies, a number of
domain as well as granularity boundaries must be traversed and integrated.
PIDs arise from complex interactions between gene products, pathways,
tissues, organs and the interaction with the environment. The description
and representation of these interactions should be computable and contain
an adequate theory of biological, chemical and immunological functions and
functionings. Consequently, an ontology of disease and phenotype should
provide the means to define phenotypes based on the processes, objects
and functions which give rise to a particular phenotype. For example, it
should be possible to infer from a phenotype such as agammaglobulinemia
that a patient does not have gamma-globulin as part and therefore that the
biological function of gamma-globulin cannot be realized in this patient.
To obtain these inferences and achieve interoperability with other relevant
ontologies such as anatomy, process, phenotype and disease ontologies, we
3194
Biomarkers
PIDO characterizes PIDs based on both their phenotypes and associated
biomarkers. We define a biomarker as a role (Loebe, 2005) that a phenotype
plays within a clinical diagnosis process. The biomarker role is a role in an
observation process:
Relation to other ontologies
Downloaded from http://bioinformatics.oxfordjournals.org/ by guest on December 22, 2016
(Guzman et al., 2007). Valuable information concerning primary
immunodeficiencies is also contained in general bioinformatics
databases such as the Online Mendelian Inheritance in Man database
(Hamosh et al., 2005), which contains descriptions of phenotypes
associated with Mendelian diseases in free text form, ArrayExpress
(Parkinson et al., 2009) or the Gene Expression Omnibus (Barrett
et al., 2009) for functional genomics data, UniProt (Consortium,
2010) (information about proteins), IntAct (Aranda et al., 2010)
(protein/protein interactions) or KEGG (Kanehisa et al., 2010)
(protein interactions and pathways). In practice, however, these
resources are difficult to use for the average clinician and only
present fragmented information, rather than an integrated picture,
which is often needed by the clincal practitioner.
Biomedical researchers, wishing to engage in research in the
field, are faced with similar problems. On the basis of literature
searches, it is currently extremely difficult to even identify
which primary immunodeficiencies exist, what their phenotypes
are and to find and integrate all the relevant information across
bioinformatics resources. For example, patients with suspected
PIDs of unclear origin are often given the diagnosis CVID
resulting in a large and diverse group of diseases bearing a
common label. As such, the problems faced by clinicians and
biomedical researchers significantly overlap. Computational support
for both the clinician involved in the care of PID patients as
well as the biomedical researcher is therefore highly desirable and
necessary. Specifically, such support should lead to ‘knowledge
centralisation’ in two distinct ways. First, knowledge centralization
in the sense of data integration integrates PID-related data across
the various resources mentioned above and presents a unified
view to the end user. Knowledge centralization in the sense of
the development of both a human-comprehensible as well as a
machine-computable representation of information about PID can,
for example, lead to the development of expert systems that
utilize the represented information. Ontologies are both humancomprehensible and machine computable specifications of the
knowledge in a domain of discourse and, as such, are well suited to
play a ‘centralising’ role. Here, we present the Ontology of Primary
Immunodeficiency Diseases (PIDO), describe how it interacts with
other relevant ontologies and demonstrate its application in a clinical
decision support system.
PIDO
Table 1. Ontologies and domains currently cross-referenced within PIDO
4
Ontology
4.1
FMA
NCI Thesaurus
HPO
Chemical Entities of Biological Interest
GO
PO
Domain
Anatomy
General Medical Vocabulary
Human Phenotypes
Chemical Entities
Genes and Processes
Proteins
3
IMPLEMENTATION
The PID ontology has been formalized in the Web Ontology
Language (OWL). The ontology has been edited using either
the Protege 4 (Knublauch et al., 2004) or TopBraid Composer
(http://www.topbraidcomposer.com) OWL editors as well as the
Manchester OWL API (Horridge and Bechhofer, 2009).
The PIDFinder Web Application was developed in the Java
programming language using the Google Web Toolkit (GWT,
http://code.google.com/webtoolkit/) and Google App Engine (GAE,
http://code.google.com/appengine/) frameworks.
Canonical disease phenotypes
In the context of PIDO, we focus on the representation of the
canonical phenotype of PIDs: a canonical disease phenotype
consists of every observed phenotypic manifestation of the disease.
We emphasize, however, that patients do not commonly exhibit
all manifestations that are associated with a canonical disease
phenotype. As a consequence, we intend to explore the relation
between the observed phenotype in a patient and the canonical
phenotype of a disease using a measure of semantic similarity that is
able to account for incomplete and noisy information. Including all
possible observations associated with a disease in the description
of canonical disease phenotypes will enable us to increase the
similarity, if patients and disease phenotype overlap in any of these
phenotypes, and to decrease the similarity if they do not. We would
then count patient and disease as phenotypically most similar when
they fully overlap in their phenotypes. This method has already been
applied successfully for the identification of novel gene–disease
association in the PhenomeNET system (Hoehndorf et al., 2011).
4.2
Use-case: WAS-related PIDs
WAS-related primary immunodeficiencies may arise as a
consequence of defects in the WAS gene. Several recent studies have
established good correlations between the type of gene defect and the
corresponding human phenotype (Imai et al., 2004; Jin et al., 2004).
Patients with insertions into the WAS gene or complete deletions of
the gene have been found to develop full Wiskott-Aldrich Syndrome,
whereas those with splice abnormalities develop either the full
form or a milder version, depending on the exact nature of the
splice abnormality. In PIDO, the following (clinical) phenotypes
are associated with Wiskott Aldrich Syndrome presence of antiDNA antibody, inflammation of the joints and colon, autoimmune
hemolytic anemia, autoimmune thrombocytopenia, B-cell and
hematopoietic cell neoplasms, bacterial, viral and fungal infections,
bloody diarrhea, CD8+ T-cell dependent cytotoxicity defect,
eczema, immune-complex glomerulonephritis, myelodysplastic
syndrome, petechia and small platelet size.
We use the Wiskott Aldrich Syndrome as an example to show
how this complex phenotype can be modeled in terms of more basic
phenotypes.
One commonly occurring phenotype in WAS patients is an
elevated level of anti-DNA antibodies present in the serum. We
formalize this as a phenotype of patients that have anti-DNA
antibodies as part which are present in an increased concentration:
Phenotype and phenotype_of some
(has_part some
(Anti-DNA_Antibody and (has_property
some Increased_Concentration)))
This enables the inference that entities with this phenotype have
Anti-DNA Antibodies as part and therefore enables interoperability
with anatomy ontologies that also use the has_part and part_of
relations (Hoehndorf et al., 2010b). Based on this assertion, further
inferences across ontologies become possible. For example, antiDNA antibodies realize certain functions under a given set of
conditions. If the inference of presence or absence of a part is
subsequently combined with an ontology specifying, for example,
3195
Downloaded from http://bioinformatics.oxfordjournals.org/ by guest on December 22, 2016
the Mammalian Phenotype Ontology (Smith and Eppig, 2010) is currently
ongoing, and annotations will be released in future versions of the ontology.
Furthermore, the mapping of terms referring to cellular components and
biological processes, to proteins, cell types and qualities is still in progress,
and mappings to the Gene Ontology (GO) (Ashburner et al., 2000), the
Protein Ontology (PO) (Natale et al., 2011), the Cell Type Ontology (CL)
(Bard et al., 2005) and the Phenotypic Attribute and Trait Ontology (PATO)
(Gkoutos et al., 2005) will be created (Table 1). The mappings are constructed
either manually or semi-automatically by using fuzzy string matches between
ontologies. Fuzzy matches are determined using the Levenshtein (Navarro,
2001) or Needleman–Wunsch (Needleman and Wunsch, 1970) algorithms
and validated by a human curator.
To demonstrate interoperability with other biomedical ontologies, in
particular phenotype and anatomy ontologies, we use automated reasoning
over phenotype and anatomy ontologies to derive representations of PID
phenotypes in the phenotype ontologies for other species. Such mappings
are of considerable use even for organisms, such as worms, for example,
which are very far apart from humans: simple model organisms that can
be used to develop comprehensive experimental analyses of the genetic
and molecular makeup of complex phenotypic traits have been the staple
of biomedical research for some time. For example, much work has been
done in the past to study the immune system of nematodes as a model
system for innate immunity (Schulenburg et al., 2004). In particular, we
used the PhenomeBLAST software (http://phenomeblast.googlecode.com)
to generate mappings to the mouse, fly and worm phenotype ontologies, and
we make these mappings available on our website.
PIDO uses the General Formal Ontology (GFO) (Herre et al., 2006) as
an upper ontology, because the GFO facilitates the integration of objects
and processes, contains an expressive theory of relational and processual
roles (Loebe, 2005) and provides expressive axioms in its OWL version.
The taxonomic structure of the GFO is presented in Figure 1.
This strategy facilitates then reuse of existing resources and ontologies
while at the same time avoiding reasoning problems that arise due to the
large size of many reference ontologies. Furthermore, a single ontology that
combines the necessary fragments of relevant biomedical domain ontologies
is often easier to maintain and use (Bard and Rhee, 2004).
DISCUSSION
N.Adams et al.
the function of the part, then inference over presence or absence of
function—although not currently implemented in PIDO—also now
becomes a possibility (Hoehndorf et al., 2010a).
Another phenotype often associated with Wiskott–Aldrich
patients are frequently recurring bacterial, viral and fungal
infections. Infections are processes in the GFO. Within GFO,
processes can be characterized by the modes in which entities
participate in a process, and these modes of participation are
considered to be the processual roles of a process (Herre et al.,
2006; Loebe, 2005). For the process of infection, at least two such
processual roles may be identified: the role of Infectious Agent and
the role of Infected Entity. Formally, we formulate the following
axioms:
Infection SubClassOf: gfo:Process and
(has_role some Infectious_Agent) and
(has_role some Infected_Entity)
Infectious_Agent SubClassOf:
gfo:Processual_role
and (role_of some gfo:Process)
Processual roles can be played by entities. Therefore, we can
define a Bacterial Infection as an infection in which the role of the
Infectious agent is played by a Bacterium:
Bacterial_Infection EquivalentTo:
(Infection and (has_role some
(Infectious_Agent and
(played_by some Bacterium))))
Analogously, we are able to specify an infection by site rather than
by causative agent. For example, we can define a Gastrointestinal
Tract Infection:
Gastrointestinal_Tract_Infection EquivalentTo:
(Infection and (has_role some
(Infected_Entity and played_by
some Gastrointestinal_Tract)))
This definition states that a Gastrointestinal Tract Infection is an
Infection in which the role of the Infected Entity is played by the
3196
Gastrointestinal Tract. Based on these defined classes, we can then
define Bacterial Gastrointestinal Tract Infection as follows:
Gastrointestinal_Tract_Infection EquivalentTo:
(Infection and (has_role some
(Infected_Entity and played_by
some Gastrointestinal_Tract)) and
(has_role some (Infectious_Agent and
played_by some Bacterium)))
After defining the process Gastrointestinal Tract Infection, we are
able to define the phenotype Having Bacterial Infection:
Having_Bacterial_Infection EquivalentTo:
phenotype_of some (plays-role some
(Infected_Entity and
role_of some Bacterial_Infection))
This definition states that the phenotype Having Bacterial
Infection is a phenotype of things that play the role of the Infected
entity within a Bacterial Infection process.
All process-based phenotype definitions in PIDO follow the
same pattern. As a future extension to these phenotype definition
patterns, we could further define phenotypes such as ‘Having
Frequent Recurring Infections’ by explicitly referring to an
increase in the rate of occurrence of infections. However, while
definitions following such a pattern are used in other phenotype
ontologies and their inclusion will enable a basic form of
interoperability with these ontologies, a full definition of the
intended meaning of ‘frequently recurring’ is a substantial challenge
for ontology representation languages that may be addressed in the
future.
We can combine basic phenotypes to describe complex
phenomena such as syndromes by asserting the class representing
the complex phenomena as equivalent to or a subclass of all the
phenotypes that characterize it. For example, the Omenn Syndrome
is commonly associated with primary immunodeficiencies arising
due to defects of the RAG1 or RAG2 genes (Santagata et al., 2000).
Patients presenting with the syndrome either have very large or
very small lymph nodes, hepatosplenomegaly, lymphocytosis and
Downloaded from http://bioinformatics.oxfordjournals.org/ by guest on December 22, 2016
Fig. 1. Simplified schematic of the GFO and top-level classes of PIDO. The GFO is an ontology both of categories and individuals. Individuals are divided
into abstract individuals (such as numbers), concrete individuals (such as processes and material objects) and spatio-temporal individuals (regions of space or
time). Processes are subclasses of concrete individuals and phenotypes are considered to be relational roles. GFO classes are shown in gray.
PIDO
alopecia. In PIDO, this can be formalized as:
Having_Omenn_Syndrome SubClassOf:
Having_Alopecia and
Having_Hepatosplenomegaly and
Having_Lymphocytosis and
(Having_Small_LymphNode or
Having_Large_LymphNode)
Canonical representations of primary immunodeficiency diseases
can be developed in an analogous manner by describing them as
intersections of simpler phenotypes.
4.3
Description Logic-based querying
Having_Primary_Immunodeficiency_Disease and
Having_Thrombocytopenia
to be Common Variable Immunodeficiency caused by CD81 Gene
Defect, Goods Syndrome, Hyper-IgM Syndrome Type 2, ORAI1
Defect, RAG1/RAG2 SCID Phenotype with Expansion of GammaDelta T-Cells, STIM1-Defect and WAS. If the patient subsequently
goes on to develop a B-cell lymphoma, the query can be expanded to
Having_Primary_Immunodeficiency_Disease and
Having_Thrombocytopenia and
Having_B_Cell_Lymphocytic_Neoplasm
In this case, the results narrow to the Wiskott-Aldrich Syndrome
alone. Queries of this type can contribute to a diagnosis based
on the phenotypes observed in a patient. Using the same query,
a researcher might also determine, which genes are commonly
associated with Thrombocytopenia in PIDO. Because of the
particular axiomatization of phenotypes that includes possible gene
defects, the genetic causes underlying a PID can be retrieved using
querying in Description Logics, thereby leading to an integration
across levels of granularity.
4.4 The PID Finder
A prime motivation for PIDO’s development is to assist clinicians in
gaining a rapid overview over existing relevant PID knowledge as
well as to contribute to the diagnosis of PIDs. To demonstate PIDO’s
utility in clinical decision support, we developed the PIDFinder
(Fig. 2). The PIDFinder is a prototype web application, which
presents the information contained in the ontology in an easily
accessible manner and allows, apart from access to PIDO’s content,
the phenotypic comparison of PIDs and the generation of diagnosis
hypotheses. One central feature of PIDFinder is to allow a physician
to specify a set of phenotypes that are observed in a patient with a
suspected primary immunodeficiency disorder. The phenotypes are
displayed in a faceted manner using the biomarker classification
as facets. The faceted classification of phenotypes is automatically
inferred based on axioms in PIDO (Fig. 2c).
4.5
Limitations and future research
Future research related to PIDO will pursue several different
directions. First, the continued enrichment of the ontology with
content and cross-references to other ontologies and terminologies
will be of the highest priority. As such, we will continue to extract
and critically evaluate phenotypic information from the primary
research literature and incorporate it into the ontology. A second area
of future extension is enriching the ontology with data from other
biomedical databases and ontologies. In particular, we intended to
fully incorporate the FMA and GO ontologies and utilize them for
the classification of PIDs and their phenotypes. Third, we intend
to work with the developers and maintainers of PID registries
3197
Downloaded from http://bioinformatics.oxfordjournals.org/ by guest on December 22, 2016
The phenotype formalism developed here allows the use of queries
based on Description Logic. For example, when confronted with a
patient suspected of suffering from a primary immunodeficiency and
presenting with thrombocytopenia, a clinician might be interested
in PIDs that are associated with this phenotype. PIDO can facilitate
such queries. Currently, a reasoner will determine the subclasses of
the class
Due to the axioms in PIDO, phenotypes can appear in several
facets: Hepatomegaly, for example, is inferred to be both an Imaging
Biomarker as it can be observed in an imaging process, as well
as a Liver Biomarker as it describes biological variation of the
liver. The interface also offers a simple text-based suggest-box
mechanism as an alternative search mode. Once phenotypes have
been selected, they are collected into a set and form the observed
phenotype of the patient. The patient phenotype set is subsequently
compared with the canonical set of phenotypes characterizing each
PID using the Tanimoto distance metric (Fig. 2d). The greater the
calculated Tanimoto coefficient for an observed patient phenotype
set compared with a PID phenotype set, the greater the phenotypic
similarity between the observed phenotype of the patient and the
canonical disease phenotype. The result of the comparison is a
ranking of possible PIDs that are phenotypically similar to the
patient phenotype. The results of the comparison are subsequently
visualized and returned to the user either in a graph-based form
or a side-by-side comparison of overlapping and non-overlapping
phenotypes.
The Tanimoto distance is further used to compare phenotypic
similarity between PIDs themselves. The result is visualized in the
form of a heatmap (Fig. 2a) as well as for using a side-by-side
comparison of overlap and non-overlap between PID phenotypes
(Fig. 2b).
It should be noted that the PID Finder is only one of several usecases for PIDO and as such does not leverage all the possibilities
that the use of expressive and well-axiomatized ontologies offers.
For example, we have not implemented the ability to run DL queries
in the current version of the software and are also not currently using
formal reasoning to arrive at diagnosis suggestions, but have rather
opted for a semantic similarity approach using Tanimoto distances.
However, this does not mean that the development of a wellaxiomatized ontology is a wasted effort: the PIDFinder, for example,
makes use of reasoning to generate the multifaceted presentation of
phenotypes. The use of classifiers and formal descriptions also helps
to deal with ambiguity in the specification of observed phenotypes:
if, for example, a clinician only selects the ‘Arthritis’ phenotype,
a classifier will infer that there are more specialized subforms
of arthritis (e.g. polyarthritis, oligoarthritis, etc.) in the ontology
and expand the selection made by the clinician in the PidFinder
user interface to include these phenotypes in the selection unless a
more defined subtype is subsequently chosen. Finally, implementing
functionality allowing OWL DL Queries of the type discussed in the
previous section in the PID Finder also remains a possibility.
N.Adams et al.
to both apply PIDO for the classification and analysis of PIDrelated information, and to extend PIDO with classes that are
relevant within the established PID resources. Finally, we intend
to further address the challenges involved in describing canonicity
and non-canonicity in biomedical ontologies (Hoehndorf et al.,
2007, 2010b).
The appropriate definition of phenotypes such as Having Small
Platelets Size (a phenotype for Wiskott-Aldrich Syndrome) or
Thrombocytopenia remains an open issue. The former refers to the
fact that the average of the platelet size distribution in a patient
is shifted to lower values with respect to that which is considered
normal, and thrombocytopenia denotes the situation in which the
number of platelets in the blood of a patient is reduced with respect
to the number that is considered normal.
4.6
Conclusions
We have developed PIDO, which characterizes PIDs by defining
a semantically rich representation of the basic observable
characteristics in organisms. The characteristics are based on
descriptions of quality, function, structure or process and are
interoperable with other biomedical domain ontologies. Based
on PIDO, we have developed PIDFinder, a prototypical webbased clinical decision support system that enables access to the
3198
knowledge contained in the ontology and is capable of generating
diagnosis hypotheses. PIDO connects immunological knowledge
across resources within a common framework and thereby enables
translational research and the development of medical applications
for the domain of immunology and PIDs.
ACKNOWLEDGEMENT
We thank Dr Ulrich Baumann (Hannover Medical School) for the
helpful discussions and support.
Funding: BBSRC Grant (BB/G022747/1 to N.A.); European
Commission’s 7th Framework Programme, RICORDO project,
Grant (248502 to R.H.); BBSRC Grant (BB/G0043581 to G.G.);
Hannover Medical School C4-Proferssorship to G.H. And, Deutsche
Forschungsgemeinschaft (DFG) Grant (SFB587, TP14 to C.H.).
Conflict of Interest: none declared.
REFERENCES
Aranda,B. et al. (2010). The IntAct molecular interaction database in 2010. Nucleic
Acids Res., 38, D525–D531.
Ashburner,M. et al. (2000) Gene Ontology: tool for the unification of biology. Nat.
Genet., 25, 25–29.
Downloaded from http://bioinformatics.oxfordjournals.org/ by guest on December 22, 2016
Fig. 2. Screenshots of the PIDFinder Web Application: (a) heatmap characterizing the phenotypic similarity of PIDs; (b) phenotype-by-phenotype comparison
of PIDs; (c) selection of phenotypes observed in a patient and (d) PID diagnosis suggestion based on a set of observed phenotypes.
PIDO
Keerthikumar,S. et al. (2009) Prediction of candidate primary immunodeficiency
disease genes using a support vector machine learning approach. DNA Res., 16,
345–351.
Knublauch,H. et al. (2004) The protege owl plugin : an open development environment
for semantic web applications. Design, 3298, 229–243.
Loebe,F. (2005) Abstract vs social roles: a refined top-level ontological analysis.
In Boella,G. et al. (eds) Proceedings of the 2005 AAAI Fall Symposim ‘Roles,
an Interdisciplinary Perspective: Ontologies, Languages and Multiagent Systems’.
AAAI Press, Menlo Park, California, USA.
Maglott,D. et al. (2007) Entrez Gene: gene-centered information at NCBI. Nucleic Acids
Res., 35, D26–D31.
Natale,D.A. et al. (2011) The Protein Ontology: a structured representation of protein
forms and complexes. Nucleic Acids Res., 39, D539–D545.
Navarro,G. (2001) A guided tour to approximate string matching. ACM Comput. Surv.,
33, 31–88.
Needleman,S.B. and Wunsch,C.D. (1970) A general method applicable to the search
for similarities in the amino acid sequence of two proteins. J. Mol. Biol., 48,
443–453.
Parkinson,H. et al. (2009) ArrayExpress update–from an archive of functional genomics
experiments to the atlas of gene expression. Nucleic Acids Res., 37, D868–D872.
Riminton,D.S. and Limaye,S. (2004) Primary immunodeficiency diseases in adulthood.
Intern. Med. J., 34, 348–354.
Robinson,P.N. and Mundlos,S. (2010) The human phenotype ontology. Clin. Genet.,
77, 525–534.
Rosse,C. and Mejino,J.L.V. (2003) A reference ontology for biomedical informatics:
the Foundational Model of Anatomy. J. Biomed. Informat., 36, 478–500.
Samarghitean,C. and Vihinen,M. (2009) Bioinformatics services related to diagnosis of
primary immunodeficiencies. Curr. Opin. Allergy Clin. Immunol., 9, 531–536.
Santagata,S. et al. (2000) The genetic and biochemical basis of Omenn syndrome.
Immunol. Rev., 178, 64–74.
Schulenburg,H. et al. (2004) Evolution of the innate immune system: the worm
perspective. Immunol. Rev., 198, 36–58.
Sioutos,N. et al. (2007) NCI Thesaurus: a semantic model integrating cancer-related
clinical and molecular information. J. Biomed. Informat., 40, 30–43.
Smith,C.L. and Eppig,J.T. (2010) The mammalian phenotype ontology: enabling robust
annotation and comparative analysis. Wiley Interdisc. Rev. Syst. Biol. Med., 1,
390–399.
Urschel,S. et al. (2009) Common variable immunodeficiency disorders in children:
delayed diagnosis despite typical clinical presentation. J. Pediatr., 154, 888–894.
Väliaho,J. et al. (2005) BMC Medical Informatics and Distribution of
immunodeficiency fact files with XML ⣓ from Web to WAP. BMC Med.
Informat. Decis. Mak., 11, 1–11.
3199
Downloaded from http://bioinformatics.oxfordjournals.org/ by guest on December 22, 2016
Bard,J.B.L. and Rhee,S.Y. (2004) Ontologies in biology: design, applications and future
challenges. Nat. Rev. Genet., 5, 213–222.
Bard,J. et al. (2005) An ontology for cell types. Genome Biol., 6, R21.
Barrett,T. et al. (2009) NCBI GEO: archive for high-throughput functional genomic
data. Nucleic Acids Res., 37, D885–D890.
Boyle,J.M. and Buckley,R.H. (2007) Population prevalence of diagnosed primary
immunodeficiency diseases in the United States. J. Clin. Immunol., 27, 497–502.
Consortium,U. (2010) The Universal Protein Resource (UniProt) in 2010. Nucleic Acids
Res., 38, D142–D148.
Geha,R. et al. (2007) Primary immunodeficiency diseases: an update from the
International Union of Immunological Societies Primary Immunodeficiency
Diseases Classification Committee. J. Allergy Clin. Immunol., 120, 776–794.
Gkoutos,G.V. et al. (2005) Using ontologies to describe mouse phenotypes. Genome
Biol., 6, R8.
Guzman,D. et al. (2007) The ESID Online Database network. Bioinformatics , 23,
654–655.
Hamosh,A. et al. (2005) Online Mendelian Inheritance in Man (OMIM), a
knowledgebase of human genes and genetic disorders. Nucleic Acids Res., 33,
D514–D517.
Herre,H. et al. (2006) General Formal Ontology (GFO) a foundational ontology
integrating objects and processes. Technical Onto-Med Report No. 8, Institute of
Medical Informatics, Statistics and Epidemiology, University of Leipzig, Leipzig.
Hoehndorf,R. et al. (2007) Representing default knowledge in biomedical ontologies:
application to the integration of anatomy and phenotype ontologies. BMC
Bioinformatics, 8, 377.
Hoehndorf,R. et al. (2010a) Applying the functional abnormality ontology pattern to
anatomical functions. J. Biomed. Semant., 1, 4.
Hoehndorf,R. et al. (2010b) Interoperability between phenotype and anatomy
ontologies. Bioinformatics, 26, 3112–3118.
Hoehndorf,R. et al. (2011) PhenomeNET: a whole-phenome approach to disease gene
discovery. Nucleic Acids Res., 39, e119.
Horridge,M. and Bechhofer,S. (2009) The OWL API: a Java API for working with OWL
2 ontologies. In OWLED 2009, 6th OWL Experienced and Directions Workshop.
Chantilly, Virginia.
Imai,K. et al. (2004) Clinical course of patients with WASP gene mutations. Blood,
103, 456–464.
Jin,Y. et al. (2004) Mutations of the Wiskott-Aldrich Syndrome Protein (WASP):
hotspots, effect on transcription, and translation and phenotype/genotype
correlation. Blood, 104, 4010–4019.
Kanehisa,M. et al. (2010) KEGG for representation and analysis of
molecular networks involving diseases and drugs. Nucleic Acids Res., 38,
D355–D360.