Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Biomedical Ontologies

Peter L. Elkin (ed.), Terminology, Ontology and their Implementations, Cham, Switzerland: Springer Nature (2023), 2023
We begin at the beginning, with an outline of Aristotle’s views on ontology and with a discussion of the influence of these views on Linnaeus. We move from there to consider the data standardization initiatives launched in the nineteenth century and then turn to investigate how the idea of computational ontologies developed in the AI and knowledge representation communities in the closing decades of the twentieth century. We show how aspects of this idea, particularly those relating to the use of the term “concept” in ontology development, influenced SNOMED CT and other medical terminologies. Against this background, we then show how the Foundational Model of Anatomy, the Gene Ontology, Basic Formal Ontology, and other OBO Foundry ontologies came into existence and discuss their role in the development of contemporary biomedical informatics....Read more
Biomedical Ontologies Barry Smith Preprint version of Chapter 5 of P. L. Elkin (ed.), Terminology, Ontology and their Implementations, Springer Nature Switzerland AG 2022, https://doi.org/10.1007/978-3- 031-11039-9 We begin at the beginning, with an outline of Aristotle’s views on ontology and with a discussion of the influence of these views on Linnaeus. We move from there to consider the data standardization initiatives launched in the nineteenth century and then turn to investigate how the idea of computational ontologies developed in the AI and knowledge representation communities in the closing decades of the twentieth century. We show how aspects of this idea, particularly those relating to the use of the term “concept” in ontology development, influenced SNOMED CT and other medical terminologies. Against this background, we then show how the Foundational Model of Anatomy, the Gene Ontology, Basic Formal Ontology, and other OBO Foundry ontologies came into existence and discuss their role in the development of contemporary biomedical informatics. Keywords: biomedical ontology, Aristotle, Linnaeus, SNOMED CT, Foundational Model of Anatomy, Gene Ontology, Basic Formal Ontology. From Aristotle to Linnaeus The term “ontology” (ontologia) is a neologism, coined in the seventeenth century as an alternative to the Greek “metaphysics,” referring to the branch of philosophy that is engaged in the study of what exists or of “what it is to be” in the most general sense. An ontology in the modern sense—which includes the biomedical ontologies discussed in what follows—is a taxonomy-like artifact that is structured in such a way as to be useful not only to humans but also to computers. The connection between the two senses of “ontology” turns on the fact that the foundation of any taxonomy is the idea of what is general, and this idea lies at the center of both ancient metaphysics and contemporary ontology. We are all implicitly familiar with the idea of what is general through our use of general terms (nouns and noun phrases) in both everyday and scientific
contexts. In a tradition which goes back to Plato and Aristotle, such general terms are said to refer to universals, where each universal is associated with the collection of those particulars in reality which are its instances. 1 A universal, sometimes also called a kind or type, is in some sense that which all its instances have in common. Aristotle held that we acquire knowledge of universals by observing the particulars that instantiate them. He himself displayed a life-long interest in descriptive biology, and his metaphysics is rooted in his experience of the world as a place populated by instances of universals in the biological domain [1–3]. Science, from Aristotle’s point of view—which he saw as “natural philosophy”—is thereby focused on what is qualitative in nature, which is to say on describing how specific instances of universals such as human or bird are at specific times running or sitting or perspiring or rotting. The process of mathematicization of science initiated by Galileo has of course diminished the attraction of qualitative, descriptive views of science of the sort embraced by Aristotle. But elements of such views still survive today, not least in the world of biomedicine. Even the idea that we gain knowledge of universals by examining their instances in reality is still very much a part of science. This is because the terms scientists use in formulating scientific laws are precisely terms representing universals. Laws link universals. And we gain knowledge of such laws by performing experiments in which we engage with the instances of the universals linked. Science and Common Sense Aristotle himself studied anatomy, astronomy, embryology, geography, geology, meteorology, physics, and zoology. 2 His starting point for exploring such domains scientifically consists in establishing what kinds of entities they contain and thus in mapping the corresponding domain universals. This implies also creating a terminology consisting of the general terms with which to represent those universals. Understanding “what it is to be” for the entities in a domain means to acquire knowledge of what the particulars instantiating any given universal in that domain have in common. Aristotle assumed that human beings are in harmony with the world we find around us in the sense that, when we observe reality, we are able to grasp in our minds the universals instantiated by the things we see and touch. He does not seek 1 “Particular” means: an entity that exists in some unique region of space and time. Where universals are repeatable in indefinitely many instances, particulars are unrepeatable. 2 He also wrote on aesthetics, ethics, logic, metaphysics, government, politics, economics, psychology, rhetoric, and theology [4]. In his article on Aristotle’s biology in the Stanford Encyclopedia of Philosophy, Lennox [5] reports that in 1837 the anatomist Richard Owen introduced a survey of Aristotle’s zoological studies by declaring that “Zoological Science sprang from [Aristotle’s] labours, we may almost say, like Minerva from the Head of Jove, in a state of noble and splendid maturity.”
Biomedical Ontologies Barry Smith Preprint version of Chapter 5 of P. L. Elkin (ed.), Terminology, Ontology and their Implementations, Springer Nature Switzerland AG 2022, https://doi.org/10.1007/978-3031-11039-9 We begin at the beginning, with an outline of Aristotle’s views on ontology and with a discussion of the influence of these views on Linnaeus. We move from there to consider the data standardization initiatives launched in the nineteenth century and then turn to investigate how the idea of computational ontologies developed in the AI and knowledge representation communities in the closing decades of the twentieth century. We show how aspects of this idea, particularly those relating to the use of the term “concept” in ontology development, influenced SNOMED CT and other medical terminologies. Against this background, we then show how the Foundational Model of Anatomy, the Gene Ontology, Basic Formal Ontology, and other OBO Foundry ontologies came into existence and discuss their role in the development of contemporary biomedical informatics. Keywords: biomedical ontology, Aristotle, Linnaeus, SNOMED CT, Foundational Model of Anatomy, Gene Ontology, Basic Formal Ontology. From Aristotle to Linnaeus The term “ontology” (ontologia) is a neologism, coined in the seventeenth century as an alternative to the Greek “metaphysics,” referring to the branch of philosophy that is engaged in the study of what exists or of “what it is to be” in the most general sense. An ontology in the modern sense—which includes the biomedical ontologies discussed in what follows—is a taxonomy-like artifact that is structured in such a way as to be useful not only to humans but also to computers. The connection between the two senses of “ontology” turns on the fact that the foundation of any taxonomy is the idea of what is general, and this idea lies at the center of both ancient metaphysics and contemporary ontology. We are all implicitly familiar with the idea of what is general through our use of general terms (nouns and noun phrases) in both everyday and scientific contexts. In a tradition which goes back to Plato and Aristotle, such general terms are said to refer to universals, where each universal is associated with the collection of those particulars in reality which are its instances.1 A universal, sometimes also called a kind or type, is in some sense that which all its instances have in common. Aristotle held that we acquire knowledge of universals by observing the particulars that instantiate them. He himself displayed a life-long interest in descriptive biology, and his metaphysics is rooted in his experience of the world as a place populated by instances of universals in the biological domain [1–3]. Science, from Aristotle’s point of view—which he saw as “natural philosophy”—is thereby focused on what is qualitative in nature, which is to say on describing how specific instances of universals such as human or bird are at specific times running or sitting or perspiring or rotting. The process of mathematicization of science initiated by Galileo has of course diminished the attraction of qualitative, descriptive views of science of the sort embraced by Aristotle. But elements of such views still survive today, not least in the world of biomedicine. Even the idea that we gain knowledge of universals by examining their instances in reality is still very much a part of science. This is because the terms scientists use in formulating scientific laws are precisely terms representing universals. Laws link universals. And we gain knowledge of such laws by performing experiments in which we engage with the instances of the universals linked. Science and Common Sense Aristotle himself studied anatomy, astronomy, embryology, geography, geology, meteorology, physics, and zoology.2 His starting point for exploring such domains scientifically consists in establishing what kinds of entities they contain and thus in mapping the corresponding domain universals. This implies also creating a terminology consisting of the general terms with which to represent those universals. Understanding “what it is to be” for the entities in a domain means to acquire knowledge of what the particulars instantiating any given universal in that domain have in common. Aristotle assumed that human beings are in harmony with the world we find around us in the sense that, when we observe reality, we are able to grasp in our minds the universals instantiated by the things we see and touch. He does not seek “Particular” means: an entity that exists in some unique region of space and time. Where universals are repeatable in indefinitely many instances, particulars are unrepeatable. 2 He also wrote on aesthetics, ethics, logic, metaphysics, government, politics, economics, psychology, rhetoric, and theology [4]. In his article on Aristotle’s biology in the Stanford Encyclopedia of Philosophy, Lennox [5] reports that in 1837 the anatomist Richard Owen introduced a survey of Aristotle’s zoological studies by declaring that “Zoological Science sprang from [Aristotle’s] labours, we may almost say, like Minerva from the Head of Jove, in a state of noble and splendid maturity.” 1 deeper theories of what lies “behind” or “beyond” appearances, because to seek such theories would be to assume that the world is not as it appears to be. Certainly, there is room for error on the Aristotelian view. This relates, however, to particular perceptions only; it leaves the general features of perceptual knowledge untouched. Thus, the Aristotelian—like the commonsensical—way of thinking about the world will never concede that it is false throughout. Error is a local phenomenon, it does not distort our entire outlook. Modern science, on the other hand (and the Platonic and Democritian philosophies it absorbed) postulated just such global distortions. ([6], p. 148) We return to this way of thinking in the section on “The FMA as a Canonical Ontology”, when we investigate the idea of a canonical ontology and thereby explore the reasons why biomedical ontologies have evolved in such a way as to include elements deriving both from Aristotelian thinking and from modern, data-driven science. Aristotle on Definitions The treatises compiled by Aristotle’s students and promulgated in his name constitute a virtual encyclopedia of Greek knowledge. Typically, each treatise begins with a review of earlier contributions to the study of the relevant domain and then presents Aristotle’s own view with the aid of definitions of one or more central terms. Book II of the Physics, for example, begins with the definition of “nature” and Book II of On the Soul with the definition of “soul.” The term “definition” is used in a variety of ways in the Aristotelian corpus, but the core of his account rests on the idea that there is a hierarchy (or multiple hierarchies) of more and less general universals, with each universal in the hierarchy relating as species to its genus (in other words to its immediate parent in the hierarchy). A species is then defined by stipulating what is specific about the instances of its genus, which makes them instances of the species. The way this works is illustrated in Fig. 5.1, which is based on the tree-form representation of universals attributed to Porphyry, the author of an influential introduction to Aristotle’s Categories from the third century (see [8]), which was also the standard textbook in logic for at least a millennium after his death. At the apex of the tree is the universal Substance, which divides into the branches of Material and Immaterial. The former is then divided into Living and Nonliving, and Living Substance is divided further into the Sentient and the Non-sentient. Sentient Substance, finally, is divided into the Rational and the Irrational, which then provides the means to define a Human Being as a Rational Sentient Living Material Substance or—taking advantage of the way in which definitions at lower levels incorporate logically the contents of Fig. 5.1 Tree of Porphyry (from [7]) Definitions at higher levels—that a Human Being is a Rational Animal.3 This feature of incorporation allows successively evermore complex thoughts to be expressed by means of expressions that remain relatively compact at each stage. It is exploited in mathematics and in all the sciences to enable human beings to deploy ever more complex ideas in still understandable ways. We can now understand why the Aristotelian rule for creating definitions is referred to by Aristotle’s interpreters as definition per genus et differentiam. What this means is that a definition of a species consists of two parts specifying, respectively, its genus and an associated specific difference. The instances of the species that is being defined are then all and only those instances of the genus that satisfy the specific difference. We shall return to this rule for formulating definitions in the section on “Aristotelian Definitions” below. In Aristotle’s Posterior Analytics, we find a definition of human being as “animal, mortal, footed, biped, wingless” (92al-2) and alternatively as “animal, tame, biped” (96b31); in Metaphysics Z “biped” and “animal” are said to “constitute the definition of man” (1038a3). In his Politics (1.1253a) Aristotle defines an animal of the human kind as a zoon politikon, or in other words as a social animal, an animal that naturally lives in a polis. 3 Aristotle’s Table of Categories Aristotle created a number of more or less fragmentary examples of what we would nowadays call domain ontologies. At the same time, he worked also on creating a top-level ontology, which would bind those domain ontologies together within a single logically and ontologically coherent framework by providing a common starting point—a common set of highest genera—for the definitions of their respective terms. The treatise compiled by Aristotle’s students under the title Metaphysics addresses issues arising at this highest level of generality. Aristotle’s ideas on such matters are, as we shall see, still of considerable influence today. But he was of course not the only philosopher in the ancient world to have developed an influential approach to metaphysics. His competitors in this respect included his own teacher Plato, who conceived universals as inhabiting a timeless, ideal world, remote from the world of things we encounter in our everyday experience. They included also Democritus, who propagated the doctrine according to which all that exists are “atoms and the void,” and Heraclitus, who embraced a process philosophy according to which “everything is flux.” In the battle of grand theories, however, it is clear that Aristotle was—at least until the seventeenth century—the overwhelming victor. This was in part a result of the fact that Aristotelian metaphysics formed the foundations of Christian theology. But it also reflects the relative faithfulness of many of Aristotle’s ideas, when compared with those embraced by his competitors, to much of what is accepted by common sense: that human beings (for example) exist; that they grow and develop through time; that they have qualities, habits, and dispositions of various sorts; that they stand to each in various relations; that they occupy places; and so forth. As will already be clear, one main starting point of Aristotle’s metaphysics is the term “substance.” At Metaphysics 1030b6–12, for example, he asserts that “a proper definition states the essence of an entity, by which is meant ‘substance’.” It would be a difficult task to provide here an account of what Aristotle understood by this term. Suffice it to state here that in almost all the contexts where he provides examples of substances, he refers to organisms (in particular to humans and horses). Substance is in Aristotle’s terminology one of the categories, which means that it is one of those most general universals which go to form his top-level ontology. They are what we call “primitives” in virtue of the fact that they cannot be defined by the method of genus and specific difference because, lying at the very top, they lack a genus. In some places, Aristotle suggests that substance is the only category. There is, however, another strain in Aristotle’s thinking according to which there are multiple universals on this highest level. First is the category of substance, which is marked by the fact that it and its subordinate universals are instantiated in every case as a matter of necessity. What this means is that if a being is, for example, a horse at any time in its existence, then it is a horse at all times in its existence. Second is a collection of accident categories, which are those highest-level universals which hold of their bearer not essentially but as a matter of accident. This means that they can be gained and lost during the course of the bearer’s existence. Aristotle presents different versions of his list of categories. Figure 5.2 presents the version described at Cat., 2a34-35, 2b3-5, 2b15-17, in which, along with substance, nine accident categories are distinguished, examples for each of which are (from Cat., 1b25-2a4): Position: is lying, is sitting Action: cutting, burning Passion: being cut, being burned Time: yesterday, last year Quality: white, grammatical Quantity: four foot, five foot Relation: double, half, larger Having: has shoes on, has armor on Place: in the Lyceum, in the marketplace Fig. 5.2 Aristotle’s table of categories in the arrangement proposed by Jansen [9]. Categories are picked out in bold. The significance of the divisions marked by the italicized labels added by Jansen will become clear in section “Basic Formal Ontology (BFO)” This table should not be seen as being complete. Further categories can be added to make the categorical structure of the world more explicit. Indeed, as Jansen [9] points out, the project presented by Aristotle in the Categories “seems to be rather a working report on an ongoing research project than something ultimate and completed.” One candidate additional category might be that of hole, an entity which can potentially be occupied or filled by something material, as for example a womb may be occupied by a fetus, or a trench be filled with water. In his treatise On the Parts of Animals, Aristotle refers to the esophagus as “the channel through which food is conveyed to the stomach” (III, 3). At the same time, he refers to it as being “of a flesh-like character.” It is then the flesh-like entity, rather than the channel, which forms a part in the sense relevant to this treatise. This turns on the fact that Aristotle’s system of categories has no clear room for places or locations which are not occupied, nor for channels, or cavities, or voids—or, more generally, for holes which are not filled (see [10] and section “Canonical Relations” below). The table of categories in Fig. 5.1 may be extended also by recognizing universals in addition to those which have particulars as their instances. On some views, for example, there can be higher level universals, which themselves have universals at lower levels as their instances. One example of such a higherlevel universal would be the universal universal [11]. Linnaeus’s Scala Naturæ and Genera Morborum We can imagine Aristotle’s ten categories as forming the top of a much larger hierarchy formed by universals belonging to successively more specific orders of being at lower levels. This idea was elaborated in the doctrines of the Scala Naturæ created by medieval philosophers, where each kind of entity is slotted into its own proper place in what is seen as a Great Chain of Being. This in turn formed one starting point for what we understand today as the tree of life, first systematically documented in Linnaeus’s Systema Naturæ, the title of the 10th edition of which is: System of nature through the three kingdoms of nature, according to classes, orders, genera and species, with characters, differences, synonyms, places [12]. Here, the “three kingdoms” are minerals, plants, and animals. In his Genera Morborum, Linnaeus applied this taxonomical method to the realm of diseases, distinguishing 11 classes, 37 orders, and 325 species of human disease, a small selection of which is presented in Fig. 5.3. Fig. 5.3 Fragment from Linnaeus, Genera Morborum [13], selecting from Orders 1–3 of the 5th Class (Mental diseases), extracted from Munsche and Whitaker [14]; see also Egdahl [15] Physics For some 2000 years after its basic ideas were first set forth by Aristotle and his early commentators, the discipline of metaphysics advanced hardly at all, to the degree that it formed the central part of what was habitually referred to as philosophia perennis. The preeminent role of (Aristotelian) metaphysics in the pantheon of philosophical disciplines began to be challenged, however, from around the time of Descartes and Kant, who awarded pride of place to epistemology, which deals not with being but rather with knowledge. At the same time, the preeminent role of philosophy itself among the disciplines was challenged by the rise of the empirical sciences. During the eighteenth century, physics, in particular, evolved from its status as a qualitative and primarily descriptive discipline to become a quantitative and by degrees predictive discipline rooted in the mathematics-based study of observational and experimental results. This led, however, to an increasingly more urgent practical need for a standardized terminology for the communication of such results in a way that would allow international scientific and technological collaboration. Initiatives to address this need culminated in 1875 with the ratification of the Metre Convention, a landmark treaty which created the International Bureau of Weights and Measures (BIPM). This led in turn to the SI international standard system of units, which has served since 1960 as the universally accepted specification of the units of measure for physical quantities. The SI standard incorporates controlled vocabularies not only for the representation of the physical units of measure but also for the kinds (universals) of physical magnitudes for whose measurement these units are employed. Both units and the corresponding magnitudes are divided into two classes of base and derived (see Fig. 5.4), where the latter are defined in terms of the former. Force, for example, is defined as mass times acceleration, and a newton, the unit of force, is defined as the force needed to accelerate one kilogram of mass at the rate of one meter per second squared in the direction of the force applied.4 The success of the SI system is one important reason why the need for ontology has made itself felt hardly at all in the domain of physics. But there is a second, and no less important, reason, which turns on the fact that the kinds of entities and relationships with which physics is concerned are rigorously defined using mathematical equations.5 The language of mathematics thereby serves in the physical domain as the lingua franca for the communication of scientific knowledge. These two factors together effectively ensure the mutual exploitability of both theories and results across all physical subdisciplines and all application areas, a feature of the physical domain that has shown itself to be indispensable to the success of all modern technology. Similar advances in standardization of language and taxonomy were made also in chemistry, beginning with Mendeleev’s Periodic Table in 1869 and culminating in the work of the International Union of Pure and Applied Chemistry (IUPAC). The latter has defined rules for naming and classifying organic and inorganic compounds and created thereby a similar level of mutual exploitability of chemical knowledge across all chemical disciplines, both pure and applied. Comparable advances in biological standardization in the wake of Linnaeus were primarily in the anatomical domain, with the Nomina Anatomica, dating from 1895, replaced by the Terminologia Anatomica in 1998, though it was clear already in 2001 that the latter left much to be desired from the perspective of the new, information-driven approaches in medical science [19]. 4 New definitions of the base units were proposed in BIPM [16]. Johansson points out in [17] that these new definitions appear to be circular, given that symbols for what is to be defined appear also in the defining expressions. He is indeed able to show that there are substantially noncircular definitions underneath these circularities, but he also identifies certain problems that still remain. 5 Less well defined are the terms used to represent both the different types of magnitude and the ontological relations between (a) these magnitudes themselves, (b) the symbols appearing in mathematical equations, and (c) the measurement results formulated in terms of SI units. For a treatment of these matters, see Landgrebe and Smith [18]. Fig. 5.4 Examples of base and derived quantities and units in the SI system of units Ontology, Logic, and Artificial Intelligence Ontological Commitment Before moving to the special case of biomedical ontologies, and to the revolution in the handling of biomedical data which has come in the wake of the human and other “model organism” genome projects, we need to take account of the rise of ontology in the computer science disciplines, which occurred in the closing decades of the last century. The beginnings of this episode in the history of ontology can be traced to the 1948 paper “On What There Is” by the prominent philosopher-logician Willard Van Orman Quine [20]. This paper advances a new conception of the proper method of ontology, which at the same time gave new respectability to the term “ontology” itself, not least through its influence on the work of John McCarthy, creator of the term “artificial intelligence.” According to Quine, the ontologist’s task is to establish not what there is but rather what are the kinds of entities to which scientists are committed in their theorizing. The ontologist studies the world, on this conception, by drawing conclusions from the theories developed in the natural sciences. Each natural science has its own preferred repertoire of entities to the existence of which it is committed, a repertoire which is revealed in its vocabulary. In applying his method for identifying ontological commitments, however, Quine turns not to the controlled vocabularies for physics or chemistry established by organizations such as the BIPM or IUPAC. Rather, he looks instead to a fictional future state of science, in which scientific theories would have been subjected to a quite different sort of regimentation, resting not on consistency in the use of terms but rather on the formalization of scientific propositions using the language of first-order logic (FOL). First-Order Logic (FOL) FOL is the logical framework of choice that is used by philosophers and others in the formalization of many different sorts of theories. It grew out of the attempts by Frege—the founder of modern logic—and then by Whitehead, Russell, and others, to use logic to formalize the whole of mathematics. Building on the success of this work, Rudolf Carnap and other members of the Vienna circle conceived the project of a “unified science” [21], based—in one version at least—on the idea of a FOLbased axiomatization of all scientific theories. In his The Logical Structure of the World [22], Carnap used logic as a vehicle for the design of “linguistic frameworks supplying all of the (names for) objects and concepts required by science” [23]. The use of FOL brings great benefits, including the following: • It allows us to capture in a single formal system many features of our reasoning not only in science and mathematics but also in our everyday affairs. • It has a mature and sophisticated model-based semantics, which is used in all contemporary ontology applications. • It exists in a number of semantically equivalent varieties of standardized syntax optimized for specific uses, including computational use in support of ontologies [24]. When it comes to computer applications, however, FOL has the shortcoming that it is not decidable, which means there is no effective procedure for determining, given a consistent set T of FOL formulas—which might, for example, be the set of axioms of an ontology—whether an additional FOL formula A can be added to T in such a way as to preserve consistency. As we shall see in section “The Web Ontology Language (OWL)”, it was the attempt to rectify this shortcoming which led to the development of OWL, the Web Ontology Language. Ontological Commitment Again At the heart of FOL is the idea of quantified statements, for example of the form ∃xPx and ∀yQy, which mean, respectively, some value of the variable x satisfies the predicate P and every value of the variable y satisfies the predicate Q. Thus, “∃” stands for “for some” and “∀” for “for all.” To determine the ontological commitments of a scientific theory formalized in FOL, in Quine’s approach, means to determine which entities belong to the ranges of those variables over which the formulas of the theory quantify—an idea that is captured in Quine’s maxim: “To be is to be the value of a variable.” Imagine, for example, that we wish to formalize in FOL the sentence 1. Teco is a bonobo as part of the evidence base for a scientific theory. A typical FOL rendering of this sentence is 2. ∃x(is-a-bonobo(x) & x = Teco) or, translated back into English 3. there is some x which is a bonobo and which is identical to Teco It is here Teco alone which is the value of a variable in either (2) or (3). This means that it is to Teco alone that we are ontologically committed in making either of these assertions. But could we not reformulate (1) in such a way that, say, bonobohood would serve as the value of a variable, for example by writing 4. ∃P(instantiates(P, Teco) & P = bonobohood)? The problem here is that (4) is standardly interpreted as belonging not to first, but rather to higher order logic, which is defined precisely by the fact that it allows quantification over predicates.6 Quine’s use of FOL to determine ontological commitment thus leads to an ontology in which only particulars exist—in other words to a nominalist doctrine, according to which particulars belong to the realm of what exists, but generals (universals) belong only to the realm of what can be said. For Aristotle, in contrast, as for his successors in the camp of what we shall henceforth call “ontological realism” [26], there is in addition to Teco a second something that exists, and that contributes to making true the sentence “Teco is a bonobo,” namely some feature or way of being, some species or natural kind to which Teco belongs, or some structure or pattern of DNA in Teco’s genome. From the ontologically realist perspective, (1) then asserts a relation between Teco and this second something. The Vienna Circle Project to Unify Science In a series of groundbreaking contributions around the turn of the last century, Frege and others demonstrated that at least a large portion of mathematics could be unified by showing that the corresponding mathematical truths are ultimately truths of logic. The Vienna circle project was much less successful [28]. But its underlying idea was influential nonetheless, above all through its effect on the work of John McCarthy and others in the field of artificial intelligence. Serious problems, such as Russell’s paradox [25], arise in a logic, which allows unrestricted quantification over predicates. Smith [26, 27] describes a simple paradox-free alternative (nonstandard) FOL reading of sentences like (4), which involves quantification not over predicates but over universals. 6 McCarthy was a leading figure in the first, logicist, or “symbolic” (also called Good Old-Fashioned AI or “GOFAI”7) wave of AI research, contributing inter alia to that strand in GOFAI, which sought to use FOL-based approaches (including modal logic, situation calculus, and so on) in order to capture in a formal way information about the world, for example to support the building of an intelligent robot programmed with the ontology of common sense that is used by humans. It was in this context that McCarthy recognized the overlap between work done in philosophical ontology and activity of building logical theories for AI systems. McCarthy affirmed already in 1980 that builders of logic-based intelligent systems must first “list everything that exists, building an ontology of our world” [29]. This view, inspired by McCarthy’s reading of Quine,8 was advanced also by McCarthy’s collaborator Patrick Hayes in his “Naive Physics: Ontology for Liquids” [32], the first work to use in its title the word “ontology” in the new sense of the term that is aligned to the use of computers. As Hayes writes, looking back on the question of early uses of ontology in AI: As far as I recall, my use in the title of the 1978 paper was original. I used it deliberately to suggest/imply that the KR problem in AI was connected with philosophical ontology. The background to this was my reading Carnap’s Logical Structure of the World as an undergraduate, probably some time in 1964. Reading this blew my mind and first got me excited about the idea of using logic to describe the real world. When I got into AI and read McCarthy’s “Situations, actions and causal laws” … I was immediately struck by the similarity both of goals and even in places of formal (what would now be called ‘ontological’) techniques. [33] The Web Ontology Language (OWL) The sort of ontology practiced by Hayes was of considerable influence, as is illustrated for example in the Hobbs and Moore collection entitled Formal Theories of the Commonsense World published in 1985 [34–36]. Work of this sort in AI has of course been eclipsed in recent years by an approach centered around deep neural networks and related stochastic approaches.9 In the world outside AI, however, the work of McCarthy and Hayes was just one initial strand in the burgeoning of work in ontology and knowledge representation (or “KR”) that took place from the 1980s onwards, in a movement which received considerable further impetus from the release, in 1999, of Protégé 1.0, a freely available software tool for the building of ontologies (and applied not least in the biomedical realm). 7 A term coined by Haugeland in [29]. To quote from McCarthy [30]: “In philosophy, ontology is the branch that studies what things exist. W.V.O. Quine’s view is that the ontology is what the variables range over. Ontology has been used variously in AI, but I think Quine’s usage is best for AI .” 8 At about this time, the drive to find a computationally tractable language for the purpose of developing formal ontologies led to the exploration of subsets of FOL, especially in the family of so-called description logics [38]. This culminated in the standardization by the World Wide Web Consortium (W3C) of the Web Ontology Language or “OWL” in 2004, which is currently the most widely used logical framework for ontology development (https://www.w3.org/OWL/). The new ontology languages were optimized for computer use, though unfortunately this came at the price of sacrifices in expressiveness [39, 40]. One result of the new ease with which ontologies could be built led accordingly to an upswell of overlapping and often mutually inconsistent efforts, as different groups sought in different ways to overcome the barriers of low expressivity. The results are illustrated for example by the way in which the fashion for agent-based modeling around the turn of the millennium led to the development of some 30 “agent ontologies,” under headings such as action, actions, activity, agent, agents, agent architecture, agent communication, and agent framework.10 Many in the KR community seem to have assumed that the development of many, many ontologies is something positive. It is necessary only that each of the ontologies developed should be associated in the minds of its developers with some potential use case—an idea promulgated for example by Noy and McGuinness in their influential ontology manual [41] in their assertion that “Deciding whether a particular concept is a class in an ontology or an individual instance depends on what the potential applications of the ontology are.” The multiplication of ontologies derived also from the fact that during the period in question grant funding was available for the development only of novel ontologies. Efforts to establish the sorts of principles of best practice that might point ontology in a more scientific direction were, on the other hand, neglected. Ontology development in this period, not surprisingly, gained a bad reputation—the results, it was said, were “brittle,” “unsustainable,” and “unscalable” and rested on oversimplified (and thus often unscientific) models of the relevant subject matters. 9 Something like GOFAI may be enjoying a mild reawakening, for example, in the recent book on general AI by Marcus and Davis [37], where a logic-based framework of commonsense AI is seen as a possible avenue to allow the gluing together of various narrow stochastic AIs within a single general framework. 10 These are listed in the catalog of ontologies developed using DAML, the DARPA Agent Markup Language (http://www.daml.org/ontologies/, last accessed July 30, 2021), which was one of the precursors of the OWL language. The Concept Orientation The idea that we should seek to focus on the development of well-grounded reference ontologies was rejected, in many circles, since it was taken to imply that the authors of such ontologies would aspire to the possession of some kind of God’s eye perspective. Since such a perspective is unavailable, it was assumed that the best we can achieve is ontologies based on the ontological commitments of specific languages, theories, systems of beliefs, or what we shall encounter below as “conceptualizations.” The discipline of knowledge representation has to deal, after all, not with reality, but rather with the knowledge (and thus the concepts) in people’s minds. It is thus very likely to lead to the situation in which there is a plurality of ontologies covering the same topic, since there is plurality of knowers whose knowledge is being captured and a plurality of uses to which this knowledge is being put. KR researchers who invested in trying to develop more generally applicable methodologies focused their efforts especially on approaches to ontology development based on model-theoretic semantics, making important contributions not least to the development of the ideas underlying OWL [38]. There too, however, a presupposition to the effect often reigned that we can never understand what a given language or theory is really about—thus we can never compare an ontology to any independent reality beyond. We can, though, build abstract (set-theoretic) models, which we can usefully manipulate, for example in checking the consistency of a set of definitions and axioms. From around 1990, there were, however, some few who acknowledged the need for a common framework of high-generality terms, axioms, and definitions, which would promote ontology reusability by building ontologies in a way that would ensure correspondence to the things and processes in reality they were designed to represent. Thus, they embraced the need for just one agent/action ontology, just one software ontology, and so forth, and they started to ask questions like: “What is an object/process/attribute/relation? What is a transaction, a person, an organization? How do they depend on each other? How are they related?” [42]. It was in this way that early examples of top-level ontologies began to be developed with the goal of unifying and systematizing the development of domain ontologies at lower levels, though at first with little traction in the broader world of KR scientists in general or of ontology developers in particular. In the year 1993, Tom Gruber published the first credible attempt at defining what an ontology—in the new, computer science sense of this term— is: An ontology is an explicit specification of a conceptualization. The term is borrowed from philosophy, where an ontology is a systematic account of Existence. For knowledge-based systems, what “exists” is exactly that which can be represented. When the knowledge of a domain is represented in a declarative formalism, the set of objects that can be represented is called the universe of discourse. This set of objects, and the describable relationships among them, are reflected in the representational vocabulary with which a knowledgebased program represents knowledge. [43] Gruber’s definition was rapidly adopted in multiple subfields of computer science. The definition still, of course, leaves open what a “conceptualization” might be and how the crucial sentence “For knowledge-based systems, what ‘exists’ is exactly that which can be represented” is to be interpreted.11 What can be said with certainty, however, is that the vast majority of those following in Gruber’s footsteps assumed that his definition meant that an ontology is a representation not of entities in reality but of concepts. In the same year, on the other hand, the first International Workshop on Formal Ontology in Conceptual Analysis and Knowledge Representation was held in Padua, which brought together ontologists such as Gruber from the computer science field with philosophers embracing a more traditional approach to the understanding of the meaning of “exists.”12 The Case of SNOMED The dominance of what is sometimes referred to in terminology circles as the “concept orientation” made itself manifest especially in the rapidly expanding field of medical terminology research, which is also the field in which the problems associated with concept-based approaches have been deliberated upon most persistently. The immediate need of medical terminologists was to address the problems raised by the huge numbers of synonymous and quasisynonymous terms in the various medical disciplines. This problem is of minor importance where only humans are involved in the application of medical terms. Human experts in any given field know full well how to handle synonymy. When computers enter the picture, however, matters are different. The (for a long time) most influential solution to the problem of how computers are to handle synonymy was presented by Cimino in his classic paper on “Desiderata for Controlled Medical Vocabularies in the Twenty-First Century” [46]. Cimino’s thesis is that, in the medical domain, 11 This sentence is, for someone with a background in philosophical ontology, deeply problematic. 12 This event served as the launchpad for the subsequent FOIS (Formal Ontology in Information Systems, https://iaoa.org/index.php/fois/fois-history/) conference series, which remains the premier event in ontology (science). Its organizers, Nicola Guarino and Roberto Poli, describe the meeting as “probably the first interdisciplinary initiative in this area, aiming to explore the connections between philosophers belonging to the tradition of Brentano and Husserl, philosophers of language, and people working on principles of knowledge representation and engineering” ([44], p. 624). On Guarino’s unique intermediating role in these developments, see his [45]. most systems that report using controlled vocabulary are actually dealing with the notion of concepts. Authors are becoming more explicit now in stating that they need vocabularies in which the unit of symbolic processing is the concept—an embodiment of a particular meaning. Concept orientation means that terms must correspond to at least one meaning (“nonvagueness”) and no more than one meaning (“nonambiguity”), and that meanings correspond to no more than one term (“nonredundancy”). The definition of “concept” here provided—as “an embodiment of a particular meaning”—is, however, difficult to parse. Embodied in what? Moreover, to understand this definition, we would need to understand already the meaning of the term “meaning,” which philosophers have long recognized as a difficult nut to crack. The central problem not addressed by Cimino, however, is that the meaning of the word “concept” itself varies drastically from one community to the next (and sometimes from one paragraph to the next). Certainly, there are acceptable and scientifically well-defined uses of this term, for instance in the study of conceptual change in developmental psychology [47]. But constructing a medical terminology—for example a gargantuan terminology such as SNOMED CT13—is not an exercise in empirical psychology. Unfortunately, SNOMED CT is itself pervaded (quite literally from top to bottom) by the concept orientation, as is seen in the fact that SNOMED CT concept is the topmost node of the entire SNOMED CT taxonomical hierarchy and thus subsumes the entire SNOMED CT universe. Thus, we have: clinical finding is_a SNOMED CT Concept environment or geographical location is_a SNOMED CT Concept, pharmaceutical product is_a SNOMED CT Concept, social context is_a SNOMED CT Concept, and many more. Given, therefore, the standard reading of is_a—the reading accepted in other contexts by SNOMED CT itself—it follows that when you are suffering from a headache then what you are suffering from is_a (is a) SNOMED CT concept. The SNOMED CT community does indeed go to great lengths to explain what it means by “concept,” for example in its Editorial Guide from 2018,14 where we read: Formerly known as the “Systematized Nomenclature for Medicine,” though this label has been dropped since the SNOMED International Organization no longer sees SNOMED as a nomenclature. “CT” stands for “clinical terms.” The first versions of SNOMED were developed under the leadership of Roger A. Côté, originally under the name “Systematized Nomenclature of Pathology” (SNOP). When Côté visited the Vatican to present to the Vatican Library a copy of the four volumes of what was then titled SNOMED International: The Systematized Nomenclature of Human and Veterinary Medicine, he also met Pope John Paul II, who is said by some to have remarked, “Do you realize that this spells ‘DEMONS’ backwards?” 13 5. SNOMED CT concepts should name classes of things. 6. A concept is defined as a clinical idea to which a unique concept identifier has been assigned. Concepts are associated with descriptions that contain human-readable terms describing the concept. 7. A term is defined as a human-readable phrase that names or describes a concept. We note that in (5) concepts are identified as names, while in (6) they are identified as clinical ideas and explicitly distinguished from terms. In (7), terms are identified as names or descriptions of concepts.15 Thus, the term “Myocardial Infarction” (for example) describes not myocardial infarction (the clinical phenomenon that appears in certain patients) but rather the concept Myocardial Infarction. Terms in medical terminologies à la Cimino are about Cimino’s “embodied meanings.” Already in 2010, SNOMED CT had responded to criticisms concerning the problems created by such ambiguities in its use of the term, by publishing as the Glossary entry for “Concept” in its Technical Reference Guide the following warning: Concept An ambiguous term. Depending on the context, it may refer to: • • • a clinical idea to which a unique ConceptId has been assigned; the ConceptId itself, which is the key of the Concepts Table (in this case, it is less ambiguous to use the term “concept code”); the real-world referent(s) of the ConceptId, that is, the class of entities in reality which the ConceptId represents (in this case, it is less ambiguous to use the term “meaning” or “code meaning”) But as Ceusters [49] notes, merely pointing out this problem does not imply that the problem has been solved. Indeed, the very same Glossary still contains, for example, an assertion to the effect that a SNOMED CT term is “a text string that represents the Concept.”16 So what is it then that is represented by a term: (1) the clinical idea, (2) less likely, but nevertheless in line with the expressed ambiguity—the ConceptID, or (3) the real-world referent(s)? The same question must then be asked for the several hundred occurrences of the word ‘concept’ throughout the SNOMED CT documentation. In some cases, readers can infer from the context which meaning is intended, but in most cases, only the SNOMED CT authors can provide the answer by rewriting the entire documentation. ([48]; see also Sect. 2.1 of [50]) 14 https://confluence.ihtsdotools.org/download/attachments/75337342/doc_EditorialGuide_Current-en-US_INT_20180731.pdf/ 15 Friends of the concept orientation can of course criticize those who use the term “term” as an alternative to “concept” by pointing out that the term “term,” too, can be misused in a way that involves a blurring of the distinction between “entities of the domain” and “entities of language.” This occurs, for example, when Bauer et al. [48] describe a new software tool called Ontologizer as “a Java application that can be used to perform statistical analysis for overrepresentation of Gene Ontology (GO) terms in sets of genes or proteins derived from an experiment.” The most recent SNOMED CT documentation reveals no advance on this front, suggesting now that “concept” and “term” are interchangeable and introducing an entirely new characterization of concepts as “clinically relevant thoughts”: Concepts, or terms, are represented by unique codes and human readable descriptions. Each concept is a unique clinically relevant thought, across a wide range like abscess, zygote, measurement procedure, or substance, as examples. (https://www.imohealth.com/ideas/article/snomed-ct-101-a-guide-to-theinternational-terminology-system/) Curing an abscess, then, means curing a clinically relevant thought. First is what we might call the argument from intellectual modesty, which can be summarized as follows: It is medical domain experts who must answer for the truth of whatever theories the medical terminology is intended to mirror. Since domain experts themselves will sometimes disagree, any given terminology should embrace no claims as to what the world is like, but reflect, rather, some abstract conceptual substitute derived, somehow, from the different concepts used by different experts. Against this, however, it can be pointed out that communities of experts working on common domains in the medical as in other scientific fields in fact accept a massive and ever-growing body of consensus truths about the entities in these domains. Where conflicts do arise in the course of scientific development, these are highly localized and pertain in medicine primarily to specific mechanisms, for example of drug action or disease development. But the latter can serve as the targets of conflicting beliefs only against the background of a large body of shared presuppositions. Moreover, we can think of no scenario under which it would make sense to postulate special entities called “concepts” as the entities to which terms subject to scientific dispute would refer. For either, for any such term, the dispute is eventually resolved in favor of one side or the other, and then it is the corresponding real-world entity that has served as its referent all along. Or it is ultimately established that the term in question is non-designating, and then this term is no longer a candidate for inclusion in whatever is the active version of the relevant terminology. The proposal from SNOMED and other defenders of the concept approach, however, is much more radical. It is that we provide guaranteed referents called “concepts” not only for terms identified as problematic but also for every single term in the terminology. The realist alternative solution is in contrast more modest. It is simply to treat any terminology as subject to a process of evolution [56, 57]. Even terms still subject to dispute can be incorporated into the terminology alongside other terms already accepted as 17 See for example Ceusters et al. [50, 51]; Bodenreider et al. [52]; Bona and Ceusters [53]; Ceusters and Mullin [54]; and Guarino et al. [55]. referring to real-world entities, but in such a way that they are marked as being still subject to dispute and thereby treated logically as unavailable for use in certain sorts of inferences.18 Another argument in favor of the concept orientation is the argument from negative findings. Consider, for example, the case where a clinician reports a finding of “absent nipple.” The defender of the concept orientation will argue that there is no real-world entity denoted by this expression, and therefore that the expression must refer to something like a concept. Certainly, clinicians need to record such findings. But from the realist point of view, their findings are precisely that a nipple is absent, not that a special kind of (“absent,” conceptual) nipple is present [59]. Next is the argument from hypertension. The subject matters of biology and medicine are, as it is held, replete with entities which do not exist in reality but are rather convenient fictions, as in the case of the entities designated by expressions such as “hypertension” or “obesity” or “abnormal curvature of spine.” Such abstractions are, as it is held, “mere concepts,” since they reflect not joints in reality but rather certain more or less arbitrary human decisions (which may indeed vary over time). From the realist point of view, in contrast, such terms are analogous to, for example, “Poland” or “the Middle Ages.” That is, they represent full-fledged entities in the real world, but they are entities whose boundaries are precisely the results of decisions made by human beings. The meter, the kilogram, and the second, too, are the results of fiat demarcations of this sort, and so also is hypertension, which rests on a (periodically readjusted) fiat threshold established by consensus among physicians. Finally, we can mention what we might call the argument from administration, which asserts that, for many of the purposes for which medical terminologies are devised, a focus on something like Aristotelian universals would be far too restrictive. Consider the ICD (International Classification of Diseases) term: 8. Tuberculosis of adrenal glands, tubercle bacilli not found (in sputum) by microscopy, but found by bacterial culture. There is no ontological difference between tuberculosis diagnosed by microscopy and tuberculosis diagnosed by bacterial cultural, any more than there is such a difference between tuberculosis diagnosed on a Wednesday and tuberculosis diagnosed on a Thursday, or while wearing socks. For the administrative purposes of the ICD and its many users, however, it is important that differences such as those expressed in (8) should be accounted for terminologically. Perhaps, then, a term like (8) should be acknowledged as representing a concept? But no, and yet again: no. (8) is about tuberculosis; indeed, it is about tuberculosis of adrenal glands (and thus it is also about glands) and similarly 18 Compare the strategy based on the Modal Relation Ontology outlined by Rudnicki [58]. it is also about sputum [60]. Wherever (8) occurs in any document prepared by some clinician user of ICD, we can be sure that the author of this document is quite clear in her mind that that is what this term is about. She is not using this term to refer to, for example, someone’s clinical thought. Combination terms like (8) involve the mixing together of properly ontological terms (representing universals in the domains of disease, anatomy, and species taxonomy) with epistemological terms relating to how particular instances of a disease were discovered to exist, a matter of how, in this case, reality is understood by health professionals. Manipulation of such combinations is an indispensable part of information-driven medical research, and so there is certainly no objection to developing ontologies whose terms would capture distinctions such as that between a bacterial culture test and a microscopy assay. Such ontologies are indeed already being developed (see for example Bandrowski et al [62] and Gurcan [63]). Needed, too, are ontological resources which allow the representation of what we might think of as administrative aspects of medical or scientific discourse. Consider a term such as: 9. Subject in clinical trial SwEaTB for Diagnosing of Acute Tuberculosis. Here, we have a term that is not intended to represent a universal or the extensions of a universal (in anything like the Aristotelian sense). Rather, it is intended to capture what we can think of as a convenience combination (also called “defined class” [64]). We then need to distinguish two kinds of ontologies: what we might call “reference ontologies,” on the one hand (dealt with in the sections on “The Foundational Model of Anatomy” and “The Open Biological and Biomedical Ontologies (OBO) Foundry” below), which are designed to be of global reach and application neutral and thus to capture universals, together with, on the other hand, “application ontologies,” which result from the combination of terms from reference ontologies together with terms such as (9) developed for local, application-specific purposes [65]. Building this sort of bridge between application ontologies and reference ontologies is by no means a trivial matter [66]. Experience strongly suggests, however, that it is the only course that will avoid the sort of destructive proliferation witnessed in the ontology field in the 1990s. The Foundational Model of Anatomy From around 2013, a paradigm shift has been occurring in biomedical terminology and ontology development circles [67] away from the concept orientation. Attempts have since then been made for example to create an ontologically robust upper-level structure for SNOMED CT [68]. The first biomedical ontology to be developed in the spirit of ontological realism, however, came much earlier. This was the Foundational Model of Anatomy [69], which addresses the need for a generalizable anatomy ontology that could be used and adapted by any computer-based application that requires anatomical information. The FMA is a domain ontology that represents a coherent body of explicit declarative knowledge about human anatomy. It has the potential for enabling many digital applications involving reference to and manipulation of information about anatomical entities, for instance in educational applications, particularly in the domain of distance learning, and as the basis for computer models, for example in the area of human anatomical development. Its ontological framework can be applied and extended to all other species, and it provides the template for CARO (the Common Anatomy Reference Ontology) [70] and much of the content for the UBERON integrated cross-species anatomy ontology [71]. The FMA as a Canonical Ontology The FMA is very large, comprising some 120,000 terms and over 2.1 million assertions of relationships between the entities represented by these terms. Yet for all its size, it addresses only what we can provisionally think of as the normal healthy human being. This is because the attempt to do justice to, for example, all possible types of variants and pathologies affecting human anatomy would lead to an explosion in size, which would make the result unmanageable and probably also of little utility. Rather, the strategy of the FMA is to constitute a canonical ontology, ranging over types (universals) which are in a sense idealizations of the human organism’s body and of its component parts. More precisely, the FMA represents all material objects, all portions of substance, and all spaces that result from the coordinated expression of the structural genes of the human organism (in a good approximation: all parts of a normally developed human body, from the macromolecular to macroscopic levels of granularity). Canonical anatomy is thus distinct from instantiated anatomy, which comprises anatomical data about individual organisms. Though it does not itself comprise such data, the FMA serves as a valuable framework for capturing and storing instancelevel anatomical data in computable form, by providing the vocabulary for describing those ways in which instantiated anatomical structures can depart from what is canonical [72]. Canonical Relations To capture the meanings of its terms in a computer-parsable form, the FMA ontology, like the other biomedical ontologies which have followed in its wake, consists primarily of statements of the form “A rel B,” where “rel” stands for a relational expression such as “constitutional_part_of,” “has_regional_part,” “is_member_of,” “is_tributary_of,” and most importantly “is_a” (meaning either: is a subtype of or is a subclass of), which is the relation used to determine the backbone taxonomy of every ontology. The upper part of the FMA backbone taxonomy is represented in Fig. 5.5. (“Anatomical Space,” here, refers to the sorts of channels and cavities referred to in section “Aristotle’s Table of Categories” above.) Fig. 5.5 Upper-level structure of the Foundational Model of Anatomy (arrows express is_a relations) The now standard way of defining part_of and other such relations between types in ontologies is by reference to the relations that hold between the corresponding instances of these types and using the FOL device of quantification. Two major types of definitions are then required, for relations between types of processes, on the one hand, and between objects and their parts and aggregates, on the other. For the former, we have: X has_part Y = def. For any instance x of the process type X, there is some instance y of the process type Y, which is such that y instance-level-partof x. Example: Development of Spleen has_part Development of Splenic Lobules. For the latter, however, we need to take account of time, in order to do justice to the fact that objects can gain and lose parts while preserving their identity: X has_part Y = def. For any time t and for any instance x of the object type X at t, there is some instance y of the object type Y at t, which is such that y instance-level-part-of x at t. Example: Set of Teeth has_part Left Maxillary Dentition. On the basis of a set of definitions modeled on the above, a group of leaders of different groups of biomedical ontology developers, including not only the FMA and Gene Ontologies but also the GALEN group around Alan Rector, developed the Relation Ontology (RO) [73]. This provides a basis for the formal definition of the relations used by biomedical ontology developers in a way that promotes interoperability of the ontologies, which use them and thereby allow new types of automated reasoning both within and across ontologies.19 In some domains, universal parthood assertions of the abovementioned sort are unproblematic. This holds for example for relations between molecules and their parts in chemistry. It also holds for certain anatomical relations, such as Neuron has_constitutional_part Plasma Membrane.20 In biology in general and in medicine in particular, however, such universal assertions are problematic because there are variants (for example, some humans have a middle lobe of left lung) and pathologies (for example tumors, or missing teeth). Many assertions of relations in the FMA hold, therefore, as a matter of canonical ontology. This means that an FMA statement such as Skin of Thumb has_regional_part Nail of Thumb is not an empirical assertion. Thus, it is not falsified by the existence of human thumbs from which the nail has been removed. Rather, it is a statement that expresses how Nail of Thumb and Skin of Thumb are supposed to relate to each other in virtue of the workings of the underlying structural genes of the human organism. 19 The current version of the Relation Ontology can be found at http://www.obofoundry.org/ontol- ogy/ro.html. An expanded set of upper-level relations, developed to deal with the problem documented by Grewe, et al. [74], is provided in part 2 of [75]. 20 The two main types of part in FMA are constitutional parts, which are genetically determined, as in Hand has_constitutional_part Skeleton of Hand, and regional parts, where the part entities are the results of fiat delineation using arbitrary landmarks, as in Hand has_regional_part Digit [76]. Aristotelian Definitions A further crucial contribution of the FMA to the subsequent development of biomedical ontologies is in the field of definitions. The goal of a dictionary definition is to provide an explanation of the meaning of an expression that is useful to humans. In the ideal case, the dictionary provides an explanation that is built out of terms that are more familiar and simpler in meaning than the term to be defined. Often, however, dictionary definitions will amount to mere paraphrases, and they may be circular, either directly or indirectly (as when term A is defined using term B, but term B is defined using term A). Often, too, multiple, mutually inconsistent definitions are provided for a single term. To reach the goal of providing a tool to support logical reasoning, FMA requires a set of logically consistent definitions, with at most one definition for each term and structured in such a way that each definition provides a statement of individually necessary and jointly sufficient conditions for the correct application of the term defined [77, 78]. To address these needs, Rosse and his collaborators introduced the idea of what, drawing on the ideas of Aristotle discussed above, they called “Aristotelian definitions.” A definition of the form S = def. a G which Ds, where S stands for species, G for genus, and D for differerentia(e), tells us that, if we know that something is a G which Ds, then we know that it is an S, and if we know that something is an S, then we also know that it is a G which Ds. Here, G is the immediate parent of S in the backbone taxonomy of the salient ontology, and D is what sets apart those Gs which are Ss from the rest of the Gs. An example from the FMA ontology is: Anatomical structure [S] = def. Material anatomical entity [G] which ⌈is generated by coordinated expression of the organism’s own genes that guide its morphogenesis; has inherent 3D shape; is such that its parts are connected and spatially related to one another in patterns determined by coordinated gene expression⌉ [D] [69] where ⌈ ⌉ marks out the collection of sufficient conditions that forms the salient specific difference. Together, G and D specify the essential characteristics of any S. And a “group of entities that share the same set of essential characteristics constitutes a class of the ontology” [79]. The Gene Ontology (GO) Background In 1977, Frederick Sanger and his collaborators sequenced the first full genome, that of a virus called phiX174. Since that point, the biological and biomedical sciences have been subjected to a process of upheaval as a result of the need to take account of the gigantic amounts of molecular assay data that have been generated in the wake of the successful completion of the human and the various fly, mouse, fish, yeast, and other model organism genome projects. Practically, all aspects of what we might call “old biology” were destined to be transformed as biologists and clinical scientists worked out how to take account of these new data in dealing scientifically not only with the many new kinds of entity being disclosed at the molecular (and finer) levels through the advance of science, but also with all the already recognized phenomena at coarser levels of granularity (cell, tissue, organ, organism, population) upon which biology and medicine had hitherto been based [80]. But how to make the gigantic quantities of new data discoverable and usable by biologists in a situation where the primary source data live in many independently developed biological databases? How to transform these many efforts into a single cooperative force? Among the very earliest repositories for the new data, created already in the 1970s, were the first protein structure database (Protein Data Bank, https://www.rcsb.org/pages/pdb50/) and the first mammalian genetics database (created at the Mouse Genome Informatics (MGI) resource of the Jackson Lab, http://www.informatics.jax.org/). These were followed in 1981 by the first depository for nucleotide sequences, established in 1981 at the European Molecular Biology Laboratory (EMBL) in Heidelberg (https://www.embl.org/about/history/). Each of these contributed to the strategy of using molecular assay data deriving from model organisms to advance our understanding of human health and disease, the idea being that clinical scientists could harvest the results of experiments carried out on model organisms in order to draw conclusions relevant to humans by exploiting crossspecies homologies. The reason gene (and corresponding protein) sequences are similar between organisms is because of their descent from a common ancestor. When GO was founded, it was widely hypothesized (and is now supported by a great deal of evidence) that function is also generally conserved, so that an experiment that elucidates an aspect of the function of a gene in the mouse, or in yeast, could tell us about the function of related genes in humans. The GO made it possible to test this hypothesis computationally at large scale and, more importantly, to infer the functions of human genes by studying other, more experimentally tractable systems. It is this idea which provided initial impetus for the development of the GO. By the turn of the millennium, the number of biological databases was reaching a level where it had become unmanageable. Attempts to create a federated system failed, not least because it was so hard to get the many groups involved to agree on how the data should be structured and labeled. The fear, too, was that such a federated system would create what Suzanna Lewis refers to as “a technological behemoth that would be unable to respond to new requirements when they inevitably occurred.” The most fundamental questions for the biologists served by the model organism databases revolved around the genes. … One essential aspect of this, which everyone agreed was necessary, was systematically recording the molecular functions and biological roles of every gene. ([81], emphasis added) The Origins of the GO In the 1990s, Michael Ashburner began assembling classifications of molecular functions and biological processes, originally to serve the requirements of FlyBase, the database for Drosophila genetics and molecular biology. At around this time, different model organism communities began to see that they could solve a significant portion of their data integration issues if a functional classification system were created that was cross-species in nature. The goal was to get the developers of databases focused on sequence (nucleic acid or protein) together with the developers of other specialty biological databases built for different ranges of organisms to agree on how this should be done in a way that would work for all organism communities. Lewis describes against this background how the GO came into being in 1998: In July of that year, Michael Ashburner presented a proposal at the Montreal International conference on Intelligent Systems for Molecular Biology (ISMB) bio-ontologies workshop to use a simple hierarchical controlled vocabulary; his proposal was dismissed by other participants as naïve. But later, in the hotel bar, representatives of FlyBase [Ashburner], SGD [the Saccharomyces (yeast) Genome Database] (Steve Chervitz), and MGI (Judith Blake) embraced the proposal and agreed jointly to apply the same vocabulary to describe the molecular functions and biological roles for every gene in our respective databases. Thus we founded the Gene Ontology Consortium. ([81]; compare [82, 83]) Note that the vision was not to create a database covering all functions of all genes in all organisms. Rather—and here lay the brilliant insight of Ashburner, Lewis, and their collaborators—it was to create a controlled vocabulary for representing types of molecular functions and to use this vocabulary to annotate (or “tag”) occurrences of references to corresponding genes or gene products in literature or in data in such a way as to make the latter discoverable by third parties from different branches of biology. The GO became, in effect, an engine for searching in literature and data what was still mostly hidden to outsider communities because it was inadequately or inconsistently described. It was based on annotations created by human beings (PhD biologists), pioneers in the new discipline of biocuration. The GO itself was to a large degree populated through the work of such biocurators. The annotations themselves would then be compiled, in conjunction with the UniProt protein sequence repository [84], to form the GO Annotation database (GOA) [85]. Then came more sophisticated software tools such as the Amigo browser (http://amigo. geneontology.org/), which allowed a significant fraction of the world’s biological literature and data to be subjected to filtered search, allowing an investigator, for example studying the process of muscle development in Bos taurus (cow), to find immediately all proteins documented as involved in this process, all the articles in which this involvement is documented, and the source and nature of the evidence which each of these articles provides.21 21 http://amigo.geneontology.org/amigo/search/annotation?q=muscle%20development The result was called “Gene Ontology”, not because it was an upshot of the work on ontologies growing out of the KR and other computer-associated disciplines in the preceding years, but merely because “ontology” was, in 1998, the word du jour. The KR ontologies were in many cases, as we saw, products of a view to the effect that for every different project a new ontology is needed. The more ontologies, after all, the better. But then the results, for all their bangs and whistles, proved (not surprisingly) useless as soon as their authors moved on to the next project. The GO, in contrast, resulted from the insight that a simple controlled vocabulary could unite the many sequence data-driven projects springing forth on all sides. It started out not as a sophisticated computer artifact, but rather as just a simple directed acyclical graph that could serve as the basis for an indefinitely extendable project of annotation. The nodes of the graph are terms22 (again: nouns and noun phrases, albeit now associated with alphanumeric identifiers, definitions, URIs, and so forth), and its edges are relations (initially just is_a and part_of).23 The fact that the GO was developed and maintained primarily by experts in molecular biology led initially to a certain animosity between the GO community and the community of those who had been developing ontologies on the basis of their computer expertise. However, with the eventual adoption by the GO community of OWL as their ontology development language, and with the ever-increasing numbers of powerful software tools and algorithms and research methodologies made possible by the existence of the GO and its sister ontologies, this animosity has now largely disappeared. Initially, too, there were sceptics on the biology side, above all Sydney Brenner, winner of the 2002 Nobel Prize for his discoveries concerning programmed cell death. In the same year, Brenner published a paper entitled “Life sentences: Ontology recapitulates philology,” charging the GO Consortium with the desire to transform genomics into what he called “genamics.” To do serious theoretical work, Brenner held, the network we should be interested in is not the network of names but the network of the objects themselves. The language of these objects is not the Oxford Dictionary of Molecular Biology … but the language of molecular biology itself. [There the] objects have their own names: they are chemical names written in the language of DNA sequences and the arrangements of amino acids on protein surfaces. [87] What Brenner failed to see was that, even if all of us become fluent in the language of chemical names, we would still need to connect what we can say in this language with what we need to say in all the languages of old biology, including, not least, the languages of clinical medicine. Stefan Schulz (personal communication) points out that “label” is in some ways preferable to “term.” A text string such as “Primary malignant neoplasm of lung (disorder),” for example, would never be used by any human author of scientific text. In the end, however, he favors over “term” the expression “representational unit,” whose advantages are outlined by Smith et al. [86]. 23 For the current set of relations in GO, see http://geneontology.org/docs/ontologyrelations/ 22 Since its inception, indeed, the GO has gone from strength to strength. It is today by far the world’s most successful scientific ontology, whether measured along the dimensions of number and variety of associated software applications; quantities of data and literature annotated using its terms; number, size, and degree of utilization of major databases incorporating these terms; numbers of experiments performed with its aid; and so forth. There are multiple drivers of this success. One of the main ones is that the GO and the GO annotations are hand built by human curators, who use the scientific literature as a basis for their work. The result is an extract of biological knowledge captured using GO (and sister ontology) terms and relations, which has proved itself to be of tremendous utility. There have, to be sure, been a number of proposals to leave population of the GO to machine learning. The problem with this approach is that it is not possible to create an algorithm that can extract knowledge from scientific literature automatically [88]. Algorithms can be used for the sort of approximative text translation that is made available by Google Translate, but they cannot achieve results with the sort of accuracy that is required for the scientific purposes of the GO [88]. One very fruitful application of GO is to what is called the enrichment analysis of gene (product) datasets. In intervention studies (for example genetic or pharmacological interventions) or time-series analyses, the GO can be used to obtain an overview of the cellular locations, functions, and biological processes in which the gene products are involved in order to develop hypotheses about dependent variables or outcomes analyzed in such experiments. The GO can also be used to classify and assess the status of independent variables in order to identify confounding effects (hidden covariables). Powerful software applications have been developed for these purposes, including the GO-Figure! visualization tool developed by Reijnders and Waterhouse [89].24 Another reason for the GO’s success is that it makes certain sorts of investigations possible that would just not be possible without it. The point is not just that genomic data is annotated with the same shared ontology, nor that this enables such data to be exchanged and integrated. Still more important is that the resulting huge and ever-growing unified knowledge base about the functions of genes makes it possible to interpret large-scale measurements of gene expressions (or other -omics measurements) in relation to an unending series of biological phenomena. Examples of studies, selected at random from those published just in recent weeks, use the GO to identify pathways implicated in suicide behavior, breast cancer survival, autism spectrum disorders, involvement of calcium signaling in schizophrenia, association signals of dental caries, disease modeling in C. elegans, and many more (Fig. 5.6). 24 At the same time, care must be taken to avoid misuse of the GO annotation data, for example by failing to take account of the ontological structure of the GO itself or by ignoring the evidence codes, which provide information as to the methods by which the data expressed in annotations were obtained [90]. The GO Table of Categories The three questions you want the answers to when you discover a new gene product or complex are the following: What does it do at the molecular level of granularity? To what downstream biological processes does it contribute? Where is it located in the cell? The GO is accordingly divided into three sub-ontologies, whose respective root nodes—referred to in the original GO paper [92] as the “Three categories of GO”— are defined as follows: Molecular function = def. Biochemical activity (including specific binding to ligands or structures) of a gene product. This definition also applies to the capability that a gene product (or gene product complex) carries as a potential. Examples: ‘enzyme’, ‘transporter’, ‘ligand’. Fig. 5.6 Fragment from the GO Biological Process Ontology, from [91] Biological process = def. Biological objective to which the gene or gene product contributes. A process is accomplished via one or more ordered assemblies of molecular functions. Processes often involve a chemical or physical transformation, in the sense that something goes into a process and something different comes out of it. Examples: ‘cell growth and maintenance’, ‘signal transduction’. Cellular component = def. Place in the cell where a gene product is active. Examples: ‘ribosome’, ‘nuclear membrane’, ‘Golgi apparatus’. Function in the GO The GO has retained its original modular architecture and its general structure and methodology over its more than 20-year history. But it has been subject throughout this entire period to considerable revisions at lower levels. This is primarily a matter of the depreciation of terms deemed obsolete, revision of definitions, or addition of new terms and even of new families of terms, for example covering hitherto underrepresented domains, such as immunology [93]. The passage of time has seen also revisions to the original definitions of the three GO categories, and it is especially in connection with the GO’s definition of “function” that controversy has arisen. The current definition of molecular function reads as follows: 10. Molecular function = def. Molecular process that can be carried out by the action of a single macromolecular machine, usually via direct physical interactions with other molecular entities. Function in this sense denotes an action, or activity, that a gene product (or a complex) performs. (http://geneontology.org/, as of August 7, 2021) At the start, the assumption was built into the GO worldview that is_a relations can never span the boundaries between the three GO sub-ontologies. The functions at the (chemical) level of granularity of molecules in the GO thus stand in some way opposed both to the processes occurring at higher (“biological”) levels of granularity and to the locations in the cell. Quite rightly, I believe, many of those who first encounter the GO are therefore confused by the fact that its molecular function ontology is populated primarily with terms designating types of activities—terms such as “ion channel regulator activity,” “regulation of lysozyme activity,” “ceramide floppase activity,” and “regulation of phosphatidate phosphatase activity.” The new definition of molecular function (10) tells us, indeed, that molecular function is_a molecular process. In normal usage, however, and also in the usage of many philosophers, functions are not a special type of process. Rather, they are certain sorts of historically grounded potentials or capabilities in things that can be realized in processes when suitable circumstances obtain. As the term “function” is normally understood, a function can fail entirely to be realized, or it can be misrealized. A well-oiled machine, for example, will indeed perform its function in normal circumstances, but when things go wrong, then it can behave (act) in all sorts of nonfunctional—or, as we might also say, noncanonical—ways. In a sense, the functions of the macromolecular machines inside an organism are being continuously realized, just as the principal function of the organism’s heart is being continuously realized for so long as the organism is alive. But macromolecular machines change their activity patterns. For example, the sleeping brain is biochemically very active, but the pattern of activity differs from the wake pattern, and it differs again if one suppresses the normal rest pattern by taking sleeping pills or alcohol or both. Strictly speaking, of course, these latter cases are irrelevant to the GO. The GO, too, is a canonical ontology, and the scope of its molecular function ontology is determined by those molecular level processes that the organism evolved to perform because it allowed the organism to better survive and reproduce. This is the meaning of “canonical” for molecular (and indeed for all) functions. The GO is canonical also in that it does not deal, for example, with processes which are induced experimentally. Unrealized functions at the molecular level are also out of scope. However, the cyclic nature of many activities in organisms means that the referent of “activity” will even in many canonical cases differ from one phase to the next. The referent of “function,” in contrast, will always be the same. Defining precisely the meaning of the term “function” is a nontrivial matter, and philosophers and others have proposed various alternative definitions. In the Gene Ontology Handbook, Paul Thomas provides an account of the GO’s usage of “function” according to which it is the standard etiological or “selected effect” definition of function that is intended by the GO [94]. We believe that his arguments for this interpretation—an interpretation which we also defend [95]—are sound. Unfortunately, as we shall see, the strength of his arguments is diminished by the fact that he adopts the terminological convention at work already in the original GO definition of “molecular function” provided in the previous section of this chapter. He neglects an important distinction, namely that between a function (program, capability), on the one hand, and its corresponding activity (realization/execution/performance) on the other. In a simplified version of the selected effect account (based on Millikan [96], which Thomas also cites), a function is defined as follows: 11. A has function F = def. A originated as a reproduction (for instance as offspring, or as copy) of some prior item or items that, due in part to possession of the properties reproduced, have actually performed F in the past, and A exists because (causally historically because) of this or these performances. It is, very briefly, the function of my heart to pump blood because my ancestors’ hearts’ pumping blood through their bodies kept them alive and because I exist because of this. We note that, according to this definition, it would still be the function of my heart to pump blood even if (for example because I am connected to a heart-lung machine) it is currently unable to do so. It would still be the function of my screen to display pixels even if (for example because my machine is switched off) it is currently unable to do so. As Thomas correctly points out, it is an advantage of the selected effect approach that it explicitly incorporates evolutionary considerations by requiring that the function of any biological entity ultimately derives from its history of natural selection. The approach thereby provides a method for determining which—among the myriad potential alternative effects the actions of a particular entity might have—are properly to be considered as the exercise of its function. One effect of my heart pumping, for example, is to produce sound, but this is not a part of the function of my heart, because this effect was not selected for. The terminology of the GO has been built in such a way as to do justice to the selected effect account of function, but in a way that most subtypes of “function” are labeled “activity.” This is not because Thomas and others failed to appreciate the difference between function and activity, but rather because, in the canonical world of the GO, function and activity go so tightly hand-in-hand with each other that it would be terminologically redundant to provide representations of both (thus both to catalyze and catalytic activity, both to regulate and regulating activity, and so forth—compare also the function and process columns in Table 5.1). The simplest would be to rename the GO “molecular function” ontology, and to call it instead the “molecular activity ontology” or—following a practice which I understand is already favored in certain GO circles—the “molecular functioning ontology,” thereby adding the explanation that molecular “activity”/“functioning” means “the exercise of a function of a macromolecular machine” and providing as part of its glossary a suitable definition of “function.” It is important to keep both “activity” and “function” in circulation, however. For it might certainly be the case that, in the canonical world of the GO, it is trivial that any activity of Xing that is realized under a particular set of conditions (which is in practice how evidence to assert that a gene product is an instance of a given GO class is obtained) is also the realization of a function: to X. But this is no longer true when GO is being used in those areas where there are departures from what is canonical. There are multicellular systems in my heart, which have the function to contract. This function remains one and the same even under those noncanonical conditions, where my heart is not functioning very well and where contraction activities therefore depart from the canonical. Thomas summarizes his account of molecular function in two places, as follows: 12. In the GO, a molecular function is a process that can be carried out by the action of a single macromolecular machine, via direct physical interactions with other molecular entities. Function in this sense denotes an action, or activity, that a gene product performs. 13. A function as conceived by molecular biologists (in what could be called the ‘molecular biology paradigm’) refers to specific, coordinated activities that have the appearance of having been designed for a purpose. That apparent purpose is their function. To do justice to the ontological distinction between function and processes/activities that realize them, these would need to be amended to read: 14. In the GO, a molecular function *is realized in* a process that can be carried out by the action of a single macromolecular machine, via direct physical interactions with other molecular entities. *This realization is* an action, or activity, that a gene product performs. 15. A function as conceived by molecular biologists (in what could be called the ‘molecular biology paradigm’) *is what is involved where* specific, coordinated activities have the appearance of having been designed for a purpose. That apparent purpose is their function. Extending the GO Interestingly, in elucidating his account of function, Thomas draws on Jacques Monod’s idea in Chance and Necessity [97] of teleonomy. Monod defines this as “the characteristic of objects endowed with a purpose or project, which at the same time they exhibit through their structure and carry out through their performances” (p. 9). This applies to artifacts such as screwdrivers, which are designed to have a certain purpose. For living systems, however, we cannot talk of design. Rather, as Thomas writes, “what appears to be a future-goal-oriented action by a living organism is, in fact, only a blind repetition of a genetic program that evolved in the past.” More completely, however—for there are two series of blind repetitions here—he should write: 1. The program is copied over and over again through (blind) biological processes of copying the relevant entity (for instance the relevant macromolecular machine). 2. The execution of each copy of the program is repeated in the successive realizations of that entity’s function. Teleonomy, for Monod, is present at all levels of a biological system, from proteins (which he calls “the essential molecular agents of teleonomic performance”) to “systems providing large scale coordination of the organism’s performances … [such as] the endocrine and nervous systems” (op. cit., p. 62). At all levels, indeed, we have objects, and systems and parts of objects, performing (activities) which realize apparent purposes (functions), such as pumping blood, regulating chemical levels in the blood, and removing damaged cells from the blood. And in each of these cases, we have to deal not only with functions of the body involving groups of cells interacting via molecules or ions, but also with functions of parts of the body at higher levels of granularity than molecules, which we might therefore call biological functions. And interestingly, although Thomas’s [98] paper deals almost exclusively with functions at the molecular level, its title is “The Gene Ontology and the Meaning of Biological Function,” though by this he means not the functions of organs such as heart or lungs, but rather of systems of macromolecular machines, which Monod sees as analogues of cybernetic systems, thereby reflecting the way in which biologists today conceptualize the feedback loops constructed from multiple molecular activities. Ontological categories Function Level of granularity To pump blood biological program for heart Molecular system, cell, organ, organism contraction: set of molecular Process Pumping blood GO:0060047: heart contraction — multicellular organismal process in functions that, if which the heart executed in the proper context, decreases in volume in a would result in characteristic way heart contraction Object Heart to propel blood through the body Molecule To catalyze a biochemical reaction Catalysis of a chemical reaction Catalyst Table 5.1 In this table the GO ontology is extended by modules to accommodate biological functions, and their bearers, at higher levels of granularity. The table is adapted from the presentation described in footnote 27, but incorporating amendments by Paul Thomas (personal communication)a Table 5.1 depicts, on this basis, the GO architecture extended in such a way that an explicit division is drawn between levels of granularity along the vertical axis and kinds of entity along the horizontal, where the shaded cells correspond (roughly) to the coverage domain of the original GO. Thomas now describes his own position as follows (personal communication): ‘of course for GO, it is all ultimately at the level of molecules. It is the Gene Ontology—it is a conceptualization of how genes (technically, gene products, which are molecule types that are encoded by genes) function at the molecular level and at the system level. Essentially, the system level for molecular biologists is conceptualized as a highly integrated, coordinated execution of individual molecular activities. So in GO, the system level (biological process) is also represented in terms of gene products and their activities/functions. GO was not constructed for describing the functions of higher order objects like the heart, though of course in practice, it is natural to describe some biological programs in terms of higher order objects. For example, GO describes the genetic programs (BP), carried out by the activities of gene products (MF), that result in heart contraction. GO also describes the genetic programs, carried out by the activities of gene products, that result in the construction of the higher order objects themselves (e.g., heart development). But GO biological processes also include subcellular processes: genetic programs that transmit a message (in the form of molecules of a given type) from outside a single cell to the cell nucleus (e.g., the Wnt signaling pathway), which is executed only by a set of molecule types, not a higher order object.’ a The Open Biological and Biomedical Ontologies (OBO) Foundry The Birth of OBO As we learn from the subtitle of the landmark paper [92], the GO was originally conceived as a “tool for the unification of biology.” In other words, the GO was built to foster the melding together of biological datasets and techniques across disciplines, across levels of granularity in the organism, across species, and across geographically dispersed communities of originators and compilers of data and of researchers using these data. An example of success in this regard is the way in which the GO enables communication across all the disciplines collaborating for example in a field such as aging research, which involves the study of model systems of human aging in organisms as diverse as yeast, reptiles, and whales. Already in 2001, the trail laid by the GO opened the way for the creation of a series of controlled, cross-species vocabularies for neighboring areas of biology through the creation of a public ontology repository, originally (we imagine for a very short time) referred to as “GOBO,” for “Global Open Biomedical Ontologies” [99], and subsequently dubbed the “OBO Library.” The rules for building ontologies for this library can be found in a tutorial presented by Ashburner and Lewis at the Intelligent Systems for Molecular Biology (ISMB) conference in 2005 on “Principles of Biomedical Ontology Construction” (http://bit.ly/2GUkpoh) [100]. Most important are that the ontology must be shared without limit, and thus that it must be in the public domain and easily findable;25 that it is used in application to actual instances of important scientific data; and that it is maintained in such a way that, where such application leads to identification of errors and gaps, the latter will be promptly rectified. In the case of the GO, this strategy produced a positive snowball effect, making the GO increasingly attractive to successive cohorts of new users, who themselves identify new errors and gaps, giving rise to a regimen of continual improvement of a sort that was unknown to ontologies before the GO. The Birth of the OBO Foundry The OBO Foundry was first conceived at a meeting held in Leipzig in 2004 on the topic of The Formal Architecture of the GO. Other groups from the KR and OWL communities had attempted earlier to interest the GO community in the benefits of a more ambitious approach to ontology development, especially as concerns the treatment of logic and definitions. Where these earlier efforts had failed,26 some of the new arguments presented at this meeting drawing on the perspective of ontological realism met with greater success.27 25 The OBO community here anticipates the modern FAIR approach [101]. No less important was the introduction of new rules to promote coordinated ontology development, the idea being that ontologies would be admitted as members of the Foundry initiative only if their developers had committed in advance to certain principles, for example relating to working within set boundaries (for example of proteins or cell types) and agreeing to collaborate on those terms which relate to entities in areas where boundaries overlap. The details of Foundry organization were then worked out in a series of meetings, some of them under the auspices of the then newly established National Center for Biomedical Ontology (NCBO).28 Building the Foundry was viewed as amounting to distinguishing within the original OBO Library as a whole an inner compartment comprising, at any given stage, those ontologies certified to have satisfied both the Library principles and also a series of additional principles specific to the Foundry, designed to advance the quality and interoperability of its included ontologies [103]. The core goal of the OBO Foundry—where “OBO” is now understood as meaning “open biological and biomedical ontologies”—like that of the OBO Library is to create a situation where ontologies would support efficient knowledge accumulation in the life sciences by providing recommended sets of terms for annotating data in each life science domain—thus one set of terms for proteins [104], one set of terms for small molecules (or chemical entities of biological interest: [105]), one for plants [106], and so forth. The terms in each ontology would be accompanied not only by natural language definitions designed to ensure that the terms are correctly used by those (humans) involved in creating annotations of biological literature and data, but also by formal definitions designed to promote computer-aided reasoning with the resulting annotated data. For the Foundry ontologies, a layer of governance was introduced—in some ways analogous to the editorial board of a scientific journal—and a process of review was established which would certify conformance to the Foundry principles. The current set of principles includes the requirement to use a standard ontology language (currently OWL) and use of Basic Formal Ontology (see below) as shared top level. In fact, BFO [64] makes itself manifest already in the terminology used in the top two rows of Table 5.2, which depicts the initial structure proposed for the OBO Foundry, a structure which in effect extends to include also ontologies external to the GO. One member of the OWL community remarked to me at the time that “a meeting on the formal architecture of the GO? Well … that would have to be a very short meeting, then.” 27 These arguments were summarized in a presentation by the author entitled “STOP!” (for Smart Terminologies through Ontological Principles—http://ontology.buffalo.edu/04/STOP_GO_5_04. ppt), whose goal was to show how the realist perspective can help in the identification of errors in the GO. See also [75, 102]. 26 In a parallel development, there arose at about the same time what would become a much larger biomedical ontology repository extending the original OBO Library idea, namely the NCBO BioPortal (https://bioportal.bioontology.org/), which provided the advantage of providing access to ontology-structured versions of SNOMED CT, HL7, MeSH, and other major resources from the world of medical terminology. The BioPortal adopted a very liberal strategy of acceptance of ontologies, which was in a sense at the opposite extreme from the strategy of the OBO Foundry.29 This, however, created for the BioPortal a problem of redundancy and lack of consistency between the (now of the order of) 500 ontologies listed, a problem which was further exacerbated as new ontologies were developed incorporating reuse of terms and definitions from multiple already established ontologies but supplying them with new term identifiers and new URIs. This conflicts with the OBO Foundry goal of creating a set of mutually consistent and non-redundant ontologies for the life sciences that would promote for each term a unique recommended natural language definition, formal definition, and URI.30 Basic Formal Ontology (BFO) BFO was adopted as required top level for ontologies in the Foundry in order to make available a common set of categories (highest level universals) that would serve as the shared starting point for the definitions of lower level universals included in the coverage domains of the separate biomedical ontologies in the Foundry. BFO itself was developed as a very small representational artifact with the narrowly focused task of providing a top-level ontology, which could be used to support the integration of domain ontologies developed for purposes of scientific research. As Tables 5.1 and 5.2 make clear, the structure of the set of ontology modules of the OBO Foundry is derived, in effect, by taking the crossproduct of BFO’s top-level categories with the multiple granular levels (of molecule, cell, organ, organism, population) relevant to biology. The FMA was the first extensively populated ontology to take advantage of the theoretical foundations of such a toplevel ontology and thereby extend the latter into the biomedical domain [72, 108]. 28 http://ncorwiki.buffalo.edu/index.php/NCBO_Sponsored_Dissemination_and_Training_ Events_2005-2015 29 Later, there arose the Ontobee portal (http://www.ontobee.org/) [107], a biomedical ontology repository that is optimized for the purposes of the OBO Foundry using the technology of a linked data server. Ontobee is a linked ontology data server, supporting ontology term dereferencing, linkage, query, analysis, and integration. 30 The most recent versions of the BioPortal go some way to solving this problem by generating search results in such a way that the original source definition of a term is returned first in the list of all the ontologies, which use that term with that definition. Relation to time Continuant Independent Occurrent Dependent Granularity Organ and organism Cell and cellular component Molecule Organism (NCBI Taxonomy) Cell (CL) Anatomical Entity (FMA, CARO) Cellular Component (FMA, GO) Molecule (ChEBI, SO, RNAO, PRO) Organ Function Cellular Function (GO) Phenotypic Quality (PaTO) Molecular Function (GO) Biological Process (GO) Molecular Process (GO) Table 5.2 The projected structure of the OBO Foundry from around 2005 (shaded regions correspond to the three original GO ontologies) Adopting BFO allowed: 1. The explicit formulation of aspects of the development methodology and architectural structure of the OBO Foundry (and other) ontologies in ways that have helped steer their subsequent development [64] 2. Providing a readily applicable technique for formulating definitions of terms in these ontologies [78] 3. Formalizing relations [73] 4. Supporting the strategy of cross-product definitions [109], whereby definitions in one OBO Foundry ontology will draw on terms defined already in other such ontologies, for example in the case of those GO terms whose definitions incorporate representations of molecules drawn from the ChEBI chemistry ontology, as described by Hill et al. [110] 5. Formally encoding the OBO Foundry principles as operational rules and applying the resultant checks across the full OBO suite of ontologies, thereby demonstrating how a sizable federated community can be organized and evaluated on objective criteria that help improve overall quality, interoperability, and sustainability [111] The Evolution of BFO There have been four releases of BFO thus far.31 Version 1 was released in 2001, and the influence of Aristotle’s table of categories on this first version can be seen in the similarity of terminology and structure as between the upper rows of Fig. 5.2 and those of Fig. 5.7. Another influence was the top-level ontology DOLCE [112]. BFO shared with DOLCE from the very start an architecture based on two orthogonal divisions of entities into disjoint categories of (1) continuant vs. occurrent and (2) independent vs. dependent. Material objects (Aristotle’s substances) are independent continuants; qualities are dependent continuants; and processes are occurrents. Entities in all of these categories exist on the level of both universals and instances.32 The release of BFO 1.1 in 2007 was prompted by the need to enable coverage of information artifacts, nucleotide sequences, and similar (copyable) entities, a need which arose with the birth of two new ontologies, the Ontology for Biomedical Investigations (OBI) in 2006 [62, 115]33 and the Information Artifact Ontology (IAO; see [60]), which provided terms used to represent entities such as publications, footnotes, protocols, and databases. The release of BFO 2 in 2015 reflects the transition from an OWL DL to an OWL 2 formalization, as well as the addition of term IDs and of temporalized relations. A preliminary version was released for review at the 2012 meeting of the International Conference on Biomedical Ontology. By the year 2020, BFO has come to serve as something of a stable attractor to ontology developers (http://basic-formal-ontology.org/users.html, [117]), thereby giving rise to powerful network effects analogous to those brought by the QWERTY keyboard and the TCP/IP internet protocol, whereby each successive new user BFO raises the value of the artifacts created on its basis by earlier users, in another positive feedback loop. As a consequence of these developments, the Joint Technical Committee on Information Technology (JTC 1) of the International Standards Organization (ISO) and the International Electrotechnical Commission (IEC) have approved in 2021 the ISO/IEC 21838: Top-Level Ontologies (TLO) standard. Part 1 of this standard sets forth the requirements for being a top-level ontology. Part 2 documents BFO in a way that demonstrates satisfaction of these requirements. The release of BFO-2020 includes a more careful treatment of definitions. All non-primitive terms have been provided with English language definitions (which means statements of individually necessary and jointly sufficient conditions). All primitive terms have been provided with elucidations, which means statements of necessary conditions together with specifications of examples of use. Additional improvements concern the logical formalization of BFO. Along with an OWL version of BFO-2020, the ISO standard provides also an axiomatization in common logic (BFO-2020-CL) and a translation thereof into FOL. A proof of consistency of BFO-2020-CL is provided, together with a proof that BFO-2020-OWL is derivable therefrom. English language definitions 31 Successive versions are available through http://ontology.buffalo.edu/bfo A further influence was lessons learned from work on a framework that would link the quantitative data studied in the new field of Geographic Information Science with qualitative data pertaining to the hills and valleys, and rivers and lakes, that form the subject matter of what we might call old geography ([113]; compare also [114]). 33 OBI is now an OBO Foundry ontology. It was created as a generalization of the Functional Genomics Investigation Ontology (FuGO) [116]. 32 Fig. 5.7 BFO-2020 is_a hierarchy from ISO/IEC 21838-2 (https://www.iso.org/standard/74572. html) and elucidations provided in the standard are formulated in such a way as to be as close as possible to BFO-2020-CL while at the same time serving as an access route to the content of BFO-2020 for human users. Conclusion Like the Gene Ontology, and like the Planteome ontologies, which are seen by their developers as “integrative tools for plant science” [118], the OBO Foundry as a whole is a tool for the unification of biology. Indeed, all the OBO Foundry ontologies continue in their way the project of the Vienna circle to achieve the unification of science. They do this, however, not from the starting point of logic and philosophy, but rather from the starting point of biology and ontology. And they do this more successfully, because their project of unification is deeply interwoven, through multiple different sorts of multidirectional interactions, with ongoing developments in biology and, increasingly, in clinical sciences. Acknowledgments Thanks go to Werner Ceusters, Janna Hastings, Patrick Hayes, Yongqun (Oliver) He, Jobst Landgrebe, Suzanna Lewis, Cornelius Rosse, Stephan Schulz, and Paul Thomas. Work on this chapter was supported by the NIH/NCATS 1UL1TR001412 Buffalo Clinical and Translational Research Center CTSA Award. References 1. Grene M. A portrait of Aristotle. London: Faber and Faber; 1963. 2. Lennox JG. Marjorie Grene, Aristotle’s philosophy of science and Aristotle’s biology. Proc Biennial Meeting Philos Sci Assoc. 1984;2:365–77. 3. Leroi MA. The lagoon: how Aristotle invented science. London: Penguin Books; 2014. 4. Sallam HN. Aristotle, godfather of evidence-based medicine. Facts, Views and Visions. 2010;2(1):11–9. 5. Lennox JG. Aristotle’s biology. In: Zalta EN, editor. The Stanford encyclopedia of philosophy. Stanford: Stanford University; 2021. 6. Feyerabend P. In defence of Aristotle: comments on the condition of content increase. In: Radnitzky G, Andersson G, editors. Progress and rationality. Dordrecht: Reidel; 1978. p. 143–80. 7. Brunczwik A. Tractatus in Aristotelis logicam. 1748. https://classic.europeana.eu/portal/en/ record/2048128/39246. Accessed 26 July 2021. 8. Barnes J, editor. Porphyry introduction. Oxford: Oxford University Press; 2006. 9. Jansen L. Aristotle’s categories. Topoi. 2007;26:153–8. 10. Casati R, Varzi AC. Holes and other superficialities. Cambridge, MA: MIT Press; 1994. 11. Botti Benevides A, Bourguet JR, Guizzardi G, Penaloza R, Almeida JP. Representing a reference foundational ontology of events in SROIQ. Appl Ontol. 2019;14(3):293–334. 12. Linnaeus C. Systema naturæ per regna tria naturæ, secundum classes, ordines, genera, species, cum characteribus, differentiis, synonymis, locis. Stockholm: Laurentii Salvii; 1758. 13. Linnaeus C. Genera morborum. Upsalla: Steinert; 1759. 14. Munsche H, Whitaker HA. Eighteenth century classification of mental illness: Linnaeus, de Sauvages, Vogel, and Cullen. Cogn Behav Neurol. 2012;25(4):224–39. 15. Egdahl A. Linnaeus’ Genera Morborum, and some of his other medical works. Medical Library Hist J. 1907;5(3):185–93. 16. BIPM. International Standard System of Units. 9th ed. France: Sèvres; 2019. 17. Johansson I. Quantities as metrical coordinative definitions and as counts: on some definitional structures in the new SI brochure. J Gen Philos Sci. 2021;2021:1–23. 18. Landgrebe J and Smith B. Mathematics and Physics Ontology. Draft manuscript; in preparation. 19. Rosse C. Terminologia Anatomica; considered from the perspective of next-generation knowledge sources. Clin Anat. 2001;14(2):120–33. 20. Quine WVO. On what there is. Rev Metaphys. 1948;2(5):21–38. 21. Neurath O, Carnap R, Morris C. Foundations of the unity of science: toward an International Encyclopedia of Unified Science, 2 volumes. Chicago: University of Chicago Press; 1938–1968. 22. Carnap R. Der logische Aufbau der Welt. Berlin: Weltkreis., Translated RA George as The logical structure of the world. Berkeley, CA: University of California Press; 1928. p. 1967. 23. Leitgeb H, Carus A. Rudolf Carnap, Supplement A. In: Zalta EN, editor. Aufbau, The Stanford encyclopedia of philosophy. Stanford: Stanford University; 2021. https://plato.stan- ford.edu/archives/sum2021/entries/carnap/aufbau.html/. 24. ISO/IEC 24707. Information Technology—Common Logic (CL): A Framework for a Family of Logic-Based Languages. Geneva: International Standards Organization; 2018. 25. Moore GH. The emergence of first-order logic. In: Kitcher P, Asprey W, editors. History and philosophy of modern mathematics, vol. 11. Minneapolis: University of Minnesota Press; 1988. p. 95–135. 26. Smith B, Ceusters W. Ontological realism as a methodology for coordinated evolution of scientific ontologies. Appl Ontol. 2010;5:139–88. 27. Smith B. Against fantology, in JC Marek & ME Reicher (eds). Experience and Analysis. Vienna: HPT&ÖBV. 2005;153–70. 28. Dahms HJ. Mission accomplished? Unified science and logical empiricism at the 1935 Paris Congress and afterwards. Philosophia Scientiæ Travaux d’histoire et de philosophie des sciences. 2018;22–23:289–305. 29. Haugeland J. Artificial intelligence, the very idea. Cambridge, MA: MIT Press; 1985. 30. McCarthy J. Circumscription – a form of non-monotonic reasoning. Artif Intell. 1980;5(13):27–39. 31. McCarthy J. Concepts of logical AI. In: Logic-based Artificial Intelligence. New York: Springer; 2000. p. 37–56. 32. Hayes PJ. Naive physics I: ontology for liquids. Working Papers, No. 35. Dalle Molle Institute; 1978. p. Geneva. 66pp 33. Hayes PJ. Early use of the word ‘ontology’ in AI (via John Sowa). 2013. http://ontolog.cim3. net/forum/ontolog-forum/2013-11/msg00016.html. Accessed 4 Jul 2021. 34. Hobbs JR, Moore RC, editors. Formal theories of the commonsense world, Ablex series in artificial intelligence. Cambridge, MA: Intellect Books; 1985. 35. Hayes PJ. The second naive physics manifesto. In: Hobbs R, Moore RC, editors. Formal theories of the common-sense world. Norwood, NJ: Abiex; 1985. p. 1–36. 36. Hayes PJ. Naïve physics I: Ontology for liquids. In: Hobbs R, Moore RC, editors. Formal theories of the common-sense world. Norwood, NJ: Abiex; 1985a. p. 71–108. 37. Marcus G, Davis E. Rebooting AI: building Artificial Intelligence we can trust. New York: Vintage; 2019. 38. Baader F, Horrocks I, Sattler U. Description logics. Foundations Artif Intell. 2008;3:135–79. 39. Ceusters W, Smith B. Ontology and medical terminology: why descriptions logics are not enough. In: Proceedings of the Conference Towards an Electronic Patient Record (TEPR 2003), San Antonio, 10–14 May 2003; 2003. (Electronic publication). 40. Schulz S, Stenzhorn H, Boeker M, Smith B. Strengths and limitations of formal ontologies in the biomedical domain. Electron J Commun Inf Innov Health. (Special Issue on Ontologies, Semantic Web and Health). 2009;3(1):31–45. https://doi.org/10.3395/reciis.v3i1.241en. 41. Noy N, McGuinness DL. Ontology development 101. Knowledge Systems Laboratory. Stanford: Stanford University; 2001. 42. Smith B, Welty C. Ontology: towards a new synthesis. In: Formal Ontology in information systems. New York: ACM Press; 2001. p. 3–9. 43. Gruber TR. A translation approach to portable ontology specifications. Knowl Acquis. 1993;5:199–220. 44. Guarino N, Poli R, editors. Proceedings of the International Workshop on Formal Ontology in Conceptual Analysis and Knowledge Representation. Int J Hum Comput Stud. 1995;43(5–6):623–965. 45. Guarino N. BFO and DOLCE: so far, so close…. Cosmos + Taxis. 2017;4(4):10–8. 46. Cimino JJ. Desiderata for controlled medical vocabularies in the twenty-first century. Methods Inf Med. 1998;37(4-5):394–403. 47. Carey S. The origin of concepts. Oxford: Oxford University Press; 2009. 48. Bauer S, Grossmann S, Vingron M, Robinson PN. Ontologizer 2.0—a multifunctional tool for GO term enrichment analysis and data exploration. Bioinformatics. 2008;24(14):1650–1. 49. Ceusters W. SNOMED CT’s RF2: is the future bright? Stud Health Technol Inform. 2011;169:829–33. 50. Ceusters W. The place of Referent Tracking in biomedical informatics. In: Terminology, ontology and their implementations. Switzerland: Springer Nature; 2022 (this volume). 51. Ceusters W, Smith B, Kumar A, Dhaen C. Mistakes in medical ontologies: where do they come from and how can they be detected? In: Pisanelli DM, editor. Ontologies in medicine. Proceedings of the Workshop on Medical Ontologies, Rome October 2003, Stud Health Technol Inform, vol. 102. Amsterdam: IOS Press; 2004. p. 145–64. 52. Ceusters W, Smith B, Kumar A, Dhaen C. Ontology-based error detection in SNOMED-CT®. MEDINFO. Amsterdam: IOS Press; 2004a. p. 482–6. 53. Bodenreider O, Smith B, Kumar A, Burgun A. Investigating subsumption in DL-based terminologies: a case study in SNOMED CT. Artif Intell Med. 2007;39:183–95. 54. Bona JP, Ceusters W. Mismatches between major subhierarchies and semantic tags in SNOMED CT. J Biomed Informatics. 2018;81:1–15. 55. Ceusters W, Mullin S. Expanding evolutionary terminology auditing with historic 56. 57. 58. 59. 60. 61. 62. 63. 64. 65. 66. 67. 68. 69. 70. 71. 72. 73. 74. 75. 76. formal and linguistic intensions: a case study in SNOMED CT. Stud Health Technol Inform. 2019;264:65–9. Smith B, Kusnierczyk W, Ceusters W. Towards a reference terminology for ontology research and development in the biomedical domain. In: Proceedings of KR-MED, CEUR, vol. 222; 2006. p. 57–65. Smith B. From concepts to clinical reality: an essay on the benchmarking of biomedical terminologies. J Biomed Informatics. 2006;39(3):288–98. Smith B. Ontology (science). In: Eschenbach C, Grüninger M, editors. Formal Ontology in information systems. Proceedings of the Fifth International Conference (FOIS 2008). Amsterdam: IOS Press; 2008. p. 21–35. Rudnicki R. An overview of the Common Core Ontologies. Buffalo: CUBRC; 2019. Ceusters W, Elkin P, Smith B. Negative findings in electronic health records and biomedical ontologies: a realist approach. Int J Med Informatics. 2007;76:S326–33. Ceusters W, Smith B. Aboutness: towards foundations for the Information Artifact Ontology. In: Proceedings of the Sixth International Conference on Biomedical Ontology (ICBO). CEUR 1515; 2015. p. 1–5. Bandrowski A, Brinkman R, Brochhausen M, Brush MH, Bug B, et al. The Ontology for Biomedical Investigations. PLoS One. 2016;11(4):e0154556. Gurcan MN, Tomaszewski J, Overton JA, Doyle S, Ruttenberg A, Smith B. Developing the Quantitative Histopathology Image Ontology (QHIO): a case study using the hot spot detection problem. J Biomed Informatics. 2017;66:129–35. Arp R, Smith B, Spear A. Building ontologies with Basic Formal Ontology. Cambridge, MA: MIT Press; 2015. Shaw M, Detwiler LT, Brinkley JF, Suciu D. Generating application ontologies from reference ontologies. In: Proceedings of AMIA Annual Symposium; 2008. p. 672–6. Schulz S, Steffel J, Polster P, Palchuk M, Daumke P. Aligning an Administrative Procedure Coding System with SNOMED CT. In: JOWO Joint Ontologies Workshops, 2019 (CEUR 2519); 2019. Schulz S, Balkanyi L, Cornet R, Bodenreider O. From concept representations to ontologies: a paradigm shift in health informatics? Healthcare Informatics Res. 2013;19(4):235–42. Schulz S, Martínez-Costa C. Harmonizing SNOMED CT with BioTopLite: an exercise in principled ontology alignment. In: MEDINFO 2015: eHealth-enabled Health 2. Amsterdam: IOS Press; 2015. p. 832–6. Rosse C, Mejino JV Jr. A reference ontology for bioinformatics: The Foundational Model of Anatomy. J Biomed Informatics. 2003;36:478–500. Haendel MA, Neuhaus F, Osumi-Sutherland D, Mabee PM, Mejino JL, Mungall CJ, Smith B. CARO – the Common Anatomy Reference Ontology. In: Burger A, et al., editors. Anatomy ontologies for bioinformatics: principles and practice. London: Springer; 2008. p. 327– 49. Mungall CJ, Torniai C, Gkoutos GV, Lewis SE, Haendel MA. Uberon, an integrative multispecies anatomy ontology. Genome Biol. 2012;13(1):1–20. Rosse C, Mejino JV Jr. The Foundational Model of Anatomy Ontology. In: Burger A, et al., editors. Anatomy ontologies for bioinformatics: principles and practice. New York: Springer; 2007. p. 59–117. Smith B, Ceusters W, Klagges B, Köhler J, Kumar A, Lomax J, Mungall C, Neuhaus F, Rector AL, Rosse C. Relations in biomedical ontologies. Genome Biol. 2005;6(5):1–5. Grewe N, Jansen L, Smith B. Permanent generic relatedness and silent change. In: Formal Ontology in information systems. Proceedings of the Ninth International Conference (FOIS 2016) Ontology Competition, (CEUR 1660); 2016. p. 1–5. ISO/IEC 21838. Information Technology—Top-Level Ontology (TLO), Part 1: Requirements, Part 2: Basic Formal Ontology. Geneva: International Standards Organization; 2021. Mejino JV Jr, Agoncillo AV, Rickard KL, Rosse C. Representing complexity in part-whole relationships within the Foundational Model of Anatomy. In: AMIA Annual Symposium Proceedings; 2003. p. 450–4. 77. Köhler J, Munn K, Rüegg A, Skusa A, Smith B. Quality control for terms and definitions in ontologies and taxonomies. BMC Bioinformatics. 2006;7(1):1–12. 78. Seppälä S, Ruttenberg A, Smith B. Guidelines for writing definitions in ontologies. Ciência da Informação. 2017;46(1):73–88. 79. Michael J, Mejino JV Jr, Rosse C. The role of definitions in biomedical concept representation. In: AMIA Annual Symposium Proceedings; 2001. p. 463–7. 80. Kumar A, Smith B, Novotny DD. Biomedical informatics and granularity. Compar Funct Genomics. 2004;5(6–7):501–8. 81. Lewis SE. Gene Ontology: looking backwards and forwards. Genome Biol. 2004;6:103. 82. Ashburner M. On the representation of “gene function” in databases. Discussion paper for ISMB, Montreal, 1998. Version 1.2 June 19 1998. 1998. http://biomirror.aarnet.edu.au/ biomirror/geneontology/docs/gene_ontology_discussion.html 83. Stevens H. Life out of sequence. Chicago: University of Chicago Press; 2013. 84. UniProt Consortium. The Universal Protein Resource (UniProt). Nucleic Acids Res. 2008;36. (Database issue):D190–5. 85. Camon E, Magrane M, Barrell D. The Gene Ontology Annotation (GOA) Database: sharing knowledge in Uniprot with Gene Ontology. Nucleic Acids Res. 2004;32(Database issue):D262–6. 86. Guarino N, Oberle D, Staab S. What is an ontology? In: Handbook on ontologies. Berlin: Springer; 2009. p. 1–17. 87. Brenner S. Life sentences: ontology recapitulates philology. Genome Biol. 2002;3(4):1006.1–2. https://doi.org/10.1186/gb-2002-3-4-comment1006. 88. Landgrebe J, Smith B. Making AI meaningful again. Synthese. 2021;198(3):2061–81. 89. Reijnders MJMF, Waterhouse RM. Summary visualizations of Gene Ontology terms with GO-Figure! Front Bioinformatics. 2021; https://doi.org/10.3389/fbinf.2021.638255. 90. Rhee SY, Wood V, Dolinski K, Draghici S. Use and misuse of the Gene Ontology annotations. Nat Rev Genet. 2008;9(7):509–15. 91. Li X et al. Pop’s Pipes: poplar gene expression data analysis pipelines. Tree genetics & genomes. 2014;10(4):1093–101. 92. Ashburner M, Ball CA, Blake JA, et al. Gene Ontology: tool for the unification of biology. Nat Genet. 2000;25(1):25–9. 93. Diehl AD, Lee JA, Scheuermann RH, Blake JA. Ontology development for biological systems: immunology. Bioinformatics. 2007;23(7):913–5. 94. Thomas PD. The Gene Ontology and the meaning of biological function. In: The Gene Ontology handbook. New York: Humana; 2017. p. 15–24. 95. Spear AD, Ceusters W, Smith B. Functions in Basic Formal Ontology. Appl Ontol. 2016;11(2):103–28. 96. Millikan RG. In defense of proper functions. Philos Sci. 1989;56:288–302. 97. Monod J. Chance and necessity. New York: Alfred Knopf; 1971. 98. Thomas PD, Hill DP, et al. Gene Ontology Causal Activity Modeling (GO-CAM) moves beyond GO annotations to structured descriptions of biological functions and systems. Nat Genet. 2019;51(10):1429–33. 99. Ashburner M, Lewis SE. On ontologies for biologists: The Gene Ontology – untangling the web. In: Bock GR, Goode JA, editors. “In Silico” simulation of biological processes. New York: Wiley; 2003. 100. Ashburner M, Lewis SE. Principles of biomedical ontology construction, Tutorial. Detroit, MI: Intelligent Systems for Molecular Biology (ISMB); 2005. http://bit.ly/2GUkpoh/. 101. Wilkinson MD, et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data. 2016;3(1):1–9. 102. Smith B, Köhler J, Kumar A. On the application of formal principles to life science data: A case study in the Gene Ontology. In Erhard Rahm (ed) Data Integration in the Life Sciences, First International Workshop, DILS 2004, Leipzig, Germany, March 25–26, 2004, (Lecture Notes in Computer Science 2994), Springer, 2004;79–94. 103. Smith B, Ashburner M, Rosse C, Bard J, Bug W, Ceusters W, Goldberg LJ, Eilbeck K, Ireland A, Mungall CJ, Leontis N, et al. The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration. Nat Biotechnol. 2007;25(11):1251–5. 104. Chen C, et al. Protein ontology on the semantic web for knowledge discovery. Sci Data. 2020;7:337. https://doi.org/10.1038/s41597-020-00679-9. 105. Hastings J, et al. The ChEBI reference database and ontology for biologically relevant chemistry: enhancements for 2013. Nucleic Acids Res. 2012;41(D1):D456–63. 106. Cooper L, et al. The Planteome database: an integrated resource for reference ontologies, plant genomics and phenomics. Nucleic Acids Res. 2018;46(D1):D1168–80. 107. Ong E, Xiang Z, Zhao B, Liu Y, Lin Y, Zheng J, Mungall C, Courtot M, Ruttenberg A, He Y. Ontobee: a linked ontology data server to support ontology term dereferencing, linkage, query and integration. Nucleic Acids Res. 2017;45(D1):D347–52. 108. Rosse C, Kumar A, et al. A strategy for improving and integrating biomedical ontologies. In: AMIA Annual Symposium Proceedings; 2005. p. 639–43. 109. Mungall CJ, Bada M, Berardini TZ, Deegan J, Ireland A, Harris MA, Hill DP, Lomax J. Crossproduct extensions of the Gene Ontology. J Biomed Informatics. 2011;44(1):80–6. 110. Hill DP, Adams N, Bada M, Batchelor C, Berardini TZ, Dietze H, Drabkin HJ, Ennis M, Foulger RE, Harris MA, Hastings J. Dovetailing biology and chemistry: integrating the Gene Ontology with the ChEBI chemical ontology. BMC Genomics. 2013;14(1):1. 111. Jackson RC, Matentzoglu N, Overton JA, Vita R, Balhoff JP, Buttigieg PL, Carbon S, Courtot M, Diehl AD, Dooley D, Duncan W, et al. OBO Foundry in 2021: operationalizing open data principles to evaluate ontologies. bioRxiv. 2021; https://doi.org/10.1101/2021.06.01.446587. 112. Masolo C, Borgo S, Gangemi A, Guarino N, Oltramari A. WonderWeb Deliverable D18: Ontology Library. 2004. http://wonderweb.semanticweb.org/deliverables/documents/ D18.pdf. 113. Mark DM, Smith B. A science of topography: bridging the qualitative-quantitative divide. In: Geographic information science and mountain geomorphology. Chichester, England: Springer-Praxis; 2004. p. 75–100. 114. Dolan ME, Holden CC, Beard MK, Bult CJ. Genomes as geography: using GIS technology to build interactive genome feature maps. BMC Bioinformatics. 2006;7(1):1–8. 115. Vita R, Zheng J, Jackson R, Dooley D, Overton JA, Miller MA, Berrios DC, Scheuermann RH, He Y, McGinty HK, Brochhausen M. Standardization of assay representation in the Ontology for Biomedical Investigations. Database. 2021; https://doi.org/10.1093/database/ baab040. 116. Whetzel PL, et al. Development of FuGO: an ontology for functional genomics investigations. OMICS. 2006;10(2):199–204. https://doi.org/10.1089/omi.2006.10.199. 117. Haller A, Polleres A. Are we better off with just one ontology on the Web? Semantic Web. 2020;11(1):87–99. 118. Walls RL, et al. Ontologies as integrative tools for plant science.Am J Bot. 2012;99(8):1263– 75. https://doi.org/10.3732/ajb.1200222.