Ontologies: an High-Level Discussion

An ontology is a representational artifact, comprising a taxonomy as proper part, whose representations are intended to designate some combination of universals, defined classes, and certain relations between them.

This definition employs a number of terms that are themselves in need of defining. Understanding these terms and the rationale behind their inclusion in the definition will take us a long way toward understanding what an ontology is.

The first term, taxonomy, we can define as follows (where here and in all that follows universal and type are used as synonyms): A taxonomy is a hierarchy consisting of terms denoting types (or universals or classes) linked by subtype relations.

The most familiar kinds of taxonomies are those we find in biology (taxonomies of organisms into genera and species). But taxonomies can be found also in any domain where it is possible to group things together into types or universals based on common features. We will discuss taxonomies in greater detail further on.

By hierarchy we mean a graph-theoretic structure consisting of nodes and edges with a single top-most node (the root) connected to all other nodes through unique branches (thus all nodes beneath the root have exactly one parent node).

By types or universals we mean the entities in the world referred to by the nodes in a hierarchy. As we will regularly use the term entity in a broad and generic sense, we here provisionally define it as follows: An entity is anything that exists, including objects, processes, and qualities. Entity thus comprehends also representations, models, images, beliefs, utterances, documents, observations, and so on.

Ontologies Are Representational Artifacts

Ontologies represent (or seek to represent) reality, and they do so in such a way that many different persons can understand the terms they contain and so learn about the entities in reality that these terms represent. Ontologies in the sense that is important to us here are designed to support the development, testing, and application of scientific theories, and so they will to a large degree be about the same sorts of entities as are represented by the general terms in scientific textbooks. Ontologies consist of terms arranged together in a certain way, and terms are an important subtype of representations:

A representation is an entity (for example, a term, an idea, an image, a label, a description, an essay) that refers to some other entity or entities.

When John remembers the Tower Bridge in London, then there is a representation in his mind that is about or refers to an entity other than itself, namely a certain bridge over the River Thames. Similarly, when Sally looks through a microscope at bacteria arrayed on a glass slide, then there are thoughts running through her mind to the effect that these are E. coli that I am seeing. These thoughts involve representations that point beyond themselves and make reference to certain entities on the side of reality, in this case bacteria on the slide. It is one of the most basic features of human thought that beliefs, desires, and experiences in general point beyond themselves to certain entities that they are about. Note that a representation (for example, your memory of your grandmother) can be of or about a given entity even though it leaves out many aspects of its target. Note, too, that a representation may be vague or ambiguous, and it may rest on error.

An artifact is something that is deliberately designed (or, in certain borderline cases, selected) by human beings to address a particular purpose.

Artifact comes from the Latin ars, meaning human skill or product. Artifacts include such things as knives, clothing, paperweights, automobiles, and hard drives. All artifacts are public entities in the sense that they can at least in principle be available to and used by multiple individuals in a community.

A representational artifact is an artifact whose purpose is one of representation.

Thus a representational artifact is an artifact that has been designed and made to be about something (some portion of reality) and using some public form or format. Representational artifacts include things such as signs, books, diagrams, drawings, maps, and databases.

A key feature of representational artifacts of the sorts important to us here is that they come with rules for their interpretation. Maps do not come merely color coded, they also come with a legend or table that makes it possible to interpret their color coding as representing certain kinds of entities (countries, oceans, mountain ranges, etc.). Such legends have many of the features of ontologies, including the feature of supporting information integration; for example, maps that use a common legend can be more easily compared and combined.

A simple kind of representational artifact would be a drawing made by Sally of Tower Bridge based on her memory of how it looked when she visited London some years earlier. Sally's memory, and the images in her mind, are cognitive representations. Her drawing, in contrast, is a representational artifact that exists independently of such cognitive representations and transforms them into something that is publicly observable and inspectable. Just as Sally's memory of Tower Bridge can be better or worse, more or less accurate, so also the representational artifact that she creates on the basis of this memory can be better or worse, and more or less accurate as a representation of the entity to which it is intended to refer.

An ontology is an artifact, since it is something that has been deliberately produced or constructed by human beings to achieve certain purposes, and there is a sense in which-by analogy to Sally's drawing-it serves to make public mental representations on the part of its human creators. While not all representational artifacts are ontologies, all ontologies are representational artifacts, and thus everything that holds of representational artifacts in general holds also of ontologies.

Representational units and composite representations are very common types of representations-encompassing practically the whole world of documents, which use written or printed language to represent things in the world. For example, the composite representation John is drinking a glass of water, asserted by someone who is watching John, picks out a process in the world. The representational units in this composite representation include John and glass; these are the smallest referring bits of language contained within the sentence (J, w, and so on do not refer to or represent anything). Other examples of representational units include icons, names, simple word forms, or the sorts of alphanumeric identifiers we might find in patient records or automobile parts catalogs. Thus, a representational unit is a representation no proper part of which is a representational unit.

A composite representation is a representation built out of constituent subrepresentations as its parts, in the way in which paragraphs are built out of sentences and sentences out of words.

Note that many images are not composite representations in the sense here defined, since they are not built out of smallest representational units in the way in which molecules are built out of atoms. (Pixels are not representational units since they are not representations.) Maps are typically built out of parts that include both representational units (for example, names of towns or hills) and image-like elements (for instance, shading used to represent inclines).

A Note on Term

In the following pages we will often make use of the word term to refer to the singular nouns and singular noun phrases that form the representational units and composite representations in an ontology. The terms in an ontology are the linguistic expressions used in the ontology to represent the world, and drawn as nearly as possible from the standard terminologies used by human experts in the corresponding discipline. (Thus terms are distinct from identifiers of the sorts used in programming languages or from the alphanumeric IDs used in serial numbers or on credit cards.) Examples of terms in our sense include: aorta, blood pressure, smoking behaviour, patient etc.

Terms in this sense can refer to single entities, collections of entities, or types of entities.

The question of what terms an ontology should include is determined (a) by the selected scope of the ontology (which is determined in turn by the purpose the ontology is designed to address), (b) by the available resources for population of the ontology, (c) by the structure of the domain that the ontology is intended to represent, and (d) by consensus among scientific experts about what the relevant entities are in that domain and about what they are to be called.

Ontology, Terminology, Conceptology

In our approach to ontology we assume that it is uncontroversial that ontologies should be understood as a kind of representational artifact, and that the entities represented are entities in reality-such as cells, molecules, organisms, planets, and so forth. Some ontologies contain terms which do not refer to any entities at all because-unknown to the developers-some type of error has been made. But even in those cases the terms in question are included in the ontology with the intention that they should refer. (Something like this was true, in former times, in the case of terms such as phlogiston and ether.)

The relation between term and referent is to be understood by analogy with the relation of external directedness that is involved, for instance, when we assert that Oxford refers to Oxford, or Ronald Reagan refers to Ronald Reagan. This is true even where, as in ontologies such as the Mental Functioning Ontology (MFO), terms refer to entities-for example, mental processes-that are internal to the mind or brain of human beings. Terms such as mental process too, as they appear in ontologies, are intended to refer to portions of reality in just the same sense as do terms referring to physical entities such as molecules or planets.

Confusion arises here in virtue of the fact that, in addition to the relation of reference or aboutness between terms in MFO and their mental targets in reality, there is another sort of relation between language and mind, which we can call the relation from term to concept. This latter relation holds in virtue of the fact that, when people use terms, they may associate these terms with mental representations-sometimes called concepts-of various sorts.

Concepts

ISO Standard 1087-1, for example, sees terms as denotations of concepts, defining concept as follows: A unit of thought constituted through abstraction on the basis of characteristics common to a set of objects.

The background to this definition is a view of concept acquisition rooted in the phenomenalist ideas of the Vienna Circle. Concepts are acquired, on this view, in virtue of the fact that, as we sense objects in our surroundings, we detect certain similarities-for instance between one horse and another, or between one red thing and another. We then learn to conceive the characteristics responsible for such similarities in abstraction from the objects that possess them.

Concepts are then formed through combination of such characteristics. Characteristics can be combined into concepts in many ways (for instance: {red, spherical{, {diseased, female, nonsmoker}, {with tomato sauce, with mozzarella, with pepperoni}), and for each such combination of characteristics there is, in principle at least, a corresponding concept. Equivalence between terms in different languages is a matter of correspondence between the corresponding bundles of characteristics. The terms are equivalent, according to ISO, if and only if they denote one and the same concept.

What ISO leaves out of account-and what is left out of account by the ontologists who have been inspired by ISO-is the question of how we gain access to such concepts, entities that are alleged to exist at some language-independent level. Note, too, that ISO's own approach to standardization does not consistently follow an approach on the level of concepts of this sort. ISO Standard 3166-1, for example, defines a widely used set of codes for identifying countries and related entities. Currently ISO 3166-1 assigns official two-letter codes to 249 countries, dependent territories, and areas of geographical interest. The code assigned to France, for example, is ISO 31662:FR. And the code is assigned to France itself-to the country that is otherwise referred to as Frankreich or Ranska. It is not assigned to the concept of France (whatever that might be).

The Concept Orientation

We do not deny that mental representations have a role to play in the world of ontologies. When, for example, human biocurators use an ontology to tag data or literature or museum catalogs, then they will have certain thoughts or images in their minds. And if concept is used to refer to their understanding of the meanings of the terms they are using, then they can also be said to have concepts in their minds. Doctors, similarly, can be said to have concepts in their minds when diagnosing patients. Indeed, when a doctor misdiagnoses a patient then it is tempting to say that there was only the concept in his mind-and that there was nothing on the side of the patient to which this concept would correspond.

For this and other reasons, including the influence of ISO, the view of ontologies as representations of concepts has predominated especially in the field of medical or health informatics. More recently, however, this concept orientation has been challenged by the realist orientation that is defended here. The goal of ontology for the realist is not to describe the concepts in people's heads. Rather, ontology is an instrument of science, and the ontologist, like the scientist, is interested in terms or labels or codes-all of which are seen as linguistic entities-only insofar as they represent entities in reality. The goal of ontology is to describe and adequately represent those structures of reality that correspond to the general terms used by scientists.