RDF: Resource Description Framework

The RDF data model is similar to classical conceptual modeling approaches (such as entity–relationship or class diagrams). It is based on the idea of making statements about resources (in particular web resources) in expressions of the form subject–predicate–object, known as triples. The subject denotes the resource; the predicate denotes traits or aspects of the resource, and expresses a relationship between the subject and the object.

For example, one way to represent the notion The sky has the color blue in RDF is as the triple: a subject denoting the sky, a predicate denoting has the color, and an object denoting blue. Therefore, RDF uses subject instead of object (or entity) in contrast to the typical approach of an entity–attribute–value model in object-oriented design: entity (sky), attribute (color), and value (blue).

RDF is an abstract model with several serialization formats (being essentially specialized notations). In addition the particular encoding for resources or triples can vary from format to format.

A collection of RDF statements intrinsically represents a labeled (predicates), directed multigraph. This makes an RDF data model better suited to certain kinds of knowledge representation than other relational or ontological models.

Getting Started with RDF Data

The Resource Description Framework (RDF) is the base language of the Semantic Web. It's a language used for describing data, metadata, and even other data languages. RDF uses a graph data format, in contrast to relational data formats (such as most databases) and hierarchical data formats (such as XML). Any data model or data language that uses RDF is a part of the Semantic Web.

The RDF graph is based on the idea that every data item should have a unique Web identifier, called a URI (Uniform Resource Identifier), and that every data item can be connected to every other item. A URI is different from a URL (Uniform Resource Locator) in that a URI may refer to either a Web name or a location; a URL may refer only to actual Web locations. RDF makes URI relationships between data items the central attribute of the over-all data model. Semantic Web programmers create data with URIs and link them together using relationships that are also named with URIs. In this way, an interconnected set of data may be distributed at global scale across the Internet.

Making a statement (Or two!)

In the listing below, you can see a basic RDF structure (in RDF/XML). Like anybody who is learning a new programming language, you may find it difficult to understand all the syntax at first, but don't be intimidated. After you get past some of the initial syntax questions, you'll see that RDF can be pretty easy.

<?xml version=”1.0”?>
<rdf:RDF
  xmlns:rdf=”http://www.w3.org/1999/02/22-rdf-syntax-ns#”
  xmlns:dc=”http://purl.org/dc/elements/1.1/”>

  <rdf:Description rdf:about=”http://me.jtpollock.us/”>
    <dc:title>Jeff’s Homepage!</dc:title>
  </rdf:Description>
  <rdf:Description rdf:about=”http://me.jtpollock.us/”>
    <dc:creator
      rdf:resource=”http://me.jtpollock.us/foaf.rdf#me”/>
    </rdf:Description>
</rdf:RDF>

When you load the data structure in the listing above into any RDF-capable system, you get two new RDF data items. Each item is saying something about my relationship to a Web page. These RDF data items are usually called triples, or statements. (You can use the terms interchangeably.)

The first RDF statement is

<rdf:Description rdf:about=”http://me.jtpollock.us/”>
  <dc:title>Jeff’s Homepage!</dc:title>
</rdf:Description>

It says that there is a Web page at the address http://me.jtpollock.us/, the title of which is Jeff's Homepage!

The second RDF statement is similar:

<rdf:Description rdf:about=”http://me.jtpollock.us/”>
  <dc:creator
    rdf:resource=”http://me.jtpollock.us/foaf.rdf#me”/>
</rdf:Description>

This statement says that there is a Web page at http://me.jtpollock. us/ whose creator is http://me.jtpollock.us/foaf.rdf#me. With RDF, every part of a statement may be a URI that points to another location. In this statement, the creator data simply points to another RDF resource that has a collection of data about me.

When taken together, these RDF triples provide two individual statements about the relationship between a particular Web page and some other data resources on the Web. The first triple simply names the page with a title, and the second triple identifies the creator by pointing to another set of data that describes me.

You can try validating this RDF yourself. Validating RDF is a lot like validating XML, HTML, or any other programming language: The validator simply checks to see whether there are any issues with your code. Navigate your Web browser to the W3C (World Wide Web Consortium) RDF validation service at www.w3.org/RDF/Validator. At this site, type in the code in the Listing, choose the Triples and Graph setting from the Web page, and click the Parse RDF button. You should see results that look like... surprise! You'll see how the W3C Validation Service can take the example code you've copied, identify the two individual triples, and build a simple picture of how they're related in a graph.

Behold: A federated data graph

The simple set of two RDF triples in the listing is actually a somewhat sophisticated data graph. The syntax of the RDF example uses several keywords to reference other parts of the example and also other parts of the Web. RDF statements that reference data vocabularies hosted in other parts of the Web are called federated graphs. Take, for example, the following statements:

xmlns:rdf=”http://www.w3.org/1999/02/22-rdf-syntax-ns#”
xmlns:dc=”http://purl.org/dc/elements/1.1/”

These two lines of code declare prefix variables that may be used elsewhere in the example. For instance, the xmlns prefix stands for XML Namespace, and you can use this keyword to create short-hand variables throughout your RDF documents. Throughout this chapter, I refer to it as simply a namespace.

In the example, you can see that the keyword prefix rdf is made equal to the URI http://www.w3.org/1999/02/22-rdf-syntax-ns#. The keyword dc is declared equal to the URI http://purl.org/dc/elements/1.1/.

With the RDF example shown in the listing, you can see tags that contain the two keyword prefixes dc and rdf. Whenever you see the shorthand prefix, you know that the computer will replace the shorthand with a fully qualified statement. For example, the tag <dc:creator> will be interpreted as <http://purl.org/dc/elements/1.1/creator>. This method is how you link data and data semantics across the Web.

The word creator means something to a human, but it doesn't mean anything to an XML parser. Typically, without the Semantic Web, a programmer would have to encode specific matching logic in a software program to interpret and react to the word creator when it appears in data. But in the Semantic Web, words can be defined as part of a vocabulary, providing context, definitions, and a model for interpreting the meaning of those words.

In the example, the word creator is defined to be a part of the XML Namespace http://purl.org/dc/elements/1.1/. This URI is a directory service that points to a vocabulary about publishing provided by the Dublin Core initiative.

Thus, the definition of the word creator is provided by the Dublin Core vocabulary.

...

Reification

It is not unusual for someone who is building a model in RDF for the first time to feel a bit limited by the simple subject/predicate/object form of the RDF triple. They don't want to just say that Shakespeare wrote Hamlet, but they want to qualify this statement and say that Shakespeare wrote Hamlet in 1604 or that Wikipedia states that Shakespeare wrote Hamlet in 1604. In general, these are cases in which it is, or at least seems, desirable to make a statement about another statement. This process is called reification. Reification is not a problem specific to Semantic Web modeling; the same issue arises in other data modeling contexts like relational databases and object systems. In fact, one approach to reification in the Semantic Web is to simply borrow the standard solution that is commonly used in relational database schemas, using the conventional mapping from relational tables to RDF. In a relational database table, it is possible to simply create a table with more columns to add additional information about a triple. So the statement Shakespeare wrote Hamlet is expressed in a single row of a table, where there is a column for the author of a work and another column for its title. Any further information about this event is done with another column. When this is converted to RDF, the row is represented by a number of triples, one triple per column in the database. The subject of all of these triples is the same: a single resource that corresponds to the row in the table.

This approach works well for examples like Shakespeare wrote Hamlet in 1601, in which we want to express more information about some event or statement. It doesn't work so well in cases like Wikipedia says Shakespeare wrote Hamlet, in which we are expressing information about the statement itself, Shakespeare wrote Hamlet. This kind of metadata about statements often takes the form of provenance (information about the source of a statement, as in this example), likelihood (expressed in some quantitative form like probability, such as It is 90 percent probable that Shakespeare wrote Hamlet), context (specific information about a project setting in which a statement holds, such as Kenneth Branagh played Hamlet in the movie), or time frame (Hamlet plays on Broadway January 11 through March 12). In such cases, it is useful to explicitly make a statement about a statement. This process, called explicit reification, is supported by the W3C RDF standard with three resources called rdf:subject, rdf:predicate, and rdf:object.

Let's take the example of Wikipedia says Shakespeare wrote Hamlet. Using the RDF standard, we can refer to a triple (q:n1) as follows:

{q:n1, rdf:subject, lit:Shakespeare}
{q:n1, rdf:predicate, lit:wrote}
{q:n1, rdf:object, lit:Hamlet}

Then we can express the relation of Wikipedia to this statement as follows:

{web:Wikipedia, m:says, q:n1}

Notice that just because we have asserted the reification triples about q:n1, it is not necessarily the case that we have also asserted the triple itself:

{lit:Shakespeare, lit:wrote, lit:Hamlet}

This is as it should be; after all, if an application does not trust information from Wikipedia, then it should not behave as though that triple has been asserted. An application that does trust Wikipedia will want to behave as though it had.

Naming RDF Graphs

Informally, a graph is a diagram with nodes and edges. In RDF, this corresponds directly to a set of triples.

When we manage data sets, we might just refer to all the triples in our data, as we have done with all the examples in this chapter so far. For most situations, this is fine. But we might want to single out a set of triples (i.e., a graph) and give that a name. Since this is the Web, that name will be in the form of a URI. The RDF standard provides a means for doing this—it is called the named graph.

The idea of a named graph is quite simple; we refer to a set of triples with a name, which itself is a URI.

Why would we want to name a graph? There are a few basic use cases:

One file, one graph: So far, we have seen examples of how we can extract RDF data from spreadsheets. We can extract RDF data from other sources as well, and indeed, we can create data natively as RDF. Soon enough, we'll see how to write down RDF data into a plain text file. When we load this data into an RDF data store, we might want to keep data from different sources separate. A convenient way to do this is to put all the data from one source into a single named graph. The name of the graph (as a URI) can even give information as to where we can find that source.; We have seen the need for higherorder relationships, in which we want to make statements about statements. Named graphs provide another way to accomplish this. We put a set of triples about which we want to make some statement into a named graph, and make the statement about that graph.
Context: Sometimes when we have a set of triples, we would like to consider them in some context; for example, earlier we considered the fact Kenneth Brannagh played Hamlet in the movie. In this example, in the movie (where by the movie we are referring specifically to https://www.imdb.com/title/ tt0116477/) represents a context for the assertion Kenneth Brannagh played Hamlet.

Named graphs are a simple extension to the RDF formalism, and really don't change any of the basics; RDF still links one named resource to another, where each name is global in scope (i.e., on the Web). Named graphs simply allow us to manage sets of these links, and to name them as well. Sometimes when we are using named graphs, we refer to quads instead of triples; this is because it is possible to represent a triple and its graph as a fourtuple. The name of the fourth entry in the quad is usually called the graph, but is sometimes referred to as the context, anticipating a particular use for the named graph.

Vocabulary

The vocabulary discussed in the following four subsections is used as a foundation for RDF Schema, where it is extended.

RDF Classes

rdf:XMLLiteral: the class of XML literal values
rdf:Property: the class of properties
rdf:Statement: the class of RDF statements
rdf:Alt, rdf:Bag, rdf:Seq: containers of alternatives, unordered containers, and ordered containers (rdfs:Container is a super-class of the three)
rdf:List: the class of RDF Lists
rdf:nil: an instance of rdf:List representing the empty list

RDFS Classes

rdfs:Resource: the class resource, everything
rdfs:Literal: the class of literal values, e.g. strings and integers
rdfs:Class: the class of classes
rdfs:Datatype: the class of RDF datatypes
rdfs:Container: the class of RDF containers
rdfs:ContainerMembershipProperty: the class of container membership properties, rdf:_1, rdf:_2, ..., all of which are sub-properties of rdfs:member

RDF Properties

rdf:type: an instance of rdf:Property used to state that a resource is an instance of a class
rdf:first: the first item in the subject RDF list
rdf:rest: the rest of the subject RDF list after rdf:first
rdf:value: idiomatic property used for structured values
rdf:subject: the subject of the RDF statement
rdf:predicate: the predicate of the RDF statement
rdf:object: the object of the RDF statement

Note rdf:Statement, rdf:subject, rdf:predicate, rdf:object are used for reification.

RDFS Properties

rdfs:subClassOf: the subject is a subclass of a class
rdfs:subPropertyOf: the subject is a subproperty of a property
rdfs:domain: a domain of the subject property
rdfs:range: a range of the subject property
rdfs:label: a human-readable name for the subject
rdfs:comment: a description of the subject resource
rdfs:member: a member of the subject resource
rdfs:seeAlso: further information about the subject resource
rdfs:isDefinedBy: the definition of the subject resource

RDF Notations aka Serialization Formats

Several common formats are in use, including:

Turtle, a compact, human-friendly format.
TriG, an extension of Turtle to datasets.
N-Triples, a very simple, easy-to-parse, line-based format that is not as compact as Turtle.
N-Quads, a superset of N-Triples, for serializing multiple RDF graphs.
JSON-LD, a JSON-based serialization.
N3 or Notation3, a non-standard serialization that is very similar to Turtle, but has some additional features, such as the ability to define inference rules.
RDF/XML, an XML-based syntax that was the first standard format for serializing RDF.
RDF/JSON, an alternative syntax for expressing RDF triples using a simple JSON notation.

An XML Notation for RDF: `RDF/XML`

RDF/XML is sometimes misleadingly called simply RDF because it was introduced among the other W3C specifications defining RDF and it was historically the first W3C standard RDF serialization format. However, it is important to distinguish the RDF/XML format from the abstract RDF model itself. Although the RDF/XML format is still in use, other RDF serializations are now preferred by many RDF users, both because they are more human-friendly, and because some RDF graphs are not representable in RDF/XML due to restrictions on the syntax of XML QNames.

Resource Identification

The subject of an RDF statement is either a uniform resource identifier (URI) or a blank node, both of which denote resources. Resources indicated by blank nodes are called anonymous resources. They are not directly identifiable from the RDF statement. The predicate is a URI which also indicates a resource, representing a relationship. The object is a URI, blank node or a Unicode string literal. As of RDF 1.1 resources are identified by Internationalized Resource Identifiers (IRIs); IRI are a generalization of URI.

In Semantic Web applications, and in relatively popular applications of RDF like RSS and FOAF (Friend of a Friend), resources tend to be represented by URIs that intentionally denote, and can be used to access, actual data on the World Wide Web. But RDF, in general, is not limited to the description of Internet-based resources. In fact, the URI that names a resource does not have to be dereferenceable at all. For example, a URI that begins with "http:" and is used as the subject of an RDF statement does not necessarily have to represent a resource that is accessible via HTTP, nor does it need to represent a tangible, network-accessible resource—such a URI could represent absolutely anything. However, there is broad agreement that a bare URI (without a # symbol) which returns a 300-level coded response when used in an HTTP GET request should be treated as denoting the internet resource that it succeeds in accessing.

Therefore, producers and consumers of RDF statements must agree on the semantics of resource identifiers. Such agreement is not inherent to RDF itself, although there are some controlled vocabularies in common use, such as Dublin Core Metadata, which is partially mapped to a URI space for use in RDF. The intent of publishing RDF-based ontologies on the Web is often to establish, or circumscribe, the intended meanings of the resource identifiers used to express data in RDF. For example, the URI:

http://www.w3.org/TR/2004/REC-owl-guide-20040210/wine#Merlot

is intended by its owners to refer to the class of all Merlot red wines by vintner (i.e., instances of the above URI each represent the class of all wine produced by a single vintner), a definition which is expressed by the OWL ontology—itself an RDF document—in which it occurs. Without careful analysis of the definition, one might erroneously conclude that an instance of the above URI was something physical, instead of a type of wine.

Note that this is not a bare resource identifier, but is rather a URI reference, containing the '#' character and ending with a fragment identifier.

Blank Nodes

In RDF, a blank node (also called bnode) is a node in an RDF graph representing a resource for which a URI or literal is not given, while still letting them express things about that resource within a knowledge graph. The resource represented by a blank node is also called an anonymous resource.

According to the RDF standard a blank node can only be used as subject or object of an RDF triple.

For instance, we might know that John’s mother was Spanish and a skilled cook, but nothing else.

We can represent this information in RDF as follows:

:john :hasMother [ :hasNationality "Spanish" ;
     :isSkilledAt :cooking ] .

This can also be expressed as:

:john :hasMother _:_john_s_mother .

_:_john_s_mother :hasNationality "Spanish" ;
     :isSkilledAt :cooking ] .

We will then be able to query all the people who have a Spanish mother like this:

SELECT ?person ?mother
WHERE {
?person :hasMother ?mother .
?mother :hasNationality "Spanish" .
}

In John’s case however, the binding for the ?mother variable that will be returned will not be tell us much, as we didn’t know much to begin with. So some triplestores may store _:_john_s_mother as something different, e.g. _:_bnode_123, and they would still be meeting the W3C standard.

Keep in mind that, since blank nodes are not resources, you cannot query them directly, so:

SELECT ?p ?o
WHERE {
_:_john_s_mother ?p ?o
}

Is actually treated exactly like:

SELECT ?p ?o
WHERE {
?s ?p ?o
}

When a blank node occurs as the subject of a triple, this is the syntax. Say, a stranger has stolen a car, is blond and is 175 cm tall.

[] a :Person ;
  :hasCommittedCrime :theft_car_123 ;
  :hasHairColour "Blond" ;
  :hasHeight "175cm".

The body of knowledge modeled by a collection of statements may be subjected to reification, in which each statement (that is each triple subject-predicate-object altogether) is assigned a URI and treated as a resource about which additional statements can be made, as in Jane says that John is the author of document X. Reification is sometimes important in order to deduce a level of confidence or degree of usefulness for each statement.

In a reified RDF database, each original statement, being a resource, itself, most likely has at least three additional statements made about it: one to assert that its subject is some resource, one to assert that its predicate is some resource, and one to assert that its object is some resource or literal. More statements about the original statement may also exist, depending on the application's needs.

Borrowing from concepts available in logic, some RDF model implementations acknowledge that it is sometimes useful to group statements according to different criteria, called situations, contexts, or scopes, as discussed in articles by RDF specification co-editor Graham Klyne. For example, a statement can be associated with a context, named by a URI, in order to assert an is true in relationship. As another example, it is sometimes convenient to group statements by their source, which can be identified by a URI, such as the URI of a particular RDF/XML document. Then, when updates are made to the source, corresponding statements can be changed in the model, as well.

Implementation of scopes does not necessarily require fully reified statements. Some implementations allow a single scope identifier to be associated with a statement that has not been assigned a URI, itself. Likewise named graphs in which a set of triples is named by a URI can represent context without the need to reify the triples.

Constraints

In knowledge graphs, and databases more generally, a constraint ensures that data conforms to a certain structure, called a shape.

For instance, when storing data about persons, there might be some fields which are always needed (social security number (SSN), date of birth, etc.), and others that may or may not apply (their spouse’s social security number, their children’s date of birth, etc—they may not have either).

We can impose a constraint to ensure that no person is store who does not have a social security number, and we will be able to tell our database to not accept any data that violates the constraint.

In knowledge graphs, the most widely used language for constraints is SHACL (Shapes Constraints Language). See the W3C definition for a technical outline of SHACL.

This essentially lets us specify shapes that the data should conform to, and then run a check to verify this. Here is an example:

:PersonShape a sh:NodeShape ;
sh:targetClass :Person ;
sh:property [
  sh:path :hasSocialSecurityNumber ;
  sh:maxCount 1 ;
  sh:minCount 1
] .

This tells us that every member of the class :Person must have exactly one object for the :hasSocialSecurityNumber predicate.

We can go further however, with a shape like this one, which also tells us what datatype the SSN should be, and what it should look like:

:PersonShape a sh:NodeShape ;
sh:targetClass :Person ;
sh:property [
  sh:path :hasSSN ;
  sh:maxCount 1 ;
  sh:minCount 1 ;
  sh:datatype xsd:string ;
  sh:pattern "^\\d{3}-\\d{2}-\\d{4}$"
] .

i.e. 3 digits at the start, followed by a hyphen, then 2 digits, then another hyphen, and finally 4 digits at the end.

If we validate the data against these shapes, we will be able to also find out which constraints were violated.

So if Alice had no SSN, then the validation report would tell us so.

If Bob has two SSNs, similarly, this reason would be flagged.

And if Charlie had one SSN, but it looks wrong, e.g. “12345-88”, then this would also be clear from the validation report.

Software for Handling RDF

Ubuntu Software for RDF

eye

swish

Swish is a framework for performing deductions in RDF data using a variety of techniques.

Swish is conceived as a toolkit for experimenting with RDF inference, and for implementing stand-alone RDF file processors (usable in similar style to CWM, but with a view to being extensible in declarative style through added Haskell function and data value declarations).

It explores Haskell as "a scripting language for the Semantic Web".

Swish is a work-in-progress, and currently incorporates:

Turtle, Notation3 and NTriples input and output. The N3 support is incomplete (no handling of @forAll).
RDF graph isomorphism testing and merging.
Display of differences between RDF graphs.
Inference operations in forward chaining, backward chaining and proof-checking modes.
Simple Horn-style rule implementations, extendable through variable binding modifiers and filters.
Class restriction rule implementation, primarily for datatype inferences.
RDF formal semantics entailment rule implementation.
Complete, ready-to-run, command-line and script-driven programs.

libjs-n3

The N3.js library is an implementation of the RDF.js low-level specification at https://rdf.js.org/ that lets you handle RDF in JavaScript easily.

It offers:

Parsing triples/quads from Turtle, TriG, N-Triples, N-Quads, and Notation3 (N3)
Writing triples/quads to Turtle, TriG, N-Triples, and N-Quads
Storage of triples/quads in memory

Parsing and writing is:

asynchronous – triples arrive as soon as possible
streaming – streams are parsed as data comes in, so you can parse files larger than memory
fast – by far the fastest spec-compatible parser in JavaScript

raptor

Raptor is a C library providing a set of parsers and serializers for Resource Description Framework (RDF) triples by parsing syntaxes or serializing the triples into a syntax.

The parsing syntaxes are RDF/JSON, RDF/XML, N-Triples, Turtle, GRDDL and RSS tag soup including Atom 0.3 and Atom 1.0. The serializing syntaxes are RDF/XML, N-Quads, N-Triples RSS 1.0 and Atom 1.0. Raptor can handle RDF/XML as used by RDF applications such as RSS 1.0, FOAF, DOAP, Dublin Core and OWL.

Raptor is designed for performance, flexibility and embedding (no memory leaks) and to closely match the revised RDF/XML specification.

rasqal

Rasqal is a C library providing support for querying the Resource Description Framework (RDF) including parsing query syntaxes, constructing the queries, executing them, returning result bindings and formatting results. It supports the SPARQL RDF Query Language, RDF Data Query Language (RDQL) and LAQRS experimental query language extending SPARQL.

redland

Redland is a library that provides a high-level interface for RDF (Resource Description Framework) implemented in an object-based API. It is modular and supports different RDF/XML parsers, storage mechanisms and other elements. Redland is designed for applications developers to provide RDF support in their applications as well as for RDF developers to experiment with the technology.

serd

Serd is a lightweight C library for RDF syntax which supports reading and writing Turtle and NTriples.

Serd is not intended to be a swiss-army knife of RDF syntax, but rather is suited to resource limited applications, or situations where a simple reader/writer with minimal dependencies is ideal (e.g. in LV2 hosts or plugins).

Serd is:

small: Serd is implemented in under 2500 lines1 of standard C code.
portable and dependency-free: Serd uses only the C standard library, and has no external dependencies, making it a lightweight dependency in every sense.
fast and lightweight: Serd (and the included serdi tool) can be used to stream abbreviated Turtle (unlike many other tools which can not stream since they must first build an internal model to abbreviate). In other words, Serd can re-serialise an unbounded amount of Turtle using a fixed amount of memory, preserving the abbreviations in the input.
conformant and well-tested: Serd is written to the Turtle, NTriples and URI specifications, and includes a comprehensive test suite which includes all the normative examples from the Turtle specification, all the "normal" examples from the URI specification, and additional tests added specifically for Serd. The test suite has over 96% pre coverage (by line), and runs with zero memory errors or leaks.

sord

Sord is a lightweight C library for storing Resource Description Framework (RDF) data in memory.

Sord includes man pages for the library (man sord) and a simple command line utility (man sordi).

morla

With Morla you can manage more RDF documents simultaneously, visualize graphs, use templates for quick writing and exec SPARQL/RDQL queries.

You can import RDFS documents and use their content to write new RDF triples. Templates are also RDF documents, and they make Morla easily customizable and expandable. You can embed Javascript code in your templates so you can validate and change user inputs.

perlrdf