RDF, Semantic Web, and Semantic Databases


RDF (Resource Description Framework) is a model for representing data, and more specifically, meaning, on the web. It is different to XML in that XML is a markup language for adding data tags into unstructured data; RDF is a model for expressing data, entities and their relationships. It is possible to express RDF in XML format (RDF-XML) although many other expressions are possible. For a discussion about the differences between XML and RDF see here. At the first approximation, XML is for data, RDF is for meaning.

R(Resource) :
  • Can be everything
  • Must be uniquely identified and be referencable
  • Simple by URI (Uniform Resource identifier)
D(Description)
  • Description of resources
  • Representing properties and relationships among the resources
  • Relationships can be represented as graphs
F(Framework)
  • A combination of web based protocols (URI, HTTP, XML...)
  • Based on formal models (semantics)
  • Defines all allowed relationships among resources.

At the heart of RDF is the assertion, a subject-predicate object relationship. For example, some facts about Bloomington could be expressed very simply in RDF text format as:
Bloomington is_a City
Bloomington has_population 81381
 
We can then have assertions that relate to these, for instance
David_Wild lives_in Bloomington
Each of these assertions is called an RDF Triple (subject-predicate-object). But note that there is a lot of ambiguity here: the data is still too unstructured to be useful. How do we define a city (versus town, etc?). What if we really meant Bloomington, IL, not Bloomington, Indiana in some of these? What if there are two David Wilds?

This relates strongly to our problem in relational databases of needing a primary key to uniquely identify each entity/tuple. In the web world, we also need to uniquely identify each tuple, and we do that using Uniform Resource Identifiers (URIs). URIs usually look a lot like URLs, but they need not map to an actual web address. Let's say I own the domain uri123.com. I could then use URI's to UNIQUELY identify each of the entities, then use these in the RDF. The URL may "point" to a description of the resource. So we may define the following
http://uri123.com/popcenters/City
http://uri123.com/popcenters/Town
http://uri123.com/states/Indiana
http://uri123.com/popcenters/has_population
http://uri123.com/popcenters/lives_in
http://uri123.com/cities/Indiana/Bloomington
http://uri123.com/people/David_Wild
So we can then make an unambiguous statement (within a particular namespace) such as:
http://uri123.com/people/David_wild   http://uri123.com/popcenters/lives_in   http://uri123.com/cities/Indiana/Bloomington
To make this less verbose, we can specify a default RDF name space. For more on this, follow through the RDF-Primer example.

Let's take an example

David Wild has an email djwild@indiana.edu and writes blog at http://allhazards.blogspot.com. Let's see how does it look as a graph at RDF Validator
<?xml version="1.0"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:pers="http://cheminfov.informatics.indiana/Personal#">
<rdf:Description rdf:about="http://cheminfov.informatics.indiana/DavidWild">
<pers:hasEmail rdf:resource="djwild@indiana.edu" />
</rdf:Description>
<rdf:Description rdf:about="http://cheminfov.informatics.indiana/DavidWild">
<pers:writesBlog rdf:resource="http://allhazards.blogspot.com"/>
</rdf:Description>
</rdf:RDF>
 
or
 
<?xml version="1.0"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:pers="http://cheminfov.informatics.indiana/Personal#">
<rdf:Description rdf:about="http://cheminfov.informatics.indiana/DavidWild">
<pers:hasEmail rdf:resource="djwild@indiana.edu" />
<pers:writesBlog rdf:resource="http://allhazards.blogspot.com/"/>
</rdf:Description>
</rdf:RDF>
 

Making RDF Useful


The "Semantic Web" is really, from a technical perspective, RDF, plus three other technologies that make RDF really useful:

Triple Stores, for storing databases of RDF (equivalent of an RDBMS)
SPARQL, for searching RDF (equivalent of SQL)
Ontologies (in OWL), for describing and mapping RDF data

How Semantic Databases relate to Relational Databases


There are two main differences between relational databases and triple stores

First, semantic databases separate the data from the structure of the data, whilst relational databases tightly couple the data with the structure of the data. This makes it easy to add new cross-silo structure in semantics, and also to develop tools and algorithms that are not tied to the structure of a particular silo - so for instance you can merge together several datasets, and map dataset-level descriptions easily to higher level ontologies (e.g. "this is an Amazon book; this is a Google Book; they are both books"). We can then issue intuitive queries in one statement that are not dataset dependent - e.g. "find me all of the books written by J.K. Rowling"

Second, a semantic database is a network database, meaning that all the RDF triples in aggregate, form a (usually huge) network, or graph of nodes (subjects and objects) and edges (predicates). This is a hugely important property, as it enables us to do all kinds of interesting searching and prediction on the graph (for example, shortest path, subgraph isomorphism, and so on).

See article: Will triple stores replace relational databases?

Searching with SPARQL

Lets Look at the power of SPARQL.
I want to find the actors of the movies in which two popular male actors Arnold Schwarzenegger and Sylsvester Stallone worked together.
Try googling it? Not so easy. Now lets see what we have in semantic web .

Now lets go to SPARQL query editor and do the search with this query given below
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX : <http://dbpedia.org/resource/>
PREFIX dbpedia2: <http://dbpedia.org/property/>
PREFIX dbpedia: <http://dbpedia.org/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
 
SELECT * WHERE {
?film dbpedia2:starring :Arnold_Schwarzenegger.
?film dbpedia2:starring :Sylvester_Stallone.
?film dbpedia2:starring ?actors.
}
 
ORDER by ?film
Now what you get?

Welcome to Semantic web

We will work through a couple of DBPedia examples taken from the W3C site using the DBPedia SPARQL End Point

Find 50 example concepts in DBPedia
SELECT DISTINCT ?concept
WHERE {
    ?s a ?concept .
} LIMIT 50
Find all landlocked countries with a population greater than 15 million
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX type: <http://dbpedia.org/class/yago/>
PREFIX prop: <http://dbpedia.org/property/>
SELECT ?country_name ?population
WHERE {
    ?country a type:LandlockedCountries ;
             rdfs:label ?country_name ;
             prop:populationEstimate ?population .
    FILTER (?population > 15000000) .
}
Let's try some other examples using IU's own Chem2Bio2RDF SPARQL Endpoint. Note the use of the OWL ontology.

What are the side effects of the diabetes drug Troglitazone?
PREFIX c2b2r: <http://chem2bio2rdf.org/chem2bio2rdf.owl#>
PREFIX bp: <http://www.biopax.org/release/biopax-level3.owl#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
 
SELECT *
FROM <http://chem2bio2rdf.org/owl#>
WHERE
{
?chemical rdfs:label "Troglitazone"^^xsd:string ;
          c2b2r:causeSideEffect [bp:name ?side_effect] .
}
What are the diseases that can be treated by Troglitazone?
PREFIX c2b2r: <http://chem2bio2rdf.org/chem2bio2rdf.owl#>
PREFIX bp: <http://www.biopax.org/release/biopax-level3.owl#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
 
SELECT *
FROM <http://chem2bio2rdf.org/owl#>
WHERE
{
?chemical rdfs:label "Troglitazone"^^xsd:string ;
          c2b2r:treatDisease [bp:name ?disease] .
}
What drugs are interact with troglitazone? what are their effects?
PREFIX c2b2r: <http://chem2bio2rdf.org/chem2bio2rdf.owl#>
PREFIX bp: <http://www.biopax.org/release/biopax-level3.owl#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
 
SELECT *
FROM <http://chem2bio2rdf.org/owl#>
WHERE
{
?chemical rdfs:label "Troglitazone"^^xsd:string ;
          c2b2r:hasDrugDrugInteraction [c2b2r:hasPart [bp:name ?name];
                                        c2b2r:description ?description] .
FILTER (str(?name)!="Troglitazone") .
}

How it's coming together


LinkedOpenData - see Vimeo Video
Google Knowledge Graph
Facebook Graph Search
Web 1.0-Web 3.0
OpenPHACTS

Semantic searching is going mainstream. Next is semantic reasoning. Here's what could be coming....



And this is the progress we made (not quite Semantic Web yet)



Here is Tim Berner's Lee about Next Generation of Web




IU is researching this too - see tools at http://djwild.info

To learn much more about the Semantic Web, check out Semantic University