The Semantic Web: An Introduction
The Semantic Web is a mesh of information linked up in such a way as to be easily processable by machines, on a global scale. You can think of it as being an efficient way of representing data on the World Wide Web, or as a globally linked database.The Semantic Web was thought up by Tim Berners-Lee, inventor of the WWW, URIs, HTTP, and HTML. There is a dedicated team of people at the World Wide Web consortium (W3C) working to improve, extend and standardize the system, and many languages, publications, tools and so on have already been developed. However, Semantic Web technologies are still very much in their infancies, and although the future of the project in general appears to be bright, there seems to be little consensus about the likely direction and characteristics of the early Semantic Web.
What’s the rationale for such a system? Data that is geneally hidden away in HTML files is often useful in some contexts, but not in others. The problem with the majority of data on the Web that is in this form at the moment is that it is difficult to use on a large scale, because there is no global system for publishing data in such a way as it can be easily processed by anyone. For example, just think of information about local sports events, weather information, plane times, Major League Baseball statistics, and television guides… all of this information is presented by numerous sites, but all in HTML. The problem with that is that, is some contexts, it is difficult to use this data in the ways that one might want to do so.
So the Semantic Web can be seen as a huge engineering solution… but it is more than that. We will find that as it becomes easier to publish data in a repurposable form, so more people will want to pubish data, and there will be a knock-on or domino effect. We may find that a large number of Semantic Web applications can be used for a variety of different tasks, increasing the modularity of applications on the Web. But enough subjective reasoning… onto how this will be accomplished.
The Semantic Web is generally built on syntaxes which use URIs to represent data, usually in triples based structures: i.e. many triples of URI data that can be held in databases, or interchanged on the world Wide Web using a set of particular syntaxes developed especially for the task. These syntaxes are called “Resource Description Framework” syntaxes.
URI - Uniform Resource Identifier
A URI is simply a Web identifier: like the strings starting with “http:” or “ftp:” that you often find on the World Wide Web. Anyone can create a URI, and the ownership of them is clearly delegated, so they form an ideal base technology with which to build a global Web on top of. In fact, the World Wide Web is such a thing: anything that has a URI is considered to be “on the Web”.
The syntax of URIs is carefully governed by the IETF, who published RFC 2396 as the general URI specification. The W3C maintains a list of URI schemes.
DF - Resource Description Framework
A triple can simply be described as three URIs. A language which utilises three URIs in such a way is called RDF: the W3C have developed an XML serialization of RDF, the “Syntax” in the RDF Model and Syntax recommendation. RDF XML is considered to be the standard interchange format for RDF on the Semantic Web, although it is not the only format. For example, Notation3 (which we shall be going through later on in this article) is an excellent plain text alternative serialization.
Once information is in RDF form, it becomes easy to process it, since RDF is a generic format, which already has many parsers. XML RDF is quite a verbose specification, and it can take some getting used to (for example, to learn XML RDF properly, you need to understand a little about XML and namespaces beforehand…), but let’s take a quick look at an example of XML RDF right now:-
<rdf:RDF xmlns:rdf=”http://www.w3.org/1999/02/22-rdf-syntax-ns#”
xmlns:dc=”http://purl.org/dc/elements/1.1/”
xmlns:foaf=”http://xmlns.com/0.1/foaf/” >
<rdf:Description rdf:about=”">
<dc:creator rdf:parseType=”Resource”>
<foaf:name>Sean B. Palmer</foaf:name>
</dc:creator>
<dc:title>The Semantic Web: An Introduction</dc:title>
</rdf:Description>
</rdf:RDF>
This piece of RDF basically says that this article has the title “The Semantic Web: An Introduction”, and was written by someone whose name is “Sean B. Palmer”. Here are the triples that this RDF produces:-
<> <http://purl.org/dc/elements/1.1/creator> _:x0 .
this <http://purl.org/dc/elements/1.1/title> “The Semantic Web: An Introduction” .
_:x0 <http://xmlns.com/0.1/foaf/name> “Sean B. Palmer” .
This format is actually a plain text serialization of RDF called “Notation3″, which we shall be covering later on. Note that some people actually prefer using XML RDF to Notation3, but it is generally accepted that Notation3 is easier to use, and is of course convertable to XML RDF anyway.
Source: Infomesh
