Click here to Skip to main content
15,891,976 members
Articles / Programming Languages / PHP

SimpleRDFElement class makes it easier to handle RDF XML

Rate me:
Please Sign up or sign in to vote.
4.20/5 (2 votes)
6 Jul 2011GPL38 min read 18.7K   322   3  
The SimpleXML object bundled with PHP does not handle namespaces or RDF documents well. This extension class helps.

Introduction

Resource Description Framework (RDF) is a method for expressing "triples" of knowledge (statements in subject-predicate-object format) in a way that is easily serialized as XML. Different "terms" in RDF are defined by different vocabularies that people make available online in RDF schema documents, and the system is specifically designed to let people build on each other's vocabulary definitions: you can assign different vocabularies as namespaces in your RDF XML document and use the terms that they define. As a result, a typical RDF document includes elements from a large number of namespaces. Because of the way that RDF information is represented in XML, it is common for every single element tag to be qualified by a namespace prefix and every parent element to have child elements from multiple namespaces.

This presents a problem in PHP when trying to use their SimpleXML module to parse RDF XML. It provides a SimpleXMLElement class that is easy and fun to use, as long as you are not dealing with namespaces. It can be adequate to use when the namespace handling is very simple: for example, when the child elements of an element all belong to the same namespace. But there is no easy way to get the namespace prefix or portion of a particular element, and it is difficult to handle elements with child elements from multiple namespaces. This makes what should be a very simple piece of code—converting RDF XML into a representation of the "triples" (subject, predicate, object) represented by the XML—mind-numbingly complex.

As a result, I present the SimpleRDFElement class: a class that extends the built-in PHP SimpleXMLElement class with a few extra methods designed to make it easier to use when working with RDF XML.

(Note: The rest of this article is written with the assumption that you are familiar with the basics of RDF, XML, and PHP, and know what terms like "triple", "namespace", and "object method" mean and how they are represented.)

Background

As part of a project that I have been working on, I needed to be able to convert strings of RDF/XML text into objects representing each of the tags (or "nodes") in the XML, and then to determine what RDF Triples were represented by those XML elements and their sub-trees. I wanted to make use of the built-in functionality of the SimpleXML module in PHP, but when I tried, I encountered a number of problems. This is just a brief list of some of the issues I encountered when trying to use the SimpleXMLElement class to represent RDF/XML:

  • Because all children of the root element are qualified with namespace prefixes, they cannot be accessed as object properties using the -> operator
  • Because of the way that the child nodes array is created, qualified elements also cannot be viewed as array elements using methods like print_r()
  • Because the children() method, when called without arguments, only returns unqualified (i.e., no namespace prefix) elements, it returns nothing and so cannot be iterated over
  • As a result, the object appears to be completely empty; it even evaluates as "false" if you try to use it as a Boolean (e.g. adding an "or die()" clause after the assignment)
  • When you call the children() method with a namespace argument, it will only retrieve children (and their sub-children) that have that namespace prefix
  • As a result, if you expect the children of an element to come from any of a number of namespaces, you have to iterate over every namespace

(If you are curious, I have a detailed blog post about some of my failed attempts and problems I was experiencing here: http://talkingowlproject.blogspot.com/2011/06/simplexml-and-namespace-quirks.html.)

After extensive Google searching for a solution to this problem, I found nothing that fit my needs. Either I could download extensive RDF "frameworks" that require installing a dozen or more PHP class files (….but all I want to do is parse an RDF string into triples! I don't need all that!), or I could follow the suggestions of some "hack" that literally were unworkable. (For example, one person suggested I simply replace the ":" character in the RDF string with "_" in order to get rid of namespaces entirely. This doesn't work because the namespaces prefixes in an XML document are arbitrary, intended merely to be "shortcuts" to the longer URIs defined in the header of the document. Different people can use different prefixes to represent the same namespace URI, and it should not make a difference.)

So I decided to create my own solution as a "lightweight" alternative. It is literally one file with one main class (the SimpleRDFElement class) and one helper class (the SimpleRDFTriple class). All it really does is add a few helper methods to the built-in SimpleXMLElement class in PHP. But these methods make all the difference in the world when you are handling RDF XML.

Because this solution is short and simple, there is a lot it doesn't do. That is on purpose: it is not supposed to do a lot. It is a simple solution to a simple problem. It will let you parse an RDF document as an object and will let you access namespace information. It also gives you a method that will extract triples from the top-level element represented by the object and its direct children. (This method is not recursive, so you will have to do any recursion yourself.) 

I cannot guarantee that it will absolutely function for every valid RDF/XML document. However, I am open to making (some) additions and improvements, and fixing anything that you find broken. Please contact me with your comments, suggestions, and complaints.

Using the Code

This code is a single file that contains two PHP class definitions.

The first class is merely a helper class, SimpleRDFTriple, which literally is an object with no methods and three properties: tripleSubject, triplePredicate, and tripleObject. The only reason this class is here is so that the SimpleRDFElement class can have a method, getTriples(), that returns an array of objects of that type.

The second class, SimpleRDFElement, extends the class SimpleXMLElement which is built into PHP as part of the SimpleXML library.

Because the class extends the SimpleXMLElement, you can create a new SimpleRDFElement from a string variable that contains RDF/XML text using the built-in function simplexml_load_string():

PHP
$xmlobj = simplexml_load_string($xmltext,'SimpleRDFElement');

The first parameter is the variable containing the RDF/XML text that you want to parse, and the second parameter is a string: the name of our extended class, SimpleRDFElement. This will return an object of type SimpleRDFElement, which means that it can be manipulated exactly like a SimpleXMLElement object, but that you can also use the new elements provided by the extension class.

The new methods provided by SimpleRDFClass are:

PHP
$xmlobj->getPrefix()

Returns the namespace prefix of the root element of the object, based on the namespace definitions defined by the XML text.

PHP
$xmlobj->getNamespace()

Returns the full URI of the namespace of the root element of the object, based on the namespace definitions defined by the XML text.

PHP
$xmlobj->getFullName()

Returns the fully qualified name of the root element, using the prefix-colon-tagname format, e.g., rdfs:Class.

PHP
$xmlobj->getFullURI()

Returns the full URI of the root element, using the expanded URI of the namespace followed by the element tag name, e.g. http://www.w3.org/2000/01/rdf-schema#Class.

PHP
$xmlobj->getChildNodes()

Returns an array of all of the child elements (as SimpleRDFElement objects) of the current top-level element. Unlike the built-in children() method, this returns all child elements regardless of namespace.

PHP
$xmlobj->getAttributes()

Returns an array of all of the attributes (as individual SimpleRDFElement objects) of the current top-level element. Unlike the built-in attributes() method, this returns all attributes regardless of namespace.

$xmlobj->getTriples()

Returns an array of SimpleRDFTriple objects. This is a simple helper class that defines an object with three properties: tripleSubject, triplePredicate, tripleObject. This method parses the top level element and constructs triples based on that element, its attributes, and its immediate child elements. It is not recursive.

Most of the methods are simple and their usage is self-evident if you are familiar with RDF, XML, and namespaces.

The only complex method is getTriples(), which returns an array of SimpleRDFTriple objects based on the root element represented by $xmlobject.

You should keep in mind that getTriples() is not recursive, and therefore will assume that the root node represents an RDF element that contains information about the subject of the triples, and the immediate child elements (and attributes) predicate, and object information about that subject. This means that if you have initially created your $xmlobj from a full RDF/XML document, so that the root element is the RDF element, you will have to iterate over the children to extract triples.

For example, the following code provides a very simple RDF/XML string and will show how to extract all of its triples:

PHP
$xmltext = 
'<rdf:rdf 
        xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" 
        xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#">
<rdf:Description rdf:id="#someperson">
<rdfs:label>Bob</rdfs:label>
<rdf:type rdf:resource="http://xmlns.com/foaf/0.1/Person" />
</rdf:Description>
</rdf:rdf>';

$xmlobj = simplexml_load_string($xmltext,'SimpleRDFElement');

foreach ($xmlobj->getChildNodes() as $child)
{
     foreach ($child->getTriples() as $trip)
     {
        print_r( $trip );
     }
}

This will produce the following output text:

PHP
SimpleRDFTriple Object
(
    [tripleSubject] => #someperson
    [triplePredicate] => http://www.w3.org/2000/01/rdf-schema#label
    [tripleObject] => Bob
)
SimpleRDFTriple Object
(
    [tripleSubject] => #someperson
    [triplePredicate] => http://www.w3.org/1999/02/22-rdf-syntax-ns#type
    [tripleObject] => http://xmlns.com/foaf/0.1/Person
)

Points of Interest

The code in the source file is deliberately kept very simple, so that instead of simply using it like some kind of "black box", you can see exactly how it is done and (if you would like) modify it.

If you come up with a particularly clever extension or additional method, let me know about it and I will add it (and your name, with credit) to the source code that is linked to above.

History

Updates on this class or anything related to it will appear on the blog: http://talkingowlproject.blogspot.com/.

License

This article, along with any associated source code and files, is licensed under The GNU General Public License (GPLv3)


Written By
Web Developer
United States United States
I'm just some guy.

Comments and Discussions

 
-- There are no messages in this forum --