Click here to Skip to main content
15,891,607 members
Articles / Programming Languages / XML
Tip/Trick

XML Database and XQuery to the Rescue

Rate me:
Please Sign up or sign in to vote.
5.00/5 (1 vote)
26 Jan 2015CPOL5 min read 13K   80   3  
How to process very large XML files

This article appears in the Third Party Products and Tools section. Articles in this section are for the members only and must not be used to promote or advertise products in any way, shape or form. Please report any spam or advertising.

Introduction

Regardless of the platform and languages, processing a very large XML  (for example, a size > 8G) is a daunting task. Traditionally XML is loaded into memory and using DOM the XML is processed. As you could imagine processing a XML of size of 10G using DOM is almost impossible or will result in an unacceptable performance. One could argue that SAX is the alternative solution but remember SAX is forward only and a read-only parser.

So on this situation the XML Database and XQuery comes to developerss' rescue. In this article , I'll show you how to use a XML database and run Xquery in the XML database.

Choosing a XML Database

There are many free and commercial XML databases out there. Choosing a XML database is entirely based on your own requirements. I have been using BaseX and eXist-db for my work and personal use. Both of them are free to use. In this article, to demonstrate the XQuery examples, I have used eXist-db but you could use any XML database of your choice.

You might go with the database that you are already familiar with, for instance MS SQL Server, but be aware SQL Server doesn't honour the W3C's full specification of XQuery. Because MS SQL Server has very limited support for XQuery, you should check first if SQL Server's support is sufficient for your solution and also for future extensions. For example, MS SQL does not support import namespace in XQueries which is a very powerful XQuery feature. If it is very simple solution and you are not planning on doing any further extension, then you could consider it.

What is XQuery

XQuery is to XML DB/XML what SQL is to relational database/tables and XPath is the key to working with XQuery. The structure of an XQuery is called FLWOR (pronounced “flower”). Every XQuery should satisfy the FLWOR struture.

F – For (Optional)

L – Let (Optional)

W – Where (Optional)

O – Ordered By (Optional)

R – Return

The following example returns all the employees from HR department in alphabetical order with the remaining holidays.

for $emp in $model//employees

let $remainHol := $emp/@totHol - $emp/@usedHol

where $emp/@dept = "HR"

ordered By $emp/@name

return <Employee id="{data($emp/@id)}" name="{data($emp/@name)}" remainingHol="{$remainHol}" />

As the aim of this article is to demonstrate the power of XML Database and XQuery, we are not going to look into XQuery in much detail. Please refer to some good XQuery tutorials for learning about XQuery in detail.

This link http://cs.au.dk/~amoeller/XML/querying/languages.html is worth reading to learn XQuery.

Capabilities Of XQuery

  • Selecting information based on specific criteria (Xpath expressions)
  • Filtering unwanted data (Predicates/Where clause)
  • Joining data from multiple documents(Let)
  • Sorting, grouping, and aggregating data (Order By , (Group By), Funtions)
  • Transforming XML data into another XML structure
  • Performing arithmetic calculations on numbers and dates
  • Manipulating strings to reformat text

Running XQuery in XML Database

You could use either BaseX or eXist-db. I have used eXist-db for this article, which can be downloaded from http://www.exist-db.org. Download eXist-db from the link and follow the instructions for installing it in your machine.  Once you have started eXist-db, you will see an icon in your system tray as shown below. Right-click       on this icon.

Image 1

 

To write an XQuery , click on 'Open eXide' to launch the XQuery IDE (eXide) from eXist-db and you will see the IDE as shown below. You can start writing XQuery in the IDE as you would do in SSMS for MS SQL Server or SQL Developer for Oracle DB.

Image 2

You could download the XML file (books.xml) and the XQueries that I have used for this article (link can be found at the bottom of this article) or you could create a new XQuery file or open the one that you have saved in the XML db before.  In the XQuery, the XML is referenced as a file (you have to change c:\XmlInAction to wherever you have saved the file in your system). In fact the file is treated as DB. In BaseX, you have to load the XML file as DB before querying it.

Example : Basic XQuery

The following XQuery example shows a simple example to illustrate the structure of a XQuery. As you could see the top pane is used for the XQuery and the bottom pane for the result from the XQuery.

 

Image 3

Example  : Using Aggregation

(: Aggregation in XQuery :)

for $author in distinct-values(doc('file:///C:\XQueryInAction\books.xml')/catalog/book/author)
let $booksByAuthor := doc('file:///C:\XQueryInAction\books.xml')/catalog/book[author = $author]
order by $author
return <author name="{$author}"
               noOfBooks="{count(distinct-values($booksByAuthor/@id))}"
               totalPrice="{sum($booksByAuthor/price)}" />

Example : Group By  in XQuery V<3

(: Group by author:)

for $author in distinct-values(doc('file:///C:\XQueryInAction\books.xml')/catalog/book/author)
let $booksByAuthor := doc('file:///C:\XQueryInAction\books.xml')/catalog/book[author = $author]
order by $author
return <author name="{$author}">{
         for $book in $booksByAuthor
         order by $book/title
         return $book
}</author>

 

Example : Using Group By in XQuery V3.0

xquery version "3.0";

for $author in distinct-values(doc('file:///C:\XQueryInAction\books.xml')/catalog/book/author)
let $authorName := $author
for $book in doc('file:///C:\XQueryInAction\books.xml')/catalog/book
where $book/author = $author
group by $authorName
return <author name="{$authorName}">{$book/title}</author>

This article is just to help you understand how powerful the XML database and XQuery when it comes to a very large XML and how to use the IDE for XQuery to write and run your XQueries. Right now we have got the XQuery to extract the information required for the processing in the IDE. Now the question is: "How can a program execute the XQuery from an application?"

Well, I have been using BaseX at my workplace and it provides various APIs for different platforms and languages. You could deploy the XQueries as REST services in a web server and the XQueries can be called by simple HTTP request. It also provides other APIs that you can choose based on your problem at hand. The link for the BaseX home page is http://www.basex.org/. There are other XML databases as free and commerical software that are very powerful when you deal with XML including eXist-db.

Points Of Interest

  • Indexes can be applied to XML DB for better performance
  • More than one XML DB can be joined in a XQuery
  • Client/Server artchitecture support
  • Full Text index support
  • Use of XPath expressions
  • XQuery Editors
  • REST/HTTP support
  • Third party xquery modules support such as functx (http://www.xqueryfunctions.com)

Download the XML and XQuery samples

Now you can XQuery any sized XML with ease. Happy XQuering!

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Engineer
United Kingdom United Kingdom
Areas of Expertise:

.NET, MVC, SQL Server, XML , SOA and WEB solutions.

* Microsoft certified in C# Programming.
* Microsoft certified Techology Specialist (ASP.NET)
* IBM certified in XML Technology
* Sun certified Java Programmer
* Sun certified Web Solution Developer

Comments and Discussions

 
-- There are no messages in this forum --