This article introduces a "Delta Calculation Component" (DCC for simplicity) that provides delta calculation logic of XML document using XSLT implementation.
DCC gets a XML document as an input. This XML document contains two different sections. The sections present two different XML structures. DCC runs the delta calculation logic to extract the changes between these two sections. The result of the DCC component is presented in one XML section containing the consolidated changes (similarities are ignored) between the input’s sections. The output includes flags to annotate the deleted, added and updated elements according to the input file. These flags’ values are configurable in the component’s configuration file.
The input XML document contains the first section which is called the "AS IS" structure and the second section which is called the "TO BE" structure. The two sections wrapped separately by the same parent element name (whatever the user defined/configured element name). Sample of the input file and more detailed description will be demonstrated in the next sections.
The DCC implementation in XSLT language compares these two sections upon the configuration.xml file which contains some defined configurations by the user. XSLT language is faster than other programming languages which can provide the same logic, which influences positively on the user application performance. The result is constructed in one section wrapped by "
DeltaCalculationOutputWrapper" XML element. This section contains the difference between the elements in the input sections. The flag (action attribute) is attached to the XML element according to "AS IS-TO BE" changes. The changes could be: deleted element from "AS IS", added element to "TO BE", changed element between "AS IS" and "TO BE". The identification verification of the ASIS and TOBE elements depends on the defined primary attribute per each input element. All details of these points and configuration will be described later in the following sections.
DCC Functionality Details
In Figure 1, the input consists of two "structure" elements under the same "ABC" parent element. These two "structure" elements will be considered in delta calculation of DCC component. Under each "structure" component, there are books elements (and could be anything else upon user input file). These elements under "structure" will be evaluated. Each element as described later has a primary attribute key. Let’s say in the case of book, the "
book_name" is the primary key, so the books elements will be compared through this attribute value. The primary key per each element is defined by the user. If there is an element that the user has not defined such primary key, any element in the ASIS will match with same element in TOBE under the same hierarchy even if they have different other details (other different attributes).
Figure 1: Delta Calculation Sample Input
After doing that logic to be able to match the elements in two separate "structure" elements, DCC defines what the added or removed elements are. Any parent that has added or removed child will be considered as updated.
The similar elements that have no updates will be absent in the generated output, DCC presents only the changes.
According to Figure 1, DCC will mention in its output the following:
- "Pascal Language" book is removed with all its child elements.
- "C++ Language" book is added with all its child elements.
- "Java Language" book is updated with "Java_New_Author" is added and "Java_Old_Author" is removed ("Java_Author" will not be mentioned at all, no changes in it nor in its children).
Figure 2: Delta Calculation Architecture
As described in figure 2, DCC includes 3 types of artifacts:
- Yellow blocks: user defined input file and generated output file
- Green blocks: user exposed artifacts. They are a configuration file and a DCC entry point XSL file.
- Red blocks: DCC internal artifacts that contain the delta calculation logic. These files should not be touched by the user.
Input XML file is the user defined input file. As in figure 1, the input file contains two "structure" elements. "Structure" element as it will described later, is the wrapper of the input "ASIS" and "TOBE" sections. Underneath this element, the structure follows the user specific problem. Also, the elements out of that element are out of delta calculation logic scope. Example of these outer elements is "
DEF" element in figure 1.
Out_XXX XML file (where XXX is the input file’s name) is the output generated file. As presented in figure 3, it contains "
DeltaCalculationOutputWrapper" element. Under this element, the changed/added/removed elements will appear. Any similar inputs will not be shown in the output. Any added element will be marked as added, any removed element will be marked as removed. And in both cases, all the predecessor elements are marked as updated. As shown in figure 3, the book that has at least one added or removed author is handled as updated book.
Figure 3: Delta Calculation sample output xml file
DCC_EntryPoint XSL file is the entry point for DCC. The user should call this XSL file after setting the path of the input file. This call could be managed through implementation (Java implementation for example) or using a tool like "Altova XML spy". The user doesn’t need to do anything with that file, just he needs to call it.
DCC_Configuration XML file is a configuration file that should take some attention from the user. The user here defines many things as the following:
DC-ComparedElement" is the entry to define the input wrapper element name ("
structure" in our sample scenario).
Component", here the user defines all the components that need to be compared. In each "
component" element, the user should state information: the user problem related element name and the attribute of this element that will be used to know either two elements are identical or not. This "
primarykey" element states something like the primary key in DB to differentiate between rows.
DC-UpdatedElement" is the feature that the user could define the flag that will mark the updated elements in the output file.
DC-RemovedElement" is the feature that the user could define the flag that will mark the removed elements in the output file.
DC-AddedElement" is the feature that the user could define the flag that will mark the added elements in the output file.
Figure 4: DCC_Configuration xml sample file
DCC_Delta_Calculation_Logic XSL file is an internal XSL file, contains the needed logic for the calculation. This file should not be modified by the user.
DCC_Utility XSL file is an internal XSL file, contains the needed logic utility for the calculation. This file should not be modified by the user.
Getting Started to Use DCC without User Interface
When you would like to use the component internally in your application, you should follow the following steps:
- Configure "DCC_Configuration.xml" file according to your input schema as mentioned before in this article.
- Put all the DCC files (XSLs and configuration XML) in the same directory.
- Use your implementation to call "DCC_EntryPoint.xsl" file directly attaching the input file (without any specified template call).
- Find the output of the call; it is the delta calculation logic output.
Getting Started to Use Dcc With a User Interface
Figure 5: Delta Calculation GUI Interface
In figure 5, this is the first form that appears when you start running the delta calculation component jar. This form consists of three fields, the first one is XSLT file path which should contain the correct path of the DCC_EntryPoint.xslt file, the second field should contain the correct path of the input file, and the third one contains the path of the folder you want to encompass your output file.
Figure 6: Delta calculation filled form with correct paths.
Figure 6, this form contains the three fields which were filled with the correct paths.
Figure 7: Delta calculation completed successfully
After filling the paths with the correct paths and clicking start button, the delta calculation starts working, then a message box will appear with a successful message. Now the generated file is ready for checking.
Figure 8: Delta calculation failed scenario
If an incorrect input is used, a message box will appear with a failure message describing what is wrong (as in figure 8).
Figure 9: Delta Calculation with missing field
As in figure 9, if start button is clicked and there was a path that has not been filled, a message box will appear to ask for completing it.
DCC component is apparently fit with middleware applications that need such feature to be implemented using XSLT. The logic of delta calculation is easily to be implemented using any object oriented/structural programming language, but the really new idea here is implementing such complex logic using XSLT language which provides better performance than any normal programming language (Java, .NET ….).
Although DCC really fits in middleware application, it could also be used in desktop, web, and enterprise application (under the constraint that, this application has the needed library to execute XSLT language like Saxon jars).
The GUI part of DCC (the jar file) could be used from any non-technical user to get the same logic service through the GUI presentation and with the support of the configuration file.