Click here to Skip to main content
15,886,026 members
Articles / Database Development / MongoDB
Tip/Trick

Tutorial: Compare JSON Documents and Apply Patches in TerminusDB and MongoDB

Rate me:
Please Sign up or sign in to vote.
0.00/5 (No votes)
2 Mar 2022CPOL4 min read 5.4K   1  
Save time comparing JSON docs and build collaboration features into your applications
In this demo tutorial, we will show how the diff and patch operation can be applied to monitor changes in TerminusDB schema, TerminusDB documents, JSON schema, and other document databases like MongoDB. Save time comparing large JSON documents, and build in data collaboration features with this free open-source tool.

A Little Background on JSON diff and patch

A fundamental tool in Git’s strategy for distributed management of source code is the concept of the diff and the patch. These foundational operations are what make git possible. Diff is used to construct a patch that can be applied to an object such that the final state makes sense for some value.

But what about structured data? Do similar situations arise with structured data that require diff and patch operations? Sure they do.

In applications, when two or more people are updating the same object, such as an online store, this sort of curation operation is often achieved with a lock on the object. Which means only one person can win. And locks are a massive source of pain, not only because you can’t achieve otherwise perfectly reasonable concurrent operations, but because you risk getting stale locks and having to figure out when to release them.

When more than one person is working on a dataset, there are often times when there is a conflict. Without adequate workflow and conflict measures, quite often someone’s change gets squashed and as a result, data can start to become inaccurate. In the long run, this causes all sorts of issues with reporting, customer service, and business intelligence. This is where diff and patch come in, where users can see a before and after state each time they submit their changes to the database. Here, any conflicts can be flagged and a human review can oversee these changes to ensure data accuracy in the long run. Better data, better decisions.

Using Diff and Patch with TerminusDB Python

Prerequisites

You will need to install the TerminusDB Python client, check out here.

Ensure you have the docker container running on localhost.

In this script, we demonstrate how diff will give you a Patch object back and with that object, you can apply patch to modify an object and we show this for TerminusDB schema, TerminusDB documents, and JSON schema.

In TerminusDB, documents and schemas are represented in JSON-LD format. With diff and patch, we can easily compare any documents and schemas to see what has been changed.

Let us look at a document as a Python object:

Python
class Person(DocumentTemplate):
    name: str
    age: intjane = Person(name="Jane", age=18)
janine = Person(name="Janine", age=18)

You can directly apply a diff to get a patch object:

Python
result_patch = client.diff(jane, janine)pprint(result_patch.content)

With the patch object (result_patch here), you can either review its content or you can apply it to an object and you can get an after object back.

Python
after_patch = client.patch(jane, result_patch)pprint(after_patch)
assert after_patch == janine._obj_to_dict()

As you can see, the after_patch object (document) is the same as janine. You can put this document back in the database using replace_document to commit this change.

Diff and patch also work with JSON-LD documents:

JSON
jane = { "@id" : "Person/Jane", "@type" : "Person", "name" : "Jane"}
janine = { "@id" : "Person/Jane", "@type" : "Person", 
           "name" : "Janine"}result_patch = client.diff(jane, janine)pprint(result_patch.content)

It is also not limited to JSON-LD, it can work with schemas:

JSON
class Company(DocumentTemplate):
    name: str
    director: Personschema1 = WOQLSchema()
schema1.add_obj("Person", Person)
schema2 = WOQLSchema()
schema2.add_obj("Person", Person)
schema2.add_obj("Company", Company)result_patch = 
   client.diff(schema1, schema2)pprint(result_patch.content)

Note that diff and patch will work on most JSON formats.

Another application example is to compare 2 JSON schemas:

JSON
schema1 = {
  "type": "object",
  "properties": {
    "name": { "type": "string" },
    "birthday": { "type": "string", "format": "date" },
    "address": { "type": "string" },
  }
}schema2 = {
  "type": "object",
  "properties": {
    "first_name": { "type": "string" },
    "last_name": { "type": "string" },
    "birthday": { "type": "string", "format": "date" },
    "address": {
      "type": "object",
      "properties": {
        "street_address": { "type": "string" },
        "city": { "type": "string" },
        "state": { "type": "string" },
        "country": { "type" : "string" }
      }
    }
  }
}result_patch = client.diff(schema1, schema2)pprint(result_patch.content)

See the full script here

Using Diff and Patch with MongoDB

In this script, we demonstrate how diff and patch can be used in your MongoDB workflow. The first part of the script is the MongoDB tutorial on how to use Pymongo and in the second part, we demonstrate the extra step to review the changes before applying a patch to your MongoDB collection.

As we discovered in the last section, diff and patch can apply to any JSON format. Since MongoDB also uses JSON format to describe their data, we can use diff and patch to do similar things.

Here we use the tutorial for Pymongo as an example:

Python
client = MongoClient(os.environ["MONGO_CONNECTION_STRING"])# Create the database for our 
                           # example (we will use the same database throughout the tutorial
connection = client['user_shopping_list']collection_name = connection["user_1_items"]item_1 = {
"_id" : "U1IT00001",
"item_name" : "Blender",
"max_discount" : "10%",
"batch_number" : "RR450020FRG",
"price" : 340,
"category" : "kitchen appliance"
}item_2 = {
"_id" : "U1IT00002",
"item_name" : "Egg",
"category" : "food",
"quantity" : 12,
"price" : 36,
"item_description" : "brown country eggs"
}
collection_name.insert_many([item_1,item_2])expiry_date = '2021-07-13T00:00:00.000'
expiry = dt.datetime.fromisoformat(expiry_date)
item_3 = {
"item_name" : "Bread",
"quantity" : 2,
"ingredients" : "all-purpose flour",
"expiry_date" : expiry
}
collection_name.insert_one(item_3)

Imagine we want to change item_1:

Python
new_item_1 = {
"_id" : "U1IT00001",
"item_name" : "Blender",
"max_discount" : "50%",
"batch_number" : "RR450020FRG",
"price" : 450,
"category" : "kitchen appliance"
}

We can compare the old and new item 1 with diff and patch:

Python
tbd_endpoint = WOQLClient("http://localhost:6363/")# Find the item back from database 
                                                   # in case someone already changed it
item_1 = collection_name.find_one({"item_name" : "Blender"})
patch = tbd_endpoint.diff(item_1, new_item_1)pprint(patch.content)

Again, we can review before making the change at MongoDB:

Python
collection_name.update_one(patch.before, {"$set": patch.update})

This is another more complicated example:

Python
expiry_date = '2021-07-15T00:00:00.000'
expiry = dt.datetime.fromisoformat(expiry_date)
new_item_3 = {
"item_name" : "Bread",
"quantity" : 5,
"ingredients" : "all-purpose flour",
"expiry_date" : expiry
}item_3 = collection_name.find_one({"item_name" : "Bread"})
item_id = item_3.pop('_id') # We wnat to pop it out and optionally we can add it back
patch = tbd_endpoint.diff(item_3, new_item_3)pprint(patch.content)# Add _id back, though it 
                                                                  # still works without
before = patch.before
before['_id'] = item_idcollection_name.update_one(before, {"$set": patch.update})

See the full script here.

Using Diff and Patch with MongoDB JavaScript

Just like the last section, diff and patch can be used to compare documents and schemas to see what has been changed using the JavaScript client.

In this script, we will demonstrate it.

We created a function called patchMongo:

Python
const mongoPatch = function(patch){
    let query = {};
    let set = {};    if('object' === typeof patch){
        for(var key in patch){
            const entry = patch[key];            if( entry['@op'] == 'SwapValue'){
                query[key] = entry['@before'];
                set[key] = entry['@after'];
            }else if(key === '_id'){
                query[key] = ObjectId(entry);
            }else{
                let [sub_query,sub_set] = mongoPatch(entry);
                query[key] = sub_query;
                if(! sub_set === null){
                    set[key] = sub_set;
                }
            }
        }
        return [query,set]
    }else{
        return [patch,null]
    }
}

We created an object that we can put back to update the data in MongoDB:

Python
let patchPromise = client.getDiff(jane,janine,{});
patchPromise.then( patch => {
    let [q,s] = mongoPatch(patch)
    console.log([q,s]);    const res = db.inventory.updateOne(q, { $set : s});
    console.log(res);
    if (res.modifiedCount == 1){
        console.log("yay!")
    }else{
        console.log("boo!")
    }
    console.log(patch);
});

See the full script here.


We hope you found this tutorial useful. We’ve included some additional links below for further reading:

History

  • 2nd March, 2022: Initial version

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
United States United States
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions

 
-- There are no messages in this forum --