Click here to Skip to main content
15,867,330 members
Articles / Database Development / NoSQL

Storing Tree like Hierarchy Structures With MongoDB

Rate me:
Please Sign up or sign in to vote.
5.00/5 (11 votes)
16 Jan 2013CC (ASA 3U)6 min read 72K   809   24   1
Three approaches to store tree like structures with NoSQL databases in MongoDB
This article demonstrates three approaches for storing tree like structures with NoSQL databases with example of MongoDB.

Introduction

In real life, almost any project deals with the tree structures. Different kinds of taxonomies, site structures, etc. require modeling of hierarchy relations. In this article, I will illustrate this using the first three of five typical approaches of operating with hierarchy data with an example of the MongoDB database. Those approaches are:

  • Model Tree Structures with Child References
  • Model Tree Structures with Parent References
  • Model Tree Structures with an Array of Ancestors
  • Model Tree Structures with Materialized Paths
  • Model Tree Structures with Nested Sets

Note: This article is inspired by another article 'Model Tree Structures in MongoDB' by 10gen, but does not copy it. It provides additional examples on typical operations with tree management. Please refer to 10gen's article to get a more solid understanding of the approach.

Background

As a demo dataset, I use some fake eshop goods taxonomy.

Image 1

Challenges to Address

In a typical site scenario, we should be able to:

  • operate with tree (insert new node under specific parent, update/remove existing node, move node across the tree)
  • get path to node (for example, in order to be build the breadcrumb section)
  • get all node descendants (in order to be able, for example, to select goods from more general category, like 'Cell Phones and Accessories' which should include goods from all subcategories.

In each of the examples below we:

  • add new node called 'LG' under electronics
  • move 'LG' node under Cell_Phones_And_Smartphones node
  • remove 'LG' node from the tree
  • get child nodes of Electronics node
  • get path to 'Nokia' node
  • get all descendants of the 'Cell_Phones_and_Accessories' node

Please refer to the image above for a visual representation.

Tree Structure With Parent Reference

This is the most commonly used approach. For each node, we store (ID, ParentReference, Order).

Image 2

Operating With Tree

Pretty simple, but changing the position of the node within siblings will require additional calculations. You might want to set high numbers like item position * 10^6 for order in order to be able to set new node order as trunc (lower sibling order - higher sibling order)/2 - this will give you enough operations, until you will need to traverse the whole tree and set the order defaults to big numbers again.

Adding New Node

Good points: requires only one insert operation to introduce the node.

JavaScript
var existingelemscount = db.categoriesPCO.find({parent:'Electronics'}).count();
var neworder = (existingelemscount+1)*10;
db.categoriesPCO.insert({_id:'LG', parent:'Electronics', 
                         someadditionalattr:'test', order:neworder})
//{ "_id" : "LG", "parent" : "Electronics", 
//      "someadditionalattr" : "test", "order" : 40 } 

Updating/Moving the Node

Good points: as during insert - requires only one update operation to amend the node:

JavaScript
existingelemscount = db.categoriesPCO.find({parent:'Cell_Phones_and_Smartphones'}).count();
neworder = (existingelemscount+1)*10;
db.categoriesPCO.update({_id:'LG'},
{$set:{parent:'Cell_Phones_and_Smartphones', order:neworder}});
//{ "_id" : "LG", "order" : 60, "parent" : 
//          "Cell_Phones_and_Smartphones", "someadditionalattr" : "test" }  

Node Removal

Good points: requires single operation to remove the node from tree

JavaScript
db.categoriesPCO.remove({_id:'LG'});  

Getting Node Children, Ordered

Good points: all childs can be retrieved from database and ordered using single call.

JavaScript
db.categoriesPCO.find({$query:{parent:'Electronics'}, $orderby:{order:1}})
//{ "_id" : "Cameras_and_Photography", "parent" : "Electronics", "order" : 10 }
//{ "_id" : "Shop_Top_Products", "parent" : "Electronics", "order" : 20 }
//{ "_id" : "Cell_Phones_and_Accessories", "parent" : "Electronics", "order" : 30 }

Getting All Node Descendants

Bad points: unfortunately, requires recursive calls to database.

JavaScript
var descendants=[]
var stack=[];
var item = db.categoriesPCO.findOne({_id:"Cell_Phones_and_Accessories"});
stack.push(item);
while (stack.length>0){
    var currentnode = stack.pop();
    var children = db.categoriesPCO.find({parent:currentnode._id});
    while(true === children.hasNext()) {
        var child = children.next();
        descendants.push(child._id);
        stack.push(child);
    }
}

descendants.join(",")
//Cell_Phones_and_Smartphones,Headsets,Batteries,Cables_And_Adapters,
//Nokia,Samsung,Apple,HTC,Vyacheslav

Getting Path to Node

Bad points: unfortunately also require recursive operations to get the path.

JavaScript
var path=[]
var item = db.categoriesPCO.findOne({_id:"Nokia"})
while (item.parent !== null) {
    item=db.categoriesPCO.findOne({_id:item.parent});
    path.push(item._id);
}

path.reverse().join(' / ');
//Electronics / Cell_Phones_and_Accessories / Cell_Phones_and_Smartphones 

Indexes

Recommended index is on fields parent and order.

JavaScript
db.categoriesPCO.ensureIndex( { parent: 1, order:1 } )  

Tree Structure with Childs Reference

For each node, we store (ID, ChildReferences).

Image 3

Please note that in this case, we do not need order field, because Childs collection already provides this information. Most of the languages respect the array order. If this is not in case for your language, you might consider additional coding to preserve order, however this will make things more complicated.

Adding New Node

Note: Requires one insert operation and one update operation to insert the node.

JavaScript
db.categoriesCRO.insert({_id:'LG', childs:[]});
db.categoriesCRO.update({_id:'Electronics'},{  $addToSet:{childs:'LG'}});
//{ "_id" : "Electronics", "childs" : 
//[     "Cameras_and_Photography", "Shop_Top_Products", "Cell_Phones_and_Accessories", "LG" ] } 

Updating/Moving the Node

Requires single update operation to change node order within same parent, requires two update operations, if node is moved under another parent.

Rearranging order under the same parent:

JavaScript
db.categoriesCRO.update({_id:'Electronics'},
{$set:{"childs.1":'LG',"childs.3":'Shop_Top_Products'}});
//{ "_id" : "Electronics", "childs" : 
//[     "Cameras_and_Photography",     "LG",     
//"Cell_Phones_and_Accessories",     "Shop_Top_Products" ] } 

moving the node:

JavaScript
db.categoriesCRO.update({_id:'Cell_Phones_and_Smartphones'},{  $addToSet:{childs:'LG'}});
db.categoriesCRO.update({_id:'Electronics'},{$pull:{childs:'LG'}});
//{ "_id" : "Cell_Phones_and_Smartphones", 
//"childs" : [ "Nokia", "Samsung", "Apple", "HTC", "Vyacheslav", "LG" ] } 

Node Removal

Node removal also requires two operations: one update and one remove.

JavaScript
db.categoriesCRO.update({_id:'Cell_Phones_and_Smartphones'},{$pull:{childs:'LG'}})
db.categoriesCRO.remove({_id:'LG'}); 

Getting Node Children, Ordered

Bad points: requires additional client side sorting by parent array sequence. Depending on result set, it may affect speed of your code.

JavaScript
var parent = db.categoriesCRO.findOne({_id:'Electronics'})
db.categoriesCRO.find({_id:{$in:parent.childs}}) 

Result set:

JavaScript
{ "_id" : "Cameras_and_Photography", "childs" : [     "Digital_Cameras",     
"Camcorders",     "Lenses_and_Filters",     "Tripods_and_supports",     
"Lighting_and_studio" ] }
{ "_id" : "Cell_Phones_and_Accessories", "childs" : [     "Cell_Phones_and_Smartphones",
"Headsets",     "Batteries",     "Cables_And_Adapters" ] }
{ "_id" : "Shop_Top_Products", "childs" : [ "IPad", "IPhone", "IPod", "Blackberry" ] }

//parent:
{
    "_id" : "Electronics",
    "childs" : [
        "Cameras_and_Photography",
        "Cell_Phones_and_Accessories",
        "Shop_Top_Products"
    ]
} 

As you see, we have ordered array childs, which can be used to sort the result set on a client.

Getting All Node Descendants

Note: Also recursive operations, but we need less selects to databases comparing to the previous approach.

JavaScript
var descendants=[]
var stack=[];
var item = db.categoriesCRO.findOne({_id:"Cell_Phones_and_Accessories"});
stack.push(item);
while (stack.length>0){
    var currentnode = stack.pop();
    var children = db.categoriesCRO.find({_id:{$in:currentnode.childs}});

    while(true === children.hasNext()) {
        var child = children.next();
        descendants.push(child._id);
        if(child.childs.length>0){
          stack.push(child);
        }
    }
}

//Batteries,Cables_And_Adapters,Cell_Phones_and_Smartphones,Headsets,Apple,HTC,Nokia,Samsung
descendants.join(",") 

Getting Path to Node

Path is calculated recursively, so we need to issue number of sequential calls to database.

JavaScript
var path=[]
var item = db.categoriesCRO.findOne({_id:"Nokia"})
while ((item=db.categoriesCRO.findOne({childs:item._id}))) {
    path.push(item._id);
}

path.reverse().join(' / ');
//Electronics / Cell_Phones_and_Accessories / Cell_Phones_and_Smartphones 

Indexes

Recommended index is putting index on childs:

JavaScript
db.categoriesCRO.ensureIndex( { childs: 1 } ) 

Tree Structure Using an Array of Ancestors

For each node, we store (ID, ParentReference, AncestorReferences):

Image 4

Adding New Node

You need one insert operation to introduce new node, however you need to invoke select to prepare the data for insert:

JavaScript
var ancestorpath = db.categoriesAAO.findOne({_id:'Electronics'}).ancestors;
ancestorpath.push('Electronics')
db.categoriesAAO.insert({_id:'LG', parent:'Electronics',ancestors:ancestorpath});
//{ "_id" : "LG", "parent" : "Electronics", "ancestors" : [ "Electronics" ] } 

Updating/Moving the Node

Moving the node requires one select and one update operation:

JavaScript
ancestorpath = db.categoriesAAO.findOne({_id:'Cell_Phones_and_Smartphones'}).ancestors;
ancestorpath.push('Cell_Phones_and_Smartphones')
db.categoriesAAO.update({_id:'LG'},
{$set:{parent:'Cell_Phones_and_Smartphones', ancestors:ancestorpath}});
//{ "_id" : "LG", "ancestors" : [     "Electronics",     "Cell_Phones_and_Accessories",
//     "Cell_Phones_and_Smartphones" ], "parent" : "Cell_Phones_and_Smartphones" } 

Node Removal

Node removal is done with single operation:

JavaScript
db.categoriesAAO.remove({_id:'LG'}); 

Getting Node Children, Unordered

Note unless you introduce the order field, it is impossible to get ordered list of node children. You should consider another approach if you need order.

JavaScript
db.categoriesAAO.find({$query:{parent:'Electronics'}}) 

Getting All Node Descendants

There are two options to get all node descendants. One is classic through recursion:

JavaScript
var ancestors = db.categoriesAAO.find({ancestors:"Cell_Phones_and_Accessories"},{_id:1});
while(true === ancestors.hasNext()) {
       var elem = ancestors.next();
       descendants.push(elem._id);
   }
descendants.join(",")
//Cell_Phones_and_Smartphones,Headsets,Batteries,Cables_And_Adapters,
//Nokia,Samsung,Apple,HTC,Vyacheslav

The second is using aggregation framework introduced in MongoDB 2.2:

JavaScript
 var aggrancestors = db.categoriesAAO.aggregate([
    {$match:{ancestors:"Cell_Phones_and_Accessories"}},
    {$project:{_id:1}},
    {$group:{_id:{},ancestors:{$addToSet:"$_id"}}}
])

descendants = aggrancestors.result[0].ancestors
descendants.join(",")
//Vyacheslav,HTC,Samsung,Cables_And_Adapters,Batteries,Headsets,
//Apple,Nokia,Cell_Phones_and_Smartphones 

Getting Path to Node

This operation is done with single call to database, which is the advantage of this approach.

JavaScript
var path=[]
var item = db.categoriesAAO.findOne({_id:"Nokia"})
item
path=item.ancestors;
path.join(' / ');
//Electronics / Cell_Phones_and_Accessories / Cell_Phones_and_Smartphones 

Indexes

Recommended index is putting index on ancestors:

JavaScript
db.categoriesAAO.ensureIndex( { ancestors: 1 } ) 

Code in Action

Code can be downloaded from repository https://github.com/Voronenko/Storing_TreeView_Structures_WithMongoDB

All files are packaged according to the following naming convention:

  • MODELReference.js - initialization file with tree data for MODEL approach
  • MODELReference_operating.js - add/update/move/remove/get children examples
  • MODELReference_pathtonode.js - code illustrating how to obtain path to node
  • MODELReference_nodedescendants.js - code illustrating how to retrieve all the descendants of the node

All files are ready to use in mongo shell. You can run examples by invoking mongo < file_to_execute, or, if you want, interactively in the shell or with RockMongo web shell.

Points of Interest

Please note, that MongoDB does not provide ACID transactions. This means, that for update operations, split into separate update commands, your application should implement additional code to support your code specific transactions.

Formal advise from 10gen is following:

  • The Parent Reference pattern provides a simple solution to tree storage, but requires multiple queries to retrieve subtrees.
  • The Child References pattern provides a suitable solution to tree storage as long as no operations on subtrees are necessary. This pattern may also provide a suitable solution for storing graphs where a node may have multiple parents.
  • The Array of Ancestors pattern - no specific advantages unless you constantly need to get path to the node

You are free to mix patterns (by introducing order field, etc.) to match the data operations required to your application.

I have prepared the next set of examples for approaches on storing tree structures in MongoDB.

You might want to check this article, describing nested sets, Materialized path and combined approach.

History

  • 4th January, 2013: Initial version

License

This article, along with any associated source code and files, is licensed under The Creative Commons Attribution-Share Alike 3.0 Unported License


Written By
Web Developer
Ukraine Ukraine
Web Developer, interested in bleeding age web technologies and projects.

Experienced and interested in:
- High load web projects, bespoke software development
- DevOps: Chef, Ansible, Vagrant
- NoSQL (mongodb)
- Client stack (javascript core, jquery, AngularJS, HTML5 apis)
- *AAS (Amazon beanstalk, Redhat openshift)
- MEAN & Pure JS stack (Javascript, AngularJS, Node.JS, MongoDB)


-> DevOps inquiries
-> Other inquiries
-> Follow me on Github

Comments and Discussions

 
QuestionImplementation for managing a family tree Pin
victorbos27-Dec-20 11:26
victorbos27-Dec-20 11:26 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.