Click here to Skip to main content
15,867,308 members
Articles / General Programming / Performance

B-Tree Sorted Dictionary

Rate me:
Please Sign up or sign in to vote.
4.96/5 (48 votes)
26 Jan 2024MIT8 min read 179.6K   4.8K   140   56
In-memory B-Tree sorted dictionary

Introduction

B-Tree is a balanced m-way tree. This discussion from Wiki is a good material to introduce one to the characteristics and node layout of a B-Tree data structure: http://en.wikipedia.org/wiki/B-tree.

This article discusses an in-memory B-Tree implementation. Although B-Tree is typically used in file/database system indexing (see Open Source B-Tree On Disk @: https://github.com/SharedCode/sop), there is a significant value in implementing a B-Tree based collection or dictionary in the .NET framework or in any language/platform for that matter.

Advantages of B-Tree In-memory

Typical in-memory sorted dictionary data structures today are based on the Binary Tree algorithm, which is not to be confused with B-Tree. Each node of a Binary Tree can contain a single item, whereas a B-Tree can contain a user defined number of items per node, and its nodes are kept balanced. This is a very important differentiation. Being that each node has a single item, storing a large number of items in a Binary Tree will generate a tall and narrow node graph with numerous nodes.

In a corresponding B-Tree implementation, the graph will tend to be shorter and wider with a lot fewer nodes. This is because a node in a B-Tree is typically configured to have numerous items; e.g., a B-Tree dictionary with 12 items will require a single node to contain the items, and a Binary Tree will require 12 nodes which can be scattered around in the heap (memory). Increasing our item sampling to thousands, hundreds of thousands, or millions, we're talking about a significant differentiation and significant optimization that a corresponding B-Tree based sorted dictionary can provide. A million single item node of a Binary Tree vs. around eighty three thousand of a B-Tree, for a 12 item node setup, and even far less if the user specifies more items per node than mentioned.

With this characterization, it is easy to imagine that a Binary Tree will tend to use more memory, and will tend to generate more memory fragmentation than a respective dictionary utilizing the B-Tree algorithm. And in production environments, we can't afford to have a system that is prone to memory fragmentation as in time, the application will degrade in performance and can cause out of memory conditions due to lack of contiguous allocable space.

Implementation

In this article and associated code implementation, I've selected C# and the .NET Framework as the language and platform of implementation. Since B-Tree is a well established algorithm, I will focus my discussion on the high level design/implementation issues including the interface contract. The complete source code is available for download.

IBTreeDictionary: Extending the Generics IDictionary

Generics is a great feature for enforcing coding and compile time data type checks. Thus, it is a great feature to support it in this implementation of B-Tree based Sorted Dictionary (Note: non-Generics implementation is also supported). The Sorted Dictionary implements the Generics IDictionary interface in order to provide the capability mentioned.

In order to expose B-Tree, specific features such as duplicate Key is allowed and thus requiring an explicit Item Iterator to give fine grained control in item traversal, I am extending the IDictionary interface to add methods for the mentioned feature, giving birth to the IBTreeDictionary interface. Here is how the said interface looks like:

C#
public interface IBaseCollection<T> : ICloneable
{
  bool MoveNext();
  bool MovePrevious();
  bool MoveFirst();
  bool MoveLast();
  bool Search(T Value);
}

public interface IBTreeDictionary<TKey, TValue> : 
       System.Collections.Generic.IDictionary<TKey, 
       TValue>, IBaseCollection<TKey>
{
  Sop.Collections.BTree.SortOrderType SortOrder{ get;set; }
  BTreeItem<TKey, TValue> CurrentEntry{ get; }
  TKey CurrentKey{ get; }
  TValue CurrentValue{ get; set; }
  void Remove();
}

In order to consume items even if they have duplicate keys, the dictionary requires a feature to be able to set an item pointer to a desired position, then traverse a range while consuming the values of the items in the range. This is why the Movexxx methods and the CurrentKey/CurrentValue item pointers are required.

Also, B-Tree provides a feature to allow a user to traverse items either in ascending or descending sort order, thus the SortOrder attribute was provided.

Management and Search Functions

Following are selected B-Tree management and search functions to illustrate some behavior which characterizes and gives the user the idea of this flavor implementation of the very useful B-Tree algorithm:

Add Method

Adding an item to a B-Tree requires determination of the appropriate node where to add the item. This is required because items of the tree are maintained sorted as they get inserted or removed. Traversing the tree in order to find the node can be done recursively or via a looping construct such as a while loop. The latter is preferred than recursion as in this situation, it removes the requirement of stack usage, i.e., when done, there is no implicit stack (pop) operations as the innermost function unwinds/returns to the main caller function.

This block illustrates a while loop that is used to traverse the tree to find the target node. It uses BinarySearch for each node of the tree to further optimize the search per node. This is possible since each item in the node is sorted and thus, BinarySearch allows an optimal searching and minimal comparison of keys.

C#
TreeNode CurrentNode = this;
int Index = 0;
while (true)
{
    Index = 0;
    byte NoOfOccupiedSlots = CurrentNode.count;
    if (NoOfOccupiedSlots > 1)
    {
        if (ParentBTree.SlotsComparer != null)
            Index = Array.BinarySearch(CurrentNode.Slots, 0, 
                    NoOfOccupiedSlots, Item, ParentBTree.SlotsComparer);
        else
            Index = Array.BinarySearch(CurrentNode.Slots, 0, 
                                       NoOfOccupiedSlots, Item);
        if (Index < 0)
            Index = ~Index;
    }
    else if (NoOfOccupiedSlots == 1)
    {
        if (ParentBTree.Comparer != null)
        {
            if (ParentBTree.Comparer.Compare(CurrentNode.Slots[0].Key, Item.Key) < 0)
                Index = 1;
        }
        else
        {
            try
            {
                if (System.Collections.Generic.Comparer<TKey>.Default.Compare(
                                 CurrentNode.Slots[0].Key, Item.Key) < 0)
                    Index = 1;
            }
            catch (Exception)
            {
                if (CurrentNode.Slots[0].Key.ToString().CompareTo(Item.Key.ToString()) < 0)
                    Index = 1;
            }
        }
    }
    if (CurrentNode.Children != null)
    // if not an outermost node, let next lower level node do the 'Add'.
        CurrentNode = CurrentNode.Children[Index];
    else
        break;
}

At this point, the correct node to add the item to is found. Now, invoke the actual Add function:

C#
CurrentNode.Add(ParentBTree, Item, Index);
CurrentNode = null;

Here is the Add function that does the actual item insert to the found node. It maintains the items of the node to be sorted, and manages the breakup of the node when it is full, and other B-Tree maintenance actions required such as a broke up node's parent promotion, etc.:

C#
void Add(BTreeAlgorithm<TKey, TValue> ParentBTree,
                BTreeItem<TKey, TValue> Item, int Index)
{
    byte NoOfOccupiedSlots = count;
    // Add. check if node is not yet full..
    if (NoOfOccupiedSlots < ParentBTree.SlotLength)
    {
        ShiftSlots(Slots, (byte)Index, (byte)NoOfOccupiedSlots);
        Slots[Index] = Item;
        count++;
        return;
    }
    else
    {
        Slots.CopyTo(ParentBTree.TempSlots, 0);
        byte SlotsHalf = (byte)(ParentBTree.SlotLength >> 1);
        if (Parent != null)
        {
            bool bIsUnBalanced = false;
            int iIsThereVacantSlot = 0;
            if (IsThereVacantSlotInLeft(ParentBTree, ref bIsUnBalanced))
                iIsThereVacantSlot = 1;
            else if (IsThereVacantSlotInRight(ParentBTree, ref bIsUnBalanced))
                iIsThereVacantSlot = 2;
            if (iIsThereVacantSlot > 0)
            {
                // copy temp buffer contents to the actual slots.
                byte b = (byte)(iIsThereVacantSlot == 1 ? 0 : 1);
                CopyArrayElements(ParentBTree.TempSlots, b, 
                                  Slots, 0, ParentBTree.SlotLength);
                if (iIsThereVacantSlot == 1)
                    // Vacant in left, "skud over"
                    // the leftmost node's item to parent and left.
                    DistributeToLeft(ParentBTree, 
                       ParentBTree.TempSlots[ParentBTree.SlotLength]);
                else if (iIsThereVacantSlot == 2)
                    // Vacant in right, move the rightmost
                    // node item into the vacant slot in right.
                    DistributeToRight(ParentBTree, ParentBTree.TempSlots[0]);
                return;
            }
            else if (bIsUnBalanced)
            { // if this branch is unbalanced..
                // _BreakNode
                // Description :
                // -copy the left half of the slots
                // -copy the right half of the slots
                // -zero out the current slot.
                // -copy the middle slot
                // -allocate memory for children node *s
                // -assign the new children nodes.
                TreeNode LeftNode;
                TreeNode RightNode;
                try
                {
                    RightNode = ParentBTree.GetRecycleNode(this);
                    LeftNode = ParentBTree.GetRecycleNode(this);
                    CopyArrayElements(ParentBTree.TempSlots, 0, 
                                      LeftNode.Slots, 0, SlotsHalf);
                    LeftNode.count = SlotsHalf;
                    CopyArrayElements(ParentBTree.TempSlots, 
                       (ushort)(SlotsHalf + 1), RightNode.Slots, 0, SlotsHalf);
                    RightNode.count = SlotsHalf;
                    ResetArray(Slots, null);
                    count = 1;
                    Slots[0] = ParentBTree.TempSlots[SlotsHalf];
                    Children = new TreeNode[ParentBTree.SlotLength + 1];
                    ResetArray(Children, null);
                    Children[(int)Sop.Collections.BTree.ChildNodes.LeftChild] = LeftNode;
                    Children[(int)Sop.Collections.BTree.ChildNodes.RightChild] = RightNode;
                    //SUCCESSFUL!
                    return;
                }
                catch (Exception)
                {
                    Children = null;
                    LeftNode = null;
                    RightNode = null;
                    throw;
                }
            }
            else
            {
                // All slots are occupied in this and other siblings' nodes..
                TreeNode RightNode;
                try
                {
                    // prepare this and the right node sibling
                    // and promote the temporary parent node(pTempSlot).
                    RightNode = ParentBTree.GetRecycleNode(Parent);
                    // zero out the current slot.
                    ResetArray(Slots, null);
                    // copy the left half of the slots to left sibling
                    CopyArrayElements(ParentBTree.TempSlots, 0, Slots, 0, SlotsHalf);
                    count = SlotsHalf;
                    // copy the right half of the slots to right sibling
                    CopyArrayElements(ParentBTree.TempSlots, 
                       (ushort)(SlotsHalf + 1), RightNode.Slots, 0, SlotsHalf);
                    RightNode.count = SlotsHalf;
                    // copy the middle slot to temp parent slot.
                    ParentBTree.TempParent = ParentBTree.TempSlots[SlotsHalf];
                    // assign the new children nodes.
                    ParentBTree.TempParentChildren[
                      (int)Sop.Collections.BTree.ChildNodes.LeftChild] = this;
                    ParentBTree.TempParentChildren[
                      (int)Sop.Collections.BTree.ChildNodes.RightChild] = RightNode;
                    ParentBTree.PromoteParent = (TreeNode)Parent;
                    ParentBTree.PromoteIndexOfNode = GetIndexOfNode(ParentBTree);
                    //TreeNode o = (TreeNode)Parent;
                    //o.Promote(ParentBTree, GetIndexOfNode(ParentBTree));
                    //SUCCESSFUL!
                    return;
                }
                catch (Exception)
                {
                    RightNode = null;
                    throw;
                }
            }
        }
        else
        {
            // _BreakNode
            // Description :
            // -copy the left half of the temp slots
            // -copy the right half of the temp slots
            // -zero out the current slot.
            // -copy the middle of temp slot to 1st elem of current slot
            // -allocate memory for children node *s
            // -assign the new children nodes.
            TreeNode LeftNode;
            TreeNode RightNode;
            try
            {
                RightNode = ParentBTree.GetRecycleNode(this);
                LeftNode = ParentBTree.GetRecycleNode(this);
                CopyArrayElements(ParentBTree.TempSlots, 0, 
                                  LeftNode.Slots, 0, SlotsHalf);
                LeftNode.count = SlotsHalf;
                CopyArrayElements(ParentBTree.TempSlots, 
                   (ushort)(SlotsHalf + 1), RightNode.Slots, 0, SlotsHalf);
                RightNode.count = SlotsHalf;
                ResetArray(Slots, null);
                Slots[0] = ParentBTree.TempSlots[SlotsHalf];
                count = 1;
                Children = new TreeNode[ParentBTree.SlotLength + 1];
                Children[(int)Sop.Collections.BTree.ChildNodes.LeftChild] = LeftNode;
                Children[(int)Sop.Collections.BTree.ChildNodes.RightChild] = RightNode;
                return; // successful
            }
            catch (Exception)
            {
                LeftNode = null;
                RightNode = null;
                RightNode = null;
                throw;
            }
        }
    }
}

Remove Method

There are two versions of Remove in a B-Tree; one that will remove an item given a Key, and the other that will remove the currently selected item of the tree. The former will cause the B-Tree to search for the item, given a key, and position the item pointer to the item with the matching key in the tree. The same node determination listed in the Add block is used to find the node and item to delete. After the item pointer is positioned at the right one, the item is actually deleted from the tree. Here is the code that illustrates item deletion.

When the item for deletion is not in the outermost node, the tree moves the next item from the outermost node to overwrite the currently selected item's slot, causing removal of the said item from the tree.

C#
internal protected bool Remove(BTreeAlgorithm<TKey, TValue> ParentBTree)
{
    if (Children != null)
    {
        byte byIndex = ParentBTree.CurrentItem.NodeItemIndex;
        MoveNext(ParentBTree);
        Slots[byIndex] = 
          ParentBTree.CurrentItem.Node.Slots[ParentBTree.CurrentItem.NodeItemIndex];
    }
    return true;
}

Then it takes care of managing the outermost node either to set to null the slot vacated by the moved item, or to maintain the balance of the tree by pulling an item from either the left or right sibling node.

C#
internal void FixTheVacatedSlot(BTreeAlgorithm<TKey, TValue> ParentBTree)
{
    sbyte c;
    c = (sbyte)count;
    if (c > 1) // if there are more than 1 items in slot then..
    { //***** We don't fix the children since there are no children at this scenario.
        if (ParentBTree.CurrentItem.NodeItemIndex < c - 1)
            MoveArrayElements(Slots,
                (ushort)(ParentBTree.CurrentItem.NodeItemIndex + 1),
                ParentBTree.CurrentItem.NodeItemIndex,
                (ushort)(c - 1 - ParentBTree.CurrentItem.NodeItemIndex));
        count--;
        Slots[count] = null; // nullify the last slot.
    }
    else
    { // only 1 item in slot
        if (Parent != null)
        {
            byte ucIndex;
            // if there is a pullable item from sibling nodes.
            if (SearchForPullableItem(ParentBTree, out ucIndex))
            {
                if (ucIndex < GetIndexOfNode(ParentBTree))
                    PullFromLeft(ParentBTree); // pull an item from left
                else
                    PullFromRight(ParentBTree); // pull an item from right
            }
            else
            { // Parent has only 2 children nodes..
                if (Parent.Children[0] == this)
                { // this is left node
                    TreeNode RightSibling = GetRightSibling(ParentBTree);
                    Parent.Slots[1] = RightSibling.Slots[0];
                    Parent.count = 2;
                    ParentBTree.AddRecycleNode(RightSibling);
                    RightSibling = null;
                }
                else
                { // this is right node
                    Parent.Slots[1] = Parent.Slots[0];
                    TreeNode LeftSibling = GetLeftSibling(ParentBTree);
                    Parent.Slots[0] = LeftSibling.Slots[0];
                    Parent.count = 2;
                    ParentBTree.AddRecycleNode(LeftSibling);
                    LeftSibling = null;
                }
                // nullify Parent's children will cause this
                // tree node instance to be garbage collected
                // as this is child of parent!
                Parent.Children[0] = null;
                Parent.Children[1] = null;
                Parent.Children = null;
                Clear();
            }
        }
        else
        { // only 1 item in root node !
            Slots[0] = null; // just nullIFY the slot.
            count = 0;
            ParentBTree.SetCurrentItemAddress(null, 0);
            // Point the current item pointer to end of tree
        }
    }
}

Search Method

The Search method contains code that has the same logic as the first portion of the Add method listed above. It traverses the tree from the root down the children nodes, until it finds the target node and item using an iterative approach (while loop). But instead of adding an item, when it is found, Search merely positions the current item pointer to the found node and item in the node.

Having an item pointer combined with the Search and Movexxx methods is a powerful feature. E.g., doing a range query is very easy in this combination of methods. Position the item pointer to the (start) item you'd like to query, then use either the MoveNext or MovePrevious method to sequentially traverse the nearby items, until you'd read all the items in range that you are interested about.

Using the Code

Download the source code zip file, uncompress it to your desired target folder, and open the B-Tree.sln solution inside VS 2008. Browse through the project SampleUsage source code in order to see how to use the BTreeSortedDictionary, run it, experiment to use other methods available, and reuse your experience using the Generic SortedDictionary. Enjoy!

Comparison Results

1 Million Inserts on .Net SortedDictionary: 12 sec 308 MB peak 286 MB end
1 Million Inserts on BTreeDictionary:       9 sec 300 MB peak 277 MB end
1 Million Search on .Net SortedDictionary:  7 sec 309 MB peak 286 MB end
1 Million Search on BTreeDictionary:        7 sec 309 MB peak 277 MB end

Conclusion: This B-Tree sorted dictionary outperforms the .NET Framework's SortedDictionary in both memory utilization and speed of operations. Inserting 1 million items, this BTree outperforms .NET's by 3 seconds, and utilized ~9MB of less memory.

In searching, both are of the same speed, but then again, in memory utilization, BTree dictionary outperforms .NET's SortedDictionary by using 9MB of less memory. Multiplying the load 10X and using both in production environment, that is the true test, and I anticipate you will be very satisfied with this BTree dictionary due to architectural reasons I discussed at the top of this article.

Here is the stress program used:

C#
static void Main(string[] args)
{
    SortedDictionary<string, string> l1 = 
              new SortedDictionary<string, string>();
    StressAdd(l1, 1000000);
    StressSearch(l1, 1000000);
    //** uncomment to run the BTreeDictionary test
    //   and comment out the above block to turn off SortedDictionary test
    //BTreeDictionary<string, string> l2 = 
    //            new BTreeDictionary<string, string>();
    //StressAdd(l2, 1000000);
    //StressSearch(l2, 1000000);
    Console.ReadLine();
    return;
}

static void StressAdd(IDictionary<string, string> l, int Max)
{
    Console.WriteLine("Add: {0}", DateTime.Now);
    for (int i = 0; i < Max; i++)
    {
        l.Add(string.Format("{0} key kds vidsbnvibsnv siuv " + 
              "siuvsiubvisubviusvifsudjcnjdnc", i),
              "value kds vidsbnvibsnv siuv " + 
              "siuvsiubvisubviusvifsudjcnjdnccscs");
    }
    Console.WriteLine("Add: {0}", DateTime.Now);
}

static void StressSearch(IDictionary<string, string> l, int Max)
{
    Console.WriteLine("Search: {0}", DateTime.Now);
    for (int i = 0; i < Max; i++)
    {
        string v = l[string.Format("{0} key kds vidsbnvibsnv siuv " + 
                     "siuvsiubvisubviusvifsudjcnjdnc", i)];
    }
    Console.WriteLine("Search: {0}", DateTime.Now);
}

Points of Interest

I really learnt a lot and enjoyed authoring this B-Tree implementation, which I did the C++ version back in the mid-90s; in fact, I enjoyed it so much, it propelled me to work on the on-disk version that is available now as Open Sourced software in GitHub (https://github.com/SharedCode/sop). Pls. feel free to join, participate and use the new "on disk" version.

Hello, "sop" project had been relocated to GitHub: https://github.com/SharedCode/sop
And, current version V2 that accomplished all of "on disk" features advertised, was implemented in Golang. The B-Tree also got optimized, where on "deletes", node load balancing was turned off. It turned out to be a very cool, ACID transactions, Objects persistence enabler/adaptor on top of Cassandra & Redis. Check it out and join the project, it awaits your "creative talents"!

History

  • 23rd July, 2010 - Initial submission
  • 24th July, 2010 - Added performance comparisons with .NET Framework's SortedDictionary
  • 14th August, 2012 - Modified Points of Interest section to mention just Open Sourced B-Tree Gold (v4.5) product
  • 25th January, 2024 - Added link to Golang port: https://github.com/SharedCode/sop

License

This article, along with any associated source code and files, is licensed under The MIT License


Written By
United States United States
Ex Google, ex Microsoft
I'm just fresh off from a software eng'g gig & is looking forward to my next.
Pls. do drop me a line @gerardorecinto@yahoo.com if interested or having any question/feedback on these Open Source projects.

I'm excited to help/volunteer my services.
Have a great day!

Comments and Discussions

 
QuestionWhy not in C? Pin
Sanmayce24-May-15 5:48
Sanmayce24-May-15 5:48 
AnswerRe: Why not in C? Pin
Gerardo Recinto26-May-15 11:42
Gerardo Recinto26-May-15 11:42 
GeneralRe: Why not in C? Pin
Sanmayce26-May-15 13:17
Sanmayce26-May-15 13:17 
GeneralRe: Why not in C? Pin
Gerardo Recinto13-Oct-15 5:27
Gerardo Recinto13-Oct-15 5:27 
GeneralRe: Why not in C? Pin
Sanmayce14-Oct-15 8:02
Sanmayce14-Oct-15 8:02 
GeneralRe: Why not in C? Pin
Gerardo Recinto14-Oct-15 13:58
Gerardo Recinto14-Oct-15 13:58 
GeneralRe: Why not in C? Pin
Sanmayce15-Oct-15 7:36
Sanmayce15-Oct-15 7:36 
GeneralRe: Why not in C? Pin
Gerardo Recinto15-Oct-15 8:12
Gerardo Recinto15-Oct-15 8:12 
GeneralRe: Why not in C? Pin
Gerardo Recinto16-Oct-15 20:56
Gerardo Recinto16-Oct-15 20:56 
QuestionDuplicates Looping and Deleting Pin
runfastman27-Feb-14 10:47
runfastman27-Feb-14 10:47 
AnswerRe: Duplicates Looping and Deleting Pin
Gerardo Recinto28-Feb-14 8:42
Gerardo Recinto28-Feb-14 8:42 
QuestionWrite to file Pin
Shargon_8516-Jul-13 2:58
Shargon_8516-Jul-13 2:58 
How to write the index in file?
AnswerRe: Write to file Pin
Gerardo Recinto25-Feb-14 21:21
Gerardo Recinto25-Feb-14 21:21 
QuestionIntroduction paragraph correction Pin
Yuriy Loginov20-May-13 10:11
professionalYuriy Loginov20-May-13 10:11 
AnswerRe: Introduction paragraph correction Pin
Gerardo Recinto20-May-13 18:50
Gerardo Recinto20-May-13 18:50 
SuggestionRe: Introduction paragraph correction Pin
Yuriy Loginov21-May-13 2:35
professionalYuriy Loginov21-May-13 2:35 
GeneralRe: Introduction paragraph correction Pin
Gerardo Recinto28-Feb-14 11:28
Gerardo Recinto28-Feb-14 11:28 
QuestionGood learning material Pin
enjaychang17-Jan-13 2:05
enjaychang17-Jan-13 2:05 
AnswerRe: Good learning material Pin
Gerardo Recinto18-Feb-13 12:23
Gerardo Recinto18-Feb-13 12:23 
GeneralRe: Good learning material Pin
enjaychang6-Mar-13 3:53
enjaychang6-Mar-13 3:53 
QuestionMemory usage Pin
Lcng9-Oct-12 4:52
Lcng9-Oct-12 4:52 
AnswerRe: Memory usage Pin
Gerardo Recinto9-Oct-12 20:12
Gerardo Recinto9-Oct-12 20:12 
QuestionB tree inserion of records in C language Pin
Member 94559374-Oct-12 3:13
Member 94559374-Oct-12 3:13 
AnswerRe: B tree inserion of records in C language Pin
Gerardo Recinto5-Oct-12 17:28
Gerardo Recinto5-Oct-12 17:28 
GeneralMy vote of 5 Pin
Christian Amado28-Aug-12 12:53
professionalChristian Amado28-Aug-12 12:53 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.