Click here to Skip to main content
15,867,594 members
Articles / General Programming / Performance

B-Tree Sorted Dictionary

Rate me:
Please Sign up or sign in to vote.
4.96/5 (48 votes)
26 Jan 2024MIT8 min read 179.6K   4.8K   140   56
In-memory B-Tree sorted dictionary

Introduction

B-Tree is a balanced m-way tree. This discussion from Wiki is a good material to introduce one to the characteristics and node layout of a B-Tree data structure: http://en.wikipedia.org/wiki/B-tree.

This article discusses an in-memory B-Tree implementation. Although B-Tree is typically used in file/database system indexing (see Open Source B-Tree On Disk @: https://github.com/SharedCode/sop), there is a significant value in implementing a B-Tree based collection or dictionary in the .NET framework or in any language/platform for that matter.

Advantages of B-Tree In-memory

Typical in-memory sorted dictionary data structures today are based on the Binary Tree algorithm, which is not to be confused with B-Tree. Each node of a Binary Tree can contain a single item, whereas a B-Tree can contain a user defined number of items per node, and its nodes are kept balanced. This is a very important differentiation. Being that each node has a single item, storing a large number of items in a Binary Tree will generate a tall and narrow node graph with numerous nodes.

In a corresponding B-Tree implementation, the graph will tend to be shorter and wider with a lot fewer nodes. This is because a node in a B-Tree is typically configured to have numerous items; e.g., a B-Tree dictionary with 12 items will require a single node to contain the items, and a Binary Tree will require 12 nodes which can be scattered around in the heap (memory). Increasing our item sampling to thousands, hundreds of thousands, or millions, we're talking about a significant differentiation and significant optimization that a corresponding B-Tree based sorted dictionary can provide. A million single item node of a Binary Tree vs. around eighty three thousand of a B-Tree, for a 12 item node setup, and even far less if the user specifies more items per node than mentioned.

With this characterization, it is easy to imagine that a Binary Tree will tend to use more memory, and will tend to generate more memory fragmentation than a respective dictionary utilizing the B-Tree algorithm. And in production environments, we can't afford to have a system that is prone to memory fragmentation as in time, the application will degrade in performance and can cause out of memory conditions due to lack of contiguous allocable space.

Implementation

In this article and associated code implementation, I've selected C# and the .NET Framework as the language and platform of implementation. Since B-Tree is a well established algorithm, I will focus my discussion on the high level design/implementation issues including the interface contract. The complete source code is available for download.

IBTreeDictionary: Extending the Generics IDictionary

Generics is a great feature for enforcing coding and compile time data type checks. Thus, it is a great feature to support it in this implementation of B-Tree based Sorted Dictionary (Note: non-Generics implementation is also supported). The Sorted Dictionary implements the Generics IDictionary interface in order to provide the capability mentioned.

In order to expose B-Tree, specific features such as duplicate Key is allowed and thus requiring an explicit Item Iterator to give fine grained control in item traversal, I am extending the IDictionary interface to add methods for the mentioned feature, giving birth to the IBTreeDictionary interface. Here is how the said interface looks like:

C#
public interface IBaseCollection<T> : ICloneable
{
  bool MoveNext();
  bool MovePrevious();
  bool MoveFirst();
  bool MoveLast();
  bool Search(T Value);
}

public interface IBTreeDictionary<TKey, TValue> : 
       System.Collections.Generic.IDictionary<TKey, 
       TValue>, IBaseCollection<TKey>
{
  Sop.Collections.BTree.SortOrderType SortOrder{ get;set; }
  BTreeItem<TKey, TValue> CurrentEntry{ get; }
  TKey CurrentKey{ get; }
  TValue CurrentValue{ get; set; }
  void Remove();
}

In order to consume items even if they have duplicate keys, the dictionary requires a feature to be able to set an item pointer to a desired position, then traverse a range while consuming the values of the items in the range. This is why the Movexxx methods and the CurrentKey/CurrentValue item pointers are required.

Also, B-Tree provides a feature to allow a user to traverse items either in ascending or descending sort order, thus the SortOrder attribute was provided.

Management and Search Functions

Following are selected B-Tree management and search functions to illustrate some behavior which characterizes and gives the user the idea of this flavor implementation of the very useful B-Tree algorithm:

Add Method

Adding an item to a B-Tree requires determination of the appropriate node where to add the item. This is required because items of the tree are maintained sorted as they get inserted or removed. Traversing the tree in order to find the node can be done recursively or via a looping construct such as a while loop. The latter is preferred than recursion as in this situation, it removes the requirement of stack usage, i.e., when done, there is no implicit stack (pop) operations as the innermost function unwinds/returns to the main caller function.

This block illustrates a while loop that is used to traverse the tree to find the target node. It uses BinarySearch for each node of the tree to further optimize the search per node. This is possible since each item in the node is sorted and thus, BinarySearch allows an optimal searching and minimal comparison of keys.

C#
TreeNode CurrentNode = this;
int Index = 0;
while (true)
{
    Index = 0;
    byte NoOfOccupiedSlots = CurrentNode.count;
    if (NoOfOccupiedSlots > 1)
    {
        if (ParentBTree.SlotsComparer != null)
            Index = Array.BinarySearch(CurrentNode.Slots, 0, 
                    NoOfOccupiedSlots, Item, ParentBTree.SlotsComparer);
        else
            Index = Array.BinarySearch(CurrentNode.Slots, 0, 
                                       NoOfOccupiedSlots, Item);
        if (Index < 0)
            Index = ~Index;
    }
    else if (NoOfOccupiedSlots == 1)
    {
        if (ParentBTree.Comparer != null)
        {
            if (ParentBTree.Comparer.Compare(CurrentNode.Slots[0].Key, Item.Key) < 0)
                Index = 1;
        }
        else
        {
            try
            {
                if (System.Collections.Generic.Comparer<TKey>.Default.Compare(
                                 CurrentNode.Slots[0].Key, Item.Key) < 0)
                    Index = 1;
            }
            catch (Exception)
            {
                if (CurrentNode.Slots[0].Key.ToString().CompareTo(Item.Key.ToString()) < 0)
                    Index = 1;
            }
        }
    }
    if (CurrentNode.Children != null)
    // if not an outermost node, let next lower level node do the 'Add'.
        CurrentNode = CurrentNode.Children[Index];
    else
        break;
}

At this point, the correct node to add the item to is found. Now, invoke the actual Add function:

C#
CurrentNode.Add(ParentBTree, Item, Index);
CurrentNode = null;

Here is the Add function that does the actual item insert to the found node. It maintains the items of the node to be sorted, and manages the breakup of the node when it is full, and other B-Tree maintenance actions required such as a broke up node's parent promotion, etc.:

C#
void Add(BTreeAlgorithm<TKey, TValue> ParentBTree,
                BTreeItem<TKey, TValue> Item, int Index)
{
    byte NoOfOccupiedSlots = count;
    // Add. check if node is not yet full..
    if (NoOfOccupiedSlots < ParentBTree.SlotLength)
    {
        ShiftSlots(Slots, (byte)Index, (byte)NoOfOccupiedSlots);
        Slots[Index] = Item;
        count++;
        return;
    }
    else
    {
        Slots.CopyTo(ParentBTree.TempSlots, 0);
        byte SlotsHalf = (byte)(ParentBTree.SlotLength >> 1);
        if (Parent != null)
        {
            bool bIsUnBalanced = false;
            int iIsThereVacantSlot = 0;
            if (IsThereVacantSlotInLeft(ParentBTree, ref bIsUnBalanced))
                iIsThereVacantSlot = 1;
            else if (IsThereVacantSlotInRight(ParentBTree, ref bIsUnBalanced))
                iIsThereVacantSlot = 2;
            if (iIsThereVacantSlot > 0)
            {
                // copy temp buffer contents to the actual slots.
                byte b = (byte)(iIsThereVacantSlot == 1 ? 0 : 1);
                CopyArrayElements(ParentBTree.TempSlots, b, 
                                  Slots, 0, ParentBTree.SlotLength);
                if (iIsThereVacantSlot == 1)
                    // Vacant in left, "skud over"
                    // the leftmost node's item to parent and left.
                    DistributeToLeft(ParentBTree, 
                       ParentBTree.TempSlots[ParentBTree.SlotLength]);
                else if (iIsThereVacantSlot == 2)
                    // Vacant in right, move the rightmost
                    // node item into the vacant slot in right.
                    DistributeToRight(ParentBTree, ParentBTree.TempSlots[0]);
                return;
            }
            else if (bIsUnBalanced)
            { // if this branch is unbalanced..
                // _BreakNode
                // Description :
                // -copy the left half of the slots
                // -copy the right half of the slots
                // -zero out the current slot.
                // -copy the middle slot
                // -allocate memory for children node *s
                // -assign the new children nodes.
                TreeNode LeftNode;
                TreeNode RightNode;
                try
                {
                    RightNode = ParentBTree.GetRecycleNode(this);
                    LeftNode = ParentBTree.GetRecycleNode(this);
                    CopyArrayElements(ParentBTree.TempSlots, 0, 
                                      LeftNode.Slots, 0, SlotsHalf);
                    LeftNode.count = SlotsHalf;
                    CopyArrayElements(ParentBTree.TempSlots, 
                       (ushort)(SlotsHalf + 1), RightNode.Slots, 0, SlotsHalf);
                    RightNode.count = SlotsHalf;
                    ResetArray(Slots, null);
                    count = 1;
                    Slots[0] = ParentBTree.TempSlots[SlotsHalf];
                    Children = new TreeNode[ParentBTree.SlotLength + 1];
                    ResetArray(Children, null);
                    Children[(int)Sop.Collections.BTree.ChildNodes.LeftChild] = LeftNode;
                    Children[(int)Sop.Collections.BTree.ChildNodes.RightChild] = RightNode;
                    //SUCCESSFUL!
                    return;
                }
                catch (Exception)
                {
                    Children = null;
                    LeftNode = null;
                    RightNode = null;
                    throw;
                }
            }
            else
            {
                // All slots are occupied in this and other siblings' nodes..
                TreeNode RightNode;
                try
                {
                    // prepare this and the right node sibling
                    // and promote the temporary parent node(pTempSlot).
                    RightNode = ParentBTree.GetRecycleNode(Parent);
                    // zero out the current slot.
                    ResetArray(Slots, null);
                    // copy the left half of the slots to left sibling
                    CopyArrayElements(ParentBTree.TempSlots, 0, Slots, 0, SlotsHalf);
                    count = SlotsHalf;
                    // copy the right half of the slots to right sibling
                    CopyArrayElements(ParentBTree.TempSlots, 
                       (ushort)(SlotsHalf + 1), RightNode.Slots, 0, SlotsHalf);
                    RightNode.count = SlotsHalf;
                    // copy the middle slot to temp parent slot.
                    ParentBTree.TempParent = ParentBTree.TempSlots[SlotsHalf];
                    // assign the new children nodes.
                    ParentBTree.TempParentChildren[
                      (int)Sop.Collections.BTree.ChildNodes.LeftChild] = this;
                    ParentBTree.TempParentChildren[
                      (int)Sop.Collections.BTree.ChildNodes.RightChild] = RightNode;
                    ParentBTree.PromoteParent = (TreeNode)Parent;
                    ParentBTree.PromoteIndexOfNode = GetIndexOfNode(ParentBTree);
                    //TreeNode o = (TreeNode)Parent;
                    //o.Promote(ParentBTree, GetIndexOfNode(ParentBTree));
                    //SUCCESSFUL!
                    return;
                }
                catch (Exception)
                {
                    RightNode = null;
                    throw;
                }
            }
        }
        else
        {
            // _BreakNode
            // Description :
            // -copy the left half of the temp slots
            // -copy the right half of the temp slots
            // -zero out the current slot.
            // -copy the middle of temp slot to 1st elem of current slot
            // -allocate memory for children node *s
            // -assign the new children nodes.
            TreeNode LeftNode;
            TreeNode RightNode;
            try
            {
                RightNode = ParentBTree.GetRecycleNode(this);
                LeftNode = ParentBTree.GetRecycleNode(this);
                CopyArrayElements(ParentBTree.TempSlots, 0, 
                                  LeftNode.Slots, 0, SlotsHalf);
                LeftNode.count = SlotsHalf;
                CopyArrayElements(ParentBTree.TempSlots, 
                   (ushort)(SlotsHalf + 1), RightNode.Slots, 0, SlotsHalf);
                RightNode.count = SlotsHalf;
                ResetArray(Slots, null);
                Slots[0] = ParentBTree.TempSlots[SlotsHalf];
                count = 1;
                Children = new TreeNode[ParentBTree.SlotLength + 1];
                Children[(int)Sop.Collections.BTree.ChildNodes.LeftChild] = LeftNode;
                Children[(int)Sop.Collections.BTree.ChildNodes.RightChild] = RightNode;
                return; // successful
            }
            catch (Exception)
            {
                LeftNode = null;
                RightNode = null;
                RightNode = null;
                throw;
            }
        }
    }
}

Remove Method

There are two versions of Remove in a B-Tree; one that will remove an item given a Key, and the other that will remove the currently selected item of the tree. The former will cause the B-Tree to search for the item, given a key, and position the item pointer to the item with the matching key in the tree. The same node determination listed in the Add block is used to find the node and item to delete. After the item pointer is positioned at the right one, the item is actually deleted from the tree. Here is the code that illustrates item deletion.

When the item for deletion is not in the outermost node, the tree moves the next item from the outermost node to overwrite the currently selected item's slot, causing removal of the said item from the tree.

C#
internal protected bool Remove(BTreeAlgorithm<TKey, TValue> ParentBTree)
{
    if (Children != null)
    {
        byte byIndex = ParentBTree.CurrentItem.NodeItemIndex;
        MoveNext(ParentBTree);
        Slots[byIndex] = 
          ParentBTree.CurrentItem.Node.Slots[ParentBTree.CurrentItem.NodeItemIndex];
    }
    return true;
}

Then it takes care of managing the outermost node either to set to null the slot vacated by the moved item, or to maintain the balance of the tree by pulling an item from either the left or right sibling node.

C#
internal void FixTheVacatedSlot(BTreeAlgorithm<TKey, TValue> ParentBTree)
{
    sbyte c;
    c = (sbyte)count;
    if (c > 1) // if there are more than 1 items in slot then..
    { //***** We don't fix the children since there are no children at this scenario.
        if (ParentBTree.CurrentItem.NodeItemIndex < c - 1)
            MoveArrayElements(Slots,
                (ushort)(ParentBTree.CurrentItem.NodeItemIndex + 1),
                ParentBTree.CurrentItem.NodeItemIndex,
                (ushort)(c - 1 - ParentBTree.CurrentItem.NodeItemIndex));
        count--;
        Slots[count] = null; // nullify the last slot.
    }
    else
    { // only 1 item in slot
        if (Parent != null)
        {
            byte ucIndex;
            // if there is a pullable item from sibling nodes.
            if (SearchForPullableItem(ParentBTree, out ucIndex))
            {
                if (ucIndex < GetIndexOfNode(ParentBTree))
                    PullFromLeft(ParentBTree); // pull an item from left
                else
                    PullFromRight(ParentBTree); // pull an item from right
            }
            else
            { // Parent has only 2 children nodes..
                if (Parent.Children[0] == this)
                { // this is left node
                    TreeNode RightSibling = GetRightSibling(ParentBTree);
                    Parent.Slots[1] = RightSibling.Slots[0];
                    Parent.count = 2;
                    ParentBTree.AddRecycleNode(RightSibling);
                    RightSibling = null;
                }
                else
                { // this is right node
                    Parent.Slots[1] = Parent.Slots[0];
                    TreeNode LeftSibling = GetLeftSibling(ParentBTree);
                    Parent.Slots[0] = LeftSibling.Slots[0];
                    Parent.count = 2;
                    ParentBTree.AddRecycleNode(LeftSibling);
                    LeftSibling = null;
                }
                // nullify Parent's children will cause this
                // tree node instance to be garbage collected
                // as this is child of parent!
                Parent.Children[0] = null;
                Parent.Children[1] = null;
                Parent.Children = null;
                Clear();
            }
        }
        else
        { // only 1 item in root node !
            Slots[0] = null; // just nullIFY the slot.
            count = 0;
            ParentBTree.SetCurrentItemAddress(null, 0);
            // Point the current item pointer to end of tree
        }
    }
}

Search Method

The Search method contains code that has the same logic as the first portion of the Add method listed above. It traverses the tree from the root down the children nodes, until it finds the target node and item using an iterative approach (while loop). But instead of adding an item, when it is found, Search merely positions the current item pointer to the found node and item in the node.

Having an item pointer combined with the Search and Movexxx methods is a powerful feature. E.g., doing a range query is very easy in this combination of methods. Position the item pointer to the (start) item you'd like to query, then use either the MoveNext or MovePrevious method to sequentially traverse the nearby items, until you'd read all the items in range that you are interested about.

Using the Code

Download the source code zip file, uncompress it to your desired target folder, and open the B-Tree.sln solution inside VS 2008. Browse through the project SampleUsage source code in order to see how to use the BTreeSortedDictionary, run it, experiment to use other methods available, and reuse your experience using the Generic SortedDictionary. Enjoy!

Comparison Results

1 Million Inserts on .Net SortedDictionary: 12 sec 308 MB peak 286 MB end
1 Million Inserts on BTreeDictionary:       9 sec 300 MB peak 277 MB end
1 Million Search on .Net SortedDictionary:  7 sec 309 MB peak 286 MB end
1 Million Search on BTreeDictionary:        7 sec 309 MB peak 277 MB end

Conclusion: This B-Tree sorted dictionary outperforms the .NET Framework's SortedDictionary in both memory utilization and speed of operations. Inserting 1 million items, this BTree outperforms .NET's by 3 seconds, and utilized ~9MB of less memory.

In searching, both are of the same speed, but then again, in memory utilization, BTree dictionary outperforms .NET's SortedDictionary by using 9MB of less memory. Multiplying the load 10X and using both in production environment, that is the true test, and I anticipate you will be very satisfied with this BTree dictionary due to architectural reasons I discussed at the top of this article.

Here is the stress program used:

C#
static void Main(string[] args)
{
    SortedDictionary<string, string> l1 = 
              new SortedDictionary<string, string>();
    StressAdd(l1, 1000000);
    StressSearch(l1, 1000000);
    //** uncomment to run the BTreeDictionary test
    //   and comment out the above block to turn off SortedDictionary test
    //BTreeDictionary<string, string> l2 = 
    //            new BTreeDictionary<string, string>();
    //StressAdd(l2, 1000000);
    //StressSearch(l2, 1000000);
    Console.ReadLine();
    return;
}

static void StressAdd(IDictionary<string, string> l, int Max)
{
    Console.WriteLine("Add: {0}", DateTime.Now);
    for (int i = 0; i < Max; i++)
    {
        l.Add(string.Format("{0} key kds vidsbnvibsnv siuv " + 
              "siuvsiubvisubviusvifsudjcnjdnc", i),
              "value kds vidsbnvibsnv siuv " + 
              "siuvsiubvisubviusvifsudjcnjdnccscs");
    }
    Console.WriteLine("Add: {0}", DateTime.Now);
}

static void StressSearch(IDictionary<string, string> l, int Max)
{
    Console.WriteLine("Search: {0}", DateTime.Now);
    for (int i = 0; i < Max; i++)
    {
        string v = l[string.Format("{0} key kds vidsbnvibsnv siuv " + 
                     "siuvsiubvisubviusvifsudjcnjdnc", i)];
    }
    Console.WriteLine("Search: {0}", DateTime.Now);
}

Points of Interest

I really learnt a lot and enjoyed authoring this B-Tree implementation, which I did the C++ version back in the mid-90s; in fact, I enjoyed it so much, it propelled me to work on the on-disk version that is available now as Open Sourced software in GitHub (https://github.com/SharedCode/sop). Pls. feel free to join, participate and use the new "on disk" version.

Hello, "sop" project had been relocated to GitHub: https://github.com/SharedCode/sop
And, current version V2 that accomplished all of "on disk" features advertised, was implemented in Golang. The B-Tree also got optimized, where on "deletes", node load balancing was turned off. It turned out to be a very cool, ACID transactions, Objects persistence enabler/adaptor on top of Cassandra & Redis. Check it out and join the project, it awaits your "creative talents"!

History

  • 23rd July, 2010 - Initial submission
  • 24th July, 2010 - Added performance comparisons with .NET Framework's SortedDictionary
  • 14th August, 2012 - Modified Points of Interest section to mention just Open Sourced B-Tree Gold (v4.5) product
  • 25th January, 2024 - Added link to Golang port: https://github.com/SharedCode/sop

License

This article, along with any associated source code and files, is licensed under The MIT License


Written By
United States United States
Ex Google, ex Microsoft
I'm just fresh off from a software eng'g gig & is looking forward to my next.
Pls. do drop me a line @gerardorecinto@yahoo.com if interested or having any question/feedback on these Open Source projects.

I'm excited to help/volunteer my services.
Have a great day!

Comments and Discussions

 
QuestionWhy not in C? Pin
Sanmayce24-May-15 5:48
Sanmayce24-May-15 5:48 
AnswerRe: Why not in C? Pin
Gerardo Recinto26-May-15 11:42
Gerardo Recinto26-May-15 11:42 
GeneralRe: Why not in C? Pin
Sanmayce26-May-15 13:17
Sanmayce26-May-15 13:17 
GeneralRe: Why not in C? Pin
Gerardo Recinto13-Oct-15 5:27
Gerardo Recinto13-Oct-15 5:27 
GeneralRe: Why not in C? Pin
Sanmayce14-Oct-15 8:02
Sanmayce14-Oct-15 8:02 
GeneralRe: Why not in C? Pin
Gerardo Recinto14-Oct-15 13:58
Gerardo Recinto14-Oct-15 13:58 
GeneralRe: Why not in C? Pin
Sanmayce15-Oct-15 7:36
Sanmayce15-Oct-15 7:36 
Good good, however your project uses new languages/technologies that I know nothing about.
I visited quickly your project at github and what I like is the "benchmark", you see, when I talk speed I always give realworld examples, not knowing e.g. your insertion rate is unacceptable, one example has to be given no matter multi-threaded, client-server or whatever, in my case I will give you, not right away because I have only two laptops constantly loaded at max so I'm short of computing power, my numbers on internal/external rate of ripping Wikipedia for 1-grams, just downloaded 2015-09-01 dump.
On that note, it will help if you describe (in lame terms) what C# and all these pleiades of object-oriented interfaces offer more than good old plain C.
As far as I see, it is the client-server framework that allows multiple accesses at once, but what about single users like me who want just fast&reliable structure to house those 10-900 million unique words/phrases!
Basically we target two very different scenarios, mine being amateurish while yours - PRO.

 Store "multiple reader, single writer" feature seems feature complete…

…. :)

Massive parallel clients' simulation shows great stability.
* 250 parallel clients w/ 75 inserting 760,000 entities, 175 reading 175,000 entries randomly.
* Doing all of these parallel IOs in 6 minutes.
* same hardware. :)


Here is the two latest&final Leprechaun tools - the first root folder is for x-grams, the nested one is for dumping unique sequences of x-bytes in hexadecimal:
www.sanmayce.com/Downloads/Leprechaun_x-leton_32bit_64bit.7z[^]

The obsession of quick checking a given English phrase against huge corpus 400+ million strong is getting even stronger down the years, for a year or two I "froze" my idea of uploading 100% FREE PHRASE-CHECKER due to my limited computational resources - only ripping needs ~2TB then some maintenance and simplification of presentation, the PAGODA structure is so dear to me, however there crating one only pagoda takes an hour and 1+GB per word - scary, I need at least some 200,000 words - horrific.

Just to see what my unfinished/frozen idea is about:
http://www.sanmayce.com/MSKR/[^]

One of ideas which you may consider is to put as first line of offense not a B-tree but a myriad of B-trees, in case below 67 million B-trees, that's the beauty of order 3 - very economic memory usage especially good for internal i.e. RAM housing, below I give the option 'y' which places the B-tree(s) into RAM whereas option 'z' would put them on file - very good SSD IOPS test by the way.
At the moment I am on one old laptop with Core 2 @2GHz running Win 32bit, so I downloaded quickly English Wiktionary, very good resource as well, here we go, 11 million unique words (1-grams) out of 447 million ripped at rate 03,340,952 Phrases/s:
D:\Downloads\Z\Leprechaun_x-leton_32bit_64bit>dir
 Volume in drive D is FINA_V5
 Volume Serial Number is DC46-F9CC

 Directory of D:\Downloads\Z\Leprechaun_x-leton_32bit_64bit

10/15/2015  07:53 PM    <DIR>          .
10/15/2015  07:53 PM    <DIR>          ..
10/15/2015  07:36 PM     4,013,738,793 enwiktionary-20150901-pages-articles.xml
10/15/2015  07:36 PM       489,105,807 enwiktionary-20150901-pages-articles.xml.bz2
10/01/2015  02:02 AM           330,850 Leprechaun_x-leton.c
10/15/2015  07:24 PM             3,508 Leprechaun_x-leton_COMPILE-ALL_Microsoft_32BIT.BAT
               4 File(s)  4,503,178,958 bytes
               2 Dir(s)  27,527,008,256 bytes free

D:\Downloads\Z\Leprechaun_x-leton_32bit_64bit>cl /Ox /TcLeprechaun_x-leton.c /FaLeprechaun_x-leton /w /FAcs -Dsingleton -D_1p -D_WIN32_ENVIRONMENT_
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 16.00.40219.01 for 80x86
Copyright (C) Microsoft Corporation.  All rights reserved.

Leprechaun_x-leton.c
Microsoft (R) Incremental Linker Version 10.00.40219.01
Copyright (C) Microsoft Corporation.  All rights reserved.

/out:Leprechaun_x-leton.exe
Leprechaun_x-leton.obj

D:\Downloads\Z\Leprechaun_x-leton_32bit_64bit>ren Leprechaun_x-leton.exe Leprechaun_x-leton_32bit_Intel_01_1p.exe

D:\Downloads\Z\Leprechaun_x-leton_32bit_64bit>dir enwiktionary-20150901-pages-articles.xml/b>riplist

D:\Downloads\Z\Leprechaun_x-leton_32bit_64bit>Leprechaun_x-leton_32bit_Intel_01_1p.exe riplist enwiktionary-20150901-pages-articles.xml.1-gram 1100123 y
Leprechaun_singleton (Fast-In-Future Greedy n-gram-Ripper), rev. 16FIXFIXfix, written by Svalqyatchx.
Purpose: Rips all distinct 1-grams (1-word phrases) with length 1..31 chars from incoming texts.
Feature1: All words within x-lets/n-grams are in range 1..31 chars inclusive.
Feature2: In this revision 512MB 1-way hash is used which results in 67,108,864 external B-Trees of order 3.
Feature3: In this revision, 1 pass is to be made.
Feature4: If the external memory has latency 99+microseconds then !(look no further), IOPS(seek-time) rules.
Pass #1 of 1:
Size of input file with files for Leprechauning: 42
Allocating HASH memory 536,870,977 bytes ... OK
Allocating memory 1075MB ... OK
Size of Input TEXTual file: 4,013,738,793
|; 03,340,952P/s; Phrase count: 447,687,666 of them 11,747,711 distinct; Done: 64/64
Bytes per second performance: 29,953,274B/s
Phrases per second performance: 3,340,952P/s
Time for putting phrases into trees: 134 second(s)
Flushing UNsorted phrases: 100%; Shaking trees performance: 02,610,602P/s
Time for shaking phrases from trees: 9 second(s)
Leprechaun: Current pass done.

Total memory needed for one pass: 1,019,835KB
Total distinct phrases: 11,747,711
Total time: 144 second(s)
Total performance: 3,108,942P/s i.e. phrases per second
Leprechaun: Done.

D:\Downloads\Z\Leprechaun_x-leton_32bit_64bit>dir en*
 Volume in drive D is FINA_V5
 Volume Serial Number is DC46-F9CC

 Directory of D:\Downloads\Z\Leprechaun_x-leton_32bit_64bit

10/15/2015  07:36 PM     4,013,738,793 enwiktionary-20150901-pages-articles.xml
10/15/2015  08:01 PM       118,690,720 enwiktionary-20150901-pages-articles.xml.1-gram
10/15/2015  07:36 PM       489,105,807 enwiktionary-20150901-pages-articles.xml.bz2
               3 File(s)  4,621,535,320 bytes
               0 Dir(s)  27,407,519,744 bytes free

D:\Downloads\Z\Leprechaun_x-leton_32bit_64bit>type enwiktionary-20150901-pages-articles.xml.1-gram|more
ofvsf
byjnf
divezzat
hytlkz
gxtyvev
zoppeggiaste
fotometrick
qjobqz
gtbwyaadbtqqxdtohzx
ynumdc
vjerenik
dqta
qyooku
monoamorous
arulja
evxduue
emiw
onhn
titanosaurids
ephmf
evropo
inurebamus
hhvkz
nzvnub
nttxaybq
vsouu
kbfroqx
desapontastes
bnlojf
adwku
anomalepidids
jlttz
zwttdzan
zfswue
eeetcg
tnercmcoqw
lynehn
uwng
expergefaciebat
ltims
stzhsuihp
^C
D:\Downloads\Z\Leprechaun_x-leton_32bit_64bit>


Cheers
GeneralRe: Why not in C? Pin
Gerardo Recinto15-Oct-15 8:12
Gerardo Recinto15-Oct-15 8:12 
GeneralRe: Why not in C? Pin
Gerardo Recinto16-Oct-15 20:56
Gerardo Recinto16-Oct-15 20:56 
QuestionDuplicates Looping and Deleting Pin
runfastman27-Feb-14 10:47
runfastman27-Feb-14 10:47 
AnswerRe: Duplicates Looping and Deleting Pin
Gerardo Recinto28-Feb-14 8:42
Gerardo Recinto28-Feb-14 8:42 
QuestionWrite to file Pin
Shargon_8516-Jul-13 2:58
Shargon_8516-Jul-13 2:58 
AnswerRe: Write to file Pin
Gerardo Recinto25-Feb-14 21:21
Gerardo Recinto25-Feb-14 21:21 
QuestionIntroduction paragraph correction Pin
Yuriy Loginov20-May-13 10:11
professionalYuriy Loginov20-May-13 10:11 
AnswerRe: Introduction paragraph correction Pin
Gerardo Recinto20-May-13 18:50
Gerardo Recinto20-May-13 18:50 
SuggestionRe: Introduction paragraph correction Pin
Yuriy Loginov21-May-13 2:35
professionalYuriy Loginov21-May-13 2:35 
GeneralRe: Introduction paragraph correction Pin
Gerardo Recinto28-Feb-14 11:28
Gerardo Recinto28-Feb-14 11:28 
QuestionGood learning material Pin
enjaychang17-Jan-13 2:05
enjaychang17-Jan-13 2:05 
AnswerRe: Good learning material Pin
Gerardo Recinto18-Feb-13 12:23
Gerardo Recinto18-Feb-13 12:23 
GeneralRe: Good learning material Pin
enjaychang6-Mar-13 3:53
enjaychang6-Mar-13 3:53 
QuestionMemory usage Pin
Lcng9-Oct-12 4:52
Lcng9-Oct-12 4:52 
AnswerRe: Memory usage Pin
Gerardo Recinto9-Oct-12 20:12
Gerardo Recinto9-Oct-12 20:12 
QuestionB tree inserion of records in C language Pin
Member 94559374-Oct-12 3:13
Member 94559374-Oct-12 3:13 
AnswerRe: B tree inserion of records in C language Pin
Gerardo Recinto5-Oct-12 17:28
Gerardo Recinto5-Oct-12 17:28 
GeneralMy vote of 5 Pin
Christian Amado28-Aug-12 12:53
professionalChristian Amado28-Aug-12 12:53 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.