Bi-Directional SkipList in C#

David Jeske

Rate me:

4.45/5 (7 votes)

20 Dec 2010Apache5 min read

25.8K

331

A C# sorted map which can be efficiently subset scanned in both directions

Download source-20101210 - 11.39 KB

Introduction

BDSkipList is a generic bi-directional thread-safe sorted-map implemented as a SkipList in C#. It contains an efficient subset scanning interface, IScanner/IScannable.

Background

SortedDictionary is a generic collection provided to keep a sorted map of keys to values, while allowing quick lookups to any key. However, its enumerator only scans through the entire sorted-map and only in a forward direction. Once a map becomes large, it is often useful to scan through a subset of that map (given start/end keys) and do-so in either direction. BDSkipList provides this capability.

For background on how a skip list is structured, review this excellent SkipList in C# article or Wikipedia's skip list entry. BDSkipList operates in a similar manner, but it contains skip-lists in both directions, to facilitate scanning and finding entries efficiently in either direction.

Using the Code

At a basic level, using BDSkipList is similar to using other collections. For example, the following code will create a map and insert elements, internally keeping them in sorted order.

BDSkipList<string,int> mylist = new BDSkipList<string,int>();
mylist["abc"] = 1;
mylist["def"] = 2;
mylist["ghi"] = 3;

One can then use the default enumeration to walk through all elements in the list.

foreach (KeyValuePair<string,int> kvp in mylist) {
   Console.WriteLine(" {0} = {1}",kvp.Key,kvp.Value);
}

The above operations can all be performed in the same manner on a standard SortedDictionary. However, the IScannable interface allows us to efficiently search and scan ranges of values.

public interface IScannable<K,V> {
   KeyValuePair<K,V< FindNext(IComparable<K> keytest,bool equal_ok);
   KeyValuePair<K,V> FindPrev(IComparable<L> keytest,bool equal_ok);
   IEnumerable<KeyValuePair<K,V>> scanForward(IScanner<K> scanner);
   IEnumerable<KeyValuePair<K,V>> scanBackward(IScanner<K> scanner);
}

Using this interface, we can find the next tuple after the key "b" in our sample list is ("def" = 2), and we can efficiently perform a scan over all keys between "b" and "z" in either direction. Lookup operations, such as FindNext, FindPrev, and the first result of a scan approach (log-N) time for a list with N elements, while iterating to the next element in a scan is O(1) because it makes use of the internal linked list to quickly move to the next entry.

Let's consider the code to scan over all keys in reverse, and then scan forward efficiently over the subset of keys between "b" and "z".

// scan backwards through keys
foreach ( KeyValuePair<string,int> kvp in
   mylist.scanBackward(ScanRange<string>.All()) ) {
       Console.Writeline(" {0} = {1} ", kvp.Key, kvp.Value);
}

// scan over a subset of keys between b and z
foreach (KeyValuePair<string,int> kvp in
    mylist.scanForward(new ScanRange<string>("b","z")) {
       Console.Writeline(" {0} = {1} ", kvp.Key, kvp.Value);
}

If our list contains thousands, or even millions of entries, these scans are proportional to the number of entries returned by the scan, not the number of entries in the list, as it would be if we used a simple GetEnumerator to scan over all elements in the list.

You may have noticed above that FindPrev and FindNext make use not of a key, but of an IComparable<K> to find the next entry. IScanner operates in the same manner, which allows some convenient handling of special conditions of certain types of keys. For example, it is possible to perform scans using special "max keys" or "min keys". While it may seem trivial to do this for the beginning or end of the list, this comes in handy when handling structured keys. In one application, the key-type is a hierarcial set of parts, and a scan may be performed between "A.B.C" and AfterPrefix("A.B.C").

Points of Interest

The IScanner interface has proven both capable and at-times, inconvenient. For example, it always performs scans inclusive of the end-point ranges. While a match-to function calback is provided to qualify entries, this is an inconvenient way to express endpoint inclusion/exclusion. A future version of the IScanner interface may more directly provide support for inclusive or exclusive endpoint ranges. Further, the use of IComparable<K> was intended to enable flexible handling of prefixes and other issues in composite key types. However, the power is greater than is valid in the context of a lookup. As the log-n initial lookup is performed, it's important that the start-key maintain a constant view of its place in the sorted-map. For example, if an IComparable is provided which simply returns a random value for the Comparison, the results are quite unpredictable. Another possibility is to require the scan specification use hard-keys with specific LT, GT, LTE, GTE operators. Perhaps a future version will move in this direction.

Rather than keep a skip-list in each direction, we could alternately keep a skip-list in one direction and add bidirectional links only to the 'full' skiplist lane. While writing it, the mirror-structure of allowing FindPrev to be expressed as the mirror of FindNext seemed simple to conceive. In fact, it should not be necessary to keep both skip lists. The initial lookup approaches log(n) in either skip-list direction, regardless of the key we use or the direction we wish to scan after we do our initial lookup. Making this change would require FindPrev to be internally expressed as a FindNext followed by a backtrack to the previous internal node. Perhaps a future version will perform this optimization.

BDSkipList performs conservative coarse granularity locks around operations to be threadsafe. These locks add cost to operations and could be made smaller and faster. One alternative is to use SpinLocks, which are faster than normal locks in situations where the lock is held for a very short period of time. Another alternative is to alter the skiplist to be lock-free. The way nodes are removed may be threadsafe without locks. A removed node is left still pointing into the list, so any thread walking through a removed node would simply continue to the next node regardless of when the remove occurred.

Because enumerators through the list are created by walking a possibly changing link-list, enumerators are not guaranteed to remain constant through their enumeration. However, because of the way nodes are removed and added, an enumeration will safely finish a proper scan regardless of add or remove operations that occur concurrently. Perhaps one of the few undesirable enumerator characteristics is that if an enumerator sits for a long time on a node that is removed from the list, when it resumes scanning, it may miss adjacent nodes which were added to the sequence after its current node was removed. One way to remedy this would be to leave nodes in the list as they are deleted, and simply 'mark' them as deleted. Only when all enumerators had let go of a node would it be truly removed. However, this would add complexity to scanning, skip-list maintenance, and lookups, which would all need to specially-handle nodes which are marked deleted. This is likely not worth the trouble unless the above case is extremely common.

History

Initial article December 20, 2010

License

This article, along with any associated source code and files, is licensed under The Apache License, Version 2.0

Written By

David Jeske

United States

David Jeske is an Entrepreneur and Computer Programmer, currently living in San Francisco, California.

He earned his B.S. Computer Engineering at University of Illnois at Champaign/Urbana (UIUC), and has worked at several Silicon Valley companies, including Google, Yahoo, eGroups.com, 3dfx, and Akklaim Entertainment. He has managed and architected extremely high-traffic websites, including Yahoo Groups and orkut.com, and has experience in a broad spectrum of technology areas including scalability, databases, drivers, system software, and 3d graphics.

You can contact him at davidj -a-t- gmail (dot) com for personal messages about this article.