Building a Generic Range class

Sean Hederman

4.97/5 (21 votes)

Jun 1, 2007

12 min read

64139

1270

This article discusses the creation of a generic Range class, and goes into the decisions and thoughts about many of the design aspects. Touches on lambdas, iterators, and generics.

Download source files - 22.7 KB

Introduction

A Range is often useful in order to express the idea of a large set of values, without actually having all of those values around. It can be used for such diverse applications as schedulers, or caching algorithms.

Background

I'm going to present the Range class I wrote, but I will also go into some of the decisions I made and why I made them. First off, I'd like to thank Jay Bazuzi and his readers for his article and its follow-up where they created a simple Range class.

Using the code

The code is wrapped up in a simple C# Orcas project called Sanity. You can either extract the Range.cs file directly or just reference the project, and use a simple:

using Sanity;

at the top of your code. Please note that if you decide to extract the Range.cs file, it does depend on some other files inside the project for validation, so you'd need to edit it accordingly, or just include the Assert.cs file as well. The Range<T>, Range<TKey, TValue>, and RangeArray<T> classes are all used as standard generic classes.

Range<T> - Stores a range from LowerBound to UpperBound.
Range<TKey, TValue> - Stores a range from LowerBound to UpperBound, along with a Value.
RangeArray<TKey> - Stores a range of integers from LowerBound to UpperBound, along with an array called Values.

The classes all support the following operations:

Contains\IsContainedBy - Determines if a range or a value falls within the range.
Overlaps - Determines if a range is overlapped/touched by another range.
IsContiguousWith - Determines if two ranges touch each other at all.
CompareTo - Allows you to compare two ranges, or your range with a value.
Intersect - Gives you the intersection of two ranges, the range of where they both overlap.
Union - Gives you a range that represents a merging of the two ranges together.
Complement - Gives you a range that represents the range with another range removed out of it.
Split - Gives you a two ranges (well, possibly one), your range split at a point.
Iterate - Allows you to iterate from the LowerBound to the UpperBound of your range.
ReverseIterate - Allows you to iterate from the UpperBound to the LowerBound of your range.
==, !=, >, >=, <, <= - Standard comparison operators.
^, |, & - Represents Complement, Union, Intersect respectively.

The creation of these classes is accomplished by factory methods, called Create on the static Range class. This class also contains the methods that operate on IEnumerable objects of the Range... classes. Each method has overloads supporting covariance (as discussed below).

The methods on the Range static class are:

Create - Creates a range. There are 4 overloads of this, and they create the applicable Range type.
Contains - Indicates if a point or range is contained within a list of ranges.
Sort - Sorts a list of ranges.
Coalesce - Takes a list of ranges, and returns a new list, where contiguous ranges are joined together.
Overlapped - Takes a list of ranges, and returns those ranges that overlap a given range.
Find - Takes a list of ranges, and applies a predicate to find the ranges you're looking for.
FindFirst - Takes a list of ranges, and applies a predicate to find a range you're looking for.

Note that most of these methods are implemented as extension methods on IEnumerable<Range...> objects. In addition, all the methods that have IEnumerable returns are implemented as iterators, and thus will only perform the minimum work required to get the items you ask for.

Points of Interest

Generic or non-Generic?

Well-this question was easy enough in its way; it makes sense that we make a Range class generic. However, there was an implication of this. Obviously any kind of range will need to be able to compare its items, but which IComparable should we support? IComparable<T> has the advantage of being more strongly typed, but IComparable is probably more widely supported. It took me a while to decide, but I finally settled on IComparable<T>, largely because strong type safety is a good thing. In any case, the main Framework classes support it, and for custom classes, it's not difficult to add.

Class or Struct?

This was a question that took a while to decide. I originally settled on struct, but then switched to class when I decided I wanted to return null from some methods. Then I changed my mind again and instead returned Nullable<Range<T>>. Finally, I made the decision based on two design factors:

I wanted to have child classes that inherited from Range<T>
I wanted to ensure that the Range<T> is created by a set of factory methods only.

Since you cannot inherit from structs, and anyone can create a struct using its default constructor, I settled (by default) on a class.

Mutable or Immutable?

This was another tough call. By making the Range<T> mutable, I would certainly simplify my life, but there were some points against this too. First off I was at that point considering a RangeList<T> that would keep a set of sorted ranges, trying to keep contiguous ranges together. If we could change the values of the range underneath the collection, it would become very difficult to keep in a consistent state. What finally hit the nail in the coffin though were the two child classes that I created. Let's look at the classes:

Class diagram of the Range&ltT> and subclasses

In this diagram you can see that I decided on a Range<T> base class that would contain the UpperBound and LowerBound properties, and handle all the comparisons and modifications necessary. Sometimes we would want to associate a range with something other than just the range, some extra data. To cater to this possibility, I created the Range<TKey, TValue> class, which inherited from the base Range<T>, but added a Value property. Finally, I also handled a special case, a case where the range represented an array, similar to Range<TKey, TValue>, but different in two respects. Firstly, it inherited from Range<int> specifically, and secondly the value was an array just for easier access to the elements. To be honest I'm still not sure about this one.

Anyway, these subclasses decided the mutability question for me. It's all very well to talk about splitting a Range<T>, but what exactly does it mean to split a Range<TKey, TValue>? What should I do with the value? If I didn't know what it was, how could I meaningfully handle joining or splitting ranges containing such values? I couldn't, so I decided to make the ranges immutable. All the "modification" methods on Range<T> you see in that diagram return completely new ranges. More particularly a call to Range<TKey, TValue> classes Union would return a Range<TKey>, "without the associated value". All of the main methods here return simple ranges, without associated extra data. They are used, in a sense to tell the developer what to do:

var start = Range.Create(0, 5, "Hello ");
var end = Range.Create(6, 10, "World");
var join = start.Union(end);   // 0->10

In the example above, the last Range tells us that the union of start and end is a range from 0 to 10; it doesn't know to concatenate the two strings though. That would be for the programmer to determine, perhaps something like:

var final = Range.Create(join.LowerBound, join.UpperBound, 
                    start.Value + end.Value);

So, as with the class versus struct decision, in a very real way it was previous decisions that forced my hand on this one. The lesson is important I think. Quite often decisions made much earlier in the process constrain your choices later on. In a sense, this is the point of refactoring, to once again open up your available options.

Interfaces and Overloads

The one interface to implement was pretty obvious, IComparable<Range<T>>;. A complete no-brainer there. I also thought it was important to provide comparison against T, so that's why IComparable<T>. This leads nicely into quite an interesting digression. Given a range {0->10}, is it bigger or smaller than 5? The comparisons were to give me nightmares. I created all sorts of spaghetti code, made even more monstrous by a doomed decision to try and support not only infinities, but also limits, values that were "not quite" the given value. So we could have a range {[5]->10} which meant that the range extended from just after 5 up to and including 10. Now, ask yourself this question; forget the simple comparison I gave you earlier, which is "now" larger: {[5]->10} or {1->[5]}, or what about {∞->[5]} vs. {5->[7]}? I found myself in a nightmare world in which comparisons changed their meaning based on which side of the comparison and which side of the range they were on.

I had forgotten the cardinal rule of software development: Keep it Simple Stupid. For goodness sakes if I needed infinities and limits, I could bloody well implement another class that handled them, and just pass it in as the type parameter to Range! Trying to implement a range that did everything was an exercise in futility. More particularly, what if the developer using my class didn't want to have infinities, or the hassle of figuring out obscure comparison rules?

Anyway, after a fair bit of time exercising the horrific comparison code and unit tests, I settled on a simple solution, comparisons would be against the LowerBound. So {5->10} < 6. Just for jollies I decided to also support IComparable, where I just tested the type of the incoming object and passed it to the relevant CompareTo method.

I also overloaded Equals. Now, I decided that Equals would only ever return true for two ranges. I'm not fully decided on this one. It means I have the unusual situation where {5->10}.CompareTo(5) == 0, but {5->10}.Equals(5) == false. However, as I see it Equals is a much stricter comparison than CompareTo. If you disagree, let me know.

To make life a bit easier, I also provided a simple little overload of ToString that gives me my nice {x->y} in the debugger.

Operators

This was fun. I normally don't do operator overloading, simply because there isn't much call for it. Let's face it, what exactly does it mean to add two Customer objects? I implemented the usual comparison operators, and also implemented three modification operators:

^ - Which I took to mean Complement, removing a range from another.
| - Which I took to mean Union, coalescing two ranges into one range that spans both.
& - Which I took to mean Intersect, providing a range for the areas where both ranges overlap.

Again, I'm not 100% sure I've done the right thing by providing the modification operators, but I guess I just got carried away.

Iterators

This was quite interesting. I wanted to be able to iterate through a range. In other words (for integers), given {0->5}, to iterate 0, 1, 2, 3, 4, 5. However, since I'm dealing in generics, I don't really know what the T is, and I especially don't know what it means to iterate from one T to another. Even for types I do know about, how would I iterate them? I mean, if it was DateTime, would I iterate the seconds, minutes, years? So what I did was take a delegate in the Iterate method called incrementor. It's a Func<TArg0, TResult>> and so can be used with a lambda expression. So you can do this for example:

Range<int> range = Range.Create(0, 5);
foreach (int i in range.Iterate(x => x + 1))
{
    Debug.WriteLine(i);
}

I thought it was a nice touch. Just for completeness I provided a ReverseIterate that takes a decrementor. So, the meaning of a range is left up to the person supplying the incrementor, and a single range can mean different things at different times. Cool!

Factory class

A nice touch from the original article by Jay was to have a static Range class that takes the methods that operate on multiple ranges. I decided not to completely go that route, but all the stuff that operated on lists of ranges, as well as the factory methods I put in this class:

Class diagram of the static Range class

Covariance/Contravariance

C# does not support covariance for generics. What this means is that if I have a method like this:

IEnumerable<Range<T>> Find<T>(this IEnumerable<Range<T>> ranges, 
                    Func<Range<T>, bool> predicate);

I cannot pass a List<RangeArray<object>> into the ranges parameter. Even though the list implements IEnumerable<T>, and the T in this case is inherited from Range<T>, it does not see these two types as compatible. Apparently there is good reason for this, but I don't know what it is.

So, in order to get around this problem, I originally decided to create a RangeList<T> and a RangeList<TKey, TValue> and a RangeArrayList<T>, all of which would implement IEnumerable<Range<T>>, and thus the methods in the static class would accept them, and there would be celebrations throughout the land. Well, I did, and it worked fine, but it niggled at me. I realized that the only reason those collection classes existed was to get around the covariance problem, but that they also constrained the solution. What if someone wanted a Dictionary<Range<T>, string>, and wanted to pass the Keys enumerator to one of my methods? Well, they wouldn't be able to, all because of bloody covariance. So, if you cast your eye up to that class diagram above, you'll see my solution.

First I made all the constructors of my classes internal, ensuring that no-one could inherit them, just for good measure I marked Range<TKey, TValue> and RangeArray<T> as sealed. Then, I created two overloaded private methods on Range, called MakeCovariant. One took an IEnumerable<Range<TKey, TValue>> and the other an IEnumerable<RangeArray<T>>, and they both returned IEnumerable<Range<T>>. Then I created similar overloads for each of the static methods that called MakeCovariant on their arguments and passed the IEnumerable<Range<T>> to the "master" method. The upshot of all this was that the lack of covariance had now ceased to be a problem as far as my static Range was concerned. This would only work as long as my three classes were all that there was, and no-one created new subclasses, which I've ensured.

I guess this wouldn't be necessary if I had just decided to stick with having the Range class as a struct, but I like having the three Range options, and now I don't have to have three extra collection classes supporting them.

Anyway, one side-effect of all this is that the Find method is a bit interesting. Have a look at the three Find signatures:

... Find<T>           (this IEnumerable<Range<T>> ranges,            
                Func<Range<T>, bool> predicate)    ...
... Find<TKey, TValue>(this IEnumerable<Range<TKey, TValue>> ranges, 
                Func<Range<TKey>, bool> predicate) ...
... Find<T>           (this IEnumerable<RangeArray<T>> ranges,       
                Func<Range<int>, bool> predicate)

Note that, in all three cases, the predicate takes the base Range<T> class. What this means is that if you attempt something like:

IEnumerable<RangeArray<T>> ranges = /* Create Range */
Range.Find(ranges, x => (x.Values.Count == 0));

You'll find that the lambda won't compile. The simple reason is that, despite the fact that the collection contains RangeArray classes, I've only written the predicate to accept Range<T>. I guess I could change it, but it would mean duplicating the Find functionality, and I'm lazy.

So, have fun with the class, which you can find here. Please note that this is an Orcas project. It doesn't take too much effort to back-port it to Whidbey though. Let me know if you have any problems, improvements or anything. It has not been rigorously tested, but I've included the unit tests I've done so far.