Introduction
The following article presents a generic solution that mimics SQL's ORDER BY
clause functionally, but operates on strongly typed lists, in memory. The presented solution can be used to sort strongly-typed lists of business objects using any number of properties or fields, ascending or descending. The solution is generic in that no significant change to your class or business-object code is required to sort lists of your own custom type. The sort routine uses Marc Gravell's HyperPropertyDescriptor for fast property value retrieval.
Background
I am currently working on a Web project designed to help a large organization keep track of membership, publish fuzzy-searchable online directories, print mailing labels, manage online music clips for student auditions, and plan and schedule events.
The back-end database used to store our data consists of approximately 40 tables. Since we will eventually be handing this project over to a possibly less-experienced programmer when we are finished, one of our goals is to keep data access simple and minimize needless programmer fiddling at the SQL/table level. With this in mind, we decided to take a multi-tiered approach to data access. Skip ahead several months, and a fairly elegant data abstraction layer (DAL) has been built and is working nicely, serving as the go-between for database tables and in-memory business objects.
One of the many bonuses of having a data abstraction layer is the ability to cache query results in-memory, cutting down significantly on the number of database round-trips and general database load. However, for obvious reasons, lists (of people, locations, etc.) must be returned in different sort orders on different Web pages and for different users. For example, one user may want to sort by Last Name, First Name, while another user wants to sort by ID. These sort requirements could easily counteract any benefit of using cached data if the programmer must re-issue new SQL queries with changed ORDER BY
clauses each time we would like to sort by a different set of fields. An alternate that would avoid increased database traffic and allow for in-memory sorting would be to add property comparison routines to each business object type.
I wasn't thrilled with either of these options, so (neither wanting to rewrite the business objects nor to fiddle with existing class definitions), I decided to implement a generic solution that would mimic SQL's ORDER BY
clause on in-memory lists. Specifically, I needed a solution that could sort any list of strongly-typed business objects by any number of properties or fields, in any order.
Before I continue, I would like to give due credit to Marc Gravell for his article on HyperPropertyDescriptors. Without his code, retrieval of Property values would be excruciatingly slow, and the reflection used in this article would not be a reasonable means to quickly sort large lists. So, thank you Marc!
Using the Code
The solution is wrapped in our company's namespace: BinaryNorthwest
. Within the namespace, the static
class named Sorting
allows easy and quick access to sorting functionality. Here are a few usage examples:
Example 1: In-Place Sorting by Property Name
List<People> rgPeople = RetrieveListOfAllPeople();
SortInPlace<People>(rgPeople, new SortPropOrFieldAndDirection("LastName"));
Example 2: Sort Using the Values of Multiple Properties
List<SortPropOrFieldAndDirection> sortCriteria =
new List<SortPropOrFieldAndDirection>();
sortCriteria.Add(new SortPropOrFieldAndDirection("BirthDate", true));
sortCriteria.Add(new SortPropOrFieldAndDirection("SSN", false));
List<People> sortedPeopleList = Sorting.MultiSort<People>(rgPeople, sortCriteria);
Example 3: Force a Numeric Sort on a String Property (Manual Sort Type Override)
SortInPlace<People>(rgPeople, new SortPropOrFieldAndDirection
("AuditionRanking", SortType.eInteger));
How Does it Work?
One easy way to sort a list of objects of type T
(List<T>
, where T
is a class with multiple fields and/or properties) is to use the Sort(IComparer<T>
comparer) method. This method takes one parameter of type IComparer<T>
. This delegate's task is to perform comparisons on the field or property of interest for any two items in the list and return results of the comparison. The delegate function (int IComparer<T>.Compare(T x, T y))
is passed two parameters, both instances of class T
(T x
, T y
) and is responsible for performing a comparison and returning -1
(iff x < y), 0
(iff x == y), or 1
(iff x > y).
The SortPropOrFieldAndDirection
class is responsible for taking a property (or field) name and a sort direction and for constructing a class that appropriately implements IComparer<T>
. The generated IComparer<T>
-derived class retrieves and compares property values for the specified property name (or field name).
To obtain an appropriate IComparer<T>
, MultiSort
uses public class SortPropOrFieldAndDirection
to specify sort order and direction, and to supply an appropriate CompareProperties
or CompareFields
class instance (both CompareProperties
and CompareFields
implement IComparer<T>
).
public IComparer<T> GetComparer<T>() where T : class
{
if (NameIsPropertyName)
{
CompareProperties<T> comp = new CompareProperties<T>
(sPropertyOrFieldName, fSortDescending, sortType);
comp.pi = pi;
comp.Initialize();
return comp;
}
else
{
CompareFields<T> comp = new CompareFields<T>
(sPropertyOrFieldName, fSortDescending, sortType);
comp.fi = fi;
comp.Initialize();
return comp;
}
}
Note: SortPropertyAndDirection
and SortFieldAndDirection
derive from SortPropOrFieldAndDirection
and can be used for coding clarity (to disambiguate between field-value-based versus property-value-based sorting criteria).
CompareFields
and CompareProperties
both implement IComparer<T>
, and their respective Initialize()
methods are responsible for setting up the property and/or field comparison logic. Here is the implementation for CompareProperties<T>.Initialize()
:
public void Initialize()
{
if (fFoundProperty == false)
{
fFoundProperty = true;
if (pi == null)
{
PropertyDescriptorCollection props = TypeDescriptor.GetProperties(typeof(T));
property = props[sPropertyName];
pi = typeof(T).GetProperty(sPropertyName);
if (pi == null)
{
throw new Exception("Property name " + sPropertyName +
" not found while trying to compare objects of type " +
typeof(T).Name);
}
}
typ = pi.PropertyType;
if (sortType == SortType.eUsePropertyOrFieldType)
{
sortType = Sorting.GetSortTypeEnumForType(typ);
if (typ == typeof(string))
{
if (stringComparisonToUse == StringComparison.Ordinal)
DoCompare = StringCompareOrdinal;
else
DoCompare = StringCompare;
}
else if (typ == typeof(int) && !fSortDescending) DoCompare = CompareInt;
else if (typ == typeof(int)) DoCompare = CompareIntDesc;
else if (typ == typeof(DateTime)) DoCompare = CompareDates;
else if (typ == typeof(long)) DoCompare = CompareTypeSensitive<long>;
else if (typ == typeof(double)) DoCompare = CompareTypeSensitive<double>;
else if (typ == typeof(float)) DoCompare = CompareTypeSensitive<float>;
else if (typ == typeof(short)) DoCompare = CompareTypeSensitive<short>;
else if (typ == typeof(byte)) DoCompare = CompareTypeSensitive<byte>>;
else if (typ == typeof(bool)) DoCompare = CompareTypeSensitive<bool>>;
else DoCompare = CompareUsingToString;
}
else
{
if (sortType == SortType.eString) DoCompare = CompareUsingToString;
else if (sortType == SortType.eByte) DoCompare = CompareUsingToByte;
else if (sortType == SortType.eDateTime) DoCompare = CompareUsingToDate;
else if (sortType == SortType.eInteger) DoCompare = CompareUsingToInt;
else if (sortType == SortType.eLong) DoCompare = CompareUsingToInt64;
else if (sortType ==
SortType.eDoubleOrFloat) DoCompare = CompareUsingToDouble;
else DoCompare = CompareUsingToString;
}
}
}
Each property comparison delegate is fairly compact and contains code similar to the following (the below code compares two integers):
public int CompareInt(T x, T y)
{
int oX = (int)property.GetValue(x);
int oY = (int)property.GetValue(y);
return (oX < oY) ? -1 : ((oX == oY) ? 0 : 1);
}
Summary: Sorting Using a Single Field/Property
Step #1: Creation of SortPropOrFieldAndDirection
:
SortPropertyAndDirection sortBy = new SortPropertyAndDirection("AssignedTo");
Step #2: After we have specified criteria by which to sort and wish to do an in-place, single-property sort, simply call the Sorting.SortInPlace
method:
Sorting.SortInPlace<WorkItem>(AllWorkItems, sortBy);
Under the covers, the SortInPlace
method performs the following actions:
- Retrieves an appropriate
CompareProperties<T>
(implements IComparer<T>
) for the Property named AssignedTo
- Calls
CompareProperties<T>.Initialize()
to look up and cache the TypeDescriptor
that will be used to retrieve property values from the AssignedTo
property - Calls the .NET
Sort(IComparer<T>)
method to perform the sort using our own criteria
These steps are executed by the following code:
IComparer<T> compare = sortBy.GetComparer<T>();
lock (ListToBeSorted)
{
ListToBeSorted.Sort(compare);
}
"ORDER-BY"-like Sorting (Multi-Property Sorting)
In brief, multi-property sorting first sorts a list using the primary sort criterion, and then sorts each sorted sub-list using the secondary criterion, and then sorts each sorted sub-sub-list using the ternary criterion, etc, etc, etc. Consider the following code:
List<SortPropOrFieldAndDirection> sortCriteria = new List<SortPropOrFieldAndDirection>();
sortCriteria.Add(new SortPropOrFieldAndDirection("BirthDate", true));
sortCriteria.Add(new SortPropOrFieldAndDirection("SSN", false));
List<People> sortedPeopleList = Sorting.MultiSort<People>(rgPeople, sortCriteria);
The above code is similar to adding an ORDER BY BirthDate DESC, SSN
to the end of a T-SQL statement; the resulting (sorted) list contains all entries sorted descending by BirthDate
. If any two (or more) people share the same birthday, those two (or more) people will be sorted by SSN
.
We've already gone over the SortPropOrFieldAndDirection<T>
class; it has the same purpose in MultiSort
: to return an applicable implementation of IComparer<T>
. The MultiSort
method itself executes as follows:
- Retrieve an appropriate
IComparer<T>
for the first criteria specified and sort the entire list (by BirthDate
, descending). - Break the sorted list into sublists so that objects in each sublist have the same property value for the property we just sorted. (In the above example, each sublist would contain all people who share the same
BirthDate
). - Sort each sublist using the next sort criteria (
SSN
). - If more sort criteria exist, return to step #2, creating smaller sublists for each sorted sublist.
- Combine all sublists into a final, (now) fully sorted list.
This is not the most memory-efficient solution, nor is the process of breaking a list into sublists as optimized as it could be. However, adding these optimizations significantly increases the complexity of the code while offering only minimal performance gain under normal use conditions (sorting lists of 20,000-50,000 objects on 2-3 properties). Simply put, the code is pretty darn speedy as it is.
Without further ado, the code for multi-sort is as follows:
public static List<T> MultiSort<T>(List<T> ListToBeSorted,
List<SortPropOrFieldAndDirection> rgSortBy) where T : class
{
List<T> results;
lock (ListToBeSorted)
{
results = new List<T>(ListToBeSorted);
}
if (rgSortBy.Count == 1)
{
SortInPlace<T>(results, rgSortBy[0]);
return results;
}
try
{
List<List<T>> rgCopies = new List<List<T>>(1);
rgCopies.Add(results);
int sortByCount = rgSortBy.Count;
for (int i = 0; i < sortByCount; i++)
{
SortPropOrFieldAndDirection sortBy = rgSortBy[i];
if (string.IsNullOrEmpty(sortBy.sPropertyOrFieldName))
throw new Exception("MultiSort parameter rgSortBy was passed
an empty field name in rgSortBy [" + i.ToString() + "]");
IComparer<T> compare = sortBy.GetComparer<T>();
foreach (List<T> lst in rgCopies) { lst.Sort(compare); }
if (i < sortByCount - 1)
{
List<List<T>> rgNewCopies = new List<List<T>>(rgCopies.Count * 8);
for (int n = 0; n < rgCopies.Count; n++)
{
List<T> rgList = rgCopies[n];
List<T> rgSublist = new List<T>(rgList.Count / 32);
for (int j = 0; j < rgList.Count; j++)
{
T item = rgList[j];
if (j > 0)
{
T itemprev = rgList[j - 1];
if (compare.Compare(item, itemprev) == 0)
{
rgSublist.Add(item);
}
else
{
rgNewCopies.Add(rgSublist);
rgSublist = new List<T>(rgList.Count / 32);
rgSublist.Add(item);
}
}
else
{
rgSublist.Add(item);
}
}
rgNewCopies.Add(rgSublist);
}
rgCopies = rgNewCopies;
}
}
results.Clear();
foreach (List<T> rgList in rgCopies) { results.AddRange(rgList); }
return results;
}
catch (Exception ex)
{
throw new Exception
("Exception in MultiSort while sorting a list of " + typeof(T).Name, ex);
}
}
Isn't it Slow Using Reflection?
Using regular reflection (PropertyInfo.GetValue()
) is generally fairly slow and therefore is not appropriate for comparing hundreds of thousands of values. Consider the following code:
public int CompareInt(T x, T y)
{
int oX = (int)property.GetValue(x);
int oY = (int)property.GetValue(y);
return (oX < oY) ? -1 : ((oX == oY) ? 0 : 1);
}
This code may be called thousands or hundreds of thousands of times to compare property values of two objects. To alleviate the performance bottleneck, I am using Marc Gravell's HyperPropertyDescriptor
projected, posted here on CodeProject. In summary, it enables extremely lightweight and fast property retrieval by generating code to access the class' properties on-the-fly. Since MultiSortLib
uses TypeDescriptor
s and not PropertyInfo
for retrieving property values, simply attach [TypeDescriptionProvider(typeof(Hyper.ComponentModel.HyperTypeDescriptionProvider))]
onto your class definition for incredibly fast property value lookups via reflection.
The graph below shows time taken to sort lists of various length using either value-retrieval by Field (FieldInfo.GetValue
) or Property (PropertyDescriptor.GetValue
), sorting by either a single field or property, or three fields or properties.
As expected, sorting on a single field or property is faster than sorting by multiple fields or properties. The above graph also shows that retrieving property values via Marc's HyperPropertyDescriptor
is significantly faster than retrieving field values via FieldInfo.GetValue()
; sorting a list of 65,536 items using three criteria, retrieving values via FieldInfo.GetValue()
took a little over 2000ms. Sorting the same 65,536 items using the same three criteria, retrieving values via PropertyDescriptor.GetValue()
took less than 280ms.
The attached code also contains a simple demo project. The demo project generates from 64 to 100,000 fictional "Work Items" with dummy property values. To retrieve sort timing information, sort by clicking on the column headers or by using the multi-sort combo boxes on the right. Number of milliseconds elapsed to sort the item list is displayed in the lower right corner. You can also choose (via radio button) to sort by retrieving Field values or Property values - useful if you're interested in relative performance.
Points of Interest
This solution was fairly easy to design and implement, but it took some time with the profiler before I realized just how slow property access was when retrieving values via PropertyInfo.GetValue()
. Again, many thanks to Marc Gravell for his implementation of HyperTypeDescriptor
. If you want to see just how slow PropertyInfo.GetValue()
is, simply remove the TypeDescriptionProvider
attribute from the WorkItem
class definition.
History
08/28/07 - Per request, some additions to the demo project:
- A custom attribute has been defined and added to each
WorkItem
property to specify the field name associated with the property. Example:
[PropertyFieldName("sAssignedTo")]
public string AssignedTo
{
get
{
...
}
set
{
...
}
}
- Properties of class
WorkItem
are no longer hardcoded and are now discovered using reflection:
PropertyInfo[] rgProperties = typeof(WorkItem).GetProperties();
for (int i = 0; i < rgProperties.Length; i++)
{
string sPropertyName = rgProperties[i].Name;
AllPropertyNames.Add(new WorkItemPropertyName(sPropertyName));
}
- A faster method to compare
Enum
string representations (around 10x faster than comparing enum
s using their ToString()
method) has been added. Sorting by enum
property TaskBoredom
is now much faster. The code is below:
private static Dictionary<int, string> FastEnumLookup;
public int FastCompareEnumsAsc(T x, T y)
{
int oX = (int)property.GetValue(x);
int oY = (int)property.GetValue(y);
string s1, s2;
if (!FastEnumLookup.TryGetValue(oX, out s1))
{
Enum eX = (Enum)property.GetValue(x);
s1 = eX.ToString();
FastEnumLookup.Add(oX, s1);
}
if (!FastEnumLookup.TryGetValue(oY, out s2))
{
Enum eY = (Enum)property.GetValue(y);
s2 = eY.ToString();
FastEnumLookup.Add(oY, s2);
}
return s1.CompareTo(s2);
}
Any comments/suggestions are welcome - I'd be happy to enhance and publish future versions to CodeProject.
-- Owen Emlen (owen@binarynorthwest.com), Member BrainTechLLC, Partner BinaryNorthwest