Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / C#

Generic Multi-Field/Property Sorting for Lists of Business Objects

4.89/5 (13 votes)
13 Feb 2008CPOL8 min read 3   497  
This article presents a simple and flexible way to sort strongly-typed lists of business objects using multiple properties or fields.
Screenshot - MultiSort.jpg

Introduction

The following article presents a generic solution that mimics SQL's ORDER BY clause functionally, but operates on strongly typed lists, in memory. The presented solution can be used to sort strongly-typed lists of business objects using any number of properties or fields, ascending or descending. The solution is generic in that no significant change to your class or business-object code is required to sort lists of your own custom type. The sort routine uses Marc Gravell's HyperPropertyDescriptor for fast property value retrieval.

Background

I am currently working on a Web project designed to help a large organization keep track of membership, publish fuzzy-searchable online directories, print mailing labels, manage online music clips for student auditions, and plan and schedule events.

The back-end database used to store our data consists of approximately 40 tables. Since we will eventually be handing this project over to a possibly less-experienced programmer when we are finished, one of our goals is to keep data access simple and minimize needless programmer fiddling at the SQL/table level. With this in mind, we decided to take a multi-tiered approach to data access. Skip ahead several months, and a fairly elegant data abstraction layer (DAL) has been built and is working nicely, serving as the go-between for database tables and in-memory business objects.

One of the many bonuses of having a data abstraction layer is the ability to cache query results in-memory, cutting down significantly on the number of database round-trips and general database load. However, for obvious reasons, lists (of people, locations, etc.) must be returned in different sort orders on different Web pages and for different users. For example, one user may want to sort by Last Name, First Name, while another user wants to sort by ID. These sort requirements could easily counteract any benefit of using cached data if the programmer must re-issue new SQL queries with changed ORDER BY clauses each time we would like to sort by a different set of fields. An alternate that would avoid increased database traffic and allow for in-memory sorting would be to add property comparison routines to each business object type.

I wasn't thrilled with either of these options, so (neither wanting to rewrite the business objects nor to fiddle with existing class definitions), I decided to implement a generic solution that would mimic SQL's ORDER BY clause on in-memory lists. Specifically, I needed a solution that could sort any list of strongly-typed business objects by any number of properties or fields, in any order.

Before I continue, I would like to give due credit to Marc Gravell for his article on HyperPropertyDescriptors. Without his code, retrieval of Property values would be excruciatingly slow, and the reflection used in this article would not be a reasonable means to quickly sort large lists. So, thank you Marc!

Using the Code

The solution is wrapped in our company's namespace: BinaryNorthwest. Within the namespace, the static class named Sorting allows easy and quick access to sorting functionality. Here are a few usage examples:

Example 1: In-Place Sorting by Property Name

C#
// Sort a list of People by last name 
List<People> rgPeople = RetrieveListOfAllPeople();

// Sort the list by the values in the "LastName" property of the "People" class 
SortInPlace<People>(rgPeople, new SortPropOrFieldAndDirection("LastName"));

Example 2: Sort Using the Values of Multiple Properties

C#
// Sort a list of people by Birth date (descending), then SSN (ascending) 
// Note: the sort will automatically compare dates by examining 
// the "BirthDate" property type 
List<SortPropOrFieldAndDirection> sortCriteria =
    new List<SortPropOrFieldAndDirection>();

// Set up the sort criteria
sortCriteria.Add(new SortPropOrFieldAndDirection("BirthDate", true));
sortCriteria.Add(new SortPropOrFieldAndDirection("SSN", false));

// Perform the multi-property sort 
List<People> sortedPeopleList = Sorting.MultiSort<People>(rgPeople, sortCriteria);

Example 3: Force a Numeric Sort on a String Property (Manual Sort Type Override)

C#
// Force a string field that contains numbers (AuditionRanking) 
// to sort using number-comparison criteria 
// (Example: 10 comes after 9 using numeric comparison, 
// but "10" comes before "9" using string comparison) 
SortInPlace<People>(rgPeople, new SortPropOrFieldAndDirection
    ("AuditionRanking", SortType.eInteger));

How Does it Work?

One easy way to sort a list of objects of type T (List<T>, where T is a class with multiple fields and/or properties) is to use the Sort(IComparer<T> comparer) method. This method takes one parameter of type IComparer<T>. This delegate's task is to perform comparisons on the field or property of interest for any two items in the list and return results of the comparison. The delegate function (int IComparer<T>.Compare(T x, T y)) is passed two parameters, both instances of class T (T x, T y) and is responsible for performing a comparison and returning -1 (iff x < y), 0 (iff x == y), or 1 (iff x > y).

The SortPropOrFieldAndDirection class is responsible for taking a property (or field) name and a sort direction and for constructing a class that appropriately implements IComparer<T>. The generated IComparer<T>-derived class retrieves and compares property values for the specified property name (or field name).

To obtain an appropriate IComparer<T>, MultiSort uses public class SortPropOrFieldAndDirection to specify sort order and direction, and to supply an appropriate CompareProperties or CompareFields class instance (both CompareProperties and CompareFields implement IComparer<T>).

C#
/// <summary>        
/// Retrieves an IComparer of type T, depending on whether the instance 
/// of this class (or a derived class) references a property or field 
/// </summary>

public IComparer<T> GetComparer<T>() where T : class
{
    if (NameIsPropertyName)
    {
        CompareProperties<T> comp = new CompareProperties<T>
                (sPropertyOrFieldName, fSortDescending, sortType);
        comp.pi = pi;
        comp.Initialize();
        return comp;
    }
    else
    {
        CompareFields<T> comp = new CompareFields<T>
                (sPropertyOrFieldName, fSortDescending, sortType);
        comp.fi = fi;
        comp.Initialize();
        return comp;
    }
}

Note: SortPropertyAndDirection and SortFieldAndDirection derive from SortPropOrFieldAndDirection and can be used for coding clarity (to disambiguate between field-value-based versus property-value-based sorting criteria).

CompareFields and CompareProperties both implement IComparer<T>, and their respective Initialize() methods are responsible for setting up the property and/or field comparison logic. Here is the implementation for CompareProperties<T>.Initialize():

C#
/// <summary>
/// Sets up cached PropertyInfo and determines the best delegate to use 
/// to compare values retrieved from that property. 
/// </summary>

public void Initialize()
{
    if (fFoundProperty == false)
    {
        // Only look up the Property TypeDescriptor once (the first time) 
        fFoundProperty = true;
        if (pi == null)
        {
            // Find the TypeDescriptor for the property (and confirm it exists) 
            // Use this format to take advantage of HyperPropertyDescriptors 
            // (Marc Gravell) 
            PropertyDescriptorCollection props = TypeDescriptor.GetProperties(typeof(T));
            property = props[sPropertyName];
            pi = typeof(T).GetProperty(sPropertyName);

            if (pi == null)
            {
                throw new Exception("Property name " + sPropertyName +
                " not found while trying to compare objects of type " +
                typeof(T).Name);
            }
        }

        typ = pi.PropertyType;

        // Set up the property comparison delegate to use based on the type of 
        // values we will be comparing
        
        if (sortType == SortType.eUsePropertyOrFieldType)
        {
            // If we are using the reflected property type, we know that 
            // no type conversion is needed prior to comparison. 
            // Use fast direct-cast comparison routines 
            sortType = Sorting.GetSortTypeEnumForType(typ);
            if (typ == typeof(string))
            {
                if (stringComparisonToUse == StringComparison.Ordinal)
                   DoCompare = StringCompareOrdinal;
                else 
           DoCompare = StringCompare;
            }
            else if (typ == typeof(int) && !fSortDescending) DoCompare = CompareInt;
            else if (typ == typeof(int)) DoCompare = CompareIntDesc;
            else if (typ == typeof(DateTime)) DoCompare = CompareDates;
            else if (typ == typeof(long)) DoCompare = CompareTypeSensitive<long>;
            else if (typ == typeof(double)) DoCompare = CompareTypeSensitive<double>;
            else if (typ == typeof(float)) DoCompare = CompareTypeSensitive<float>;
            else if (typ == typeof(short)) DoCompare = CompareTypeSensitive<short>;
            else if (typ == typeof(byte)) DoCompare = CompareTypeSensitive<byte>>;
            else if (typ == typeof(bool)) DoCompare = CompareTypeSensitive<bool>>;
            else DoCompare = CompareUsingToString;
        }
        else
        {
            // We are comparing using a user-specified type. 
            // A type conversion may be necessary 
            // and we may run into invalid cast exceptions - 
            // use more robust comparison routines 
            if (sortType == SortType.eString) DoCompare = CompareUsingToString;
            else if (sortType == SortType.eByte) DoCompare = CompareUsingToByte;
            else if (sortType == SortType.eDateTime) DoCompare = CompareUsingToDate;
            else if (sortType == SortType.eInteger) DoCompare = CompareUsingToInt;
            else if (sortType == SortType.eLong) DoCompare = CompareUsingToInt64;
            else if (sortType == 
                SortType.eDoubleOrFloat) DoCompare = CompareUsingToDouble;
            else DoCompare = CompareUsingToString;
        }
    }
}

Each property comparison delegate is fairly compact and contains code similar to the following (the below code compares two integers):

C#
public int CompareInt(T x, T y)
{
    int oX = (int)property.GetValue(x);
    int oY = (int)property.GetValue(y);
    return (oX < oY) ? -1 : ((oX == oY) ? 0 : 1);
}

Summary: Sorting Using a Single Field/Property

Step #1: Creation of SortPropOrFieldAndDirection:

C#
SortPropertyAndDirection sortBy = new SortPropertyAndDirection("AssignedTo");

Step #2: After we have specified criteria by which to sort and wish to do an in-place, single-property sort, simply call the Sorting.SortInPlace method:

C#
Sorting.SortInPlace<WorkItem>(AllWorkItems, sortBy);

Under the covers, the SortInPlace method performs the following actions:

  1. Retrieves an appropriate CompareProperties<T> (implements IComparer<T>) for the Property named AssignedTo
  2. Calls CompareProperties<T>.Initialize() to look up and cache the TypeDescriptor that will be used to retrieve property values from the AssignedTo property
  3. Calls the .NET Sort(IComparer<T>) method to perform the sort using our own criteria

These steps are executed by the following code:

C#
// Retrieve an IComparer that contains logic for sorting 
// this specific business object 
// type by the specified criteria 
IComparer<T> compare = sortBy.GetComparer<T>();

// lock the list for thread safety and then call 
// the .NET Sort(IComparer<T>) method 
lock (ListToBeSorted)
{
    ListToBeSorted.Sort(compare);
}

"ORDER-BY"-like Sorting (Multi-Property Sorting)

In brief, multi-property sorting first sorts a list using the primary sort criterion, and then sorts each sorted sub-list using the secondary criterion, and then sorts each sorted sub-sub-list using the ternary criterion, etc, etc, etc. Consider the following code:

C#
// Sort a list of people by Birth date (descending), then SSN (ascending) 
// Note: the sort will automatically compare dates by examining the 
// "BirthDate" property type 
List<SortPropOrFieldAndDirection> sortCriteria = new List<SortPropOrFieldAndDirection>();

// Set up the sort criteria
sortCriteria.Add(new SortPropOrFieldAndDirection("BirthDate", true));
sortCriteria.Add(new SortPropOrFieldAndDirection("SSN", false));

// Perform the multi-property sort 
List<People> sortedPeopleList = Sorting.MultiSort<People>(rgPeople, sortCriteria);

The above code is similar to adding an ORDER BY BirthDate DESC, SSN to the end of a T-SQL statement; the resulting (sorted) list contains all entries sorted descending by BirthDate. If any two (or more) people share the same birthday, those two (or more) people will be sorted by SSN.

We've already gone over the SortPropOrFieldAndDirection<T> class; it has the same purpose in MultiSort: to return an applicable implementation of IComparer<T>. The MultiSort method itself executes as follows:

  1. Retrieve an appropriate IComparer<T> for the first criteria specified and sort the entire list (by BirthDate, descending).
  2. Break the sorted list into sublists so that objects in each sublist have the same property value for the property we just sorted. (In the above example, each sublist would contain all people who share the same BirthDate).
  3. Sort each sublist using the next sort criteria (SSN).
  4. If more sort criteria exist, return to step #2, creating smaller sublists for each sorted sublist.
  5. Combine all sublists into a final, (now) fully sorted list.

This is not the most memory-efficient solution, nor is the process of breaking a list into sublists as optimized as it could be. However, adding these optimizations significantly increases the complexity of the code while offering only minimal performance gain under normal use conditions (sorting lists of 20,000-50,000 objects on 2-3 properties). Simply put, the code is pretty darn speedy as it is.

Without further ado, the code for multi-sort is as follows:

C#
/// <Summary>
/// Sorts a given list using multiple sort (comparison) criteria, similar
/// to an SQL "ORDER BY" clause
/// Example: Specifying the sort/comparison criteria
/// (1) "State" and (2) "Zipcode" using a list of
/// address entries will result in a sorted list containing
/// address entries FIRST sorted by state
/// (starting with the A* states - Alabama, Alaska, etc).
/// Then, the address entries
/// associated with each respective state will be further sorted
/// according to Zipcode.
/// </Summary>
/// <Typeparam name="T">The type of elements in the list to sort<Typeparam>
/// <param name="ListToBeSorted">The original list. 
/// A new, sorted list will be returned.</param>
/// <param name="rgSortBy">A list of ordered criteria specifying 
/// how the list should be sorted.</param>
/// <Returns>A new list containing all elements of ListToBeSorted,
/// sorted by the criteria specified in rgSortBy.
/// </Returns>
public static List<T> MultiSort<T>(List<T> ListToBeSorted,
    List<SortPropOrFieldAndDirection> rgSortBy) where T : class
{
    List<T> results;
    // For thread safety, make a copy of the list up front.
    // Note that you still need to
    // lock the same instance of the list when/if it is modified
    // by other threads.
    lock (ListToBeSorted)
    {
        results = new List<T>(ListToBeSorted);
    }

    if (rgSortBy.Count == 1)
    {
        // if only sorting a single time, 
        // just call the basic in-place sorting on our copied "results" list 
        SortInPlace<T>(results, rgSortBy[0]);
        return results;
    }

    try
    {
        List<List<T>> rgCopies = new List<List<T>>(1);
        rgCopies.Add(results);
        int sortByCount = rgSortBy.Count;

        // For each criterion in the list of comparison criteria, 
        // one or more lists must be sorted. 
        // Each time a list is sorted, one or more sublists may be created. 
        // Each sublist contains items that were deemed 
        // to be "equivalent" according to the comparison criterion. 
        // Example: After sorting addresses entries by state 
        // you may have multiple sublists, each containing all of 
        // the address entries associated with a given state. 
        // Note: this is not the most efficient method 
        // (especially in terms of memory!), but it 
        // is sufficient in most scenarios and is easier to understand 
        // than many other methods of sorting a list using multiple criteria. 
        for (int i = 0; i < sortByCount; i++)
        {
            SortPropOrFieldAndDirection sortBy = rgSortBy[i];
            if (string.IsNullOrEmpty(sortBy.sPropertyOrFieldName))
                throw new Exception("MultiSort parameter rgSortBy was passed 
                     an empty field name in rgSortBy [" + i.ToString() + "]");

            // Retrieve an IComparer that contains logic for sorting 
            // this specific business object 
            // type by the specified criteria 
            IComparer<T> compare = sortBy.GetComparer<T>();

            // Sort each sublist using the created IComparer<T> 
            foreach (List<T> lst in rgCopies) { lst.Sort(compare); }

            if (i < sortByCount - 1)
            {
                // Create new sublists by searching for the 
                // sorted-by value boundaries/changes                
                // Our "best guess" (i.e. shot in the dark) 
                // is that we will create at least 8 sublists 
                // from the original list. NOT terribly efficient, 
                // but often sufficient.
                
                // Some advanced methods involve tracking 
                // duplicate values DURING the sort itself 
                List<List<T>> rgNewCopies = new List<List<T>>(rgCopies.Count * 8);

                for (int n = 0; n < rgCopies.Count; n++)
                {
                   List<T> rgList = rgCopies[n];
                   // Be conservative and set the initial sublist capacity 
                   // to a small number, but still honor the original 
                   // list's item count. (Example: If you are sorting a list 
                   // of "Address information" by Zipcode and the 
                   // list has 32,000 entries, then initialize 
                   // each sublist (each of which store all Address 
                   // information entries with the same Zipcode) 
                   // with a capacity of 1000. 32,000 / 32 = 1000 
                   List<T> rgSublist = new List<T>(rgList.Count / 32);

                   // Compare items to the item that preceeded it 
                   // to determine where the "value boundaries" 
                   // are located. If you will be sorting frequently and 
                   // do not have CPU cycles to burn :), 
                   // a smarter boundary-finding algorithm should be used. 
                   // (e.g. determine boundary locations 
                   // when comparing elements during the sort routine). 
                   // Another alternative is to take advantage of the fact 
                   // that the list is sorted and to 
                   // use a O(LogN) binary search rather than the 
                   // (currently) linear O(N) search. 
                   for (int j = 0; j < rgList.Count; j++)
                   {
                       T item = rgList[j];
                       if (j > 0)
                       {
                            // Compare the item to the preceding item 
                            // using the same comparison criterion 
                            // used during the sort                            
                            T itemprev = rgList[j - 1];

                            if (compare.Compare(item, itemprev) == 0)
                            {
                                 // The item had the same property or 
                                 // field value as the preceding item. 
                                 // Add it on to the same sublist. 
                                 rgSublist.Add(item);
                            }
                            else
                            {
                                 // The item did NOT have the same property 
                                 // or field value as the preceding item. 
                                 // "Close up" the previous sublist and 
                                 // start a new one. 
                                 rgNewCopies.Add(rgSublist);
                                 rgSublist = new List<T>(rgList.Count / 32);
                                 rgSublist.Add(item);
                            }
                        }
                        else
                        {
                            // The first item has no predecessor - 
                            // just add the item to the first sublist 
                             rgSublist.Add(item);
                        }
                    } // END: for (int j = 0; j < rgList.Count; j++) ... 

                    // Add the last created sublist to our 
                    // "master list of sublists" :P 
                    // It may be that this list has 0 elements in some cases, 
                    // but this is not a problem
                    
                    rgNewCopies.Add(rgSublist);

                } // END: for (int n = 0; n < rgCopies.Count; n++) ...                   

                  // Move to the next "level" of sublists in 
                  // preparation for further sorting using the next 
                  // sort/comparison criterion 
                  rgCopies = rgNewCopies;
             }

        } // END: for (int i = 0; i < sortByCount; i++) ...           

        // reconstruct all resorted sub-sub-sub-sub-sublists into a single, 
        // final (flat) results list 
        results.Clear();
        foreach (List<T> rgList in rgCopies) { results.AddRange(rgList); }

        return results;
    }
    catch (Exception ex)
    {
        throw new Exception
            ("Exception in MultiSort while sorting a list of " + typeof(T).Name, ex);
    }
}

Isn't it Slow Using Reflection?

Using regular reflection (PropertyInfo.GetValue()) is generally fairly slow and therefore is not appropriate for comparing hundreds of thousands of values. Consider the following code:

C#
public int CompareInt(T x, T y)
{
    int oX = (int)property.GetValue(x);
    int oY = (int)property.GetValue(y);
    return (oX < oY) ? -1 : ((oX == oY) ? 0 : 1);
}

This code may be called thousands or hundreds of thousands of times to compare property values of two objects. To alleviate the performance bottleneck, I am using Marc Gravell's HyperPropertyDescriptor projected, posted here on CodeProject. In summary, it enables extremely lightweight and fast property retrieval by generating code to access the class' properties on-the-fly. Since MultiSortLib uses TypeDescriptors and not PropertyInfo for retrieving property values, simply attach [TypeDescriptionProvider(typeof(Hyper.ComponentModel.HyperTypeDescriptionProvider))] onto your class definition for incredibly fast property value lookups via reflection.

The graph below shows time taken to sort lists of various length using either value-retrieval by Field (FieldInfo.GetValue) or Property (PropertyDescriptor.GetValue), sorting by either a single field or property, or three fields or properties.

Sort time by # of items and value retrieval method

As expected, sorting on a single field or property is faster than sorting by multiple fields or properties. The above graph also shows that retrieving property values via Marc's HyperPropertyDescriptor is significantly faster than retrieving field values via FieldInfo.GetValue(); sorting a list of 65,536 items using three criteria, retrieving values via FieldInfo.GetValue() took a little over 2000ms. Sorting the same 65,536 items using the same three criteria, retrieving values via PropertyDescriptor.GetValue() took less than 280ms.

The attached code also contains a simple demo project. The demo project generates from 64 to 100,000 fictional "Work Items" with dummy property values. To retrieve sort timing information, sort by clicking on the column headers or by using the multi-sort combo boxes on the right. Number of milliseconds elapsed to sort the item list is displayed in the lower right corner. You can also choose (via radio button) to sort by retrieving Field values or Property values - useful if you're interested in relative performance.

Points of Interest

This solution was fairly easy to design and implement, but it took some time with the profiler before I realized just how slow property access was when retrieving values via PropertyInfo.GetValue(). Again, many thanks to Marc Gravell for his implementation of HyperTypeDescriptor. If you want to see just how slow PropertyInfo.GetValue() is, simply remove the TypeDescriptionProvider attribute from the WorkItem class definition.

History

08/28/07 - Per request, some additions to the demo project:

  1. A custom attribute has been defined and added to each WorkItem property to specify the field name associated with the property. Example:

    C#
    [PropertyFieldName("sAssignedTo")]
    public string AssignedTo
    {
        get
        {
            ...
        }
        set
        {
            ...
        }
    }
  2. Properties of class WorkItem are no longer hardcoded and are now discovered using reflection:

    C#
    // Add property names via reflection 
    PropertyInfo[] rgProperties = typeof(WorkItem).GetProperties();
    for (int i = 0; i < rgProperties.Length; i++)
    {
        string sPropertyName = rgProperties[i].Name;
        AllPropertyNames.Add(new WorkItemPropertyName(sPropertyName));
    } 
  3. A faster method to compare Enum string representations (around 10x faster than comparing enums using their ToString() method) has been added. Sorting by enum property TaskBoredom is now much faster. The code is below:

    C#
    // Added to class CompareProperties<T> 
    private static Dictionary<int, string> FastEnumLookup;
    
    /// Faster than comparing enums using .ToString() 
    /// (the Enum.ToString() method appears to be fairly expensive) 
    public int FastCompareEnumsAsc(T x, T y)
    {
        int oX = (int)property.GetValue(x);
        int oY = (int)property.GetValue(y);
        string s1, s2;
    
        // Look in our dictionary/hashtable for the enum value's string 
        // representation 
        if (!FastEnumLookup.TryGetValue(oX, out s1))
        {
            // If this is the first time we have encountered the value, 
            // add it to the dictionary/hashtable        
            Enum eX = (Enum)property.GetValue(x);
            s1 = eX.ToString();
            FastEnumLookup.Add(oX, s1);
        }
        if (!FastEnumLookup.TryGetValue(oY, out s2))
        {
            Enum eY = (Enum)property.GetValue(y);
            s2 = eY.ToString();
            FastEnumLookup.Add(oY, s2);
        }
        // Perform regular string comparison 
        return s1.CompareTo(s2);
    }

Any comments/suggestions are welcome - I'd be happy to enhance and publish future versions to CodeProject.
-- Owen Emlen (owen@binarynorthwest.com), Member BrainTechLLC, Partner BinaryNorthwest

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)