Click here to Skip to main content
15,878,748 members
Articles / Desktop Programming / System
Tip/Trick

Sorting List using DataTable vs. Linq vs. Tuple

Rate me:
Please Sign up or sign in to vote.
4.83/5 (3 votes)
9 Feb 2015CPOL4 min read 29.7K   138   7   6
Various ways to sort list and compare results

Introduction

The goal of this tip is to present three different ways of sorting data list extracted from a text file. In addition, it will calculate time it takes to sort data using these different methods and determine the fastest method.

Background

When I first got this requirement, I quickly wrote a method to sort using DataTable. We have an application that lets user run SQL query similar to Query Designer without ever having to write SQL. It exposes the result list within the application to modify/reformat the list and save it as an extract as specified by the user. But the method I wrote wasn't working within the application because it was written some time back and DataTable namespace wasn't included. I had to use some basic C# features that would allow my code module to compile/execute within this application. It aroused my interest in figuring out different sorting methods this application allowed. There are various other ways to do this but my focus is DataTable, Linq, and Tuple and determine which one was the fastest.

Using the Code

Please download the Checks.txt file to this directory below or change the location of the file within the code.

C#
const string f = @"C:\Users\Public\Documents\Checks.txt";

All the sorted files will be saved to this same location including Test Results file.

There are 7 methods including main.

C#
public void ProcessExtractUsingDataTable()

This uses data table to sort list from text file and calculates completion time.

C#
public void ProcessExtractUsingLinq()

This uses linq to sort list from text file and calculates completion time.

C#
public void ProcessExtractUsingTuple()

This uses tuple to sort list from text file and calculates completion time.

C#
public static void ShowResults()

This displays results in console, saves result to text file and calls method to save sorted lists.

C#
public static void SaveSortedExtract()

This saves sorted lists to different text files.

C#
public static int CompareBankids()

This compares tuple list to sort by bank ids.

C#
static void Main(string[] args)
{
    Program p = new Program();
    const string f = @"C:\Users\Public\Documents\Checks.txt";

    string newFileName = Path.ChangeExtension(f, null);

    string[] contents = File.ReadAllLines(f);
    if (contents.Length <= 1)
        throw new Exception(String.Format("File {0} doesn't have enough rows", f));
    string HeaderRow = contents[0];

    List<string> arRetVal = contents.Skip<string>(1).ToList();

Here in our main method, I’m reading the text file and adding them to List<string>. I’m separating the header row before adding just the data to our list for sorting.

C#
public void ProcessExtractUsingDataTable(List<string> FileRows, string HeaderRow, string newFileName)

This method takes in List<string> passed from our main method call.

C#
funcName = System.Reflection.MethodBase.GetCurrentMethod().Name;

I’m using System.Reflection namespace to get the Base Method Name so I can save each sorted list to separate text file.

This is where I’m starting my timer to calculate total time it takes to create table and sort.

C#
//Timer Start
var timer = System.Diagnostics.Stopwatch.StartNew();

DataTable table = new DataTable();
table.Columns.Add("BankID", typeof(long));
table.Columns.Add("DDA", typeof(long));
table.Columns.Add("DepositDate", typeof(string));
table.Columns.Add("TotalAmount", typeof(double));
table.Columns.Add("TotalChecks", typeof(double));

// Loop through each list row, add to string array to split columns
// then add it to Data Table
foreach (string rowText in FileRows)
{
    string[] strArray = rowText.Split(',');
    table.Rows.Add(strArray);
}

DataView dv = table.DefaultView;
dv.Sort = "BankID";
DataTable sortedDT = dv.ToTable();

//Timer End
timer.Stop();

Once DataTable is sorted, I stopped the timer to calculate elapsed time.

Apparently, it requires bit more lines of code to put it back to List<string>.

C#
    StringBuilder RowAppend = new StringBuilder();
    List<string> arRetVal = new List<string>();
    foreach (DataRow row in sortedDT.Rows)
    {
        foreach (DataColumn Col in sortedDT.Columns)
        {
            RowAppend.Append(row[Col].ToString() + ",").ToString();
        }
        string test = RowAppend.ToString();
        test = test.Remove(test.Length - 1);

        arRetVal.Add(test);
        RowAppend.Clear();
    }
    ShowResults(arRetVal, HeaderRow, newFileName,
funcName, timer);
}

Towards the end, I’m passing the sorted list to ShowResults that will be saved to a text file.

C#
public void ProcessExtractUsingLinq(List<string> FileRows, string HeaderRow, string newFileName)
{
    funcName = System.Reflection.MethodBase.GetCurrentMethod().Name;
    var timer = System.Diagnostics.Stopwatch.StartNew();

    List<string> arRetVal = FileRows.ToDictionary
    (
        itm => itm.Substring(0, itm.IndexOf(','))
    )
    .OrderBy(kvp => Convert.ToInt32(kvp.Key))
    .Select(kvp => kvp.Value).ToList();

    timer.Stop();
    ShowResults(arRetVal, HeaderRow, newFileName, funcName, timer);

In this ProcessExtractUsingLinq method, I’m using extension method from an IEnumerable to ToDictionary, sort the BankId in ascending order and convert it back to list. Overall, I like the Linq process better, because the code is clean and concise. But, it was bit confusing to me at first when I was learning to understand Linq and I’m sure others feel the same way.

Now, this method of sorting using Tuple is interesting. I didn’t even know about tuple before my colleague Charlie mentioned it while we were talking. Tuple is used specifically for sorting operations. It is an ordered list of elements and you’ll have to specify the type of each item in the type parameter list.

If you want to learn more about Tuple, here are some links:

C#
public void ProcessExtractUsingTuple(List<string> FileRows, string HeaderRow, string newFileName)
        {
            funcName = System.Reflection.MethodBase.GetCurrentMethod().Name;
            //Timer Start
            var timer = System.Diagnostics.Stopwatch.StartNew();
            List<string> arRetVal = new List<string>();
            List<Tuple<long, long, string, double, int>> rows = 
            new List<Tuple<long, long, string, double, int>>();

Here, I’m creating a list of type Tuple that has 5 items.

C#
   // Loop through each list row and add to tuple list
    foreach (string rowText in FileRows)
    {
        string[] strArray = rowText.Split(',');
        rows.Add(new Tuple<long, long, string, double, int>
(Convert.ToInt64(strArray[0]), Convert.ToInt64(strArray[1]),
strArray[2].ToString(), Convert.ToDouble(strArray[3]), Convert.ToInt32(strArray[4])));
    }

Once all the rows from the list are added to the tuple list, Comparision(T) overloaded sorting method.

C#
    rows.Sort(CompareBankids);
    timer.Stop();

    foreach (Tuple<long, long, string, double, int> row in rows)
    {
        arRetVal.Add(string.Concat(row.Item1, ",",
row.Item2.ToString(), ",", row.Item3.ToString(), ",", row.Item4, ",", row.Item5));
    }
    ShowResults(arRetVal, HeaderRow, newFileName, funcName,
        timer);
}
public static int CompareBankids(Tuple<long, long, string, double, int> x,
Tuple<long, long, string, double, int> y)
{
    return x.Item1 < y.Item1 ? -1 : (x.Item1 > y.Item1 ? 1 : 0);
}

Conclusion

Although this application may not be completely accurate for calculating actual sorting time, I just wanted to present something quick and dirty way to calculate time it takes for each of these 3 methods to complete sorting. Furthermore, .NET uses Quicksort algorithm using Pivot to do its sorting so calculation for actual sorting will probably take the same amount of time. Within the method, the calculation accounts for memory allocation, resources, execution time, etc, which is different for each method. It is obvious that DataTable is the slowest of the 3 because it is using the most resources. Tuple method was the fastest of all three.

Method: ProcessExtractUsingLinq took 00:00:0047033 to complete sort.
---------------------------------------------------------------------------
Method: ProcessExtractUsingDataTable took 00:00:0223600 to complete sort.
---------------------------------------------------------------------------
Method: ProcessExtractUsingTuple took 00:00:0020698 to complete sort.
--------------------------------------------------------------------------- 

For further reading, here is a great article from dotnetperls website on Tuple vs. KeyValuePair performance. It also concludes that Tuple is faster.

History

  • 14th February, 2015: Suggestions from @xs2008 applied
  • 9th February, 2015: Initial version

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
United States United States
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions

 
QuestionBig Help! Pin
MaryOB770621-Mar-18 18:41
MaryOB770621-Mar-18 18:41 
QuestionStop the timer Pin
titolr103-Mar-15 23:34
titolr103-Mar-15 23:34 
AnswerRe: Stop the timer Pin
Bhis4-Mar-15 7:28
Bhis4-Mar-15 7:28 
GeneralRe: Stop the timer Pin
titolr104-Mar-15 10:43
titolr104-Mar-15 10:43 
Suggestionsome improvements Pin
sx200813-Feb-15 11:32
sx200813-Feb-15 11:32 
GeneralRe: some improvements Pin
Bhis14-Feb-15 8:52
Bhis14-Feb-15 8:52 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.