15,037,662 members
Articles / General Programming / Sorting
Article
Posted 16 Dec 2013

24.1K views
20 bookmarked

# QuickSortByRange, QuickSelectByRange, PartitionByRange

Rate me:
A performance improvement for the classic sort / select algorithms.

## Introduction

The classic QuickSort and QuickSelect algorithms have poor worst-case performance. If bad pivots are consistently chosen, such as decreasing by only a single element each time, then worst-case performance is quadratic: O(n2). Duplicate values in the array are problematic for QuickSort.

This article introduces the QuickSortByRange, QuickSelectByRange, and PartitionByRange algorithms which provide a remedy to the problem with duplicates by returning a pivot range from the PartitionByRange algorithm. This approach works well when the array contains several duplicate values.

For example, if every element of the array contains the same value, then QuickSort delivers O(n2) whereas QuickSortByRange delivers O(n). Considering that the average case performance of QuickSort is O(n log n), the all-duplicates scenario becomes a better than average case scenario for QuickSortByRange as compared to the worse than average case scenario we see with QuickSort.

## Background

QuickSort and QuickSelect both rely on a partition algorithm. The partition algorithm looks at an initial (pivot) value within a range of the array and then reorganizes the values within that range so that all the values less than or equal to the pivot value are placed to the left of the pivot and the values greater than the pivot value are moved to the right of the pivot. The algorithm will move the pivot element, if necessary, to satisfy this quasi-sort and then it returns the final position of the pivot element.

For example, let's consider the following range within a larger array:

[ < , > , == , x , < , > , < , == ]

In this example, 'x' marks the pivot value, '<' indicates values less than the pivot value, '>' indicates values greater than the pivot value, and '==' indicates values equal to the pivot value.

After a single pass through the quickSort partition algorithm, these elements will have been reorganized to:

[ < , == , < , < , == , x , > , > ]

PartitionByRange takes the quasi-sort a step further. It further reorganizes the array so that elements equal to the pivot element are placed to the immediate left of the pivot element. Values less than the pivot value are placed to the left of this range of equal-value elements.

After a single pass through the PartitionByRange algorithm, these elements will have been reorganized to:

[ < , < , < , == , == , x , > , > ]

Instead of returning a single value indicating the final position of the pivot value, the PartitionByRange algorithm returns two values indicating the range of positions which contain the same value as the pivot value. This enables the QuickSortByRange and QuickSelectByRange algorithms to move the boundaries of the next pass through the PartitionByRange further to the right or left than QuickSort and QuickSelect, respectively.

## Using the Code

The code is written in C# for arrays of integers. It's fairly straightforward to port the code to another language or value type.

The attached file is an HttpHandler which is designed to run on the localhost of a machine with IIS. It includes benchmarking code to replicate the results posted at the end of this article.

One important part of the PartitionByRange code is the Stack used to store the locations of values equal to the pivot value. When porting the code to another language, ensure that a LIFO structure is used to store these indexes. An array may be used in place of a stack as long as LIFO order is used.

### PartitionByRange

C#
```public Tuple<int, int> partitionByRange(int[] a,int left,int right,int pivot) {

int pivotValue = a[pivot];
Stack<int> equalValuePositions = null;
swap(a,pivot,right);
for(int i=left;i<right;i++)
if(a[i] <= pivotValue) {
swap(a,left++,i);
if(a[i] == pivotValue) {
if(equalValuePositions == null) equalValuePositions = new Stack<int>();
equalValuePositions.Push(i);
}
}
swap(a,left,right);
var rightPivotIndex = left;
if(equalValuePositions != null) foreach(int i in equalValuePositions) swap(a,--left,i);
return new Tuple<int, int>(left,rightPivotIndex);

}

public void swap(int[] a,int i,int j){ if(i!=j){ int k=a[i]; a[i]=a[j]; a[j]=k; } }```

### QuickSortByRange

C#
```public void quickSortByRange(int[] a) {

qSortByRange(a,0,a.Length - 1,new Random());

}

public void qSortByRange(int[] a,int left,int right,Random r) {

if(left < right) {
int pivot = left + r.Next(right - left + 1);
Tuple<int, int> pivotRange = partitionByRange(a,left,right,pivot);
qSortByRange(a,left,pivotRange.Item1 - 1,r);
qSortByRange(a,pivotRange.Item2 + 1,right,r);
}

}```

### QuickSelectByRange

C#
```public int quickSelectByRange(int[] a,int n) {

return qSelectByRange(a,0,a.Length - 1,n,new Random());

}

public int qSelectByRange(int[] a,int left,int right,int n,Random r) {

if(left == right) return a[left];

int pivot = n;

while(true) {

Tuple<int, int> pivotRange = partitionByRange(a,left,right,pivot);
if(n >= pivotRange.Item1 && n <= pivotRange.Item2) return a[n];
if(n < pivotRange.Item1)
right = pivotRange.Item1 - 1;
else
left = pivotRange.Item2 + 1;
pivot = left + r.Next(right - left + 1);

}

}```

The PartitionByRange algorithm requires a Stack (or similar LIFO structure), extra comparisons, and it returns a tuple of two integers instead of a single integer.

These overhead requirements result in a performance hit which is not overcome until the array contains at least several duplicates.

## Performance

• In arrays with no duplicate entries, the QuickSort algorithm is about 8% faster than the QuickSortByRange algorithm.
• In an array with just a few duplicate entries, the QuickSort algorithm is about 9% faster than the QuickSortByRange algorithm.
• In an array with several duplicates, the QuickSortByRange algorithm is about 14% faster than the QuickSort algorithm.
• In an array with quite a few duplicates, the QuickSortByRange algorithm is about 76% faster than the QuickSort algorithm.
• In an array with lots of duplicates, the QuickSortByRange algorithm is about 98% faster than the QuickSort algorithm.
• In an array with all duplicates, the QuickSortByRange algorithm is about 99.9% faster than the QuickSort algorithm.

## History

In the original version of this article, I incorrectly stated that QuickSortByRange delivers O(1) performance on an array where every element contains the same value.

## Share

 Engineer Comprehend Systems United States
I've been fiddling around with computers since my parents bought me a Commodore VIC-20 for Christmas in 1981.

I received a B.Sc. in Mechanical Engineering from the University of Waterloo, but have spent my career in software development.

I focused on FoxPro during my early career before switching over to .NET and JavaScript.

I created Jooshe and wrote a Code Project article on it.

I wrote the front-end (Jooshe & JavaScript) and middleware (.NET) of dbiScript.

 First Prev Next
 My vote of 2 Qwertie24-Dec-13 6:34 Qwertie 24-Dec-13 6:34
 Um.... Qwertie23-Dec-13 13:15 Qwertie 23-Dec-13 13:15
 Message Closed 23-Dec-13 14:19 Brien Givens 23-Dec-13 14:19
 Re: Um.... Qwertie24-Dec-13 5:04 Qwertie 24-Dec-13 5:04
 Message Closed 24-Dec-13 5:35 Brien Givens 24-Dec-13 5:35
 Re: Um.... Qwertie24-Dec-13 6:32 Qwertie 24-Dec-13 6:32
 Re: Um.... Brien Givens26-Dec-13 7:15 Brien Givens 26-Dec-13 7:15
 Re: Um.... KP Lee4-Jan-14 20:44 KP Lee 4-Jan-14 20:44
 Re: Um.... KP Lee4-Jan-14 20:50 KP Lee 4-Jan-14 20:50
 Re: Um.... KP Lee4-Jan-14 21:09 KP Lee 4-Jan-14 21:09
 Re: Um.... KP Lee5-Jan-14 0:21 KP Lee 5-Jan-14 0:21
 OK, looked at the code I gave earlier and main was totally screwed up. I modified tests and the last percentage is totally screwed up because I didn't factor the minutes my binary sort routine would take: Time to generate and copy two arrays: 00:00:00.0140008 of size: 200000 Time to run Array.Sort: 00:00:00.0230013 Time to run BinSort: 00:00:00.1410081, percent Array vs BinSort: 16.3120567375887 Time to run Array.Sort: 00:00:14.0108014 All same values, length: 180000000 Time to run BinSort: 00:00:07.6644384, percent Array vs BinSort: 182.802713987474 Time to run Array.Sort: 00:00:30.8567649 random values Time to run BinSort: 00:03:15.7481945, percent Array vs BinSort: 195.935991871984 This is running the debug version of the code. I fixed the code and ran the non-debug version and it turns out I didn't need to add minutes to the calculation: Time to generate and copy two arrays: 00:00:00.0300017 of size: 200000 Time to run Array.Sort: 00:00:00.0240014 Time to run BinSort: 00:00:00.0420024, percent Array vs BinSort: 57.1428571428571 Time to run Array.Sort: 00:00:14.0588041 All same values, length: 180000000 Time to run BinSort: 00:00:01.8361050, percent Array vs BinSort: 765.686274509804 Time to run Array.Sort: 00:00:30.7257574 random values Time to run BinSort: 00:00:56.1542118, percent Array vs BinSort: 54.7156035188945 In my earlier tests, 150M took 49+ seconds while the Array did it in 29+ seconds. (Remember, never compared performance of sorting the same value when I first did this.) Anyway, this is "final version" of the test code I ran: Copy Code ```static void Main(string[] args) { DateTime start = DateTime.Now; int arraySize = 200000; int[] testArray1 = GenRandVals(arraySize), testArray2 = CopyArray(testArray1), testArray3 = CopyArray(testArray1); TimeSpan span = new TimeSpan(DateTime.Now.Ticks - start.Ticks), span2; Console.WriteLine(string.Format("Time to generate and copy two arrays: {0} of size: {1}", span.ToString(), arraySize)); start = DateTime.Now; Array.Sort(testArray1); span2 = new TimeSpan(DateTime.Now.Ticks - start.Ticks); Console.WriteLine(string.Format("Time to run Array.Sort: {0}", span2.ToString())); start = DateTime.Now; BinSort(testArray2, 0, arraySize - 1); span = new TimeSpan(DateTime.Now.Ticks - start.Ticks); Console.WriteLine(string.Format("Time to run BinSort: {0}, percent Array vs BinSort: {1}", span.ToString(),span2.Milliseconds*100.0/span.Milliseconds)); arraySize = 180000000; testArray3 = new int[1]; testArray3 = new int[arraySize]; start = DateTime.Now; Array.Sort(testArray3); span2 = new TimeSpan(DateTime.Now.Ticks - start.Ticks); Console.WriteLine(string.Format("Time to run Array.Sort: {0} All same values, length: {1}", span2.ToString(),testArray3.Length)); start = DateTime.Now; BinSort(testArray3, 0, arraySize - 1); span = new TimeSpan(DateTime.Now.Ticks - start.Ticks); Console.WriteLine(string.Format("Time to run BinSort: {0}, percent Array vs BinSort: {1}", span.ToString(), (span2.Seconds * 1000 + span2.Milliseconds) * 100.0 / (span.Seconds * 1000 + span.Milliseconds))); testArray3 = new int[1]; testArray3 = GenRandVals(arraySize); start = DateTime.Now; Array.Sort(testArray3); span2 = new TimeSpan(DateTime.Now.Ticks - start.Ticks); Console.WriteLine(string.Format("Time to run Array.Sort: {0} random values", span2.ToString())); testArray3 = new int[1]; testArray3 = GenRandVals(arraySize); start = DateTime.Now; BinSort(testArray3, 0, arraySize - 1); span = new TimeSpan(DateTime.Now.Ticks - start.Ticks); Console.WriteLine(string.Format("Time to run BinSort: {0}, percent Array vs BinSort: {1}", span.ToString(), (span2.Minutes * 60000 + span2.Seconds * 1000 + span2.Milliseconds) * 100.0 / (span.Minutes * 60000 + span.Seconds * 1000 + span.Milliseconds))); Console.Read(); } ``` So, the built-in sort goes nowhere near O(n^2) when the same values are sorted, in fact the time taken is faster on the built-in sort, but no-where near the speed-up of my code when the same value is sorted. I won't even bother loading your sort process, with the same value my sort will be faster than yours. You can actually load my sort routine and prove me wrong.
 Suggestion lafffer_11923-Dec-13 10:31 lafffer_119 23-Dec-13 10:31
 My vote of 5 John Brett17-Dec-13 23:43 John Brett 17-Dec-13 23:43
 Re: My vote of 5 Brien Givens17-Dec-13 23:48 Brien Givens 17-Dec-13 23:48
 Very nice article about QuickSortByRange, QuickSelectByRange, and PartitionByRange algorithms Volynsky Alex17-Dec-13 10:24 Volynsky Alex 17-Dec-13 10:24
 Re: Very nice article about QuickSortByRange, QuickSelectByRange, and PartitionByRange algorithms Brien Givens17-Dec-13 10:29 Brien Givens 17-Dec-13 10:29
 Re: Very nice article about QuickSortByRange, QuickSelectByRange, and PartitionByRange algorithms Volynsky Alex17-Dec-13 10:37 Volynsky Alex 17-Dec-13 10:37
 Last Visit: 31-Dec-99 18:00     Last Update: 24-Sep-21 2:48 Refresh 1