Download qsbr.zip - 1.8 KB

Introduction

The classic QuickSort and QuickSelect algorithms have poor worst-case performance. If bad pivots are consistently chosen, such as decreasing by only a single element each time, then worst-case performance is quadratic: O(n²). Duplicate values in the array are problematic for QuickSort.

This article introduces the QuickSortByRange, QuickSelectByRange, and PartitionByRange algorithms which provide a remedy to the problem with duplicates by returning a pivot range from the PartitionByRange algorithm. This approach works well when the array contains several duplicate values.

For example, if every element of the array contains the same value, then QuickSort delivers O(n²) whereas QuickSortByRange delivers O(n). Considering that the average case performance of QuickSort is O(n log n), the all-duplicates scenario becomes a better than average case scenario for QuickSortByRange as compared to the worse than average case scenario we see with QuickSort.

Background

QuickSort and QuickSelect both rely on a partition algorithm. The partition algorithm looks at an initial (pivot) value within a range of the array and then reorganizes the values within that range so that all the values less than or equal to the pivot value are placed to the left of the pivot and the values greater than the pivot value are moved to the right of the pivot. The algorithm will move the pivot element, if necessary, to satisfy this quasi-sort and then it returns the final position of the pivot element.

For example, let's consider the following range within a larger array:

[ < , > , == , x , < , > , < , == ]

In this example, 'x' marks the pivot value, '<' indicates values less than the pivot value, '>' indicates values greater than the pivot value, and '==' indicates values equal to the pivot value.

After a single pass through the quickSort partition algorithm, these elements will have been reorganized to:

[ < , == , < , < , == , x , > , > ]

PartitionByRange takes the quasi-sort a step further. It further reorganizes the array so that elements equal to the pivot element are placed to the immediate left of the pivot element. Values less than the pivot value are placed to the left of this range of equal-value elements.

After a single pass through the PartitionByRange algorithm, these elements will have been reorganized to:

[ < , < , < , == , == , x , > , > ]

Instead of returning a single value indicating the final position of the pivot value, the PartitionByRange algorithm returns two values indicating the range of positions which contain the same value as the pivot value. This enables the QuickSortByRange and QuickSelectByRange algorithms to move the boundaries of the next pass through the PartitionByRange further to the right or left than QuickSort and QuickSelect, respectively.

Using the Code

The code is written in C# for arrays of integers. It's fairly straightforward to port the code to another language or value type.

The attached file is an HttpHandler which is designed to run on the localhost of a machine with IIS. It includes benchmarking code to replicate the results posted at the end of this article.

One important part of the PartitionByRange code is the Stack used to store the locations of values equal to the pivot value. When porting the code to another language, ensure that a LIFO structure is used to store these indexes. An array may be used in place of a stack as long as LIFO order is used.

PartitionByRange

public Tuple<int, int> partitionByRange(int[] a,int left,int right,int pivot) {

  int pivotValue = a[pivot];
  Stack<int> equalValuePositions = null;
  swap(a,pivot,right);
  for(int i=left;i<right;i++)
    if(a[i] <= pivotValue) {
      swap(a,left++,i);
      if(a[i] == pivotValue) {
        if(equalValuePositions == null) equalValuePositions = new Stack<int>();
        equalValuePositions.Push(i);
      }
    }
  swap(a,left,right);
  var rightPivotIndex = left;
  if(equalValuePositions != null) foreach(int i in equalValuePositions) swap(a,--left,i);
  return new Tuple<int, int>(left,rightPivotIndex);

}

public void swap(int[] a,int i,int j){ if(i!=j){ int k=a[i]; a[i]=a[j]; a[j]=k; } }

QuickSortByRange

public void quickSortByRange(int[] a) {

  qSortByRange(a,0,a.Length - 1,new Random());

}

public void qSortByRange(int[] a,int left,int right,Random r) {

  if(left < right) {
    int pivot = left + r.Next(right - left + 1);
    Tuple<int, int> pivotRange = partitionByRange(a,left,right,pivot);
    qSortByRange(a,left,pivotRange.Item1 - 1,r);
    qSortByRange(a,pivotRange.Item2 + 1,right,r);
  }

}

QuickSelectByRange

public int quickSelectByRange(int[] a,int n) {

  return qSelectByRange(a,0,a.Length - 1,n,new Random());

}

public int qSelectByRange(int[] a,int left,int right,int n,Random r) {

  if(left == right) return a[left];

  int pivot = n;

  while(true) {

    Tuple<int, int> pivotRange = partitionByRange(a,left,right,pivot);
    if(n >= pivotRange.Item1 && n <= pivotRange.Item2) return a[n];
    if(n < pivotRange.Item1)
      right = pivotRange.Item1 - 1;
    else
      left = pivotRange.Item2 + 1;
    pivot = left + r.Next(right - left + 1);

  }

}

Overhead

The PartitionByRange algorithm requires a Stack (or similar LIFO structure), extra comparisons, and it returns a tuple of two integers instead of a single integer.

These overhead requirements result in a performance hit which is not overcome until the array contains at least several duplicates.

Performance

In arrays with no duplicate entries, the QuickSort algorithm is about 8% faster than the QuickSortByRange algorithm.
In an array with just a few duplicate entries, the QuickSort algorithm is about 9% faster than the QuickSortByRange algorithm.
In an array with several duplicates, the QuickSortByRange algorithm is about 14% faster than the QuickSort algorithm.
In an array with quite a few duplicates, the QuickSortByRange algorithm is about 76% faster than the QuickSort algorithm.
In an array with lots of duplicates, the QuickSortByRange algorithm is about 98% faster than the QuickSort algorithm.
In an array with all duplicates, the QuickSortByRange algorithm is about 99.9% faster than the QuickSort algorithm.

History

In the original version of this article, I incorrectly stated that QuickSortByRange delivers O(1) performance on an array where every element contains the same value.