QuickSort. It’s widely accepted as the fastest generic sorting algorithm around. It’s also very simple. A comment in the last article showed me this Haskell beauty:

qsort [] = [] qsort (x:xs) = qsort (filter (< x) xs) ++ [x] ++ qsort (filter (>= x) xs)

While my C implementations aren’t any where as beautiful, I’m quite satified with their efficiencies. I’ll keep the code to a minimum, and the let the graphs do the talking.

Introduction

The quicksort is kinda like the heapsort, only not as strong (thus not as reliable). It is generally always recursive, and basically consists of 2 steps (Divide and Conquer):

Conquer: Randomly choose an element x in the sorting set X. Seperate the set into two segments consisting of those lesser than and greater than x, X.1 and X.2

X = [4,7,2,6,89,3,5,7,12], x = 7

X.1 = [4,2,6,3,5] X.2 = [7,89,12]

Divide: Do the same for X.1 and X.2

This will terminate when the size of X.1 and X.2 become less than 2.

A reasonable implementation of this can save on space in comparison with the mergesort (O(n)), with O(1) overhead. Interestingly, this algorithm does not guarantee O(nlog(n)) time; it has a worst running time of O(n^2), no better than Insertion Sort. It just happens that the *average* running time is O(nlog(n)), with much better coefficients than any other sort (actually, this is only true for comparison sorts, i.e. Heap, Merge, Binary, etc.).

A note, all these implementations have been run while running a normal gnome deskop, with just XChat and mocp running in the background. Let the graphs begin.

**Naive Vanilla Implementation** (using a scratch buffer):

Key Lines

void sort (int len, int *list) {

...

pivot = rand()%len;

...

if (list[i] < list[pivot]) buf[idx1++] = list[i];

else buf[idx2--] = list[i];

...

Interestingly, this is not half as jumpy as merge sort, considering that merge sort was tightly asympototically bound to n*log(n) (i.e. it’ll stay to that value as you go higher and higher), and this implementation uses random numbers to generate the pivot. The performance is much worse than MergeSort (which had a worst performance of about 0.9 ms)

**Single Array Implementation**:

Key Lines

SWAP (list[pivot], list[len-1]);

...

if (list[i] < list[len -1]) {

SWAP (list[idx], list[i]);

idx++;

}

...

SWAP (list[idx], list[len-1]);

Much better, though still a bit off the 0.9 of MergeSort. The method swaps the pivot element to the end of the list, and keeps track of the position of the last number smaller than the pivot (or actually, the position after it). And at the end of the algorithm, swaps the pivot into that location.

**Median Pivoting Implemenation**:

Key Lines

pivot = (len >> 1);

Aha, *just* better than MergeSort. And all that just by replacing the rather heavy rand() function with a simple median (just take the middle element. len>>1 is equivalent to len/2. I don’t know why I used it here, compiler optimazations and all, perhaps just used to it after the nearest_pow excercise). I would never put something like that in proper code). Interesting to note the optimization difference.

(*Thanks to Ivic for pointing out an error: Earlier I had written len>>2*)

**Grouping Equal Elements Implemenation**:

Key Lines

pivot = (len >> 1);

Whoa! *Huge* improvements. This was with grouping the terms that were equal to the pivot, and excluding them from the Divide step. Notably, this makes much less difference on a random sort.

**[BEST] Insertion Sort Implemenation**:

median/group-equals/insertion/quick.c

Key Lines

/* Insertion Sort */

if (len < cutoff)

{

for (idx = 0; idx< len; idx++)

{

i = idx;

for (tmp = idx; tmp< len; tmp++) i = (list[tmp] < list[i])?tmp:i;

if (i == idx) continue;

tmp = list[idx];

list[idx] = list[i];

list[i] = tmp;

}

return;

}

A minor incremental improvement. It just grazes 0.8 ms. It seems that QuickSort is more amenable to the insertion sort optimization than MergeSort.

I also have a graph for the single array implementation using the insertion sort.

Looks a bit chaotic, but that might be due to some freak processor usage.

Another set of implementations can be found at Jeffery Stedfast’s place. One thing he tried out was to ‘unrecurse’ it by using a stack containing the extents of any stop (essentially, where the sub array you are sorting in one of the Divide steps begins and ends). I don’t see how this is different from what C is anyway doing, i.e. pushing the variables of a function call to a register, and then popping them later. There would be only the function call step (which is a jmp command, something that takes about 1-2 processor operations. On my machine (3.04Ghz) that would be about 3e-10 seconds). On trying to implement it, I found my implemention was probably very sucky, and was much slower than the un-modified program. However, for what it’s worth the previous implmentations (using a median) are all quite a bit faster that Stedfasts implementation (~1.04s on an array of size 5e6). According tho his post, his function is a bit faster than the glib qsort() function, and that also means mine is too :-).

Well, that’s all I have. You can of course find more graphs and the source code for all of these (and some boring modifications) at /quick/

M Welinder(12:58:57) :That Haskell implementation is beautiful, but useless. Quicksort is not suited for languages

without random-access arrays. Taking the first element is outright bad as the worst-case

for quicksort is then the already-sorted case. Quicksort is only fast if you can convince

yourself that the worst case is rare.

Jos(13:57:14) :The Haskell ‘quicksort’ is not quicksort at all. You are performing many more steps than you do in a normal quicksort. The implementation is recursive with the number of recursions being approximately N/2. In total you create N new arrays with subsets of the original array.

This implementation is probably at least an order of magnitude slower than the c implementation.

Jeffrey Stedfast(15:07:29) :Hey! Very cool and thanks for the props 😉

Your post on QuickSort is much better than mine (it’s good that you experimented with different pivot points, something I never got around to) and I like your graphs (I saw your MergeSort post too, also very nice).

Unrolling the recursion doesn’t help a whole lot on x86, but some architectures have a much larger penalty for making function calls (e.g. SPARC iirc), but even on x86 it can make a difference (generally not huge, but if you’re sorting a lot it can add up).

I look forward to reading more of your posts 🙂

Nate(15:11:06) :http://www.cs.princeton.edu/~rs/talks/QuicksortIsOptimal.pdf

Havoc(15:21:21) :You can’t use plain quicksort in real code because of that worst case, which iirc applies to an array that’s already in order – as soon as you sort a large array that’s already sorted, your program will hang for a long time. n doesn’t have to get very big before O(n^2) is not just slow, but practically infinite.

glibc’s qsort() switches to insertion sort partway through, for that reason:

http://cvs.savannah.gnu.org/viewvc/libc/stdlib/qsort.c?root=libc&view=markup

Ivan Krstić(16:13:25) :len>>2 is

notequivalent to len/2. It’s equivalent to dividing by four, and is an unnecessary way of spelling out that division. Trust your compiler to do its job.Dafydd Harries(08:32:52) :I recommend Jon Bentley and Doug McIlroy’s paper “Engineering a Sort Function” where they describe their work improving over the qsort() in the libc that they had at the time.

http://www.enseignement.polytechnique.fr/informatique/profs/Luc.Maranget/421/09/bentley93engineering.pdf

@M Welinder:

Indeed, the sort included with GHC is a mergesort, though interestingly it seems it used to be a quicksort:

http://haskell.org/ghc/docs/latest/html/libraries/base/src/Data-List.html#sort

Some quick comparisons suggest that the qsort given abve is not so hideously impractical though: it seems similar to Data.List.sort on lists of 100,000 and 1,000,000 strings. At 100,000, both are around 6 times slower than glibc qsort(). At 1,000,000, they seem about 10x slower and use significantly more memory.

Code I used for comparison is at http://rhydd.org/darcs/sort-compare/.

Cale Gibbard(09:07:36) :The Haskell quicksort *is* quicksort, but it is not an in-place implementation of the algorithm. It has the same asymptotic time-complexity as the quicksort in C, but uses more memory. As was pointed out, it doesn’t do a good job of selecting a pivot, but of course, there are a large variety of ways to do that which would hardly affect the presentation of the code.

The essence of quicksort is really exactly what the Haskell program expresses: pick a pivot, partition the list into elements respectively less and greater than the pivot, sort the two ‘halves’ and concatenate the results. In-place implementations of quicksort do the partitioning step by making appropriate swaps to put all the elements of the ‘less than’ list at the beginning, destroying the original list as they go. The recursive calls are then informed which section of this already-combined list corresponds to the part they’re sorting.

Mergesort tends to be more effective with Haskell’s lists though. An added bonus of many sorting algorithms (including the mergesort in the standard List library) in Haskell is that due to lazy evaluation, the complexity is O(n log k) where k is the number of elements of the sorted list which you actually use. So, if you only ever use the top 5, say, then the asymptotic complexity drops to O(n) with no extra work.

arunchaganty(09:36:23) :Wow, thanks for the great response.

@M. Welinder, Jos, Dafydd, Cale: I agree. The Haskell page itself admits that the Haskell quicksort implementation is quite useless. However, I (like Cale), find that it expresses the essence of the quick sort rather beautifully.

@Jeffery: Thanks!

@ Nate: Thanks for the link, it was a very nice analysis.

@ Havoc: Yes. I noticed the difference in using the insertion sort was about 0.4 ms on a list of about 5,000,000 random numbers. I’m sure that difference would scale much more on larger sets, especially those with less quick-sort friendly data.

I was surprised to see that the threshold for the glibc implementation is just 4! It also looks like the implementation is similar to the unrolled, 3-way partition, and using a median (though a much neater way of doing the same). So the implementations are pretty similar :-).

@Ivic: Thank you for the correction, and I admit replacing those len/2 with len >> 1 was unneccesary.

@Dafydd: A very interesting paper. Thanks a lot for the link.

A WTH video (related to QuickSort): http://www.youtube.com/watch?v=2HjspVV0jK4

Blog::Quibb » Sort Optimization(18:31:39) :[…] affected by that. I searched around the Internet often during this, and came across this page: QuickSort. It had some interesting ideas like grouping the same element together when running the partition […]