Here’s some sample C code for a stable sorting network that just sorts an array of ints.

It uses a sorting network that imitates a bubble sort that compare_swaps through the whole array (ensuring the last element is correctly sorted), then goes through length-1 of the array, etc. This allows us to reuse the code for smaller length arrays using a "Duffs device" style switch statement.

Code: Select all

```
#define SWAP(x,y) { int px = p[x]; int py = p[y]; int tmp = p[x] = px <= py ? px : py; p[y] ^= px ^ tmp; }
static inline void stable_sorting_network(int *begin, int *end)
{
int *p, *q, tmp;
p=begin;
switch(end - begin) {
default:
case 10: SWAP(0,1); SWAP(1,2); SWAP(2,3); SWAP(3,4);
SWAP(4,5); SWAP(5,6); SWAP(6,7); SWAP(7,8); SWAP(8,9);
case 9: SWAP(0,1); SWAP(1,2); SWAP(2,3); SWAP(3,4);
SWAP(4,5); SWAP(5,6); SWAP(6,7); SWAP(7,8);
case 8: SWAP(0,1); SWAP(1,2); SWAP(2,3); SWAP(3,4);
SWAP(4,5); SWAP(5,6); SWAP(6,7);
case 7: SWAP(0,1); SWAP(1,2); SWAP(2,3); SWAP(3,4);
SWAP(4,5); SWAP(5,6);
case 6: SWAP(0,1); SWAP(1,2); SWAP(2,3); SWAP(3,4);
SWAP(4,5);
case 5: SWAP(0,1); SWAP(1,2); SWAP(2,3); SWAP(3,4);
case 4: SWAP(0,1); SWAP(1,2); SWAP(2,3);
case 3: SWAP(0,1); SWAP(1,2);
case 2: SWAP(0,1);
case 1:
case 0:
break;
}
// Insert remaining elements into sorted list.
for (p = begin + 10; p < end; ++p) {
tmp = *p;
for (q = p; q != begin && *(q-1) > tmp; --q)
*q = *(q-1);
*q = tmp;
}
}
```

You can paste the code into

https://godbolt.org/ to see that it compiles down to cmovle and xor instructions when compiled with, for example, x86-64 gcc 5.3 –O2

And here is the time used to sort arrays of random ints of various lengths (The time is based on counting cycles with the rdtsc instruction):

Code: Select all

```
Array length
Sort method 4 6 8 10 12 14 16 18 20 22 24 Total cycles
std::sort 69 112 174 244 338 371 426 626 754 822 968 4905
Insertion 49 94 147 201 258 310 380 457 520 584 706 3706
stable sort network 13 28 50 77 134 206 269 339 408 448 518 2490
```

So for this simple case, the sorting network is over twice as fast for small array sizes. Of course, the gains aren’t quite as big when you’re sorting structs based on a field in the struct, but it is still faster.