## Sorting algorithms

Discussion of chess software programming and technical issues.

Moderators: bob, hgm, Harvey Williamson

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
tpetzke
Posts: 686
Joined: Thu Mar 03, 2011 3:57 pm
Location: Germany
Contact:

### Re: Sorting algorithms

Code: Select all

``````case 12&#58;compare_swap&#40;1, 2&#41;; compare_swap&#40;0, 2&#41;; compare_swap&#40;0, 1&#41;; compare_swap&#40;4, 5&#41;; compare_swap&#40;3, 5&#41;;
compare_swap&#40;3, 4&#41;; compare_swap&#40;0, 3&#41;; compare_swap&#40;1, 4&#41;; compare_swap&#40;2, 5&#41;; compare_swap&#40;2, 4&#41;;
compare_swap&#40;1, 3&#41;; compare_swap&#40;2, 3&#41;; compare_swap&#40;7, 8&#41;; compare_swap&#40;6, 8&#41;; compare_swap&#40;6, 7&#41;;
compare_swap&#40;10, 11&#41;;compare_swap&#40;9, 11&#41;;compare_swap&#40;9, 10&#41;;compare_swap&#40;6, 9&#41;;compare_swap&#40;7, 10&#41;;
compare_swap&#40;8, 11&#41;;compare_swap&#40;8, 10&#41;;compare_swap&#40;7, 9&#41;; compare_swap&#40;8, 9&#41;; compare_swap&#40;0, 6&#41;;
compare_swap&#40;1, 7&#41;; compare_swap&#40;2, 8&#41;; compare_swap&#40;2, 7&#41;; compare_swap&#40;1, 6&#41;; compare_swap&#40;2, 6&#41;;
compare_swap&#40;3, 9&#41;; compare_swap&#40;4, 10&#41;;compare_swap&#40;5, 11&#41;;compare_swap&#40;5, 10&#41;;compare_swap&#40;4, 9&#41;;
compare_swap&#40;5, 9&#41;; compare_swap&#40;3, 6&#41;; compare_swap&#40;4, 7&#41;; compare_swap&#40;5, 8&#41;; compare_swap&#40;5, 7&#41;;
compare_swap&#40;4, 6&#41;; compare_swap&#40;5, 6&#41;;
return;
``````
Thomas...

=======
http://macechess.blogspot.com - iCE Chess Engine

Dann Corbit
Posts: 10182
Joined: Wed Mar 08, 2006 7:57 pm
Location: Redmond, WA USA
Contact:

### Re: Sorting algorithms

Do you have a program that writes these or did you do it by hand?

Did you use Knuth's merge insertion algorithm, or something else?
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.

Thanar
Posts: 6
Joined: Wed Jul 09, 2014 3:45 am

### Re: Sorting algorithms

I'm not sure what Thomas used, but here is a generator for most of the standard sorting networks: http://pages.ripco.net/~jgamble/nw.html

tpetzke
Posts: 686
Joined: Thu Mar 03, 2011 3:57 pm
Location: Germany
Contact:

### Re: Sorting algorithms

I used an online generator to generate the required pair operations and then transformed it into source code.

Then I did some unit testing to ensure that the nets perform the way they are supposed to.
Thomas...

=======
http://macechess.blogspot.com - iCE Chess Engine

mar
Posts: 2007
Joined: Fri Nov 26, 2010 1:00 pm
Location: Czech Republic
Full name: Martin Sedlak

### Re: Sorting algorithms

That's very interesting. Are sorting networks stable? I think they might be.

Thanar
Posts: 6
Joined: Wed Jul 09, 2014 3:45 am

### Re: Sorting algorithms

Are sorting networks stable?
The standard ones are not stable in general.

Sorting networks that mimic an insertion sort or bubble sort are stable, and work fine for small arrays.

From my testing, the main speed benefit of using sorting networks is the ability to eliminate almost all branches (and in particular difficult-to-predict branches) from the sorting code. To achieve this, you need to make sure that your "compare_swap" routine compiles down to cmov assembly instructions (or something similar), instead of branching code. This is possible with the right C code using gcc, or with inline assembler macros.

he extra "compare_swaps" used in the stable sorting networks compared to the optimal non-stable ones is not that significant for small arrays.

Thanar
Posts: 6
Joined: Wed Jul 09, 2014 3:45 am

### Re: Sorting algorithms

Here’s some sample C code for a stable sorting network that just sorts an array of ints.

It uses a sorting network that imitates a bubble sort that compare_swaps through the whole array (ensuring the last element is correctly sorted), then goes through length-1 of the array, etc. This allows us to reuse the code for smaller length arrays using a "Duffs device" style switch statement.

Code: Select all

``````#define SWAP&#40;x,y&#41;  &#123; int px = p&#91;x&#93;; int py = p&#91;y&#93;; int tmp = p&#91;x&#93; = px <= py ? px &#58; py; p&#91;y&#93; ^= px ^ tmp; &#125;

static inline void stable_sorting_network&#40;int *begin, int *end&#41;
&#123;
int *p, *q, tmp;

p=begin;
switch&#40;end - begin&#41; &#123;
default&#58;
case 10&#58; SWAP&#40;0,1&#41;; SWAP&#40;1,2&#41;; SWAP&#40;2,3&#41;; SWAP&#40;3,4&#41;;
SWAP&#40;4,5&#41;; SWAP&#40;5,6&#41;; SWAP&#40;6,7&#41;; SWAP&#40;7,8&#41;; SWAP&#40;8,9&#41;;
case 9&#58; SWAP&#40;0,1&#41;; SWAP&#40;1,2&#41;; SWAP&#40;2,3&#41;; SWAP&#40;3,4&#41;;
SWAP&#40;4,5&#41;; SWAP&#40;5,6&#41;; SWAP&#40;6,7&#41;; SWAP&#40;7,8&#41;;
case 8&#58; SWAP&#40;0,1&#41;; SWAP&#40;1,2&#41;; SWAP&#40;2,3&#41;; SWAP&#40;3,4&#41;;
SWAP&#40;4,5&#41;; SWAP&#40;5,6&#41;; SWAP&#40;6,7&#41;;
case 7&#58; SWAP&#40;0,1&#41;; SWAP&#40;1,2&#41;; SWAP&#40;2,3&#41;; SWAP&#40;3,4&#41;;
SWAP&#40;4,5&#41;; SWAP&#40;5,6&#41;;
case 6&#58; SWAP&#40;0,1&#41;; SWAP&#40;1,2&#41;; SWAP&#40;2,3&#41;; SWAP&#40;3,4&#41;;
SWAP&#40;4,5&#41;;
case 5&#58; SWAP&#40;0,1&#41;; SWAP&#40;1,2&#41;; SWAP&#40;2,3&#41;; SWAP&#40;3,4&#41;;
case 4&#58; SWAP&#40;0,1&#41;; SWAP&#40;1,2&#41;; SWAP&#40;2,3&#41;;
case 3&#58; SWAP&#40;0,1&#41;; SWAP&#40;1,2&#41;;
case 2&#58; SWAP&#40;0,1&#41;;
case 1&#58;
case 0&#58;
break;
&#125;

// Insert remaining elements into sorted list.
for &#40;p = begin + 10; p < end; ++p&#41; &#123;
tmp = *p;
for &#40;q = p; q != begin && *&#40;q-1&#41; > tmp; --q&#41;
*q = *&#40;q-1&#41;;
*q = tmp;
&#125;
&#125;
``````
You can paste the code into https://godbolt.org/ to see that it compiles down to cmovle and xor instructions when compiled with, for example, x86-64 gcc 5.3 –O2

And here is the time used to sort arrays of random ints of various lengths (The time is based on counting cycles with the rdtsc instruction):

Code: Select all

``````                               Array length
Sort method           4   6   8  10  12  14  16  18  20  22  24  Total cycles
std&#58;&#58;sort            69 112 174 244 338 371 426 626 754 822 968  4905
Insertion            49  94 147 201 258 310 380 457 520 584 706  3706
stable sort network  13  28  50  77 134 206 269 339 408 448 518  2490``````
So for this simple case, the sorting network is over twice as fast for small array sizes. Of course, the gains aren’t quite as big when you’re sorting structs based on a field in the struct, but it is still faster.

hgm
Posts: 23772
Joined: Fri Mar 10, 2006 9:06 am
Location: Amsterdam
Full name: H G Muller
Contact:

### Re: Sorting algorithms

This seems all very expensive compared to a simple disperse & collect. How many different values of the sort key can you expect for these small sets of moves anyway. Is this just for sorting captures MVV/LVA-wise? In Chess that is just 20 possible key values. So dispersing all moves over 20 linked lists, and then running trough the lists in order would already do it.

tpetzke
Posts: 686
Joined: Thu Mar 03, 2011 3:57 pm
Location: Germany
Contact:

### Re: Sorting algorithms

How many different values of the sort key can you expect for these small sets of moves anyway.
I sort moves based on different criteria e.g. based on the history score to name one. So I encounter a bit more than 20 different key values. The value is fitted into the upper bits of the unsigned integer I use for move representation so for sorting I just sort integers.
Thomas...

=======
http://macechess.blogspot.com - iCE Chess Engine

hgm
Posts: 23772
Joined: Fri Mar 10, 2006 9:06 am
Location: Amsterdam
Full name: H G Muller
Contact:

### Re: Sorting algorithms

OK, so you have 256 key values. Can still be done without too much overhead through 256 linked lists. As you can hook the moves in their respective lists as soon as you score them, there would be no need to pack the key into the move, earning some of the overhead back. Of course if you are really tight on space you could use the high byte for the link. With 256 key values and maximally 24 moves to sort, the lists would hardly ever contain more than one move, so the end-branch of the loop to run through the lists becomes highly predictably (as 'not taken').