Page 5 of 7

Re: Sorting algorithms

Posted: Sat Apr 29, 2017 9:50 am
by hgm
Well, even in engines that do not do any real staged move geeration, I usually don't bother generating moves when there is a hash move. Only after null move and hash move I start generating moves. And usually the gnerated moves naturally produces separate sections of captures and non-captures, which I want to sort by different criterea (MVV/LVA or history). Even if I generate them together with the captures, I don't bother to sort the non-captures before I actually get to them. For the captures I then use on-demand selection from the capture section (which usually is quite small).

Re: Sorting algorithms

Posted: Sat Apr 29, 2017 10:00 am
by petero2
hgm wrote:Very nice graph. In my experience it is worthwile to take such statistics as a function of remaining depth. Otherwise it is completely dominated by the d=1 nodes, while the statistics in deeper nodes might be completely different. (E.g. because the availability of a hash move is much more common there.)
I agree in general, but for this particular case the only effect of using a different sort algorithm is speed, and what matters is the total time spent on sorting accumulated for all depths. Basically how you sort for high depths does not matter in terms of speed because it happens so rarely compared to the d=1 case.

A reservation though is that different sort algorithms can produce different sort order in case of duplicate sort keys. While this will certainly cause a particular search tree to be different (because of LMR/LMP), I have no reason to believe than one particular ordering would be better than another on average.
hgm wrote:It seems that when the first move does not cut, it is most likely you eventually will pick all moves. This makes it a bit surprising that the 'on-demand' sorting has any benefits, and that more efficient sorting algorithms (incompatible with on-demand) would not perform better.
I think there are several factors that cause on-demand sorting to be fast.

* The most common case is that you only need to pick one move (happens in 71% of the cases for the data I posted), and no sorting algorithm can beat a single linear scan over the data.

* For small N, selection sort is often faster than more advanced algorithms. Sedgewick (Algorithms in C) claimed the cut-off where selection sort beats quicksort was around N=20 (from my memory of reading this book in the nineties). If you remove all data points where N<=20 and where number of picked moves<=1, there is not much data left where a more advanced algorithm has a chance of beating the simple selection algorithm.

* The selectBest function compiles to really efficient assembly code. The source code looks like this:

Code: Select all

inline void
Search&#58;&#58;selectBest&#40;MoveList& moves, int startIdx&#41; &#123;
    int bestIdx = startIdx;
    int bestScore = moves&#91;bestIdx&#93;.score&#40;);
    for &#40;int i = startIdx + 1; i < moves.size; i++) &#123;
        int sc = moves&#91;i&#93;.score&#40;);
        if &#40;sc > bestScore&#41; &#123;
            bestIdx = i;
            bestScore = sc;
        &#125;
    &#125;
    std&#58;&#58;swap&#40;moves&#91;bestIdx&#93;, moves&#91;startIdx&#93;);
&#125;
The assembly code for the loop looks like this:

Code: Select all

.L243&#58;
        movl    16&#40;%rsi&#41;, %ecx
        cmpl    %ecx, %edx
        cmovl   %eax, %r8d
        cmovl   %ecx, %edx
        addl    $1, %eax
        addq    $16, %rsi
        cmpl    %edi, %eax
        jne     .L243
The "if" inside the loop is implemented using conditional move instructions.

It might also be worth mentioning that I generate the move list even if I have a hash move. The way I check if the hash move is valid is to see if it is included in the generated move list. This would be the first iteration of the on-demand selection sort, implemented in a separate function called selectHashMove.

Re: Sorting algorithms

Posted: Sat Apr 29, 2017 2:15 pm
by AlvaroBegue
hgm wrote:[...] This makes it a bit surprising that the 'on-demand' sorting has any benefits, and that more efficient sorting algorithms (incompatible with on-demand) would not perform better.
Well, we might be able to have our cake and eat it too. First let me show you a clever trick I learned in college and then I'll point you to an on-demand algorithm that takes linear time to get the first move, and logarithmic time for all the other moves.

Getting the largest out of N integers requires N-1 comparisons, but you can do them in a way that facilitates finding the second largest integer later. The clever way to do this is to compute the maximum by running a single-elimination tournament between the N numbers, which will get you the maximum after N-1 "games" (i.e., comparisons). Then the second largest must be a number that lost to the winner, so that allows you to find the second largest is about log2(N) comparisons. Unfortunately, turning this idea into a full-blown algorithm seems messy.

A related algorithm, that could even be practical for chess, is heapsort. It requires an O(N) pass to turn the list of integers into a heap, then extracting each element is O(log(N)). So there you have it: An efficient on-demand algorithm.

C++ gives you all the tools you need: std::make_heap for the linear pass and std::pop_heap to get that next largest element.

A quick code sample, in case you don't want to read the documentation:

Code: Select all

struct MoveSorter &#123;
  unsigned moves&#91;256&#93;;
  unsigned n_moves;

  MoveSorter&#40;Board const &board&#41; &#123;
    // Generate moves                                                                                                   
    n_moves = board.generate_moves&#40;moves&#41;; // I assume this places sorting info in the higher 16 bits                   

    // Add a dummy move                                                                                                 
    moves&#91;n_moves++&#93; = 0u;

    std&#58;&#58;make_heap&#40;moves, moves + n_moves&#41;;
  &#125;

  unsigned next&#40;) &#123;
    std&#58;&#58;pop_heap&#40;moves, moves + n_moves&#41;;
    return moves&#91;--n_moves&#93;;
  &#125;
&#125;;

// Then you use it like this&#58;
while &#40;unsigned move = move_sorter.next&#40;)) &#123;
  // ...
&#125;

Re: Sorting algorithms

Posted: Sat Apr 29, 2017 7:38 pm
by Dann Corbit
Quick select, implemented in the standard library as std::nth_element() will partition the top n elements. Then sort those with an efficient routine. The last ones can be sorted or not. They are a lot less important than the top ones.

You could pick a strategy of always picking the top N for some fixed N, or go as a percentage of the possible moves. Or all hashed moves + K next best or something like that.

Usually, there seem to be about 3 or 4 good moves in a position, and the rest are not so good. That's why things like null move and LMR are so important. Now, you will not always have the top 4 ordered correctly no matter how good your selection criteria are (unless those criteria are perfect).

The Texel author made a really interesting post on move ordering in this thread. Looks like a fun experiment to me.

Re: Sorting algorithms

Posted: Sat Apr 29, 2017 11:07 pm
by hgm
petero2 wrote:* The most common case is that you only need to pick one move (happens in 71% of the cases for the data I posted), and no sorting algorithm can beat a single linear scan over the data.
True, but a lot depends on how you treat the hash move. It would be very interesting to know how much of this 71% is hash move, as opposed to best capture. My guess is that it should be nearly all hash move, because when the best capture would be cut move, it should already have been promoted to hash move on the previous iteration, in QS.

And getting the hash move doesn't require a single linear scan over the data. It doesn't require anything, because you already have it even before move generation. So a more relevant statistic than how often the first move cuts, is how often the first non-hash move cuts.
It might also be worth mentioning that I generate the move list even if I have a hash move. The way I check if the hash move is valid is to see if it is included in the generated move list. This would be the first iteration of the on-demand selection sort, implemented in a separate function called selectHashMove.
But that is an awfully inefficient way to validate a hash move. If your hash key is long enough hash moves don't need verification at all, and testing if a move is pseudo-legal is pretty cheap too. A more advanced search algorithm might not save you anything compared to the linear scan, but not doing anything at all certainly will. If the concern is to remove the hash move from the move list to prevent a duplicat, this could be done 'on demand' as well, i.e. test each later move for being hash move only when it comes up for searching, and skip it if it is.

For the captures after the hash move I usually use on-demand selection as well, though; the number of captures is never very large, and the chances they will cut are appreciable. So it pays to postpone any work you can postpone, even if in the end it means you are doing somewhat more work in the rare cases where you get to the end. That is different for the non-captures.

Note that qsort can also be implemented in a more 'on demand' way, where you initially only sort the top section of the data after every partitioning. This would (assuming 50-50 partitioning in every iteration) give you the best move after an amount of work equivalent to 2 linear scans through all the moves. The lower partitions (or actually their upper halves) then only get sorted by the time you need moves from those.

Re: Sorting algorithms

Posted: Sun Apr 30, 2017 7:27 pm
by petero2
hgm wrote:
petero2 wrote:* The most common case is that you only need to pick one move (happens in 71% of the cases for the data I posted), and no sorting algorithm can beat a single linear scan over the data.
True, but a lot depends on how you treat the hash move. It would be very interesting to know how much of this 71% is hash move, as opposed to best capture.
I measured something similar, how often I have a hash move that did not cause an immediate cut-off, but searching the hash move does cause a cut-off. In this situation I currently generate a move list but it could be avoided by verifying the hash move in some other way. The result was that between 12% and 26% of the generated move lists could have been optimized away.

More background data: My measurements ignore q-search nodes because I have a capture move generator so those move lists are typically quite short. Therefore using more advanced sorting algorithms for them seems unnecessary. Also I don't do hashing in q-search.
hgm wrote:
It might also be worth mentioning that I generate the move list even if I have a hash move. The way I check if the hash move is valid is to see if it is included in the generated move list. This would be the first iteration of the on-demand selection sort, implemented in a separate function called selectHashMove.
But that is an awfully inefficient way to validate a hash move.
I agree, but it does not mean that fixing it would be a big win for the engine, since move generation is quite a small part of the engine to begin with. It is certainly worth testing though.

Re: Sorting algorithms

Posted: Sun Apr 30, 2017 11:15 pm
by Rein Halbersma
Dann Corbit wrote:Quick select, implemented in the standard library as std::nth_element() will partition the top n elements. Then sort those with an efficient routine.
If you go the STL route, then std::partial_sort is typically faster, unless you intend to sort almost the entire input array, in which case std::nth_element + std::sort win.

Re: Sorting algorithms

Posted: Thu May 04, 2017 11:49 am
by Rebel
petero2 wrote:* The most common case is that you only need to pick one move (happens in 71% of the cases for the data I posted), and no sorting algorithm can beat a single linear scan over the data.
Agree.
petero2 wrote:The assembly code for the loop looks like this:

Code: Select all

.L243&#58;
        movl    16&#40;%rsi&#41;, %ecx
        cmpl    %ecx, %edx
        cmovl   %eax, %r8d
        cmovl   %ecx, %edx
        addl    $1, %eax
        addq    $16, %rsi
        cmpl    %edi, %eax
        jne     .L243
Yep, I have something similar in ASM since the early 80's. During the years spent weeks to find something faster but never managed, in best cases quicksort (in ASM also) became close.

Re: Measurement data

Posted: Fri May 05, 2017 8:13 pm
by Ras
I have run profiling benchmarks with my engine. Three scenarios have been benchmarked:

- highly optimised Shellsort right after generating the pseudo-legal move lists
- Selection sort right before using a pseudo-legal move in the node loop
- A hybrid: right before the node loop, a selection sort swaps the best pseudo-legal MVA/LVV move to the top of the list. If i==1 in the node loop, this top move did not cause a cut-off, and a highly optimised Shellsort is performed on the remaining move list.

The first ten moves after 1. e2-e4 (book off) were profiled, the engine taking white and black in turn. Time was set to 10s per move, reaching a typical depth of 9 plies on my machine.

I tried all of the scenarios with GCC 6.3.0 (32bit, x86) in -O00 and -O02. The results had to be corrected because GPROF's sampling function took between 30% and 45% of the computing time, so the sorting percentages were corrected to refer to the actual program time without the sampling function.

For the hybrid scenario, the time for the Selection sort and the Shellsort was added up. Shellsort took 81% of the sorting time, Selection sort the remaining 19%.

O0:
Shell: 16.5% (factor 2.64)
Select: 6.26% (factor 1.00)
Hybrid: 8.78% (factor 1.40)

O2:
Shell: 9.66% (factor 1.99)
Select: 4.85% (factor 1.00)
Hybrid: 5.37% (factor 1.11)

Although the Shellsort is highly optimised, there are a lot of cases where the first moves cause a cut-off so that sorting the rest of the list was pointless.

Even considering only the first pseudo-legal move (hybrid scenario) is quite an improvement. Note that this move might be illegal and therefore not causing a cut-off in this case.

Since the hybrid version isn't much slower on average, but safeguards against occasional worst case O(N^2) performance of pure Selection sort, I think I'll go with that one.

Re: Measurement data

Posted: Fri May 05, 2017 11:04 pm
by hgm
Is this all after you searched the hash move?