more STL code for Stockfish

bob · Post by **bob** » Sat Jul 16, 2011 5:31 pm

Rein Halbersma wrote:
wgarvin wrote:A couple further thoughts...

(1) Rein's version swaps two array elements, while the original version swaps a temporary (T bestMove) with an array element. So the original swap is cheaper (bestMove is in registers). That's also why the original trashes the first element, which oddly enough, seems to have a bit of performance benefit? You could fix the trashing by just storing bestMove back to the front of the list on exiting the function, and it might not slow things down much if at all.

(2) When I said the load and store instructions for the original's swap "might cost nothing", what I meant is that they are happening inside the loop and the loop does only one load and no stores. So if we assume that move(s) requiring a swap are found "somewhere in the middle" of the loop, the 2 loads and 2 stores to carry out the swap are going to be overlapped with the later executions of the loop. In comparison, Rein's version always does the swap at the end, so if anything, those loads and stores are going to overlap with other work done by the calling function after this one returns. The caller probably does more loading and storing, so the swap in Rein's version is probably competing for those resources where the original version did not.

All in all, it seems like a case where the "straightforward" hand-written code turned out to be easier for the compiler to optimize [edit: or intrinsically better suited to the 32-bit x86 platform with its few registers and great OOE]. Even if it has some quirks, it also has great performance which the "cleaner" STL solution is unlikely to be able to match. The compiler might do a better job on Rein's version if it was targeting x86-64 and had more registers available, I don't know.
Hi Wylie,

Impressed as I am about your reasoning about this assembly code, I'm afraid it defeats the whole purpose of using the STL. So what if you save a few cycles here or there. The whole point was to reduce 13 lines of code that you have to a) first implement and check for correctness and b) maintain, to only 3 lines of code that much more clearly document what is going on.

If the SF code has some particular assumptions that are slightly more specific than the general assumptions that the STL implementers have to code for, there will always be a slight performance gain by coding an algorithm yourself. But the time saved on doing it the STL way can be spent on real chess things (such as reducing the number of nodes searched, rather than a few cycles per node searched).

Rein

To a chess programmer, it is not about lines of code. It is about number of cycles burned.

Onno Garms · Post by **Onno Garms** » Sat Jul 23, 2011 10:44 am

Rein Halbersma wrote:The sort_moves function from Stockfish header move.h
should be functionally equivalent to the following piece of STL code

I didn't read through the whole discussion but I'd like to note that in Onno, I started with std::sort (or std::stable_sort?) from the STL (fastest to implement) and later replaced it by my own implementation (faster to execute) because the profiler advised me to do so. std::sort performed so badly that it had considerable impact on the overall performance of Onno!

Don't get me wrong: I have written a lot of programs and in most cases using the STL is a good idea. Using std::sort is a good idea too in most cases. But definitely not for a chess engine.

Also, of course std::sort is well implemented in most (if not all) implementations of the STL. Years ago I tried to outperform std::sort with my own implementation of quicksort and failed badly.

But look: std::sort has to work for any size of array. What is the fastest depends on the array size. Have you peeked in the implementation of std::sort? On small arrays (i.e. smaller than something like 40) it does an O(n^2) insert sort! The n log n algorithms would not pay off. This means that std::sort has to test for the array size. When sorting moves, we don't have to test. We know already that the array is small because there aren't positions with hundreds of moves.

Also, insert sort performs well if the input is roughly sorted. It performs in linear time if the input is sorted.

Summa summarum: Using std::sort is good for a quick implementation. It is also good when you don't know anything about the input, because most likely you won't be able to outperform it. However, it is possible (and actually quite easy) to write something better than std::sort if you know your input.

mcostalba · Post by **mcostalba** » Sat Jul 23, 2011 11:13 am

Onno Garms wrote: Don't get me wrong: I have written a lot of programs and in most cases using the STL is a good idea. Using std::sort is a good idea too in most cases. But definitely not for a chess engine.

The biggest drawback of std::sort() is that is platform dependent: you get different results when you compile with MSVC and Intel or when you compile for Windows or Linux (due to different libraries implementation). This simply is a showstopper for an engine with multiplatform support.

Onno Garms · Post by **Onno Garms** » Sat Jul 23, 2011 11:22 am

mcostalba wrote: The biggest drawback of std::sort() is that is platform dependent:

That can be resolved (on the cost of runtime) by using std::stable_sort().

more STL code for Stockfish

Re: more STL code for Stockfish

Re: more STL code for Stockfish

Re: more STL code for Stockfish

Re: more STL code for Stockfish