Causes for inconsistent benchmark signatures

Evert · Post by **Evert** » Thu Mar 28, 2013 11:13 am

hgm wrote:Well, if you really want to know, I guess you should look for the lowes search depth where you see a difference, and then do a search that reports the 'split node counts' for every move separately at 1, 2, 3... ply, and zoom in on the difference using that, to see which node makes the difference.

Yup, that's what I ended up doing (more or less): identify the search depth where the problem first appeared (which turned out to be 2) and then identify the exact position (the benchmark searches a couple) that causes the deviation, then print out the movelist that is searched (which isn't hard for a depth of 2).

This revealed the presence of a spurious en-passant capture in the movelist in 32-bit Linux that wasn't there in 64 bits or on OS X, which was generated because of a spurious en-passant target square being set from loading the FEN position (I don't mean an en-passant square being set while the capture is illegal, I mean a random square on the board was marked as "en passant" square). This turned out to be a harmless square in 64 bit mode, but generated a spurious pawn capture in 32 bit mode.

Tracking where the en-passant square came from, it turned out that the FEN position was incorrect and was missing the en-passant field (or the castling field, it's hard to tell the difference if one of them is missing and the other one is -).

So I corrected the FEN string and made my FEN parser more robust when dealing with incorrect FENs.

That corrected the problem and I now get consistent node-counts across different platforms, compilers and compiler options.

Phew.

hgm · Post by **hgm** » Thu Mar 28, 2013 1:03 pm

Life is so much easier at 2 ply!

In the new Fairy-Max derivative I am making (introducing move sorting in a real move list) I fixed a problem this morning that made random pieces disappear from the board, in a rather irreproducible way. Saving and restoring the piece on the to-square of e.p. captures seems to have made this problem go away. I guess that is what you get when you transfer the info whether a move is an e.p. capture through the move encoding in the hash table, in stead of basing it on the move generator's knowledge that it was a Pawn that was moving, and it went to the e.p. square.

Rein Halbersma · Post by **Rein Halbersma** » Thu Mar 28, 2013 3:24 pm

Evert wrote: What are some common-causes for inconsistencies like these? I've eliminated a random component to root move ordering (so Jazz doesn't always play the same moves early on; disabled for this) and calling libc's qsort() function (which is an unstable sort and will produce slightly different ordering on different platforms because of differences in the implementation).

I would try using std::stable_sort if you can use C++/STL, or write your own insertion sort to rule out algorithm related inconsistencies. E.g. in Stockfish they do

Code: Select all

// Our insertion sort, guaranteed to be stable, as is needed
  void insertion_sort(MoveStack* begin, MoveStack* end)
  {
    MoveStack tmp, *p, *q;

    for (p = begin + 1; p < end; ++p)
    {
        tmp = *p;
        for (q = p; q != begin && *(q-1) < tmp; --q)
            *q = *(q-1);
        *q = tmp;
    }
  }

bob · Post by **bob** » Thu Mar 28, 2013 10:23 pm

Evert wrote:I added a "benchmark" command to Jazz (searching a given set of positions to a given depth and reporting total nodes as well as time and NPS), which will hopefully be very useful. At the moment it's a bit of a pain, however.

The problem is I get different total node-counts in different environments.

OS X, 32 and 64 bit (GCC and Clang):
signature 3356838

Linux 64 bit
signature 3356838

Linux 32 bit
signature 3358082

Windows 64 bit (cross compiled and running under Wine)
signature 3340172

Windows 32 bit (cross compiled and running under Wine)
signature 3357543

What are some common-causes for inconsistencies like these? I've eliminated a random component to root move ordering (so Jazz doesn't always play the same moves early on; disabled for this) and calling libc's qsort() function (which is an unstable sort and will produce slightly different ordering on different platforms because of differences in the implementation).

That is almost certainly a bug that needs fixing. NO reason for the node counts to vary, assuming NOTHING changes but the 32/64 bit compiler differences. IE same hash size, etc... Anything else indicates a bug.

Evert · Post by **Evert** » Fri Mar 29, 2013 1:05 pm

Of course the fun started up again when I re-ran the benchmark test one ply deeper than before and I again ended up with slightly different node counts.

This time I knew the difference occurred at depth=9, because all was fine at depth=8. Fortunately I wrote some rather extensive logging functions a long time ago that print out the search tree along with search bounds, check state, position being searched, pruning decisions, almost everything, in short. So all I had to do was switch the logging on (through a pre-processor symbol), tweak it a bit (so it would only kick in after the 8-ply search) and run the benchmark on both systems. Diff (actually vimdiff) did the rest.

Turns out I had a typo in the code that verifies mate-scores in the quiescence search, which was accessing an out-of-bounds array index. Maybe that code is a bit too clever to keep anyway, but I fixed the bug rather than delete the code. All seems well again now.

Sadly this does mean I missed my chance to release a new version of Jazz at revision 720, which would have been neat.

Michel · Post by **Michel** » Fri Mar 29, 2013 1:25 pm

which was accessing an out-of-bounds array index.

This wasn't caught by Valgrind? That's possible since Valgrind does not
catch all memory errors.

Was this a global array or an array on the stack? If it was a global
array then the error might have been caught by compiling with libefence.
A buffer overrun on the stack might have been caught by enabling the gcc stack protection options.

Evert · Post by **Evert** » Fri Mar 29, 2013 4:40 pm

Michel wrote: This wasn't caught by Valgrind? That's possible since Valgrind does not
catch all memory errors.

No. But it's possible that I didn't run the benchmark to the same depth, or, come to think of it, it may be that it wasn't actually accessing an out-of-bounds element but just the wrong (almost random) element. The calculation for the index was wrong and I fixed it without checking what incorrect element it was probing.

Was this a global array or an array on the stack?

Well, neither, I guess - it's a dynamically allocated array that resides in a struct that is passed to the function.

Evert · Post by **Evert** » Fri Mar 29, 2013 4:43 pm

Rein Halbersma wrote: I would try using std::stable_sort if you can use C++/STL, or write your own insertion sort to rule out algorithm related inconsistencies.

Yes, that's what I did (use my own sort; the code is C, so no STL). In fact I had the code in there already, but disabled. Possibly because it tested as slower on selected test positions back when I wrote it; tests as faster now though, so it's not only a bug fix but actually an improvement.

Causes for inconsistent benchmark signatures

Re: Causes for inconsistent benchmark signatures

Re: Causes for inconsistent benchmark signatures

Re: Causes for inconsistent benchmark signatures

Re: Causes for inconsistent benchmark signatures

Re: Causes for inconsistent benchmark signatures

Re: Causes for inconsistent benchmark signatures

Re: Causes for inconsistent benchmark signatures

Re: Causes for inconsistent benchmark signatures