Question to syzygy author

mcostalba · Post by **mcostalba** » Tue May 03, 2016 7:29 am

syzygy wrote:Line by line explanation of the "111" case.

Thanks for the detailed explanation of piece encoding: I have added your explanations in the code and fully commented the piece encoding.

Could you please do the same walk through for pawn encoding? I have seen there are some differences, but I am not able to figure them out.

syzygy · Post by **syzygy** » Tue May 03, 2016 10:36 pm

mcostalba wrote:Could you please do the same walk through for pawn encoding? I have seen there are some differences, but I am not able to figure them out.

One tricky aspect of pawn encoding is that TB files store separate tables for files a, b, c and d. Determining this file is easy if the position for one color has a single pawn, as that pawn will (possibly after mirroring) be in one of the files a, b, c and d, which will determine the table within the TB file that must be probed. Things are a bit more complicated with at least 2 pawns of the same color, as one pawn might be in a and the other in b. Or even worse, one might be in b and the other in h which after mirroring means that one is in a and the other in g. So basically the generator and the probing code have to agree on how to assign pawn combinations to the files a-d.

Things are complicated further by the fact that ordering of pawns and pieces (for the purpose of improving compression) is done separately for each file-table. So arrays like norm[] and factor[] may be different for each file, and the order in which piece positions are extracted from the current board representation may also be different.

This is why probe_wdl_table() and probe_dtz_table() first extracts the pawn positions for the "leading color" (the same for all files, int k in probe_wdl_table()) and then calls pawn_file() to determine the correct file. This function orders the pawns according to which comes first on a2-a6,b2-b6,c2-c6,d2-d6 after mirroring the pawn to the left-hand side of the board and returns the file (from a-d) of the "first" pawn:

Code: Select all

static const ubyte flap&#91;&#93; = &#123;
  0, 0, 0, 0, 0, 0, 0, 0,
  0, 6, 12, 18, 18, 12, 6, 0,
  1, 7, 13, 19, 19, 13, 7, 1,
  2, 8, 14, 20, 20, 14, 8, 2,
  3, 9, 15, 21, 21, 15, 9, 3,
  4, 10, 16, 22, 22, 16, 10, 4,
  5, 11, 17, 23, 23, 17, 11, 5,
  0, 0, 0, 0, 0, 0, 0, 0
&#125;;

...

static int pawn_file&#40;struct TBEntry_pawn *ptr, int *pos&#41;
&#123;
  int i;

  for &#40;i = 1; i < ptr->pawns&#91;0&#93;; i++)
    if &#40;flap&#91;pos&#91;0&#93;&#93; > flap&#91;pos&#91;i&#93;&#93;)
      Swap&#40;pos&#91;0&#93;, pos&#91;i&#93;);

  return file_to_file&#91;pos&#91;0&#93; & 0x07&#93;;
&#125;

When this is done, the probing code extracts the remaining pawn and piece positions and calculates the index for the position by calling encode_pawn() with the norm[] and factor[] arrays for that file.

The function encode_pawn() first mirrors the leading piece to one of the a-d files, if necessary:

Code: Select all

  if &#40;pos&#91;0&#93; & 0x04&#41;
    for &#40;i = 0; i < n; i++)
      pos&#91;i&#93; ^= 0x07;

(This cannot yet be done in pawn_file(), because that function is called before all piece positions are extracted from the board representation.)

Now the non-leading pawns of the leading color are sorted according to the ptwist[] array and an index is calculated for the pawns of the leading color:

Code: Select all

static const ubyte ptwist&#91;&#93; = &#123;
  0, 0, 0, 0, 0, 0, 0, 0,
  47, 35, 23, 11, 10, 22, 34, 46,
  45, 33, 21, 9, 8, 20, 32, 44,
  43, 31, 19, 7, 6, 18, 30, 42,
  41, 29, 17, 5, 4, 16, 28, 40,
  39, 27, 15, 3, 2, 14, 26, 38,
  37, 25, 13, 1, 0, 12, 24, 36,
  0, 0, 0, 0, 0, 0, 0, 0
&#125;;

...

  for &#40;i = 1; i < ptr->pawns&#91;0&#93;; i++)
    for &#40;j = i + 1; j < ptr->pawns&#91;0&#93;; j++)
      if &#40;ptwist&#91;pos&#91;i&#93;&#93; < ptwist&#91;pos&#91;j&#93;&#93;)
        Swap&#40;pos&#91;i&#93;, pos&#91;j&#93;);

  t = ptr->pawns&#91;0&#93; - 1;
  idx = pawnidx&#91;t&#93;&#91;flap&#91;pos&#91;0&#93;&#93;&#93;;
  for &#40;i = t; i > 0; i--)
    idx += binomial&#91;t - i&#93;&#91;ptwist&#91;pos&#91;i&#93;&#93;&#93;;
  idx *= factor&#91;0&#93;;

This is basically the same binomial formula for placing a number of like pieces as used in encode_piece(), but with a twist.

Next, an index is calculated for the pawns of the non-leading color (if there are such pawns). This is the same as placing a number of like pieces, except that there can be no pawns in a1-h1, so at one place you can see a "- 8".

Finally, an index is calculated for each remaining piece or group of remaining like pieces. No "- 8" here.

The arrays norm[] and factor[] do the same as in encode_piece().

syzygy · Post by **syzygy** » Tue May 03, 2016 10:45 pm

I just noticed the memory barrier in probe_wdl_table(). It is necessary for double-checked locking.

In C++11 this can be implemented much more cleanly:

Code: Select all

struct TBEntry &#123;
...
  std&#58;&#58;atomic uchar ready;
...
&#125;;

...

  if (!ptr->ready.load&#40;std&#58;&#58;memory_order_acquire&#41;)
  &#123;
      TB_mutex.lock&#40;);
      if (!ptr->ready.load&#40;std&#58;&#58;memory_order_relaxed&#41;)
      &#123;
          char str&#91;16&#93;;
          prt_str&#40;pos, str, ptr->key != key&#41;;
          if (!init_table_wdl&#40;ptr, str&#41;)
          &#123;
              ptr2&#91;i&#93;.key = 0ULL;
              success = 0;
              TB_mutex.unlock&#40;);
              return 0;
          &#125;
          ptr->ready.store&#40;1, std&#58;&#58;memory_order_release&#41;;
      &#125;
      TB_mutex.unlock&#40;);
  &#125;

(And of course you might want to make the string manipulations look like C++

))

mcostalba · Post by **mcostalba** » Thu May 05, 2016 9:31 am

syzygy wrote:I just noticed the memory barrier in probe_wdl_table(). It is necessary for double-checked locking.

Thanks for your info on pawns and for this note, although there is no need of memory barrier inside a protected (locked section). The lock acts already as a memory barrier by itself.

This is the current implementation in the version of syzygy I am working on to port to SF coding style:

https://github.com/official-stockfish/S ... .cpp#L1225

syzygy · Post by **syzygy** » Thu May 05, 2016 1:40 pm

mcostalba wrote:
syzygy wrote:I just noticed the memory barrier in probe_wdl_table(). It is necessary for double-checked locking.
Thanks for your info on pawns and for this note, although there is no need of memory barrier inside a protected (locked section). The lock acts already as a memory barrier by itself.

There certainly is a need for it (or something equivalent, like synchronising on the ready flag). Without it, double-checked locking has undefined behavior and can (and will at some point) be miscompiled.

This is the current implementation in the version of syzygy I am working on to port to SF coding style:

Code: Select all

    // Init table at first access attempt
    if (!entry->ready&#41; &#123;
        std&#58;&#58;unique_lock<Mutex> lk&#40;TB_mutex&#41;;
        if (!entry->ready&#41; &#123;
            std&#58;&#58;string fname = file_name&#40;pos, entry->key != key&#41; + ".rtbw";
            if (!entry->init&#40;fname&#41;) &#123;
                // Was ptr2->key = 0ULL;  Just leave !ptr->ready condition
                *success = 0;
                return WDLDraw;
            &#125;
            entry->ready = 1;
        &#125;
    &#125;

Reordering of this code by the compiler or the processor may now cause "entry->ready = 1" to be executed before the call to entry->init is completely finished. (Such reordering by the compiler would not be surprising if the compiler decides to inline the function.)

The solution is easy: insert a memory barrier or, better, use C++11 atomics:

Code: Select all

    // Init table at first access attempt
    if (!entry->ready.load&#40;std&#58;&#58;memory_order_acquire&#41; &#123;
        std&#58;&#58;unique_lock<Mutex> lk&#40;TB_mutex&#41;;
        if (!entry->ready.load&#40;std&#58;&#58;memory_order_relaxed&#41; &#123;
            std&#58;&#58;string fname = file_name&#40;pos, entry->key != key&#41; + ".rtbw";
            if (!entry->init&#40;fname&#41;) &#123;
                // Was ptr2->key = 0ULL;  Just leave !ptr->ready condition
                *success = 0;
                return WDLDraw;
            &#125;
            entry->ready.store&#40;1, std&#58;&#58;memory_order_release&#41;;
        &#125;
    &#125;

mcostalba · Post by **mcostalba** » Thu May 05, 2016 5:46 pm

syzygy wrote:

Code: Select all

    // Init table at first access attempt
    if (!entry->ready&#41; &#123;
        std&#58;&#58;unique_lock<Mutex> lk&#40;TB_mutex&#41;;
        if (!entry->ready&#41; &#123;
            std&#58;&#58;string fname = file_name&#40;pos, entry->key != key&#41; + ".rtbw";
            if (!entry->init&#40;fname&#41;) &#123;
                // Was ptr2->key = 0ULL;  Just leave !ptr->ready condition
                *success = 0;
                return WDLDraw;
            &#125;
            entry->ready = 1;
        &#125;
    &#125;

Reordering of this code by the compiler or the processor may now cause "entry->ready = 1" to be executed before the call to entry->init is completely finished.

This would be totally harmless because even if another thread reads entry->ready = 1 while init is still pending, it will block upon taking the lock. Once lock is released, the following 'if (!entry->ready)' will return 'false' because the compiler _never_ reorders around a lock, this is ensured by lock semantic:

http://stackoverflow.com/questions/1094 ... inter-proc

Michel · Post by **Michel** » Thu May 05, 2016 5:56 pm

It seems to me that the issue is that if entry->ready = 1 is set prematurely another thread will read uninitialized data. The first branch in (!entry->ready)' (double checked locking) will return 'false'.

syzygy · Post by **syzygy** » Thu May 05, 2016 9:04 pm

mcostalba wrote:
syzygy wrote:
Code: Select all
    // Init table at first access attempt
    if (!entry->ready&#41; &#123;
        std&#58;&#58;unique_lock<Mutex> lk&#40;TB_mutex&#41;;
        if (!entry->ready&#41; &#123;
            std&#58;&#58;string fname = file_name&#40;pos, entry->key != key&#41; + ".rtbw";
            if (!entry->init&#40;fname&#41;) &#123;
                // Was ptr2->key = 0ULL;  Just leave !ptr->ready condition
                *success = 0;
                return WDLDraw;
            &#125;
            entry->ready = 1;
        &#125;
    &#125;
Reordering of this code by the compiler or the processor may now cause "entry->ready = 1" to be executed before the call to entry->init is completely finished.
This would be totally harmless because even if another thread reads entry->ready = 1 while init is still pending, it will block upon taking the lock.

If another thread reads entry->ready == 1, then !entry->ready will evaluate to false and that other thread will proceed to probe the table even though it has not been completely initialised. (So Michel is right.)

It is therefore necessary to "synchronise" the first load of entry->ready with the store to entry->ready and that is exactly what the acquire and release stuff does: it ensures that everything logically coming before the release will have been executed before the acquire takes place.

That there is no reordering through the lock does not help. As I wrote, the problem is that "entry->ready = 1" may be executed before "entry->init()" has completely finished. There is no locking between the call to entry->init() and setting of entry->ready. So this is why that memory_barrier() was added between the call and the store.

This is a very well-known problem of double-checked locking as Google will show you. And the solution in C++11 is simple.

mcostalba · Post by **mcostalba** » Fri May 06, 2016 8:55 am

syzygy wrote: This is a very well-known problem of double-checked locking as Google will show you. And the solution in C++11 is simple.

Thanks, now I understand.

Unfortunately I cannot just define 'ready' as std::atomic_bool because this disables the WDLEntry default copy c'tor so there is some issue adding 'ready' to WDLEntry struct, OTH defining a custom copy c'tor is very dumb because you have to manually assign all the fields....I will need to find some workaround.

Onegin · Post by **Onegin** » Fri May 06, 2016 9:01 am

I am a member of TalkChess.com and I accidentally stumbled into your discussion. I am an average computer user but can anybody answer my simple question about Syzygy tablebases. Why wdl TBs do NOT return DTM metric but simply a number -2, -1, 0, 1 or 2. Why not use both DTM and DTZ metrics instead of using engine search?

Question to syzygy author

Re: Question to syzygy author

Re: Question to syzygy author

Re: Question to syzygy author

Re: Question to syzygy author

Re: Question to syzygy author

Re: Question to syzygy author

Re: Question to syzygy author

Re: Question to syzygy author

Re: Question to syzygy author

Re: Question to syzygy author