Question to syzygy author

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
hgm
Posts: 27794
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Question to syzygy author

Post by hgm »

syzygy wrote:..., but does not help so much in finding an optimal swindling strategy (whatever that might be... if the opponent uses the same tables nothing will help anyway).
Indeed, optimizing swindling chances requires opponent modelling. But basically all of Chess is swindling, because the initial position is (believed to be) a draw. So the heuristic evaluation is pretty good at it.

Usually the chances for a swindle are better in complex positions than in simple ones. This argues for taking as much of the pain as you can in the current phase, rather than deferring it to a later phase. If the opponent uses the same tables as you, nothing will matter. But if he doesn't, it is much more likely that he will lack the 6-men tables than that he will lack the 4-men tables, etc.
mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Re: Question to syzygy author

Post by mcostalba »

syzygy wrote: You're welcome :)
Thanks again for your clear and detailed answers!

In probe_dtz_table I have found this code (reformatted by me, but same functionality than original):

Code: Select all

    if (DTZTable.front().key != key && DTZTable.front().key2 != key) {
        
        // Enforce "Most Recently Used" (MRU) order for DTZ_list
        for (auto it = DTZTable.begin(); it != DTZTable.end(); ++it)
            if (it->key == key) {
                // Move to front without deleting the element
                DTZTable.splice(DTZTable.begin(), DTZTable, it);
                break;
            }

        // If still not found, add a new one
        if (DTZTable.front().key != key) {

            WDLEntry* wdlEntry = WDLHash[key];
            if (!wdlEntry) {
                *success = 0;
                return 0;
            }

            .....   add new entry ....
In particular we check if current first DTZ entry in the list matches current position, and we test again both keys in this case, but if comparison fails, we continue searching across the DTZ list but in this case testing only first key. There is a reason for this? I have measured about 2% of cases where first key is wrong, but second key matches the position's key.

Also a side question: what is the rationale to limit the DTZ list to 64 entries?
syzygy
Posts: 5563
Joined: Tue Feb 28, 2012 11:56 pm

Re: Question to syzygy author

Post by syzygy »

mcostalba wrote:In particular we check if current first DTZ entry in the list matches current position, and we test again both keys in this case, but if comparison fails, we continue searching across the DTZ list but in this case testing only first key. There is a reason for this? I have measured about 2% of cases where first key is wrong, but second key matches the position's key.
I can't think of a reason, so this seems to be a bug. Not critical, but it should be fixed. Thanks for finding it :)
Also a side question: what is the rationale to limit the DTZ list to 64 entries?
If the engine just plays a single game, the limit will not be reached.
If it plays many games, or if the engine is used to analyse many endgames, many DTZ files could be opened and mmap()ed to memory that are not useful for the current position. Because opening them is not time critical, it seemed sensible to place a limit on the number of DTZ files open at the same time.

64 is just a round number. It could probably be lowered to 16 or even 8. If there is no memory pressure, any cached data will remain cached in RAM anyway. If there is memory pressure, then data in tables that are no longer mmap()ed is probably released more quickly, which is good. On the other hand, I don't expect large parts of the DTZ files to be cached in RAM. Only relatively few positions are probed, so the OS should only cache a few disk pages. So difficult to say if changing it will have any good or bad impact at all...
User avatar
yurikvelo
Posts: 710
Joined: Sat Dec 06, 2014 1:53 pm

Re: Question to syzygy author

Post by yurikvelo »

Onegin wrote:No, I'm not a developer, I'm a user. ... I couldn't find an explanation for such a behavior and started thinking about EGTBs
ask your questions in topic "Syzygy vs Nalimov" etc and I will explain
User avatar
velmarin
Posts: 1600
Joined: Mon Feb 21, 2011 9:48 am

Re: Question to syzygy author

Post by velmarin »

yurikvelo wrote:
Onegin wrote:No, I'm not a developer, I'm a user. ... I couldn't find an explanation for such a behavior and started thinking about EGTBs
ask your questions in topic "Syzygy vs Nalimov" etc and I will explain
New moderator. :evil: :evil:
mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Re: Question to syzygy author

Post by mcostalba »

syzygy wrote: 64 is just a round number. It could probably be lowered to 16 or even 8. If there is no memory pressure, any cached data will remain cached in RAM anyway. If there is memory pressure, then data in tables that are no longer mmap()ed is probably released more quickly, which is good. .
I don't think you can map more files than available virtual address space, so maybe the limit becomes sensible with 32 bit architectures and many big DTZ files open.

Coming back to source code :-) I am trying to understand decompress_pairs() but it seems another very complicated part of syzygy. Care to please walk through that code enlightening me? Thanks.
syzygy
Posts: 5563
Joined: Tue Feb 28, 2012 11:56 pm

Re: Question to syzygy author

Post by syzygy »

mcostalba wrote:
syzygy wrote: 64 is just a round number. It could probably be lowered to 16 or even 8. If there is no memory pressure, any cached data will remain cached in RAM anyway. If there is memory pressure, then data in tables that are no longer mmap()ed is probably released more quickly, which is good. .
I don't think you can map more files than available virtual address space, so maybe the limit becomes sensible with 32 bit architectures and many big DTZ files open.
Yes, but on a 32-bit architecture the 6-piece WDL files will also quickly run out of address space. (In theory you could also unmmap() WDL files, but that is a lot of hassle and you would need to add a lot of locking. As you see, things are already quite complicated, so it is nice that we can leave memory management to the OS.)
Coming back to source code :-) I am trying to understand decompress_pairs() but it seems another very complicated part of syzygy. Care to please walk through that code enlightening me? Thanks.
The first two lines deal with the special case that all table positions store the same value (this happens in a few cases because of the don't cares, but one could wonder whether it was a good idea to make this a special case; anyway, that's what I did). Reusing d->idxbits and d->min_len for this is rather ugly.

To understand the rest, you first have to know that the compressed data (to which d->data points) is divided into blocks of size 1 << d->blocksize.

Each block stores a number of symbols compressed using Huffman coding. Each symbol represents either a WDL or (remapped, see above) DTZ value, or a pair of other symbols (recursively). If you keep expanding the symbols in a block, you end up with up to 65536 WDL or DTZ values.

Each symbol represents up to 256 values and will correspond after Huffman coding to at least 1 bit. So a block of 32 bytes corresponds to at most 32 x 8 x 256 = 65536 values. This maximum is only reached for tables that consist mostly of draws or mostly of wins, but such tables are actually quite common.

In principle, the blocks in WDL tables are 64 bytes long (and will be aligned on cache lines). But for mostly-draw or mostly-win tables this can leave many 64-byte blocks only half-filled, so in such cases blocks are 32 bytes long.

The blocks of DTZ tables are up to 1024 bytes long. The generator picks the size that leads to the smallest table (too big blocks means too many are not full, too small blocks means too much overhead is lost on indexing etc.).

The "book" of symbols and Huffman codes is the same for all blocks in the table (a non-symmetric pawnless TB file will have one table for wtm and one for btm, a TB file with pawns will have tables per file a,b,c,d). Everything is in the PairsData struct to which "d" points.

So now the code. First we need to locate the right block that stores the value at index "idx".

Since each block stores a variable number of values (up to 65536), we have an array that stores the number of values for each block: d->sizetable[]. Each entry is 2 bytes and you have to add 1 to the value stored in it to get the real size (so a stored value of 65535 means 65536 symbols).

We could start at block 0 and keep skipping blocks one by one, each time subtracting d->sizetable[current_block] from idx, until we are in the block that contains the value we want. But that is too much work. Instead, we have an index structure d->indextable with entries that point into d->sizetable.

This is slightly complicated. We have about one such entry per 1 << d->idxbits index values. Each entry consists of a block number (of 4 bytes) and a (sub)index into that block (of 2 bytes). The block number and (sub)index into the block for entry k corresponds to the location of the following index (let's call it N(k)):

Code: Select all

  N&#40;k&#41; = k * &#40;1 << d->idxbits&#41; + &#40;1 << &#40;d->idxbits - 1&#41;)
Given our index "idx" we calculate this value "k":

Code: Select all

  uint32 mainidx = idx >> d->idxbits;
We also calculate how far our "idx" is from N(k):

Code: Select all

  int litidx = &#40;idx & (&#40;1 << d->idxbits&#41; - 1&#41;) - &#40;1 << &#40;d->idxbits - 1&#41;);
This simply calculates litidx = idx - N(k).

We read out the number of the block that stores N(k):

Code: Select all

  uint32 block = *&#40;uint32 *)&#40;d->indextable + 6 * mainidx&#41;;
We add the (sub)index into that block:

Code: Select all

  litidx += *&#40;ushort *)&#40;d->indextable + 6 * mainidx + 4&#41;;
If idx happened to be equal to N(k), then litidx is now equal to the (sub)index into the block that corresponds to N(k), which is what we want.

If idx was smaller or bigger than N(k), then litidx will now be as much smaller or bigger than the (sub)index into the block that corresponds to N(k), so IF litidx points into the block, then we know precisely which value we need.

However, usually we need one of the preceding blocks and we have to roll back a bit:

Code: Select all

  if &#40;litidx < 0&#41; &#123;
    do &#123;
      litidx += d->sizetable&#91;--block&#93; + 1;
    &#125; while &#40;litidx < 0&#41;;
  &#125;
or we need one of the succeeding block and we have to go forward a bit:

Code: Select all

  else &#123;
    while &#40;litidx > d->sizetable&#91;block&#93;)
      litidx -= d->sizetable&#91;block++&#93; + 1;
  &#125;
(The reason for letting the entries point into the "middle" of a series of blocks instead of to the beginning is that this reduces the average number of iterations of these two loops by half compared to a single loop that always goes forward. I think I aim for about one "mainidx" per about 16 blocks, so max about 8 blocks backward or forward.)

Finally, we find the address of our block:

Code: Select all

  uint32 *ptr = &#40;uint32 *)&#40;d->data + &#40;block << d->blocksize&#41;);
The variable "litidx" is the (sub)index into this block of the value we need.

(I just noticed I have been copying and pasting from my own code instead of that of Stockfish, which also deals with endianness.)

Now comes Huffman decoding. We decode symbol by symbol until we are at the symbol we need. After each symbol we subtract from litidx the number of values the symbol represents, which is d->symlen[sym] + 1 (so at least 1 and at most 256). We use a canonical Huffman code. The first part of the loop decodes the next symbol:

Code: Select all

    int l = m;
    while &#40;code < base&#91;l&#93;) l++;
    sym = offset&#91;l&#93;;
    if (!LittleEndian&#41;
      sym = (&#40;sym & 0xff&#41; << 8&#41; | &#40;sym >> 8&#41;;
    sym += static_cast<int>(&#40;code - base&#91;l&#93;) >> &#40;64 - l&#41;);
The variable l is the number of bits of the symbol. The canonical code is ordered such that longer symbols (in terms of the number of bits of their Huffman code) come first. So if base[l] <= code < base[l - 1] where "code" contains the next 64 bits of the compressed data, then the first l (most significant) bits of "code" encode the next symbol.

Then the length (in values) of the symbol is subtracted and, if necessary, the next l bits are fetched. This could run into the next block, but that is OK. We cannot run beyond the end of the mmap()ed file, because there is still a 16-byte checksum at the end.

Code: Select all

    if &#40;litidx < &#40;int&#41;symlen&#91;sym&#93; + 1&#41; break;
    litidx -= &#40;int&#41;symlen&#91;sym&#93; + 1;
    code <<= l;
    bitcnt += l;
    if &#40;bitcnt >= 32&#41; &#123;
      bitcnt -= 32;
      uint32 tmp = *ptr++;
      if &#40;LittleEndian&#41;
        tmp = BSWAP32&#40;tmp&#41;;
      code |= (&#40;uint64&#41;tmp&#41; << bitcnt;
     &#125;
When we have the right symbol, we expand it into two symbols, choose the left or right symbol of the pair as appropriate (the one that contains the value we are looking for) and keep expanding until we are at a symbol of length 1 (symlen[sym] +1 == 1), which is the value we need:

Code: Select all

  ubyte *sympat = d->sympat;
  while &#40;symlen&#91;sym&#93; != 0&#41; &#123;
    ubyte* w = sympat + &#40;3 * sym&#41;;
    int s1 = (&#40;w&#91;1&#93; & 0xf&#41; << 8&#41; | w&#91;0&#93;;
    if &#40;litidx < &#40;int&#41;symlen&#91;s1&#93; + 1&#41;
      sym = s1;
    else &#123;
      litidx -= &#40;int&#41;symlen&#91;s1&#93; + 1;
      sym = &#40;w&#91;2&#93; << 4&#41; | &#40;w&#91;1&#93; >> 4&#41;;
    &#125;
  &#125;
Well, it's not yet the value we need. There is one last mapping step:

Code: Select all

 return sympat&#91;3 * sym&#93;;
Each sympat[] entry has 3 bytes or rather 2 x 12 bits. The first 12 bits is the left-hand symbol of the pair, the second 12 bits is the right-hand symbol. If the symbol has length 1, then it corresponds to a WDL or DTZ value which is held in the first byte of sympat.
mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Re: Question to syzygy author

Post by mcostalba »

syzygy wrote: So now the code.
That was hard :-)

This one is the most complex so far (and I think in general because I have more or less looked at all the code base now, apart the high level probing).

I have fully documented and also reformatted a bit (well, not just a bit :-) ) all this part . I hope it is more clear now, at least it is for me.

Thanks for your support!

Full code is here: https://github.com/official-stockfish/S ... bprobe.cpp
User avatar
velmarin
Posts: 1600
Joined: Mon Feb 21, 2011 9:48 am

Re: Question to syzygy author

Post by velmarin »

mcostalba wrote:
That was hard :-)

This one is the most complex so far (and I think in general because I have more or less looked at all the code base now, apart the high level probing).

I have fully documented and also reformatted a bit (well, not just a bit :-) ) all this part . I hope it is more clear now, at least it is for me.

Thanks for your support!

Full code is here: https://github.com/official-stockfish/S ... bprobe.cpp

I really like your work, very interesting.
I have compiled and I have been an exception.
Perhaps mine is a problem, but I don't think so.
[pgn]
[Event "DESKTOP-QFSOBQ2, Blitz 1m"]
[Site "DESKTOP-QFSOBQ2"]
[Date "2016.05.14"]
[Round "2"]
[White "Stockfish MC 64"]
[Black "Gull.3.0_JVx32"]
[Result "*"]
[ECO "A37"]
[Annotator "0.00;-0.05"]
[PlyCount "118"]
[TimeControl "60"]

{AMD Phenom(tm) II X4 B50 Processor 3215 MHz W=16.5 plies; 795kN/s; 39.627
TBAs B=12.5 plies; 726kN/s; 64.597 TBAs} 1. Nf3 g6 2. c4 Bg7 3. Nc3 c5 4. g3
Nc6 5. Bg2 e6 6. O-O Nge7 7. d3 O-O 8. Bd2 d6 {Ambos última jugada del libro}
9. Qc1 {0.00/17 3} d5 {-0.05/11 1 (Cf5)} 10. Bh6 {0.14/18 1} Bxh6 {-0.04/12 1}
11. Qxh6 {0.04/18 1} f6 {0.00/14 2} 12. Rad1 {0.18/18 2} Nf5 {-0.08/15 3 (Tf7)}
13. Qh3 {0.07/17 1} h5 {-0.02/14 4 (d4)} 14. g4 {0.00/17 2} hxg4 {-0.18/14 2}
15. Qxg4 {0.03/18 3} Kg7 {-0.34/14 4} 16. e4 {0.10/17 3} d4 {-0.29/13 2} 17.
Na4 {0.12/17 1 (Ce2)} Nh6 {-0.59/12 1} 18. Qg3 {0.09/16 0} e5 {-0.62/13 3 (b6)}
19. Nxc5 {-0.35/17 2 (h4)} Qa5 {-0.54/13 0} 20. b4 {-0.34/19 4 (Cb3)} Qxb4 {-0.
63/12 1} 21. Nb3 {-0.43/18 1} Qe7 {-0.71/13 2 (Th8)} 22. h4 {-0.36/17 4 (h3)}
Bg4 {-0.63/12 2 (Rh7)} 23. Rd2 {-0.31/16 1} Kh7 {-0.64/13 1 (Ah5)} 24. Nh2 {-0.
31/15 1 (Tb1)} Be6 {-0.57/12 1 (Ah5)} 25. f4 {-0.13/14 1} exf4 {-0.51/12 1
(Tad8)} 26. Rxf4 {0.00/17 2} Rad8 {-0.58/12 1 (Cf7)} 27. Bh3 {0.00/14 1 (Cf3)}
Bxh3 {-0.59/10 1} 28. Qxh3 {-0.14/11 0} a5 {-0.65/11 3 (Rh8)} 29. Nf3 {0.00/14
2 (Cc1)} Rg8 {-0.97/9 0 (a4)} 30. Rg2 {0.00/15 4} a4 {-0.54/9 1 (Rh8)} 31. Nc1
{0.20/16 1} Kh8 {-0.42/12 1 (Td6)} 32. Ne2 {0.41/13 0} b6 {-0.30/12 2 (Cf7)}
33. Nd2 {0.53/16 2 (Rh1)} f5 {0.22/13 4} 34. Rf1 {0.85/15 2} Ne5 {0.22/12 0
(Dd6)} 35. Nf4 {1.16/13 0} Nhg4 {0.22/12 0 (Dd6)} 36. Nf3 {0.63/14 1 (h5)}
Nxf3+ {0.36/12 2} 37. Rxf3 {0.91/13 0} fxe4 {0.36/11 0 (Dh7)} 38. Qxg4 {1.10/
15 1} exf3 {0.28/12 0} 39. Nxg6+ {0.52/13 0} Rxg6 {0.28/12 0} 40. Qh5+ {0.39/
19 1} Qh7 {0.28/11 0} 41. Qxg6 {0.39/16 0} Qxg6 {0.43/12 0 (Tf8)} 42. Rxg6 {0.
54/16 0} Re8 {0.43/12 0} 43. Rxb6 {0.35/18 1} Re2 {0.27/13 0 (Te3)} 44. a3 {0.
67/18 0} Rg2+ {0.15/13 0 (Te3)} 45. Kf1 {0.10/18 1} Ra2 {0.15/12 0} 46. c5 {0.
51/18 1} Rxa3 {0.15/11 0} 47. c6 {0.37/19 1} Kg7 {0.18/13 1} 48. Rb7+ {0.31/16
0 (c7)} Kf6 {0.24/14 0} 49. c7 {0.13/17 0} Rc3 {0.24/13 0} 50. Kf2 {0.11/16 0}
Ke5 {0.24/12 0} 51. Kg3 {0.13/18 0} a3 {0.24/14 1} 52. Ra7 {0.14/15 0} Kd6 {0.
24/13 0 (f2)} 53. h5 {0.29/15 0} Rxc7 {0.24/11 0} 54. Rxa3 {0.22/13 0} Rc3 {0.
16/12 0} 55. Ra6+ {0.00/17 1 (Ta1)} Ke5 {0.14/10 0 (Re7)} 56. h6 {0.00/18 0
(Ta1)} Rxd3 {0.00/9 0} 57. h7 {0.00/19 0 (Ta1)} f2+ {0.00/10 0} 58. Kxf2 {0.00/
20 0 (Rg2)} Rh3 {0.00/16 0} 59. h8=Q+ {0.00/26 0 (Ta5+)} Rxh8 {0.00/27 0
Stockfish MC 64: no move Stockfish MC 64 caused an exception, game stopped.} *

[/pgn]
[D]8/7P/R7/4k3/3p4/7r/5K2/8 w - - 0 59

Code: Select all

1935&#58; Stockfish MC 64 - Gull.3.0_JVx32, DESKTOP-QFSOBQ2, Blitz 1m
8/7P/R7/4k3/3p4/7r/5K2/8 w - - 0 1

Analysis by Stockfish MC 64&#58;

59.h8D+ Txh8 
  =  &#40;0.00&#41;   Profundidad&#58; 7/3   00&#58;00&#58;00  0kN, tb=5
59.h8D+ Txh8 
  =  &#40;0.00&#41;   Profundidad&#58; 8/3   00&#58;00&#58;00  0kN, tb=6
59.h8D+ Txh8 
ect,ect
  =  &#40;0.00&#41;   Profundidad&#58; 72/3   00&#58;02&#58;08  119mN, tb=2854604

Next move, no move Stockfish MC 64 caused an exception,
syzygy
Posts: 5563
Joined: Tue Feb 28, 2012 11:56 pm

Re: Question to syzygy author

Post by syzygy »

That's looking very good!

If you like, feel free to add a copyright notice to the top of the file and replace the "without restrictions" license with the GPL. (The point of "without restrictions" is that you already have permission for that.)