mcostalba wrote:syzygy wrote: 64 is just a round number. It could probably be lowered to 16 or even 8. If there is no memory pressure, any cached data will remain cached in RAM anyway. If there is memory pressure, then data in tables that are no longer mmap()ed is probably released more quickly, which is good. .
I don't think you can map more files than available virtual address space, so maybe the limit becomes sensible with 32 bit architectures and many big DTZ files open.
Yes, but on a 32-bit architecture the 6-piece WDL files will also quickly run out of address space. (In theory you could also unmmap() WDL files, but that is a lot of hassle and you would need to add a lot of locking. As you see, things are already quite complicated, so it is nice that we can leave memory management to the OS.)
Coming back to source code

I am trying to understand decompress_pairs() but it seems another very complicated part of syzygy. Care to please walk through that code enlightening me? Thanks.
The first two lines deal with the special case that all table positions store the same value (this happens in a few cases because of the don't cares, but one could wonder whether it was a good idea to make this a special case; anyway, that's what I did). Reusing d->idxbits and d->min_len for this is rather ugly.
To understand the rest, you first have to know that the compressed data (to which d->data points) is divided into blocks of size 1 << d->blocksize.
Each block stores a number of symbols compressed using Huffman coding. Each symbol represents either a WDL or (remapped, see above) DTZ value, or a pair of other symbols (recursively). If you keep expanding the symbols in a block, you end up with up to 65536 WDL or DTZ values.
Each symbol represents up to 256 values and will correspond after Huffman coding to at least 1 bit. So a block of 32 bytes corresponds to at most 32 x 8 x 256 = 65536 values. This maximum is only reached for tables that consist mostly of draws or mostly of wins, but such tables are actually quite common.
In principle, the blocks in WDL tables are 64 bytes long (and will be aligned on cache lines). But for mostly-draw or mostly-win tables this can leave many 64-byte blocks only half-filled, so in such cases blocks are 32 bytes long.
The blocks of DTZ tables are up to 1024 bytes long. The generator picks the size that leads to the smallest table (too big blocks means too many are not full, too small blocks means too much overhead is lost on indexing etc.).
The "book" of symbols and Huffman codes is the same for all blocks in the table (a non-symmetric pawnless TB file will have one table for wtm and one for btm, a TB file with pawns will have tables per file a,b,c,d). Everything is in the PairsData struct to which "d" points.
So now the code. First we need to locate the right block that stores the value at index "idx".
Since each block stores a variable number of values (up to 65536), we have an array that stores the number of values for each block: d->sizetable[]. Each entry is 2 bytes and you have to add 1 to the value stored in it to get the real size (so a stored value of 65535 means 65536 symbols).
We could start at block 0 and keep skipping blocks one by one, each time subtracting d->sizetable[current_block] from idx, until we are in the block that contains the value we want. But that is too much work. Instead, we have an index structure d->indextable with entries that point into d->sizetable.
This is slightly complicated. We have about one such entry per 1 << d->idxbits index values. Each entry consists of a block number (of 4 bytes) and a (sub)index into that block (of 2 bytes). The block number and (sub)index into the block for entry k corresponds to the location of the following index (let's call it N(k)):
Code: Select all
N(k) = k * (1 << d->idxbits) + (1 << (d->idxbits - 1))
Given our index "idx" we calculate this value "k":
Code: Select all
uint32 mainidx = idx >> d->idxbits;
We also calculate how far our "idx" is from N(k):
Code: Select all
int litidx = (idx & ((1 << d->idxbits) - 1)) - (1 << (d->idxbits - 1));
This simply calculates litidx = idx - N(k).
We read out the number of the block that stores N(k):
Code: Select all
uint32 block = *(uint32 *)(d->indextable + 6 * mainidx);
We add the (sub)index into that block:
Code: Select all
litidx += *(ushort *)(d->indextable + 6 * mainidx + 4);
If idx happened to be equal to N(k), then litidx is now equal to the (sub)index into the block that corresponds to N(k), which is what we want.
If idx was smaller or bigger than N(k), then litidx will now be as much smaller or bigger than the (sub)index into the block that corresponds to N(k), so IF litidx points into the block, then we know precisely which value we need.
However, usually we need one of the preceding blocks and we have to roll back a bit:
Code: Select all
if (litidx < 0) {
do {
litidx += d->sizetable[--block] + 1;
} while (litidx < 0);
}
or we need one of the succeeding block and we have to go forward a bit:
Code: Select all
else {
while (litidx > d->sizetable[block])
litidx -= d->sizetable[block++] + 1;
}
(The reason for letting the entries point into the "middle" of a series of blocks instead of to the beginning is that this reduces the average number of iterations of these two loops by half compared to a single loop that always goes forward. I think I aim for about one "mainidx" per about 16 blocks, so max about 8 blocks backward or forward.)
Finally, we find the address of our block:
Code: Select all
uint32 *ptr = (uint32 *)(d->data + (block << d->blocksize));
The variable "litidx" is the (sub)index into this block of the value we need.
(I just noticed I have been copying and pasting from my own code instead of that of Stockfish, which also deals with endianness.)
Now comes Huffman decoding. We decode symbol by symbol until we are at the symbol we need. After each symbol we subtract from litidx the number of values the symbol represents, which is d->symlen[sym] + 1 (so at least 1 and at most 256). We use a
canonical Huffman code. The first part of the loop decodes the next symbol:
Code: Select all
int l = m;
while (code < base[l]) l++;
sym = offset[l];
if (!LittleEndian)
sym = ((sym & 0xff) << 8) | (sym >> 8);
sym += static_cast<int>((code - base[l]) >> (64 - l));
The variable l is the number of bits of the symbol. The canonical code is ordered such that longer symbols (in terms of the number of bits of their Huffman code) come first. So if base[l] <= code < base[l - 1] where "code" contains the next 64 bits of the compressed data, then the first l (most significant) bits of "code" encode the next symbol.
Then the length (in values) of the symbol is subtracted and, if necessary, the next l bits are fetched. This could run into the next block, but that is OK. We cannot run beyond the end of the mmap()ed file, because there is still a 16-byte checksum at the end.
Code: Select all
if (litidx < (int)symlen[sym] + 1) break;
litidx -= (int)symlen[sym] + 1;
code <<= l;
bitcnt += l;
if (bitcnt >= 32) {
bitcnt -= 32;
uint32 tmp = *ptr++;
if (LittleEndian)
tmp = BSWAP32(tmp);
code |= ((uint64)tmp) << bitcnt;
}
When we have the right symbol, we expand it into two symbols, choose the left or right symbol of the pair as appropriate (the one that contains the value we are looking for) and keep expanding until we are at a symbol of length 1 (symlen[sym] +1 == 1), which is the value we need:
Code: Select all
ubyte *sympat = d->sympat;
while (symlen[sym] != 0) {
ubyte* w = sympat + (3 * sym);
int s1 = ((w[1] & 0xf) << 8) | w[0];
if (litidx < (int)symlen[s1] + 1)
sym = s1;
else {
litidx -= (int)symlen[s1] + 1;
sym = (w[2] << 4) | (w[1] >> 4);
}
}
Well, it's not yet the value we need. There is one last mapping step:
Each sympat[] entry has 3 bytes or rather 2 x 12 bits. The first 12 bits is the left-hand symbol of the pair, the second 12 bits is the right-hand symbol. If the symbol has length 1, then it corresponds to a WDL or DTZ value which is held in the first byte of sympat.