A different tablebase encoding format

sje · Post by **sje** » Sun Aug 10, 2014 6:50 pm

Recently I dug out my tablebase generator program written back in 1994; it was my last C language coding effort and it reminds me why I was wise not to use C anymore.

It also showed a couple of less than optimal design decisions which I'm considering correcting in an all new version.

The first of these decisions was the choice of indexing scheme. I picked the one which may have been the simplest, but for only a modest increase in complexity a significant savings (13+%) in space may be achieved. Back then I did not know about Ken Thompson's scheme (indexing by king squares); had I known, I would have used it instead.

The second decision was to delay the inclusion of en passant captures, thinking that I'd just add it later. That particular "later" never came and that's why I never got correct tablebase files for all the KxPKP classes.

The third bad decision was to force all score values to be mapped into a single byte. With this, it's impossible to handle many of the six man classes. Likewise, forcing the index into a four byte integer precluded generating any six man class.

Bad idea number four was to require the intermediate generation file format be the same as the file result format. This led to even more space inefficiencies.

The last poor design decision was to have flat files for output; a much better idea is to prefix the score data with some metadata which describes the indexing, score ranges, and bit lengths of values.

So now I'm thinking about a re-write. Back in the Old Days I was mostly stuck with a Mac Plus with 4 MB RAM and a slow 20 MB drive; things have certainly changed since then.

--------

The king squares indexing uses a 64x64 look-up table indexed by king squares to select a segment of contiguous score data along with a reflection selector bit vector.

For an endgame without pawns, the primary king is mapped into the a1-d1-d4 triangle using from zero to three different reflections to select one of 564 segments.

For an endgame with at least one pawn, the primary king is mapped into the a1-d1-d8-a8 rectangle using at most one reflection to select one of 1,806 segments.

Within each segment, the score entry of interest is selected using the locations of the non-king men. For N non-pawn men, there are N^64 entries. If pawns are included, then things are a little different as a pawn is restricted to one of 48 (not 64) squares; a pawn on the first rank is considered to actually be on its fourth rank after a double advance with at least one legal en passant reply.

Code: Select all

Triangle map &#40;no pawns&#41;&#58;
      1 * 60 =  60
      3 * 58 = 174
      6 * 55 = 330
         sum = 564

      564 segments
man count / score count
        3 / 36,096
        4 / 2,310,144
        5 / 147,849,216
        6 / 8,462,349,824


Flank map &#40;has at least one pawn&#41;&#58;
  Permitted&#58;
     2 * 60 =  120
    12 * 58 =  696
    18 * 55 =  990
        sum = 1806

    1806 segments
man count / score count
        3 / 115,584
        4 / 7,397,376
        5 / 473,432,054
        6 / 30,299,652,096

syzygy · Post by **syzygy** » Mon Aug 11, 2014 2:57 am

sje wrote:The first of these decisions was the choice of indexing scheme. I picked the one which may have been the simplest, but for only a modest increase in complexity a significant savings (13+%) in space may be achieved. Back then I did not know about Ken Thompson's scheme (indexing by king squares); had I known, I would have used it instead.

Even more savings are possible by not including positions with pawns on the first or eight rank into the index range.

On the other hand, most of this overhead disappears with good compression. The change from 10x64 to 462 king positions leads to an improvement even with compression, as it gets rid of most duplicate positions (with the leading king on a1-b2-c3-d4 and the other king either above or below the diagonal).

The second decision was to delay the inclusion of en passant captures, thinking that I'd just add it later. That particular "later" never came and that's why I never got correct tablebase files for all the KxPKP classes.

Imho not including en passant positions into the index range is a good (or at least not a bad) decision. But the generator should take ep captures into account.

For an endgame with at least one pawn, the primary king is mapped into the a1-d1-d8-a8 rectangle using at most one reflection to select one of 1,806 segments.

Focusing on the kings seems to make it difficult to generate tables pawn file by pawn file (or even pawn slice by pawn slice).

sje · Post by **sje** » Mon Aug 11, 2014 8:42 am

What compression gives with storage efficiency, it takes away with greater access times and greater complexity.

For en passant, there has to be some way of incorporating the information into the score selector index. Mapping the target pawn to its first rank is a simple way of doing this.

As I recall, Thompson used king pair indexing even with pawn endgames and these were constructed using partial tables with the leading pawn fixed to a given rank.

Angrim · Post by **Angrim** » Mon Aug 11, 2014 9:01 am

Compression is really not optional though, so it just makes sense to take it into account when designing the other parts. Also, since it allows more of the tables to fit into memory at once, it actually reduces average access times.

syzygy · Post by **syzygy** » Mon Aug 11, 2014 8:38 pm

sje wrote:What compression gives with storage efficiency, it takes away with greater access times and greater complexity.

I fully agree with Ben on this point.

For en passant, there has to be some way of incorporating the information into the score selector index.

Of course there is, but I don't see any real gain from doing it.

Mapping the target pawn to its first rank is a simple way of doing this.

Yes, but it will also increase the index range by 1/6th (7 ranks instead of 6 ranks for pawns) or 1/3th (8 ranks). If you forgo compression that seems rather significant. Even with compression it has a cost.

As I recall, Thompson used king pair indexing even with pawn endgames and these were constructed using partial tables with the leading pawn fixed to a given rank.

I thought Thompson only did 5-piece endings with at most a single pawn and no 6-piece endings with pawns. The 5-piece endings were generated rank by rank. The resulting 6 data files were then rearranged, by file, into 4 data files. His paper states "four rank-oriented files" but that seems to be a mistake. It's not clear to me what index function he used during generation, as 83,886,080 = 5 x 64^4. The "117 megabyte" number he mentions for each of the four final files is likely 7 x 64^4, which I also find a bit strange. I would expect those numbers to be 4 x 64^4 and 6 x 64^4 (and 24 x 64^4 bytes in total per side-to-move).

The 1806 number seems to come from Nalimov and/or Heinz. See e.g. this paper. Table 1 suggests that Thompson did in fact generate 5-piece endings with 2 and more pawns, but whether he really did is still not clear to me.

Of course it is possible to generate file by file (or rank by rank or slice by slice) and convert into an 1806-scheme. But splitting up by file or rank seems to be more advantageous from the point of view of compression and access locality.

bob · Post by **bob** » Mon Aug 11, 2014 10:20 pm

sje wrote:What compression gives with storage efficiency, it takes away with greater access times and greater complexity.

For en passant, there has to be some way of incorporating the information into the score selector index. Mapping the target pawn to its first rank is a simple way of doing this.

As I recall, Thompson used king pair indexing even with pawn endgames and these were constructed using partial tables with the leading pawn fixed to a given rank.

Note that compression is not always harmful on accesses. When Eugene and I did all the testing with his code way back, we discovered that storing them compressed has some useful effects. First is reduced I/O since there is less to read or write, maybe 10x less. Eugene used "blocks" and compressed each block of a file independently so that you didn't have to start decompression at byte 0 to access byte N. And then you can cache the compressed data giving you an effectively larger cache. Note that we tested on very fast SCSI drives, but not on SSDs which were not available back then in any form that was affordable.

sje · Post by **sje** » Tue Aug 12, 2014 11:52 am

bob wrote:Note that compression is not always harmful on accesses. When Eugene and I did all the testing with his code way back, we discovered that storing them compressed has some useful effects. First is reduced I/O since there is less to read or write, maybe 10x less. Eugene used "blocks" and compressed each block of a file independently so that you didn't have to start decompression at byte 0 to access byte N. And then you can cache the compressed data giving you an effectively larger cache. Note that we tested on very fast SCSI drives, but not on SSDs which were not available back then in any form that was affordable.

An issue here is that neither the target storage hardware nor the expected access frequency are known ahead of time. So either there has to be some very good guesses, or some kind of end user configuration capability must be present.

If no SSDs are present and the probe frequency is less than 10 Hz, then compression becomes almost always desirable.

But what if a big SSD is in use?

What if the probe rate goes to 10 KHz or higher?

What if the endgame class disallows good compression because of score distribution?

What if not all of the tablebase files reside on the same class of storage device? What if some have high latency (e.g., over the net), but others don't?

What if the decompress can be done by a separate machine on the LAN?

What if different compression algorithms should be used depending on the target environment?

You can see that this is a multiple dimension optimization problem, and there's no one-size-fits-all.

--------

The 1,806 segment count is for endgames with at least one pawn which allow no diagonal reflection.

My 564 segment count does not account for a diagonal reflection for non-pawn endgames to place the secondary king inside the a1-h1-h8 triangle. Hey, I'm still in the very early design stage.

--------

Embedding en passant into the index can be done in more than one way. Shoving the pawn to its first rank is maybe the simplest, although it does increase storage requirements (for the passive side only) by a factor of 1/6 per passive pawn. I can think of other ways, but they are kind of kludgy for my tastes.

--------

I see that Apple is now using on-the-fly RAM compression/decompression to help make up for their stingy standard RAM installations and their bloated software. This appears unattractive to me for any time critical applications.

syzygy · Post by **syzygy** » Tue Aug 12, 2014 2:26 pm

sje wrote:But what if a big SSD is in use?

If your SSD is big enough to store 6-piece tables uncompressed, why not use it to store 7-piece compressed.

What if the probe rate goes to 10 KHz or higher?

Unless everything fits in RAM uncompressed, a good compression scheme is faster than no compression, even if the uncompressed tables fit on SSD.

You can see that this is a multiple dimension optimization problem, and there's no one-size-fits-all.

There is no single solution that is optimal for all hardware combinations, but using no compression at all seems to be about the worst one could do.

sje · Post by **sje** » Tue Aug 12, 2014 7:19 pm

syzygy wrote:There is no single solution that is optimal for all hardware combinations, but using no compression at all seems to be about the worst one could do.

I disagree. If the request frequency is high enough, at some point no compression will beat any compression with respect to total time usage.

mvk · Post by **mvk** » Tue Aug 12, 2014 9:16 pm

sje wrote:
syzygy wrote:There is no single solution that is optimal for all hardware combinations, but using no compression at all seems to be about the worst one could do.
I disagree. If the request frequency is high enough, at some point no compression will beat any compression with respect to total time usage.

What is your starting point? Have you looked at Ronald's recent achievements in this area? Do you imply that your scheme will offer practical advantages over Ronald's?

A different tablebase encoding format

A different tablebase encoding format

Re: A different tablebase encoding format

Re: A different tablebase encoding format

Re: A different tablebase encoding format

Re: A different tablebase encoding format

Re: A different tablebase encoding format

Re: A different tablebase encoding format

Re: A different tablebase encoding format

Re: A different tablebase encoding format

Re: A different tablebase encoding format