New 6-piece tablebases

syzygy · Post by **syzygy** » Sat Apr 13, 2013 5:54 pm

stevemulligan wrote:Nice work Ronald! How can they be so much smaller than the compressed Gaviota tables?

There are several reasons:

- Distance to zeroing move (DTZ) instead of distance to mate (DTM).

- DTZ and WDL (win/draw/loss) information is stored in separate tables without duplicating the WDL information in the DTZ table. This allows me to store for DTZ only the "smaller half" of white-to-move / black-to-move information, because DTZ is only probed at the root.

- Tricks in encoding WDL information. Positions with a winning capture are stored as "don't care". Something similar (but more complicated) for positions with a drawing capture. Of course illegal positions (side to move can capture the king) are set to don't care as well. The compression algorithm sets these values to what compresses best. The cost is a round of captures in the probing code, which I consider acceptable because these probe into much smaller tables which may be assumed to be cached in RAM.

- Tricks in encoding DTZ information. As already mentioned WDL information is removed, so draws are "don't care" and only the distance to win or loss is stored. Positions with a winning capture or pawn move are set to don't care, as this can be tested easily when probing.

- My compression algorithm seems to do quite well. It's not a general purpose compressor, so it is hard to compare to e.g. LZMA used by Gaviota, but LZMA and other compressors that compress "online" (i.e. read input and produce output in a single pass) suffer from small block sizes. My compressor constructs a (small) dictionary on the basis of the whole table which results in a sort of "random access" compression/decompression with very small blocks (at most 64 bytes for WDL, at most 1024 bytes for DTZ). The only efficiency lost by making blocks smaller is a few bits lost at the end of each block (because the next symbol does not fit anymore).

- The order of piece types in the indexing function can have a large impact on compression ratio. My indexing function therefore allows for many permutations of piece types. Before compressing a table a search is done for a good permutation (by testing on a subset of the data).

The size of my tables is slightly increased by the inclusion of information on the 50-move rule (without losing the ability to play out wins affected by the 50-move rule), but my compression deals with this efficiently and it adds something that I think was not available before.

Size is one thing, probing overhead another. Throwing away the "bigger half" of DTZ of course increases probing overhead of DTZ, but this is only done at the root so I find this completely acceptable. The trick with winning/drawing captures has a cost when probing WDL, but it should not lead to more I/O and it reduces the size quite a bit, which means that more compressed data can be cached and I/O is reduced. Overall I think it is a win. I don't cache decompressed data but instead make use of the small block size and the fact that it is faster to extract a single value from a block than to decompress the whole block.

stevemulligan · Post by **stevemulligan** » Sat Apr 13, 2013 7:18 pm

I'm not sure I understand DTZ? What is a zeroing move?

Don't you need the DTM to determine which is the best? Like if there are 2 moves where you can get checkmate but one of them is in 35 moves and the other is in 10? Is DTM derived from DTZ somehow?

michiguel · Post by **michiguel** » Sat Apr 13, 2013 7:26 pm

syzygy wrote:
stevemulligan wrote:Nice work Ronald! How can they be so much smaller than the compressed Gaviota tables?
There are several reasons:

- Distance to zeroing move (DTZ) instead of distance to mate (DTM).

- DTZ and WDL (win/draw/loss) information is stored in separate tables without duplicating the WDL information in the DTZ table. This allows me to store for DTZ only the "smaller half" of white-to-move / black-to-move information, because DTZ is only probed at the root.

- Tricks in encoding WDL information. Positions with a winning capture are stored as "don't care". Something similar (but more complicated) for positions with a drawing capture. Of course illegal positions (side to move can capture the king) are set to don't care as well. The compression algorithm sets these values to what compresses best. The cost is a round of captures in the probing code, which I consider acceptable because these probe into much smaller tables which may be assumed to be cached in RAM.

- Tricks in encoding DTZ information. As already mentioned WDL information is removed, so draws are "don't care" and only the distance to win or loss is stored. Positions with a winning capture or pawn move are set to don't care, as this can be tested easily when probing.

- My compression algorithm seems to do quite well. It's not a general purpose compressor, so it is hard to compare to e.g. LZMA used by Gaviota, but LZMA and other compressors that compress "online" (i.e. read input and produce output in a single pass) suffer from small block sizes. My compressor constructs a (small) dictionary on the basis of the whole table which results in a sort of "random access" compression/decompression with very small blocks (at most 64 bytes for WDL, at most 1024 bytes for DTZ). The only efficiency lost by making blocks smaller is a few bits lost at the end of each block (because the next symbol does not fit anymore).

- The order of piece types in the indexing function can have a large impact on compression ratio. My indexing function therefore allows for many permutations of piece types. Before compressing a table a search is done for a good permutation (by testing on a subset of the data).

The size of my tables is slightly increased by the inclusion of information on the 50-move rule (without losing the ability to play out wins affected by the 50-move rule), but my compression deals with this efficiently and it adds something that I think was not available before.

Size is one thing, probing overhead another. Throwing away the "bigger half" of DTZ of course increases probing overhead of DTZ, but this is only done at the root so I find this completely acceptable. The trick with winning/drawing captures has a cost when probing WDL, but it should not lead to more I/O and it reduces the size quite a bit, which means that more compressed data can be cached and I/O is reduced. Overall I think it is a win. I don't cache decompressed data but instead make use of the small block size and the fact that it is faster to extract a single value from a block than to decompress the whole block.

Congratulations!

I am busy now to take a close look a this, but sounds like a very interesting approach. I know that you must invested a lot of effort in this.

Miguel

michiguel · Post by **michiguel** » Sat Apr 13, 2013 7:38 pm

stevemulligan wrote:I'm not sure I understand DTZ? What is a zeroing move?

Don't you need the DTM to determine which is the best? Like if there are 2 moves where you can get checkmate but one of them is in 35 moves and the other is in 10? Is DTM derived from DTZ somehow?

It is the distance to a move that resets the "50 move clock". That is, distance to the next capture, promotion, or pawn move that wins. You have to choose the lowest DTZ with the highest 50 move clock (o lowest material, more advanced pawns) and you will always make progress.

Miguel

syzygy · Post by **syzygy** » Sat Apr 13, 2013 8:28 pm

stevemulligan wrote:I'm not sure I understand DTZ? What is a zeroing move?

Captures and pawn moves (including promotions) are zeroing moves, because they reset ("zero") the 50-move counter.

Don't you need the DTM to determine which is the best? Like if there are 2 moves where you can get checkmate but one of them is in 35 moves and the other is in 10? Is DTM derived from DTZ somehow?

If you want the fastest checkmate, then you need DTM. But you don't need DTM to make sure that you can always play out a win. All you need is the guarantee that you make progress towards the win. DTZ gives this guarantee: with every non-zeroing move it decreases, and zeroing moves either leave less pieces on the board or more advanced pawns.

Actual play along DTZ-optimal lines can be quite ugly, but this can be handled by letting the engine, once the game has reached a tablebase position, do short searches on the subset of winning moves and switch to DTZ-optimal play once the engine stops making progress (e.g. a position is repeated or the number of moves left before the 50-move rule kicks in starts to approach the DTZ value for the position).

One problem with DTM is that a winning position might be drawn due to the 50-move rule. My tables guarantee that this cannot happen. Using my tables an engine will win positions that will be drawn with DTM-optimal play.

hgm · Post by **hgm** » Sat Apr 13, 2013 10:40 pm

The main ugliness of DTZ play is that it starts with sacrificing all material you don't need, to zero the 50-move counter as fast as possible. E.g. if you have KQRK, it will sacrifice Q (because it is usually easier to force the bare King to take the Q than the R: you just drive it towards the edge with contact checks until he has no choice but to take it). Then it starts io optimally play KRK (because there are no ways left to zero the 50-move counter without drawing).

This could be prevented by only consider zeroing because of captures of the winning side, and not by captures of the losing side (which usually do not represent 'progress').

stevemulligan · Post by **stevemulligan** » Sat Apr 13, 2013 11:12 pm

If you want the fastest checkmate, then you need DTM. But you don't need DTM to make sure that you can always play out a win.

Very interesting. The size certainly makes them very attractive.

Angrim · Post by **Angrim** » Sun Apr 14, 2013 9:43 pm

syzygy wrote: - My compression algorithm seems to do quite well. It's not a general purpose compressor, so it is hard to compare to e.g. LZMA used by Gaviota, but LZMA and other compressors that compress "online" (i.e. read input and produce output in a single pass) suffer from small block sizes. My compressor constructs a (small) dictionary on the basis of the whole table which results in a sort of "random access" compression/decompression with very small blocks (at most 64 bytes for WDL, at most 1024 bytes for DTZ). The only efficiency lost by making blocks smaller is a few bits lost at the end of each block (because the next symbol does not fit anymore).

This is interesting, how do you find the compressed blocks in the file without having a huge index? This problem is the main reason that I use such large blocks in my compression method, since each block requires 4 bytes of index to tell where in the file the compressed block starts, and I keep the index in ram, so every time I halve the block size I double my ram requirements. With a 16:1 compression ratio, 64 byte blocks would result in the index being as large as the compressed file using my method.

syzygy · Post by **syzygy** » Sun Apr 14, 2013 10:45 pm

Angrim wrote:This is interesting, how do you find the compressed blocks in the file without having a huge index? This problem is the main reason that I use such large blocks in my compression method, since each block requires 4 bytes of index to tell where in the file the compressed block starts, and I keep the index in ram, so every time I halve the block size I double my ram requirements. With a 16:1 compression ratio, 64 byte blocks would result in the index being as large as the compressed file using my method.

The 64 bytes are compressed bytes, so with a 16:1 compression ratio they would represent 1024 positions. I think the 8k blocks of Nalimov are 8k positions (but I could be wrong here), so my 64 bytes may contain more positions than Nalimov's blocks if the compression ratio is high.

I use 2 bytes per block to count the number of positions per block (1-65536). For a particular m, I store for all position indices 2^(m-1) + k * 2^m (k = 0, 1, 2, ...) the index of the corresponding block (4 bytes, so max 2^32 blocks) and the position within that block (2 bytes) corresponding to the position index. I choose m so that 2^m covers about 32 blocks, so I have to add/subtract on average 8 "block counts" to find the block and position within the block for each position index.

I don't preload the indices. The pages that have the index will quickly be swapped into RAM (at least that's the idea

).

The decompression routine does not decompress all position values, but only decodes the huffman symbols. It knows how many positions are represented by each symbol, so can skip the symbols until it reaches the symbol it needs.

syzygy · Post by **syzygy** » Tue Apr 16, 2013 1:14 am

The generator now 'only' requires a system with 16 GB of RAM to generate all 6-piece tables. By using the --disk / -d option the generator does not allocate extra memory after generation for the compression phase. With 24 GB of RAM this option does not have to be used.

With some minor changes it should be possible to generate 7-piece tables on a system with 1 TB of shared RAM. If anyone wants to give that a try let me know.

New 6-piece tablebases

Re: New 6-piece tablebases

Re: New 6-piece tablebases

Re: New 6-piece tablebases

Re: New 6-piece tablebases

Re: New 6-piece tablebases

Re: New 6-piece tablebases

Re: New 6-piece tablebases

Re: New 6-piece tablebases

Re: New 6-piece tablebases

Re: New 6-piece tablebases