disk-based 8-piece TB generator

Discussion of chess software programming and technical issues.

Moderator: Ras

syzygy
Posts: 5999
Joined: Tue Feb 28, 2012 11:56 pm

disk-based 8-piece TB generator

Post by syzygy »

viewtopic.php?t=85908&start=60
Nordlandia wrote:How feasible is undertaking 8-man tables or partly about let's say 2030 around. Is the equipment up to snuff, or do we need something like Lomonosov II SuperComputer contraption or deemed obsolete by 2030 to commence 8-man EGTBs generation.
I don't see this happening before 2030 is over, but at least I have now written a generator that in principle can do this on "normal" hardware (and today I have added a README):
https://github.com/syzygy1/tb8gen

The generator not only generates the tablebase but also compresses it, essentially using the same method as used for the regular Syzygy TBs. However, the 7- and 8-piece tablebase files are internally split into (many) more separately compressed tables, which means the existing Syzygy probing code cannot read those files. (On the plus side, this new internal layout gives an improved compression ratio.) New probing code is available:
https://github.com/syzygy1/probetool/tree/new_format

Hardware requirements are (1) lots of disk space, and (2) 128 GB of RAM for the largest pawnless 8-piece tablebases (those without any identical pieces of the same color; there are not so many of them among the 8-piece tablebases) and 64 GB or less once there are identical pieces of the same color. For tables with one pawn, 32 GB RAM should be enough.

The generator is Linux-only for now and will need an x86-64 CPU with fast pext/pdep instructions (so Zen3+ or Intel).

The amount of disk space needed during generation is much less than I had originally feared (thanks to regular data compression), but for 8-piece tables it will probably still be in the many terabytes.

The very largest compressed 8-piece TB files will probably be around 5 TB in size, the complete set (WDL+DTZ) somewhere between 1.5 and 2 PB. Generation of a single pawnless table will probably take multiple weeks if not months, and pawnful tables are much worse. So realistically this is not going to happen anytime soon, if ever.

While the generator is running, it can be interrupted at any time using CTRL-C (or simply killing it). Restarting the generator will then pick up where it left off.

Things still to be improved:
- The threading code should be tuned. It now splits the K-slices in RAM into too few work blocks, and it splits the data written/read from/to disk in blocks that are too small.
- Some performance improvement should be possible by overlapping disk I/O with computation, in particular when calculating predecessors and verifying successors.
- For very sparse K-slices written to disk, the generator should compress the indices of the 1-bits instead of the bits themselves (with various speed benefits).
- I still have to write a RAM-based generator for 8-piece TBs with 2+ pawns.
syzygy
Posts: 5999
Joined: Tue Feb 28, 2012 11:56 pm

Re: disk-based 8-piece TB generator

Post by syzygy »

I have now generated two 8-piece tablebases.

The first covers the somewhat unlikely case of KNNNNNNvK.
Longest win for white according to the DTZ metric is 21 ply:
[pgn][Event ""]
[FEN "N1N5/1N6/N7/8/N7/8/N7/1K1k4 b - - 0 1"]
1...Ke1 2.Na8b6 Kd1 3.Nc4 Ke1 4.Nd2 Ke2 5.Nf3 Kd1 6.Ne1 Kd2 7.Nbc5 Ke2 8.N4c3+
Kf1 9.Ne2 Kf2 10.Ka1 Kf1 11.Ncd3 Kxe2 12.Nc3+ Kf1 13.Ne2 Kxe2 14.Kb1 Kf1 15.Nf3
Ke2 16.Nd2 Kd1 17.Nb6 Ke2 18.Nd5 Kd1 19.N5f4 Kxd2 20.Nac5 Ke3 21.Nfe6 Kd2
22.Ne5 Kd1 23.Ne4 Ke2 24.Kc1 Ke3 25.Ng3 Kf2 26.Nf5 Ke1 27.Nf4 Kf1 28.Ned3 Kg1
29.Ng3 Kh2 30.Nge2 Kh1 31.Ne1 Kh2 32.Nf3+ Kh1 33.Ng3#[/pgn]
So in this sequence of moves, black's aim is to maximize (and white's to minimize) the number of moves to either mate or a capture.

The WDL file is 47,975,056 bytes = 45.75 MB. The DTZ file is 1,761,580,304 bytes = 1.64 GB.
This includes a 6!=720-fold reduction in size thanks to the 6 white knights (but with 8 pieces, any pawnless KX..vK table will have at least 2 pairs of identical white pieces (4x reduction) or a triple of of identical white pieces (6x reduction)).

WDL is much smaller than DTZ in this case because the material is very unbalanced, which means win/draw/loss is very predictable (-> WDL compresses better) and almost all positions have a DTZ value, only the draws being "don't care" (-> DTZ compresses worse).

The second is the more interesting (even though unlikely to ever happen in a real game) KNNNNvKRR.
Longest win for white is 48 ply:
[pgn][Event ""]
[FEN "r5N1/8/4N2k/5N2/4N3/7r/8/2K5 b - - 0 1"]
1...Kh7 2.Nge7 Kh8 3.Ng6+ Kh7 4.Ne5 Kh8 5.Nf7+ Kg8 6.Nfg5 Rh1+ 7.Kb2 Kh8 8.Nf7+
Kg8 9.Ne5 Kh8 10.Ng6+ Kg8 11.Nge7+ Kf7 12.N6g5+ Kf8 13.Ng6+ Ke8 14.Nf6+ Kd8
15.Ne6+ Kc8 16.Nge7+ Kb8 17.Nd7+ Ka7 18.Nd6 Rh2+ 19.Kc1 Rh1+ 20.Kd2 Rh2+ 21.Ke3
Rc2 22.Kd3 Rc1 23.Kd2 Ra1 24.Nc6+ Ka6 25.Nec5#[/pgn]
Longest win for black is 63 ply:
[pgn][Event ""]
[FEN "N7/8/3N4/8/6r1/NN1r4/3k4/1K6 b - - 0 1"]
1...Kc3 2.Na1 Rb4+ 3.Kc1 Rh3 4.Kd1 Rd4+ 5.Ke1 Re3+ 6.Kf2 Kd3 7.Nb3 Rf4+ 8.Kg2
Ke2 9.Nc1+ Kd1 10.Ndc4 Re7 11.Nb2+ Kd2 12.Nb1+ Kc2 13.Na3+ Kc3 14.Nb1+ Kd4
15.Nb3+ Ke3 16.Nd1+ Ke2 17.Ndc3+ Ke1 18.Nc5 Rf2+ 19.Kh3 Rh7+ 20.Kg3 Rg7+ 21.Kh3
Rfg2 22.N3e4 Rg1 23.Kh2 R7g2+ 24.Kh3 Rg8 25.Kh2 R1g2+ 26.Kh3 R2g6 27.Nd3+ Kd1
28.Nbc3+ Kc2 29.Ne1+ Kb2 30.Nd3+ Ka1 31.Nf4 Rh6+ 32.Nh5 Rhxh5#[/pgn]
The statistics of the table are identical to those generated by Marc Bourzutschky (except that I keep statistics per ply and he per move).

For KNNNNvKRR, the WDL file is 20,363,170,448 bytes = 18.96 GB and the DTZ file is 5,118,495,312 = 4.77 GB.
This includes a 24x2 = 48-fold size reduction resulting from the NNNN and RR symmetries.

DTZ is relatively small because there are many draws:

Code: Select all

wtm:
59408961325 positions are wins.
221896147707 (97316801998) positions are draws.
13372865143 positions are losses.

btm:
74472858997 positions are wins.
193049671544 (142464119987) positions are draws.
4397957478 positions are losses.
The "(97316801998)" and "(142464119987)" are draws by an immediate capture. These can be encoded by the compression algorithm as a loss in the WDL table. In general that is not a win here because draws including capture draws are more dominant over losses (43.9x), than capture-draws plus losses over non-capture-draws (2.9x). (Still the compression algorithm may have encoded some capture-draws as losses where this happened to create a long run of losses.)

Considering the size of KNNNNvKRR.rtbw, a somewhat balanced pawnless table without any identical pieces of the same color could be expected to take up about 48 x 20 = 960 GB. This is only a very rough estimate because all tables are different.

Total generation time for KNNNNvKRR was about 5 hours 40 minutes, where I used a RAM disk to store all intermediary files. This fit in 256 GB, though with very little room to spare. Larger tables will clearly need disk I/O, which will slow things down, but probably not to a crawl even if a HDD is used. And an SSD will probably survive a fair number of tablebase generations (contrary to my initial expectations).

While the generation was running, I realised that the temporary files holding the "potential losses" can be dropped more quickly if the pass that calculates btm losses-in-(n+1) from wtm wins-in-n alternates between finding potential losses and verifying potential losses for every of the 10 positions of the white king. This reduces the maximum number of potential loss files on disk from 462 to 58. With just 58 it should be possible to hold them all (compressed) in RAM, at least for the majority of iterations.

Anothing thing that became clear is that there should be quite a bit to gain from tuning the compression of intermediate files (and adding a new compression method for very sparse bitmaps), in particular during later iterations. And earlier iterations could perhaps benefit from a forward pass instead of a retro pass.

Roughly half the generation time went into the final compression phase. I expect this phase to scale well for larger tables (and this phase only involves very little I/O). Still it would be nice if I could speed this up.

If I can speed up the later iterations by working on the compression of intermediate files, then 6 hours * 50 = 12.5 days per largest pawnless table would seem reasonable (on a system like mine, 9950x3d, 256 GB). Perhaps some days have to be added if an HDD is used. And perhaps improvements can still shave quite a bit of this time.

So this is already looking quite doable, if we ignore the major problem of storing (and distributing) all the generated files.

For the largest pawnful tables (with 1 pawn), the generation time might be about a month.

Tables with 2+ pawns can be generated in RAM (per PP-slice or PvP-slice), which will be faster. But the compression phase will take about as long as for tables with 1 pawn. (Of course tables with two pawns of the same color start are about half the size, and pawns in general have fewer squares to stand on, which also helps.)