Tablebase suggestion

Rein Halbersma · Post by **Rein Halbersma** » Sat Feb 21, 2026 11:56 pm

syzygy wrote: ↑Fri Jan 30, 2026 4:13 am
syzygy wrote: ↑Fri Jan 30, 2026 2:59 am
Kirill Kryukov wrote: ↑Fri Jan 30, 2026 1:43 am
syzygy wrote: ↑Fri Jan 23, 2026 10:42 amMaybe you could also try to calculate the number of positions in a single 8-piece tablebase (say KRBPvKRNP) and calculate the time you need to evaluate each position for say 10ms. Let's have 1000 CPUs working on them in parallel. How long does it take?
Hi. This caught my attention. The KRBPvKRNP endgame has precisely 64,513,180,757,427 unique legal positions, which makes it only 59-th largest 8-piece endgame. The largest one is KRBNPvKBN with 87,076,702,767,652 unique legal positions. Of course this is the lower bound, the number of positions that will be indexed with a particular indexing method will be still larger. Hope it helps a little.
Thanks!
So searching each position single-core for 10ms and running 1000 cpu cores in parallel will take 20.44 years just for this table. Not very practical
Turns out Urban Koistinen already suggested how, at least in theory, this could be done somewhat efficiently (I'm linking to my explanation of his suggestion):
viewtopic.php?p=952933#p952933

So it's not totally infeasible. But for probing you still need to do the same expensive search on the position being probed. The point of doing a tablebase probe, at least of WDL tables, is to avoid searching the position.
Also, an evaluation score returned by a 10-ply search or so isn't the most ideal type of information for improving compression.

Btw, doing N forward iterations over an entire database in order to efficiently do an N-ply search over each position, is the fundamental idea of dynamic programming. It’s called value iteration.

Rein Halbersma · Post by **Rein Halbersma** » Sat Feb 28, 2026 2:35 pm

syzygy wrote: ↑Wed Feb 04, 2026 5:45 pm When we are in g2, we have loaded 19 slices (a1-h1,a2-h2,f3,g3,h3).
[d]8/8/8/8/8/5kkk/kkkkkkkk/kkkkkkkk b - - 0 1

[...]

19 K-slices take up 97.9 GB, so 128 GB of RAM would be comfortable.

I'm trying to understand this thread. So far I understand the 462 wK/bK slicing, and also that unmoving a bK requires 9 slices in RAM at the minimum. But where does the number 19 come from? For the g2 diagram, what benefits bring the 10 slices in the a1-e1-a2-e2 rectangle being in RAM?

Rein Halbersma · Post by **Rein Halbersma** » Sat Feb 28, 2026 2:49 pm

syzygy wrote: ↑Mon Feb 16, 2026 4:02 am Generation on my PC took 15m9s using 16 cores/32 threads and with the files being saved on RAM disk. Multithreading should be more effective with 7-piece and 8-piece tables because the work units will be much bigger (one K-slice of KQRvKQR is just 1.6MB). But the amount of sequential disk I/O will be very substantial...

So this is a Wu-Beal 1-bit per position algorithm adopted to your "staggered 2-ply iteration" scheme? How fast does your 1-byte per position scheme generate the same KQRvKQR table?

syzygy · Post by **syzygy** » Sat Feb 28, 2026 5:42 pm

Rein Halbersma wrote: ↑Sat Feb 28, 2026 2:49 pm
syzygy wrote: ↑Mon Feb 16, 2026 4:02 am Generation on my PC took 15m9s using 16 cores/32 threads and with the files being saved on RAM disk. Multithreading should be more effective with 7-piece and 8-piece tables because the work units will be much bigger (one K-slice of KQRvKQR is just 1.6MB). But the amount of sequential disk I/O will be very substantial...
So this is a Wu-Beal 1-bit per position algorithm adopted to your "staggered 2-ply iteration" scheme? How fast does your 1-byte per position scheme generate the same KQRvKQR table?

I'm just doing it one ply at a time now, with 1 bit per position.

My 1-byte per position generator (or rather 2-bytes per position, both 1 byte for wtm and 1 byte for btm) takes 1m15s.
In the meantime I have sped up the 1-bit per position generator to 8m30s, still with all files on RAM disk, mainly by speeding up the many "sparse" iterations.

For sparse iterations the generator seems to spend most of its time on compressing the "wins" bitmaps that keep track of all (illegal and) winning positions already found. It would probably make sense to not update those tables every iteration but to use the win-in-n bitmaps generated since the last update of "wins" as deltas.

Currently I am compressing everything with zstd. For sparse bitmaps it is probably better to use an RLE-based scheme that could be loaded without having to write a full table (i.e. a full 1-bit-per-position K-slice).

syzygy · Post by **syzygy** » Sat Feb 28, 2026 6:11 pm

Rein Halbersma wrote: ↑Sat Feb 28, 2026 2:35 pm
syzygy wrote: ↑Wed Feb 04, 2026 5:45 pm When we are in g2, we have loaded 19 slices (a1-h1,a2-h2,f3,g3,h3).
[d]8/8/8/8/8/5kkk/kkkkkkkk/kkkkkkkk b - - 0 1

[...]

19 K-slices take up 97.9 GB, so 128 GB of RAM would be comfortable.
I'm trying to understand this thread. So far I understand the 462 wK/bK slicing, and also that unmoving a bK requires 9 slices in RAM at the minimum. But where does the number 19 come from? For the g2 diagram, what benefits bring the 10 slices in the a1-e1-a2-e2 rectangle being in RAM?

The 19 comes from the maximum number of slices that have to be kept in memory before you can save slices to disk (or release them from memory if they were loaded from disk) because they are "finished". A slice is finished if all its "neighbouring" slices have been processed.

For example, consider computing the wtm predecessors of the btm loss-in-n positions.
In this case we are looking at moves and unmoves by white, so the bK stays on the same square.
So we first place the bK on b1 and let the wK move through the squares in normal order a1-h1, a2-h2,....,a8-h8.
The wK cannot be on a1,b1,c1, so we start with wK on d1.
If we unmove btm loss-in-n positions with bK,wK on b1,d1, we reach wtm positions with bK on b1 and wK on one of d1,d2,e1,e2. So we need 4 slices in RAM corresponding to (b1,d1), (b1,d2), (b1,e1), (b1,e2).
Now we move wK to e1 and we need to add RAM slices for (b1,f1) and (b1,f2).
And so on: as we move wK to f1,g1,h1,d2,e2, we create RAM slices for bK on b1 and wK on g1,g2,h1,h2,c3,d3,e3,f3.
Now notice that once we have finished unmoving from (b1,e2), we are done with the slice (b1,d1): it is not possible to reach this wtm slice by unmoving from a btm slice which has not already been processed. So we save the wtm slice corresponding to (b1,d1) to disk, and we can use the freed up RAM for another slice (like (b1,g3) since we will now place the wK on f2).

The "worst" case scenario is when the bK is not near the wK and the wK is not on ranks 1 and 8. In this worst case, 2*8+3=19 slices are enough:
8/8/KKKKK3/KKKKKKKK/2KKKKKK/8/8/1k6 w - - 0 1
Here we are processing (b1,d5). Once finished, (b1,c4) can be released, the wK moves to e5, and (b1,f6) is added to RAM.
Once (b1,h8) is done, every slice still in RAM can be saved to disk, the bK moves to c1, and wK traverses the board again.
The bK can be restricted to a1-d1-d4. If the bK is on the diagonal a1-d4, the wK only visits the a1-h1-h8 triangle.

The verification step (to verify a potential loss-in-n found by unmoving from a win-in-(n-1)) works in exactly the same way, except now we load K-slices into RAM from disk (with "wins" bitmaps) before processing instead of saving them to disk after processing.

With colours reversed, it is the same. Just fix the wK in one of the a1-d1-d4 squares and let the bK traverse the board.

Now it's just a matter of mapping all these slices to the same naming scheme.

syzygy · Post by **syzygy** » Sat Feb 28, 2026 6:41 pm

My work in progress:
https://github.com/syzygy1/tb8gen

The README is out of date. It still generates tens of thousands of files, but they are now organized in a hierarchical directory structure. If you try it out, I would suggest doing it in a RAM disk (and probably not for 7- or 8-piece endgames, yet, although I tried a few highly symmetric ones like KNNNvKBB).

The RTBPATH environment variable has to contain the path to .rtbw files.
The number of threads can be set with "-t 8".
The number of compression threads can be changed in the Makefile (currently 16).

I am currently working on merging the W/L bitmaps for a K-slice into one K-slice with 1 or 2 bytes per position and figuring out how I want to compress them into a final TB file.

For tables with 1 pawn a similar king-slice scheme can be used per pawn slice.
I am thinking of one K-slice per 2x2 square. So 16x16 (or rather 16x15) K-slices per pawn slice (perhaps just as a unit of to write to/load from disk, with the generation code still seeing 1x1 K-slices).

For tables with >=2 pawns, things get a bit annoying with this approach because DTZ forces per-pawn-slice generation.
However, now each pawn slice fits in roughly the same amount of RAM as we need for pawnless tables, so I can use a RAM-based generator.

syzygy · Post by **syzygy** » Sat Feb 28, 2026 6:42 pm

syzygy wrote: ↑Sat Feb 28, 2026 6:11 pmThe "worst" case scenario is when the bK is not near the wK and the wK is not on ranks 1 and 8. In this worst case, 2*8+3=19 slices are enough:
8/8/KKKKK3/KKKKKKKK/2KKKKKK/8/8/1k6 w - - 0 1
Here we are processing (b1,d5). Once finished, (b1,c4) can be released, the wK moves to e5, and (b1,f6) is added to RAM.
Once (b1,h8) is done, every slice still in RAM can be saved to disk, the bK moves to c1, and wK traverses the board again.

I forgot to add tags:
[d]8/8/KKKKK3/KKKKKKKK/2KKKKKK/8/8/1k6 w - - 0 1

syzygy · Post by **syzygy** » Sat Feb 28, 2026 8:46 pm

Rein Halbersma wrote: ↑Sat Feb 28, 2026 2:35 pmFor the g2 diagram, what benefits bring the 10 slices in the a1-e1-a2-e2 rectangle being in RAM?

So the benefit of keeping the a1-e1-a2-e2 rectangle in RAM is that you then don't need to save and re-load them when the bK gets to f2-a2 and a3-f3. (So later I realised there is no benefit in zig-zagging and you can as well let the bK go a1-h1, a2-h2, a3-h3, etc.)

So it is possible to have just 9 K-slice bitmaps simultaneously in RAM, but it would mean saving and reloading these bitmaps multiple times, which significantly increases I/O. Once you can keep 19 K-slice bitmaps in RAM, the amount of I/O needed seems to be identical to when you keep a bitmap for the full table (i.e. 462 K-slice) in RAM at once. That is a factor of 462/19=24.3 in RAM saved with no real downside.

There will be some more disk seeks because you need to load/save 462 smaller files instead of 1 big file, but this is negligible especially once you get to 7-piece and 8-piece tables. And an advantage of many smaller files is that stopping and restarting the generation process should be possible with much finer granulariy. The current code does not yet try to resume from where it left off, but this should be relatively easy to implement.

Rein Halbersma · Post by **Rein Halbersma** » Sat Feb 28, 2026 11:42 pm

syzygy wrote: ↑Sat Feb 28, 2026 6:42 pm
syzygy wrote: ↑Sat Feb 28, 2026 6:11 pmThe "worst" case scenario is when the bK is not near the wK and the wK is not on ranks 1 and 8. In this worst case, 2*8+3=19 slices are enough:
8/8/KKKKK3/KKKKKKKK/2KKKKKK/8/8/1k6 w - - 0 1
Here we are processing (b1,d5). Once finished, (b1,c4) can be released, the wK moves to e5, and (b1,f6) is added to RAM.
Once (b1,h8) is done, every slice still in RAM can be saved to disk, the bK moves to c1, and wK traverses the board again.
I forgot to add tags:
[d]8/8/KKKKK3/KKKKKKKK/2KKKKKK/8/8/1k6 w - - 0 1

Ah got it! You are essentially moving the 8-square king moves ring through a1-h8, dropping a completed SW slice as you add a new NE slice. Very nice. So the total loop is bK through the 10 square triangle times the wK loop over all legal squares. So 462 slices in total, with 19 in RAM, all sequentially loaded/saved from disk.

Rein Halbersma · Post by **Rein Halbersma** » Sun Mar 01, 2026 12:59 am

syzygy wrote: ↑Sat Feb 28, 2026 8:46 pm
So it is possible to have just 9 K-slice bitmaps simultaneously in RAM, but it would mean saving and reloading these bitmaps multiple times, which significantly increases I/O. Once you can keep 19 K-slice bitmaps in RAM, the amount of I/O needed seems to be identical to when you keep a bitmap for the full table (i.e. 462 K-slice) in RAM at once. That is a factor of 462/19=24.3 in RAM saved with no real downside.

There will be some more disk seeks because you need to load/save 462 smaller files instead of 1 big file, but this is negligible especially once you get to 7-piece and 8-piece tables. And an advantage of many smaller files is that stopping and restarting the generation process should be possible with much finer granulariy. The current code does not yet try to resume from where it left off, but this should be relatively easy to implement.

For larger boards, the savings are even better. I am looking at the perfect information subset of Stratego endgames. The board is 10x10 with two 2x2 lakes, so 92 squares. I’m looking at flagless positions (to see which piece combinations can capture each other). The first interesting endgames start at 2v2, but the count explodes because repetition counters are part of the state, giving 44 billion states for say Captain-Lieutenant vs Lieutenant-Sergeant. 5-piece endgames seemed out of reach but using 4-fold symmetry there are 23x91 (highest ranking piece for each side) “canonical” slices and only 2x10+1 have to be cached at any time (since pieces only move orthogonally). This saves a factor of 99.7. So I think even 5-piece endgames might be feasible.

I’m also thinking of 10x10 draughts with e.g. 5v5 king endgames. The question is how to pick canonical 2-king configs to slice over. Not sure how to do that.

Tablebase suggestion

Re: Tablebase suggestion

Re: Tablebase suggestion

Re: Tablebase suggestion

Re: Tablebase suggestion

Re: Tablebase suggestion

Re: Tablebase suggestion

Re: Tablebase suggestion

Re: Tablebase suggestion

Re: Tablebase suggestion

Re: Tablebase suggestion