Yes, I now use sparse form tables. And it's (N+1)^8, not 8^(N+1) as the pessimistic bound.syzygy wrote: ↑Sat May 23, 2026 4:14 pm Your approach looks at orbits first. Filling/coloring an orbit takes out that orbit completely and leaves the other orbits untouched. So no need to track partially-filled orbits. But you have more numbers to store per state: one for each coloring, and with (suicide) chess you need up to N+1 colors for N pieces (of course then you won't need to cover all 8^(N+1) possible ways to color the 8 squares of an orbit).
Tablebase suggestion
Moderator: Ras
-
Rein Halbersma
- Posts: 780
- Joined: Tue May 22, 2007 11:13 am
Re: Tablebase suggestion
-
syzygy
- Posts: 6010
- Joined: Tue Feb 28, 2012 11:56 pm
Re: Tablebase suggestion
I'll blame ChatGPT for lying to me that NMM has D16 as symmetry group.Rein Halbersma wrote: ↑Sat May 23, 2026 8:47 pmI use the notation D_2n for the regular n-gon which has order 2n. So chess has D8 and draughts has its subgroup D4. For Nine Men's Morris the group is D8 x C2 for regular endgames (35 subgroups) and D8 x S3 for the special case of 3-3 endgames (120 subgroups). The 3-3 case is special because it has full 3-ring permutation symmetry, whereas normal endgames have inner-outer ring symmetry (but only for edge-edge and corner-corner squares, the orbit size are 8, 8, 4 and 4). The 19 subgroups corresponds to the D16 group, but that's not the symmetry group of NMM.D_4 has 9 subgroups, so 10 possible groups of remaining symmetries. Each of them can occur as the stabilizer of a subset of an orbit.
Nine Men's Morris seems to have 19 possible groups of remaining symmetries.
Ralph Gasser's paper does not seem to say which group it is, but Figure 6 indeed makes clear that it is D8 x C2, with C2 turning the board inside out. (I actually played that game once, apparently it is a minigame in Assassin's Creed Black Flag
D8xC2 apparently has 35 subgroups, 27 conjugacy classes.
https://groupprops.subwiki.org/wiki/Sub ... _D8_and_Z2
-
Rein Halbersma
- Posts: 780
- Joined: Tue May 22, 2007 11:13 am
Re: Tablebase suggestion
ChatGPT happily ploughed through all this. I let it generate prefix tables for 22 classes: 1 with the trivial stabilizer (both kings uniquely fix the position) and 21 different tables (1 for each king pair, with white king on a1, b2, c3 or d4). Remaining pieces are still placed orbit by orbit, but the kings positions are remembered, i.e. effectively, the state for the recursion is now (kings positions, #orbits, #material, stabilizer)syzygy wrote: ↑Sat May 23, 2026 4:14 pm For regular chess we know that we have two kings with multiplicity 1. We can start by placing them in the usual way. This also avoids a lot of illegal positions.
Then in 441 out of 462 cases we're already done: no symmetries left, and the remaining pieces can be placed group by group using combinadics.
In the remaining 21 cases, both kings are on the main diagonal. There is now exactly one diagonal symmetry left.
There are 28 orbits of size 2, 6 orbits of size 1.
-
syzygy
- Posts: 6010
- Joined: Tue Feb 28, 2012 11:56 pm
Re: Tablebase suggestion
I don't think more is needed to prove the correctness of the combinadics approach?Rein Halbersma wrote: ↑Sat May 23, 2026 8:47 pmActually, this is a coincidence. For the symmetry-reduced indexing, you need the cumulative sums over the tables. Happily, for binomials this is again a binomial thanks to the hockey-stick identity. https://en.wikipedia.org/wiki/Hockey-st ... ity#Proofssyzygy wrote: ↑Sat May 23, 2026 1:16 am So if you start by looking at square 63, and you let "empty" choose the range 0..binom(n-1,k)-1, then filling square 63 with the kth piece means you add binom(63,k) to the index. Same for square 62,61,...,0, and you recover the combinadics approach to ranking/unranking.
Encoding the placement of k (identical) pieces on n squares is equivalent to encoding for each square whether it receives a piece or not (with "yes" being selected k times).
If you label the squares 0...n-1 and encode the selections starting from square n-1, you can use the lower part of the index range to encode "empty" and the higher part of the index range to encode "full". The size of the lower part of the index range must be exactly the number of ways to place k pieces on n-1 squares. The size of the higher part of the index range must be exactly the number of ways to place k-1 pieces on n-1 squares. So if you place the "last" k-th piece on square n-1, you need to add Bin(n-1, k) to the index, decrease k by 1, and continue.
If you leave square n-1 empty, you don't add anything to the index, and you move on to square n-2 (k remaining unchanged).
I guess this is very close to https://en.wikipedia.org/wiki/Hockey-st ... ty#Proof_2
-
syzygy
- Posts: 6010
- Joined: Tue Feb 28, 2012 11:56 pm
Re: Tablebase suggestion
Different tables for the 21 king placements on the diagonal should not be needed. Perhaps you need to renumber the orbits on the diagonal so that the empty ones are 0-5 and the king ones are 6,7, for example. This reordering commutes with the reflection.Rein Halbersma wrote: ↑Sat May 23, 2026 11:40 pmChatGPT happily ploughed through all this. I let it generate prefix tables for 22 classes: 1 with the trivial stabilizer (both kings uniquely fix the position) and 21 different tables (1 for each king pair, with white king on a1, b2, c3 or d4). Remaining pieces are still placed orbit by orbit, but the kings positions are remembered, i.e. effectively, the state for the recursion is now (kings positions, #orbits, #material, stabilizer)syzygy wrote: ↑Sat May 23, 2026 4:14 pm For regular chess we know that we have two kings with multiplicity 1. We can start by placing them in the usual way. This also avoids a lot of illegal positions.
Then in 441 out of 462 cases we're already done: no symmetries left, and the remaining pieces can be placed group by group using combinadics.
In the remaining 21 cases, both kings are on the main diagonal. There is now exactly one diagonal symmetry left.
There are 28 orbits of size 2, 6 orbits of size 1.
-
Ender8pieces
- Posts: 23
- Joined: Wed Jan 21, 2026 9:16 am
- Full name: JS
Re: Tablebase suggestion
I would like to propose an optimized approach for retrograde analysis of massive 8-piece endgame tablebases (like KQQRkqqr) using a 5D coordinate slicing format: (White King Square, Black King Square, Turn, Remaining Pieces Positions).
The Problem
Generating an 8-piece tablebase like KQQRkqqr involves managing around 6.24 Trillion unique states when exploiting full board and color symmetry. Even at an ultra-compressed 1 bit per state, the table requires over 780 GB of RAM for standard in-memory retrograde iterations. This puts 8-piece tablebase generation completely out of reach for standard consumer hardware and home servers.
The Claim
We can execute a full retrograde iteration in less than 10 minutes of pure disk I/O time, using only 14 GB of RAM and a standard 1 TB High-Speed Flash NVMe SSD (capable of 6 GB/s sequential reads).
The Solution: Smart 5D Slicing & Caching Tour
Instead of loading the entire state space into memory, we break the tablebase down into 1,128 legal canonical slice-steps based on the positions of the two kings and the turn:
White King: Restricted to the 10 canonical squares of the standard endgame octant (a1-d1, b2-d2, c3-d3, d4).
Black King: ~64 squares (excluding squares occupied by or adjacent to the White King, leaving 564 legal king pairs).
Turn: 2 states (White to move / Black to move, mapped via color symmetry).
Each individual slice contains only the placement combinations of the remaining 6 pieces (2 Queens and 1 Rook per side), weighing uniformly around 691 MB.
The Algorithmic Workflow:
Ultra-Low RAM Footprint: We dedicate a cache in RAM that holds up to 20 slices at any given time (~14 GB RAM), making it perfectly lightweight for standard desktop setups.
Sequential Graph Tour: We treat the tablebase generation as a sequence of localized block writes. To write and update a specific slice, we only need its direct lookahead neighbors in memory.
Graph Optimization: By utilizing a graph solver (CP-SAT) to optimize the sequencing of the write-pipeline, we can achieve a highly efficient caching tour. The solver successfully optimized this tour down to only 4,986 total slice reads per full iteration.
Leveraging Sequential Flash Speed: This represents a massive optimization—only a factor of x4.4 over the absolute theoretical lower bound (where every slice is read exactly once). Because each slice is large (~691 MB), the NVMe drive operates entirely in its optimal sequential read mode. Moving ~3.45 TB of data at 6 GB/s means the entire iteration's I/O overhead takes just under 10 minutes.
The Global Tour Verification File
I have resulting schedule file, global_tour.json, which contains the exact step-by-step execution path generated by the CP-SAT solver. It maps out all 4,986 disk reads and 1,128 slice writes sequentially, explicitly tracing which slices enter the cache, which ones are evicted, and the exact state of the RAM at any given step. - I'm not sure how to downlad so below I will print the first 10 steps.
This pipeline has been fully validated by an independent verification script, confirming that dependency requirements are met and the 20-slice memory ceiling is never violated.
I would love to hear your thoughts on this architecture, specifically regarding parallelizing the compute kernel while maintaining this strict sequential disk-caching pipeline
global_tour (10 steps)
[
{
"global_step": 0,
"write_slice": "('a1', 'a7', 'W')",
"reads_this_step_count": 5,
"accumulated_reads_so_far": 5,
"read_slices_this_step": [
"('a1', 'a7', 'W')",
"('a1', 'a7', 'B')",
"('b1', 'g1', 'B')",
"('b1', 'a7', 'B')",
"('b2', 'a7', 'B')"
],
"evicted_slices_this_step": [],
"cache": [
"('a1', 'a7', 'W')",
"('a1', 'a7', 'B')",
"('b1', 'g1', 'B')",
"('b1', 'a7', 'B')",
"('b2', 'a7', 'B')"
]
},
{
"global_step": 1,
"write_slice": "('a1', 'h6', 'W')",
"reads_this_step_count": 5,
"accumulated_reads_so_far": 10,
"read_slices_this_step": [
"('a1', 'h6', 'W')",
"('a1', 'h6', 'B')",
"('b1', 'h6', 'B')",
"('b1', 'f8', 'B')",
"('b2', 'h6', 'B')"
],
"evicted_slices_this_step": [],
"cache": [
"('a1', 'h6', 'W')",
"('a1', 'h6', 'B')",
"('a1', 'a7', 'W')",
"('a1', 'a7', 'B')",
"('b1', 'g1', 'B')",
"('b1', 'h6', 'B')",
"('b1', 'a7', 'B')",
"('b1', 'f8', 'B')",
"('b2', 'h6', 'B')",
"('b2', 'a7', 'B')"
]
},
{
"global_step": 2,
"write_slice": "('a1', 'g6', 'W')",
"reads_this_step_count": 5,
"accumulated_reads_so_far": 15,
"read_slices_this_step": [
"('a1', 'g6', 'W')",
"('a1', 'g6', 'B')",
"('b1', 'g6', 'B')",
"('b1', 'f7', 'B')",
"('b2', 'g6', 'B')"
],
"evicted_slices_this_step": [
"('a1', 'h6', 'W')",
"('a1', 'h6', 'B')",
"('a1', 'a7', 'W')",
"('a1', 'a7', 'B')",
"('b1', 'g1', 'B')",
"('b1', 'f8', 'B')",
"('b2', 'h6', 'B')",
"('b2', 'a7', 'B')"
],
"cache": [
"('a1', 'g6', 'W')",
"('a1', 'g6', 'B')",
"('b1', 'g6', 'B')",
"('b1', 'h6', 'B')",
"('b1', 'a7', 'B')",
"('b1', 'f7', 'B')",
"('b2', 'g6', 'B')"
]
},
{
"global_step": 3,
"write_slice": "('a1', 'c8', 'W')",
"reads_this_step_count": 5,
"accumulated_reads_so_far": 20,
"read_slices_this_step": [
"('a1', 'c8', 'W')",
"('a1', 'c8', 'B')",
"('b1', 'h3', 'B')",
"('b1', 'c8', 'B')",
"('b2', 'c8', 'B')"
],
"evicted_slices_this_step": [
"('a1', 'g6', 'W')",
"('a1', 'g6', 'B')",
"('b1', 'g6', 'B')",
"('b1', 'f7', 'B')",
"('b2', 'g6', 'B')"
],
"cache": [
"('a1', 'c8', 'W')",
"('a1', 'c8', 'B')",
"('b1', 'h3', 'B')",
"('b1', 'h6', 'B')",
"('b1', 'a7', 'B')",
"('b1', 'c8', 'B')",
"('b2', 'c8', 'B')"
]
},
{
"global_step": 4,
"write_slice": "('a1', 'f6', 'W')",
"reads_this_step_count": 5,
"accumulated_reads_so_far": 25,
"read_slices_this_step": [
"('a1', 'f6', 'W')",
"('a1', 'f6', 'B')",
"('b1', 'f6', 'B')",
"('b2', 'e6', 'B')",
"('b2', 'f6', 'B')"
],
"evicted_slices_this_step": [
"('a1', 'c8', 'B')",
"('b1', 'h3', 'B')",
"('b1', 'c8', 'B')",
"('b2', 'c8', 'B')"
],
"cache": [
"('a1', 'f6', 'W')",
"('a1', 'f6', 'B')",
"('a1', 'c8', 'W')",
"('b1', 'f6', 'B')",
"('b1', 'h6', 'B')",
"('b1', 'a7', 'B')",
"('b2', 'e6', 'B')",
"('b2', 'f6', 'B')"
]
},
{
"global_step": 5,
"write_slice": "('a1', 'e6', 'W')",
"reads_this_step_count": 4,
"accumulated_reads_so_far": 29,
"read_slices_this_step": [
"('a1', 'e6', 'W')",
"('a1', 'e6', 'B')",
"('b1', 'f5', 'B')",
"('b1', 'e6', 'B')"
],
"evicted_slices_this_step": [
"('a1', 'f6', 'W')",
"('a1', 'f6', 'B')",
"('b1', 'f6', 'B')",
"('b2', 'f6', 'B')"
],
"cache": [
"('a1', 'e6', 'W')",
"('a1', 'e6', 'B')",
"('a1', 'c8', 'W')",
"('b1', 'f5', 'B')",
"('b1', 'e6', 'B')",
"('b1', 'h6', 'B')",
"('b1', 'a7', 'B')",
"('b2', 'e6', 'B')"
]
},
{
"global_step": 6,
"write_slice": "('a1', 'h1', 'W')",
"reads_this_step_count": 5,
"accumulated_reads_so_far": 34,
"read_slices_this_step": [
"('a1', 'h1', 'W')",
"('a1', 'h1', 'B')",
"('b1', 'h1', 'B')",
"('b1', 'a8', 'B')",
"('b2', 'h1', 'B')"
],
"evicted_slices_this_step": [
"('a1', 'e6', 'W')",
"('a1', 'e6', 'B')",
"('b2', 'e6', 'B')"
],
"cache": [
"('a1', 'h1', 'W')",
"('a1', 'h1', 'B')",
"('a1', 'c8', 'W')",
"('b1', 'h1', 'B')",
"('b1', 'f5', 'B')",
"('b1', 'e6', 'B')",
"('b1', 'h6', 'B')",
"('b1', 'a7', 'B')",
"('b1', 'a8', 'B')",
"('b2', 'h1', 'B')"
]
},
{
"global_step": 7,
"write_slice": "('a1', 'd6', 'W')",
"reads_this_step_count": 5,
"accumulated_reads_so_far": 39,
"read_slices_this_step": [
"('a1', 'd6', 'W')",
"('a1', 'd6', 'B')",
"('b1', 'f4', 'B')",
"('b1', 'd6', 'B')",
"('b2', 'd6', 'B')"
],
"evicted_slices_this_step": [
"('a1', 'h1', 'W')",
"('a1', 'h1', 'B')",
"('b2', 'h1', 'B')"
],
"cache": [
"('a1', 'd6', 'W')",
"('a1', 'd6', 'B')",
"('a1', 'c8', 'W')",
"('b1', 'h1', 'B')",
"('b1', 'f4', 'B')",
"('b1', 'f5', 'B')",
"('b1', 'd6', 'B')",
"('b1', 'e6', 'B')",
"('b1', 'h6', 'B')",
"('b1', 'a7', 'B')",
"('b1', 'a8', 'B')",
"('b2', 'd6', 'B')"
]
},
{
"global_step": 8,
"write_slice": "('a1', 'c6', 'W')",
"reads_this_step_count": 5,
"accumulated_reads_so_far": 44,
"read_slices_this_step": [
"('a1', 'c6', 'W')",
"('a1', 'c6', 'B')",
"('b1', 'f3', 'B')",
"('b1', 'c6', 'B')",
"('b2', 'c6', 'B')"
],
"evicted_slices_this_step": [
"('a1', 'd6', 'W')",
"('a1', 'd6', 'B')",
"('b1', 'f4', 'B')",
"('b1', 'd6', 'B')"
],
"cache": [
"('a1', 'c6', 'W')",
"('a1', 'c6', 'B')",
"('a1', 'c8', 'W')",
"('b1', 'h1', 'B')",
"('b1', 'f3', 'B')",
"('b1', 'f5', 'B')",
"('b1', 'c6', 'B')",
"('b1', 'e6', 'B')",
"('b1', 'h6', 'B')",
"('b1', 'a7', 'B')",
"('b1', 'a8', 'B')",
"('b2', 'c6', 'B')",
"('b2', 'd6', 'B')"
]
},
{
"global_step": 9,
"write_slice": "('a1', 'b6', 'W')",
"reads_this_step_count": 5,
"accumulated_reads_so_far": 49,
"read_slices_this_step": [
"('a1', 'b6', 'W')",
"('a1', 'b6', 'B')",
"('b1', 'f2', 'B')",
"('b1', 'b6', 'B')",
"('b2', 'b6', 'B')"
],
"evicted_slices_this_step": [
"('a1', 'c6', 'W')",
"('a1', 'c6', 'B')",
"('b1', 'c6', 'B')",
"('b2', 'c6', 'B')",
"('b2', 'd6', 'B')"
],
"cache": [
"('a1', 'b6', 'W')",
"('a1', 'b6', 'B')",
"('a1', 'c8', 'W')",
"('b1', 'h1', 'B')",
"('b1', 'f2', 'B')",
"('b1', 'f3', 'B')",
"('b1', 'f5', 'B')",
"('b1', 'b6', 'B')",
"('b1', 'e6', 'B')",
"('b1', 'h6', 'B')",
"('b1', 'a7', 'B')",
"('b1', 'a8', 'B')",
"('b2', 'b6', 'B')"
]
},
{
"global_step": 10,
"write_slice": "('a1', 'a6', 'W')",
"reads_this_step_count": 5,
"accumulated_reads_so_far": 54,
"read_slices_this_step": [
"('a1', 'a6', 'W')",
"('a1', 'a6', 'B')",
"('b1', 'f1', 'B')",
"('b1', 'a6', 'B')",
"('b2', 'a6', 'B')"
],
"evicted_slices_this_step": [
"('a1', 'b6', 'W')",
"('a1', 'b6', 'B')",
"('b1', 'b6', 'B')",
"('b2', 'b6', 'B')"
],
"cache": [
"('a1', 'a6', 'W')",
"('a1', 'a6', 'B')",
"('a1', 'c8', 'W')",
"('b1', 'f1', 'B')",
"('b1', 'h1', 'B')",
"('b1', 'f2', 'B')",
"('b1', 'f3', 'B')",
"('b1', 'f5', 'B')",
"('b1', 'a6', 'B')",
"('b1', 'e6', 'B')",
"('b1', 'h6', 'B')",
"('b1', 'a7', 'B')",
"('b1', 'a8', 'B')",
"('b2', 'a6', 'B')"
]
},
The Problem
Generating an 8-piece tablebase like KQQRkqqr involves managing around 6.24 Trillion unique states when exploiting full board and color symmetry. Even at an ultra-compressed 1 bit per state, the table requires over 780 GB of RAM for standard in-memory retrograde iterations. This puts 8-piece tablebase generation completely out of reach for standard consumer hardware and home servers.
The Claim
We can execute a full retrograde iteration in less than 10 minutes of pure disk I/O time, using only 14 GB of RAM and a standard 1 TB High-Speed Flash NVMe SSD (capable of 6 GB/s sequential reads).
The Solution: Smart 5D Slicing & Caching Tour
Instead of loading the entire state space into memory, we break the tablebase down into 1,128 legal canonical slice-steps based on the positions of the two kings and the turn:
White King: Restricted to the 10 canonical squares of the standard endgame octant (a1-d1, b2-d2, c3-d3, d4).
Black King: ~64 squares (excluding squares occupied by or adjacent to the White King, leaving 564 legal king pairs).
Turn: 2 states (White to move / Black to move, mapped via color symmetry).
Each individual slice contains only the placement combinations of the remaining 6 pieces (2 Queens and 1 Rook per side), weighing uniformly around 691 MB.
The Algorithmic Workflow:
Ultra-Low RAM Footprint: We dedicate a cache in RAM that holds up to 20 slices at any given time (~14 GB RAM), making it perfectly lightweight for standard desktop setups.
Sequential Graph Tour: We treat the tablebase generation as a sequence of localized block writes. To write and update a specific slice, we only need its direct lookahead neighbors in memory.
Graph Optimization: By utilizing a graph solver (CP-SAT) to optimize the sequencing of the write-pipeline, we can achieve a highly efficient caching tour. The solver successfully optimized this tour down to only 4,986 total slice reads per full iteration.
Leveraging Sequential Flash Speed: This represents a massive optimization—only a factor of x4.4 over the absolute theoretical lower bound (where every slice is read exactly once). Because each slice is large (~691 MB), the NVMe drive operates entirely in its optimal sequential read mode. Moving ~3.45 TB of data at 6 GB/s means the entire iteration's I/O overhead takes just under 10 minutes.
The Global Tour Verification File
I have resulting schedule file, global_tour.json, which contains the exact step-by-step execution path generated by the CP-SAT solver. It maps out all 4,986 disk reads and 1,128 slice writes sequentially, explicitly tracing which slices enter the cache, which ones are evicted, and the exact state of the RAM at any given step. - I'm not sure how to downlad so below I will print the first 10 steps.
This pipeline has been fully validated by an independent verification script, confirming that dependency requirements are met and the 20-slice memory ceiling is never violated.
I would love to hear your thoughts on this architecture, specifically regarding parallelizing the compute kernel while maintaining this strict sequential disk-caching pipeline
global_tour (10 steps)
[
{
"global_step": 0,
"write_slice": "('a1', 'a7', 'W')",
"reads_this_step_count": 5,
"accumulated_reads_so_far": 5,
"read_slices_this_step": [
"('a1', 'a7', 'W')",
"('a1', 'a7', 'B')",
"('b1', 'g1', 'B')",
"('b1', 'a7', 'B')",
"('b2', 'a7', 'B')"
],
"evicted_slices_this_step": [],
"cache": [
"('a1', 'a7', 'W')",
"('a1', 'a7', 'B')",
"('b1', 'g1', 'B')",
"('b1', 'a7', 'B')",
"('b2', 'a7', 'B')"
]
},
{
"global_step": 1,
"write_slice": "('a1', 'h6', 'W')",
"reads_this_step_count": 5,
"accumulated_reads_so_far": 10,
"read_slices_this_step": [
"('a1', 'h6', 'W')",
"('a1', 'h6', 'B')",
"('b1', 'h6', 'B')",
"('b1', 'f8', 'B')",
"('b2', 'h6', 'B')"
],
"evicted_slices_this_step": [],
"cache": [
"('a1', 'h6', 'W')",
"('a1', 'h6', 'B')",
"('a1', 'a7', 'W')",
"('a1', 'a7', 'B')",
"('b1', 'g1', 'B')",
"('b1', 'h6', 'B')",
"('b1', 'a7', 'B')",
"('b1', 'f8', 'B')",
"('b2', 'h6', 'B')",
"('b2', 'a7', 'B')"
]
},
{
"global_step": 2,
"write_slice": "('a1', 'g6', 'W')",
"reads_this_step_count": 5,
"accumulated_reads_so_far": 15,
"read_slices_this_step": [
"('a1', 'g6', 'W')",
"('a1', 'g6', 'B')",
"('b1', 'g6', 'B')",
"('b1', 'f7', 'B')",
"('b2', 'g6', 'B')"
],
"evicted_slices_this_step": [
"('a1', 'h6', 'W')",
"('a1', 'h6', 'B')",
"('a1', 'a7', 'W')",
"('a1', 'a7', 'B')",
"('b1', 'g1', 'B')",
"('b1', 'f8', 'B')",
"('b2', 'h6', 'B')",
"('b2', 'a7', 'B')"
],
"cache": [
"('a1', 'g6', 'W')",
"('a1', 'g6', 'B')",
"('b1', 'g6', 'B')",
"('b1', 'h6', 'B')",
"('b1', 'a7', 'B')",
"('b1', 'f7', 'B')",
"('b2', 'g6', 'B')"
]
},
{
"global_step": 3,
"write_slice": "('a1', 'c8', 'W')",
"reads_this_step_count": 5,
"accumulated_reads_so_far": 20,
"read_slices_this_step": [
"('a1', 'c8', 'W')",
"('a1', 'c8', 'B')",
"('b1', 'h3', 'B')",
"('b1', 'c8', 'B')",
"('b2', 'c8', 'B')"
],
"evicted_slices_this_step": [
"('a1', 'g6', 'W')",
"('a1', 'g6', 'B')",
"('b1', 'g6', 'B')",
"('b1', 'f7', 'B')",
"('b2', 'g6', 'B')"
],
"cache": [
"('a1', 'c8', 'W')",
"('a1', 'c8', 'B')",
"('b1', 'h3', 'B')",
"('b1', 'h6', 'B')",
"('b1', 'a7', 'B')",
"('b1', 'c8', 'B')",
"('b2', 'c8', 'B')"
]
},
{
"global_step": 4,
"write_slice": "('a1', 'f6', 'W')",
"reads_this_step_count": 5,
"accumulated_reads_so_far": 25,
"read_slices_this_step": [
"('a1', 'f6', 'W')",
"('a1', 'f6', 'B')",
"('b1', 'f6', 'B')",
"('b2', 'e6', 'B')",
"('b2', 'f6', 'B')"
],
"evicted_slices_this_step": [
"('a1', 'c8', 'B')",
"('b1', 'h3', 'B')",
"('b1', 'c8', 'B')",
"('b2', 'c8', 'B')"
],
"cache": [
"('a1', 'f6', 'W')",
"('a1', 'f6', 'B')",
"('a1', 'c8', 'W')",
"('b1', 'f6', 'B')",
"('b1', 'h6', 'B')",
"('b1', 'a7', 'B')",
"('b2', 'e6', 'B')",
"('b2', 'f6', 'B')"
]
},
{
"global_step": 5,
"write_slice": "('a1', 'e6', 'W')",
"reads_this_step_count": 4,
"accumulated_reads_so_far": 29,
"read_slices_this_step": [
"('a1', 'e6', 'W')",
"('a1', 'e6', 'B')",
"('b1', 'f5', 'B')",
"('b1', 'e6', 'B')"
],
"evicted_slices_this_step": [
"('a1', 'f6', 'W')",
"('a1', 'f6', 'B')",
"('b1', 'f6', 'B')",
"('b2', 'f6', 'B')"
],
"cache": [
"('a1', 'e6', 'W')",
"('a1', 'e6', 'B')",
"('a1', 'c8', 'W')",
"('b1', 'f5', 'B')",
"('b1', 'e6', 'B')",
"('b1', 'h6', 'B')",
"('b1', 'a7', 'B')",
"('b2', 'e6', 'B')"
]
},
{
"global_step": 6,
"write_slice": "('a1', 'h1', 'W')",
"reads_this_step_count": 5,
"accumulated_reads_so_far": 34,
"read_slices_this_step": [
"('a1', 'h1', 'W')",
"('a1', 'h1', 'B')",
"('b1', 'h1', 'B')",
"('b1', 'a8', 'B')",
"('b2', 'h1', 'B')"
],
"evicted_slices_this_step": [
"('a1', 'e6', 'W')",
"('a1', 'e6', 'B')",
"('b2', 'e6', 'B')"
],
"cache": [
"('a1', 'h1', 'W')",
"('a1', 'h1', 'B')",
"('a1', 'c8', 'W')",
"('b1', 'h1', 'B')",
"('b1', 'f5', 'B')",
"('b1', 'e6', 'B')",
"('b1', 'h6', 'B')",
"('b1', 'a7', 'B')",
"('b1', 'a8', 'B')",
"('b2', 'h1', 'B')"
]
},
{
"global_step": 7,
"write_slice": "('a1', 'd6', 'W')",
"reads_this_step_count": 5,
"accumulated_reads_so_far": 39,
"read_slices_this_step": [
"('a1', 'd6', 'W')",
"('a1', 'd6', 'B')",
"('b1', 'f4', 'B')",
"('b1', 'd6', 'B')",
"('b2', 'd6', 'B')"
],
"evicted_slices_this_step": [
"('a1', 'h1', 'W')",
"('a1', 'h1', 'B')",
"('b2', 'h1', 'B')"
],
"cache": [
"('a1', 'd6', 'W')",
"('a1', 'd6', 'B')",
"('a1', 'c8', 'W')",
"('b1', 'h1', 'B')",
"('b1', 'f4', 'B')",
"('b1', 'f5', 'B')",
"('b1', 'd6', 'B')",
"('b1', 'e6', 'B')",
"('b1', 'h6', 'B')",
"('b1', 'a7', 'B')",
"('b1', 'a8', 'B')",
"('b2', 'd6', 'B')"
]
},
{
"global_step": 8,
"write_slice": "('a1', 'c6', 'W')",
"reads_this_step_count": 5,
"accumulated_reads_so_far": 44,
"read_slices_this_step": [
"('a1', 'c6', 'W')",
"('a1', 'c6', 'B')",
"('b1', 'f3', 'B')",
"('b1', 'c6', 'B')",
"('b2', 'c6', 'B')"
],
"evicted_slices_this_step": [
"('a1', 'd6', 'W')",
"('a1', 'd6', 'B')",
"('b1', 'f4', 'B')",
"('b1', 'd6', 'B')"
],
"cache": [
"('a1', 'c6', 'W')",
"('a1', 'c6', 'B')",
"('a1', 'c8', 'W')",
"('b1', 'h1', 'B')",
"('b1', 'f3', 'B')",
"('b1', 'f5', 'B')",
"('b1', 'c6', 'B')",
"('b1', 'e6', 'B')",
"('b1', 'h6', 'B')",
"('b1', 'a7', 'B')",
"('b1', 'a8', 'B')",
"('b2', 'c6', 'B')",
"('b2', 'd6', 'B')"
]
},
{
"global_step": 9,
"write_slice": "('a1', 'b6', 'W')",
"reads_this_step_count": 5,
"accumulated_reads_so_far": 49,
"read_slices_this_step": [
"('a1', 'b6', 'W')",
"('a1', 'b6', 'B')",
"('b1', 'f2', 'B')",
"('b1', 'b6', 'B')",
"('b2', 'b6', 'B')"
],
"evicted_slices_this_step": [
"('a1', 'c6', 'W')",
"('a1', 'c6', 'B')",
"('b1', 'c6', 'B')",
"('b2', 'c6', 'B')",
"('b2', 'd6', 'B')"
],
"cache": [
"('a1', 'b6', 'W')",
"('a1', 'b6', 'B')",
"('a1', 'c8', 'W')",
"('b1', 'h1', 'B')",
"('b1', 'f2', 'B')",
"('b1', 'f3', 'B')",
"('b1', 'f5', 'B')",
"('b1', 'b6', 'B')",
"('b1', 'e6', 'B')",
"('b1', 'h6', 'B')",
"('b1', 'a7', 'B')",
"('b1', 'a8', 'B')",
"('b2', 'b6', 'B')"
]
},
{
"global_step": 10,
"write_slice": "('a1', 'a6', 'W')",
"reads_this_step_count": 5,
"accumulated_reads_so_far": 54,
"read_slices_this_step": [
"('a1', 'a6', 'W')",
"('a1', 'a6', 'B')",
"('b1', 'f1', 'B')",
"('b1', 'a6', 'B')",
"('b2', 'a6', 'B')"
],
"evicted_slices_this_step": [
"('a1', 'b6', 'W')",
"('a1', 'b6', 'B')",
"('b1', 'b6', 'B')",
"('b2', 'b6', 'B')"
],
"cache": [
"('a1', 'a6', 'W')",
"('a1', 'a6', 'B')",
"('a1', 'c8', 'W')",
"('b1', 'f1', 'B')",
"('b1', 'h1', 'B')",
"('b1', 'f2', 'B')",
"('b1', 'f3', 'B')",
"('b1', 'f5', 'B')",
"('b1', 'a6', 'B')",
"('b1', 'e6', 'B')",
"('b1', 'h6', 'B')",
"('b1', 'a7', 'B')",
"('b1', 'a8', 'B')",
"('b2', 'a6', 'B')"
]
},
-
syzygy
- Posts: 6010
- Joined: Tue Feb 28, 2012 11:56 pm
Re: Tablebase suggestion
You can tell your LLM that the number is 462...Ender8pieces wrote: ↑Wed May 27, 2026 11:33 amBlack King: ~64 squares (excluding squares occupied by or adjacent to the White King, leaving 564 legal king pairs).
That's 1319 MB per slice at 1 bit per position for one side to move. You can't divide this by 2 (except when both Ks are on the diagonal and you use really complicated indexing for those slices).Turn: 2 states (White to move / Black to move, mapped via color symmetry).
Each individual slice contains only the placement combinations of the remaining 6 pieces (2 Queens and 1 Rook per side), weighing uniformly around 691 MB.
I have already shown how to do it earlier in this thread. You can't beat 19 (20 if you need a scratch slice).Graph Optimization: By utilizing a graph solver (CP-SAT) to optimize the sequencing of the write-pipeline, we can achieve a highly efficient caching tour. The solver successfully optimized this tour down to only 4,986 total slice reads per full iteration.
(And it turns out Urban Koistinen posted the same idea in 2010, but did not implement it.)
No, you only need to reach each slice once if you do it right. (You do need to read further slices with the wins <= n etc, but that cannot be avoided even if you can hold all 462 slices at one.)Leveraging Sequential Flash Speed: This represents a massive optimization—only a factor of x4.4 over the absolute theoretical lower bound (where every slice is read exactly once). Because each slice is large (~691 MB), the NVMe drive operates entirely in its optimal sequential read mode. Moving ~3.45 TB of data at 6 GB/s means the entire iteration's I/O overhead takes just under 10 minutes.
I have already written the generator and generated KNNNNvKRR. It should work for KQQRvKQQR too.I would love to hear your thoughts on this architecture, specifically regarding parallelizing the compute kernel while maintaining this strict sequential disk-caching pipeline
The generator generates both sides of KQQRvKQQR, which indeed should in theory be avoidable to get an up to 2x speed up, but there are not so many symmetrical tables, so I don't think it is worth the trouble. The generator does make use of the symmetric material to get the compressed size down.
-
Ender8pieces
- Posts: 23
- Joined: Wed Jan 21, 2026 9:16 am
- Full name: JS
Re: Tablebase suggestion
Thank you for the informative response. I was clearly out of the loop since I was not aware of the new generator. I'm very glad to hear the news.
I really hope someone with the necessary hardware resources takes this new generator and uses it to build the full 8-piece tablebases!
I informed my LLM about the 462 slices... The numbers slightly updated, but I would like to understand more about your claim since it's stronger (at least in running time).
I actually do remember your suggestion (and apparently also Urban Koistinen's suggestion from 2010) for the 19/462, but my understanding is that it does not implement symmetry, though maybe I'm wrong.
Let's assume you save in the cache all slices with the black king in rows 1, 2, and part of row 3.
Now you would like to apply an iteration ("write") on the slice (wkd4, bka1) on White's turn.
If the White king's move is d4-d5, the Black king replicates to a8, which is not in the cache, right? Am I missing something?
Maybe I don't understand the "one side to move" correctly?
I really hope someone with the necessary hardware resources takes this new generator and uses it to build the full 8-piece tablebases!
I informed my LLM about the 462 slices... The numbers slightly updated, but I would like to understand more about your claim since it's stronger (at least in running time).
I actually do remember your suggestion (and apparently also Urban Koistinen's suggestion from 2010) for the 19/462, but my understanding is that it does not implement symmetry, though maybe I'm wrong.
Let's assume you save in the cache all slices with the black king in rows 1, 2, and part of row 3.
Now you would like to apply an iteration ("write") on the slice (wkd4, bka1) on White's turn.
If the White king's move is d4-d5, the Black king replicates to a8, which is not in the cache, right? Am I missing something?
Maybe I don't understand the "one side to move" correctly?
-
syzygy
- Posts: 6010
- Joined: Tue Feb 28, 2012 11:56 pm
Re: Tablebase suggestion
The canonical king placements have the wK in a1-d1-d4 and the bK either anywhere or, if the wK is on a1-d4, on or below the a1h8 diagonal.Ender8pieces wrote: ↑Thu May 28, 2026 8:44 am Thank you for the informative response. I was clearly out of the loop since I was not aware of the new generator. I'm very glad to hear the news.
I really hope someone with the necessary hardware resources takes this new generator and uses it to build the full 8-piece tablebases!
I informed my LLM about the 462 slices... The numbers slightly updated, but I would like to understand more about your claim since it's stronger (at least in running time).
I actually do remember your suggestion (and apparently also Urban Koistinen's suggestion from 2010) for the 19/462, but my understanding is that it does not implement symmetry, though maybe I'm wrong.
Let's assume you save in the cache all slices with the black king in rows 1, 2, and part of row 3.
Now you would like to apply an iteration ("write") on the slice (wkd4, bka1) on White's turn.
If the White king's move is d4-d5, the Black king replicates to a8, which is not in the cache, right? Am I missing something?
Maybe I don't understand the "one side to move" correctly?
In an iteration where black (including the bK) moves or unmoves, the wK loops over a1-d1-d4 as the outer loop, and the bK loops over a1-h1-a2-h2-a3-h3...-a8-h8 (skipping the squares above the diagonal if the wK is on a1-d4) as the inner loop.
In an iteration where white (including the wK) moves or unmoves, you do the same: bK loops over a1-d1-d4 as the outer loop, and the wK loops over a1-h1-a2-h2-...-a8-h8 (skipping the squares above the diagonal) as the inner loop. But now you have to load the slice that corresponds to the canonical position of the wK and the bK (and its neighbouring slices). So if the bK is on d1 and the wK on h2, you load (b1, a5).
Note that canonicalization is already doing its work when the bK moves/unmoves and the wK is fixed. If the wK is on a1 and the bK is on d4, you would need to load (a1,c4) [which is (a1,d3)], but you can skip that because you are already loading (a1,d3).
My generator explicitly masks out c4 if wK is on a1, but I think it wouldn't need to do that because it would notice anyway that the slice (a1,c4), after canonicalization, has already been loaded.
The masking out is even a bit risky, except that it can never go wrong for king moves. If the wK is on the diagonal and the bK is too, then bK to a square above is just the mirror of the bK to a square below the diagonal (e.g. (a1,d4)->(a1,d5)=(a1,e4)), which is already loaded. If the wK is on the diagonal and the bK is below, then the bK can reach a square above the diagonal only by moving to its own mirror square (e.g. (a1,e4)->(a1,d5)=(a1,e4)), which is already loaded. If the kings moves like knight, this would seem to go wrong. So conceptually safer is to try to load all canonical slices around the moving/unmoving bK (or around the moving/unmoving wK).
Note that in one iteration every slice is just loaded and saved once. It is the same amount of I/O as when you keep all slices in memory. You just split all sequential reads/writes of huge bitmaps into 462 sequential reads/writes of smaller bitmaps with a seek in between.
-
Ender8pieces
- Posts: 23
- Joined: Wed Jan 21, 2026 9:16 am
- Full name: JS
Re: Tablebase suggestion
I think I understand it now - thats realy clever.
So the claim is indeed 19/924, i.e. We need only 2% RAM and we lose the ratio time of RAM speed/disk speed in sequential.
It might be still intersting to try to redcue further the RAM consumption in the cost of time (more reads from disk). However, It seems caches under 9 (num of neighbours) slices will suffer high degradation.
I saw the next thread viewtopic.php?t=86313, and I will be asking questions on the global plan (it seems more appropriate). I will be glad to learn your analysis.
So the claim is indeed 19/924, i.e. We need only 2% RAM and we lose the ratio time of RAM speed/disk speed in sequential.
It might be still intersting to try to redcue further the RAM consumption in the cost of time (more reads from disk). However, It seems caches under 9 (num of neighbours) slices will suffer high degradation.
I saw the next thread viewtopic.php?t=86313, and I will be asking questions on the global plan (it seems more appropriate). I will be glad to learn your analysis.