Ender8pieces wrote: ↑Thu Feb 05, 2026 8:20 pm
I’ve been running a quantitative check on the 19-slice flow, and I’d like to get your thoughts regarding the "Side to Move" (STM) dependency.
When we are processing a slice for write for White to Move (WTM), we essentially iterate through all legal Black responses. However, when the iteration reaches the Black to Move (BTM) stage, we need to read those same source slices again to calculate the values for the other side.
So each slice is read twice per iteration,once for WTM and once for BTM or we could hold 38 slices (19 for each side) out of the 924 total slices in RAM to avoid the re-read.
The first half of the iteration is:
Code: Select all
S1 = Load(W, WIN_IN_N(n));
R1 = Predecessor(S1);
S2 = Load(B, UNKNOWN);
S2 = S2 & R1;
----
R2 = Load(W, WIN_IN<=N(n));
S3 = ProveSuccessor(S2, R2);
B = Add(S3, LOSE_IN_N(n));
----
R3 = Predecessor(S3);
S4 = Load(W, UNKNOWN);
S4 = S4 & R3;
W = Add(S4, WIN_IN_N(n+1));
(and then we do the same steps with W and B reversed)
R1 is the set of btm positions that lead, by a black move, to a wtm position that is a win-in-n. They are potentially loss-in-n. (Counting in moves, when generating DTZ it is better to count in ply, at least if you care about the 50-move rule.)
S2 is the subset of R1 of not-yet-resolved positions.
R2 is the set of wtm positions that are win-in-k with k<=n. R2 is used to check which positions in S2 are really loss-in-n.
S3 is that that set of btm loss-in-n positions. (So S3 is a subset of S2 is a subset of R1).
R3 is the set of wtm positions that lead, by a white move, to a btm position that is a loss-in-n. Those that are not yet resolved (R3 & S4) are win_in_(n+1).
[So R1 and R3 are constructed in RAM and then written to disk when they are dropped from memory. R2 is loaded from disk and the just dropped.]
What you mean is that W is read both in the first and the second part of the iteration to create S2/R1 and R2.
However, we cannot use the R2 K-slices to calculate S3 from S2 until enough of S2 has been created.
For example, we should already drop the a1 K-slices before we calculate the c3 K-slice, but both a1 and c3 are needed to calculate the b2 K-slice of S3.
So you would need to keep the R2 K-slices quite a bit longer in memory. Judging from the a1-b2-c3 example, we need a1 up to d4, so 3*8+4 = 28 K-slices instead of 19 K-slices. So in total 19+28 = 47 K-slices simultaneously in RAM, which means 242.2 GB, so still doable if you have 256 GB.
But I think there is a better way of using more memory (2 bits per position):
Wu-Beal treats two plies at once by counting moves instead. This is problematic for DTZ50. Bourzutschky's generator does the same (see the readme.txt).
My generator counts in plies, but it works on two plies at once.
To translate this to Wu-Beal, it loads WIN_IN_(N-1) and WIN_IN_N into S1 (without distinguishing, so 1 bit per position).
The ProveSuccessor(S3) step then checks whether a position marked as a potential loss is LOSS_IN_N or LOSS_IN_(N+1).
For the first part of the algorithm, this is free. (You just have a bigger subset in S1.)
The middle part is more tricky. Loading R2 from W is still the same amount of streaming (all of W) but you need two bits per position (or one 3-valued "trit" with 5 trits per byte). ProveSuccessor(S2,R2) is a bit more work but not that much. S3 will again be a 3-valued tritmap (or more simply 2 bits per position). Same for R3 and S4.
Can we do more?
If we also load LOSS_IN_N as part of S1, the Predecessor(S1) operation also finds WIN_IN_(N+1).
Suppose we have 2 bits per position in S1, so values 0-3:
0: neither LOSS_IN_(N-1)/N nor WIN_IN_(N-1)/N
1: one of the two wins
2: LOSS_IN_(N-1)
3: LOSS_IN_N
Now Predecessor(S1) constructs K-slices with values 0-3:
0: no predecssor
1: potential loss
2: WIN_IN_N
3: WIN_IN_(N+1)
all provided the position is still unknown, so we still must AND with the UNKNOWN bitmap from B to obtain S2 on disk.
Then, in the 2nd part of the iteration, we stream through S2 (while loading 2-bit R2 K-slices). Each win can be added directly to B. Each potential loss still has to be verified including its precise dtz determined, and is then added to B.
Now the 3rd part of the iteration is the 1st part of the half iteration for the other side to move.
To make this work we have to stagger the dtz: n-1/n for wtm, then n/n+1 for btm, then n+1/n+2 for wtm. (As my generator does.)
This might be the best use of having sufficient memory for 2 bits per position in each of the 19 K-slices in RAM.
It shouldn't be too difficult to implement both strategies in the same generator.