disk-based 8-piece TB generator

Discussion of chess software programming and technical issues.

Moderator: Ras

Koistinen
Posts: 32
Joined: Sun May 23, 2021 10:05 pm
Location: Stockholm, Sweden
Full name: Urban Koistinen

Re: disk-based 8-piece TB generator

Post by Koistinen »

syzygy wrote: Fri May 29, 2026 11:43 pm
Koistinen wrote: Fri May 29, 2026 4:15 pm Nice this is going again. I did some estimation for how long it would take using 150 * 24TB hdds for just white wins and estimate 5 hours per 8-man.

Still takes about 3 years just to compute them all! (5*4795h is about 3 years)
5 hours per 8-man on average is definitely way too optimistic. KNNNNvKRR didn't take too long, but it is 48x smaller than some others.

But 4795 seems to be overcounting it.

The total number of tables for 3-8 pieces:

Code: Select all

3-piece		5
4-piece		30
5-piece		110
6-piece		365
7-piece		1001
8-piece		2520
The 8-piece tables can be nicely split into:
- 868 pawnless tables
- 792 tables with exactly one pawn
- 860 tables with 2+ pawns.

The 868 pawnless tables are now the "easy" ones.
The 792 tables with one pawn are the hard ones (although they need the least RAM to generate). They will take up the most space.
The 860 tables with 2+ pawns can be generated in reasonable RAM, which should mean that they can be generated relatively quickly.

All the 8-men pawnless tables can be generated independently and thus in parallel (you just need to download the right 7-men WDL tables, which isn't a big deal).
When that is done, all 8-men tables with 1 pawn can be generated independently (but you need to download the right 8-men WDL tables, which starts to be an issue).
What that is done, all 8-men tables with 2 pawns, then 3 pawns, etc.

The number of 8-men subtables you need to generate a specific pawnful table isn't too large, but if each table is multiple TBs, it is a lot of data. Still, this seems within reach if you don't live in a backward country like Germany.

The real problem is: who wants to host 2 PB of data.
Don't know what the retarded germans got to do with it.

I think my counting method is better when generating one side's wins at a time.
For storing the results, your approach is better.

I can afford to host 2-3PB but am looking into if perhaps some university would care to. Potentially less tax that way.
syzygy
Posts: 6010
Joined: Tue Feb 28, 2012 11:56 pm

Re: disk-based 8-piece TB generator

Post by syzygy »

Koistinen wrote: Sat May 30, 2026 10:55 am
syzygy wrote: Fri May 29, 2026 11:43 pm
Koistinen wrote: Fri May 29, 2026 4:15 pm Nice this is going again. I did some estimation for how long it would take using 150 * 24TB hdds for just white wins and estimate 5 hours per 8-man.

Still takes about 3 years just to compute them all! (5*4795h is about 3 years)
5 hours per 8-man on average is definitely way too optimistic. KNNNNvKRR didn't take too long, but it is 48x smaller than some others.

But 4795 seems to be overcounting it.

The total number of tables for 3-8 pieces:

Code: Select all

3-piece		5
4-piece		30
5-piece		110
6-piece		365
7-piece		1001
8-piece		2520
The 8-piece tables can be nicely split into:
- 868 pawnless tables
- 792 tables with exactly one pawn
- 860 tables with 2+ pawns.

The 868 pawnless tables are now the "easy" ones.
The 792 tables with one pawn are the hard ones (although they need the least RAM to generate). They will take up the most space.
The 860 tables with 2+ pawns can be generated in reasonable RAM, which should mean that they can be generated relatively quickly.

All the 8-men pawnless tables can be generated independently and thus in parallel (you just need to download the right 7-men WDL tables, which isn't a big deal).
When that is done, all 8-men tables with 1 pawn can be generated independently (but you need to download the right 8-men WDL tables, which starts to be an issue).
What that is done, all 8-men tables with 2 pawns, then 3 pawns, etc.

The number of 8-men subtables you need to generate a specific pawnful table isn't too large, but if each table is multiple TBs, it is a lot of data. Still, this seems within reach if you don't live in a backward country like Germany.

The real problem is: who wants to host 2 PB of data.
Don't know what the retarded germans got to do with it.
Internet download/upload speeds are 10 years behind.
Or, more positively: in many countries internet speeds are getting to the point where downloading or uploading a few terabytes of data becomes feasible.
I think my counting method is better when generating one side's wins at a time.
For storing the results, your approach is better.
My method is very close to one that you proposed in 2010:
https://kirill-kryukov.com/chess/discus ... php?t=5163
But I use 1 bit per position to calculate just DTZ=N instead of 2 bits to calculate DTZ=N and N+1 (easier to implement, less RAM and probably not significantly slower if at all).
I can afford to host 2-3PB but am looking into if perhaps some university would care to. Potentially less tax that way.
So maybe this can be done :)
User avatar
Ajedrecista
Posts: 2248
Joined: Wed Jul 13, 2011 9:04 pm
Location: Madrid, Spain.

Re: Disk-based 8-piece TB generator.

Post by Ajedrecista »

Hello Ronald and Urban:

Thanks to both for your answers. I think you are counting different things: Ronald counts different types of endgames (as I would count) and Urban is counting computations that he needs in order to get the tablebases. Let me show you what I understand from the easy example of 4-man EGTB:

Code: Select all

4-man EGTB.

Ronald's    Urban's
typical ——  needed
endgames    computations

 1 KQQk ——  1 QQ
 2 KQRk ——  2 QR
 3 KQBk ——  3 QB
 4 KQNk ——  4 QN
 5 KQPk ——  5 QP
 6 KRRk ——  6 RR
 7 KRBk ——  7 RB
 8 KRNk ——  8 RN
 9 KRPk ——  9 RP
10 KBBk —— 10 BB
11 KBNk —— 11 BN
12 KBPk —— 12 BP
13 KNNk —— 13 NN
14 KNPk —— 14 NP
15 KPPk —— 15 PP

16 KQkq —— 16 Qq

17 KQkr —— 17 Qr
        —— 18 Rq

18 KQkb —— 19 Qb
        —— 20 Bq

19 KQkn —— 21 Qn
        —— 22 Nq

20 KQkp —— 23 Qp
        —— 24 Pq

21 KRkr —— 25 Rr

22 KRkb —— 26 Rb
        —— 27 Br

23 KRkn —— 28 Rn
        —— 29 Nr

24 KRkp —— 30 Rp
        —— 31 Pr

25 KBkb —— 32 Bb

26 KBkn —— 33 Bn
        —— 34 Nb

27 KBkp —— 35 Bp
        —— 36 Pb

28 KNkn —— 37 Nn

29 KNkp —— 38 Np
        —— 39 Pn

30 KPkp —— 40 Pp
The table was written by hand, but I hope no typos. The tablebase metrics are also different: Ronald's typical WDL differenciates between {win, draw, lose} in a given position; while Urban's W/DL is a {win, no win} output, without distinguishing between drawn and lost positions. I do not know if there are differences in how each of you treat cursed wins (wins longer than the fifty-move rule).

------------

Taking advantage of NULP's results for 8-man, I copied them to Excel and toyed with filters. I confirm Ronald's results of 868 pawnless tables (I filtered p and got 1652 results, so 2520 - 1652 = 868), with circa 3.81763e+16 - 3.39903e+16 ~ 4.186e+15 positions or around 10.965% of the positions of 8-man EGTB using NULP's results.

I can also confirm 792 tables with exactly one pawn, using =LEN(A1)-LEN(SUBSTITUTE(A1,"p","")) formula (thanks, Google!) to count how many 'p' are in each cell and filter to 1. This gave circa 2.03920e+16 positions or around 53.415% of the positions of 8-man EGTB using NULP's results. Expanding, I get around 27.119%, 7.308%, 1.102%, 0.088% and 0.003% of the positions for endgames with 2, 3, 4, 5 and 6 pawns, without taking into account how many pawns are per side. Logically: 2520 - 868 - 792 = 860 tables with two or more pawns.

I agree that the table of dependencies is not easy-peasy. I guess that for each of the 792 tables with exactly one pawn, there are pawnless tables that must be pre-computed (replacing 'p' by either 'q', 'r', 'b' and 'n'?) and do something similar recursively with more pawns. It is easier to tell than to make it work, but both of you are very smart.

Regards from Spain.

Ajedrecista.
Ender8pieces
Posts: 23
Joined: Wed Jan 21, 2026 9:16 am
Full name: JS

Re: disk-based 8-piece TB generator

Post by Ender8pieces »

I observed that the K4Nk2r table took around 6 hours for 111 iterations (actually 2 tables with 48 and 63 iterations respectively).

This seems quite surprising since it does not include disk reads. If it did include disk reads, it should take ~1,000 seconds for 111 iterations on a disk with 10 GB/s throughput, assuming the database is 100 GB on disk. That is roughly 20 minutes, which is much less than 6 hours.

I assume this ratio would be preserved in larger databases, so it seems the bottleneck is not the disk (hence more RAM won't help), but the retrograde analysis itself—which I assume is mostly limited by the RAM reads of all the neighboring positions.

I wonder if a GPU could help in such cases and if that implementation is possible.

Anyway, right now it seems very feasible, and maybe a community project can be adapted if a central server becomes available.
syzygy
Posts: 6010
Joined: Tue Feb 28, 2012 11:56 pm

Re: disk-based 8-piece TB generator

Post by syzygy »

Ender8pieces wrote: Sun May 31, 2026 10:15 am I observed that the K4Nk2r table took around 6 hours for 111 iterations (actually 2 tables with 48 and 63 iterations respectively).
Or even 222 iterations if you count win-in-N and loss-in-(N+1) as two iterations.
The main loop (until the 50-move rule):

Code: Select all

  int n;
  for (n = 1; n <= DRAW_RULE; n++) {
    more_wb_next = more_lw && calc_L(WHITE, n, more_lw);
    more_ww_next = more_lb && calc_L(BLACK, n, more_lb);

    more_lb = more_ww && calc_W(WHITE, n, more_ww);
    more_lw = more_wb && calc_W(BLACK, n, more_wb);

    more_wb = more_wb_next;
    more_ww = more_ww_next;
  }
calc_L(stm, n) calculates stm losses with dtz=N, calc_W(stm,n) calculates stm wins with dtz=N.
This seems quite surprising since it does not include disk reads. If it did include disk reads, it should take ~1,000 seconds for 111 iterations on a disk with 10 GB/s throughput, assuming the database is 100 GB on disk. That is roughly 20 minutes, which is much less than 6 hours.
Compressed it is a lot less I/O than that. Uncompressed it is a lot more.

One bit per position for one side to move is 49.6 GB uncompressed, so I can see how you arrive at 111 x 100 GB.
But calc_W(wtm, N):
- reads all btm losses for dtz = N-1
- reads all cumulative wtm wins for dtz <= N-1
- writes all wtm wins for dtz = N
- updates all cumulative wtm wins for dtz <= N (*)

and calc_L(wtm, N):
- reads all btm wins for dtz = N-1
- writes all candidate wtm losses for dtz = N
- reads all candidate wtm losses for dtz = N
- reads all cumulative btm wins for dtz <= N-1 to check whether all successor positions of candidate wtm losses lose.
- must somehow also check that all captures from candidate wtm losses lose (there are several options for this). (**)
- writes all wtm losses for dtz = N.

(*) The straightforward way is to write back the updated cumulative wins at each calc_W() iteration. Unlike the dtz=N wins/losses slices, these cumulative wins slices do not typically get sparser for higher iterations, so compressing does not get faster. Since decompressing is faster than compressing, I do not write them back at every iteration but instead use the smaller (if compressed) dtz=N win slices as deltas and update the cumulative slices only when reading the deltas starts to get too much. (I learned that deciding when to update is an example of the "ski-rental-problem".)

(**) Checking captures can be done in multiple ways:
- by probing subtables over and over again. Very slow (but perhaps optimal once there are very few positions left to be resolved).
- by generating a separate bitmap with positions with a drawing capture (or one that leads to a cursed loss) and reading this bitmap in every calc_L() iteration (another 50 GB uncompressed). As far as calc_L() is concerned, this bitmap could be merged into the "cumulative wins" bitmap, but this does not work for calc_W().
- by generating smaller sub-K-slice bitmaps for the positions after the capture, basically an uncompressed form of tablebase once loaded into memory, and testing captures against those. This is what my pawnless generator does. (Adds a bit to RAM usage but not much, less than a 21st full K-slice.)

For pawnful tables, you also need to check pawn moves (including promotions). Now generating sub-K-slice bitmaps does not work because pushing a pawn does not remove a piece from the board and therefore does not drastically reduce the number of positions like a capture does. So the pawnful generator generates K-slice bitmaps for positions with a non-losing pawn move. Since these have to be read anyway during each calc_L() pass, I also add the positions with a non-losing capture to them, so now there is no need for the sub-K-slices.

So you easily get to 10 x 50 GB I/O per iteration, if uncompressed, of which 3 or 4 are write steps. If you choose 3, then you need to do many reads of full-sized uncompressed dtz=N win slices, so you should want to write back the cumulative wins at each iteration, i.e. 4 write steps.

So per iteration 200 GB written. With 111 iteration this is 22.2 TB. For a full-size TB, you can multiple the size of each slice by 48 and the number of iterations maybe by 5. That is 5328 TB. It seems the better consumer SSDs are nowadays rated for 600 TB write endurance. So you'll need 9 SSDs in the process and figure out how not to lose any data while replacing SSDs, for a single large 8-men TB. This is getting expensive.

So let's be clear: it is nonsensical to consider not using compression here. Let's just forget about that, I think we've discussed this enough. If you really want to burn SSDs, then take my code and remove the compression yourself.
I assume this ratio would be preserved in larger databases, so it seems the bottleneck is not the disk (hence more RAM won't help), but the retrograde analysis itself—which I assume is mostly limited by the RAM reads of all the neighboring positions.
With good compression, I/O may indeed not be a major bottleneck even with slow HDDs (but with slow HDDs it would help to do more scheduling of reads and writes so computation does not have to wait for I/O to finish).

The retrograde analysis does a lot of index calculations. The indexing scheme for generation that I use now is a lot more expensive than what I used in my old generator, but from the 7-piece generation effort I learned that memory bandwidth is the main limiting factor, especially as the number of available cores goes up.

Of course compression/decompression also takes time. This can still be tuned (different compression levels, different compression methods).

With a modest amount of extra RAM, it should be possible to hold all or at least most of the "candidate losses" compressed in RAM (if I rearrange calc_L() a bit to consume those slices whenever the stm king has done a sweep of the board). Keeping all the cumulative wins slices compressed in RAM would be helpful too, but that will usually be too much (perhaps for pawnful tables this might often fit). So I should allow the generator to keep part of its working tree in a RAM-based filesystem and the rest on disk.

On the computation side there are:
- compression/decompression of temporary files
- index calculations. My previous calculator used a very simple and fast indexing scheme during generation. My new generator uses a more expensive scheme (I think iterating through positions is now pretty fast; most of the cycles probably go into calculating the position index after a move/unmove). But I learned from the 7-piece generation effort that memory bandwidth is more likely to be the bottleneck than calculation, especially with more and more cores being available.
- so memory bandwidth.

What also takes a lot of time (and is limited by memory bandwidth) is the final compression phase. Here I/O basically plays no role. I think something can be gained algorithmically if more memory is available.
Ender8pieces
Posts: 23
Joined: Wed Jan 21, 2026 9:16 am
Full name: JS

Re: disk-based 8-piece TB generator

Post by Ender8pieces »

OK, I hear you. We discussed this in the past, but now I fully understand your point.

Let me see if I have this right: the compressed files are roughly 10% the size of the uncompressed ones.
In the first iterations, both the compressed and uncompressed approaches will write a lot of data (though uncompressed obviously writes significantly more). In the later iterations, the compressed approach writes very little because of your "delta update" trick (which actually resolves my main concern).

Your application of the ski-rental problem here is very interesting; I used to work on similar optimization instances.

Do you have any rough estimate for the total write volume (in TB) for the full project?
Also, do you have a rough estimate of the total time required to generate the full tablebase, assuming a single machine similar to the one you are running on (and enough SSD capacity)?

Regarding memory bandwidth: could the index function be designed to support reading in "blocks of bits" instead of random bits? Perhaps the pieces with the highest mobility should be the least in the index hierarchy.

As I say, a GPU might be useful for this , but I it wouldn't be a simple implementation.
syzygy
Posts: 6010
Joined: Tue Feb 28, 2012 11:56 pm

Re: disk-based 8-piece TB generator

Post by syzygy »

Ender8pieces wrote: Mon Jun 01, 2026 9:51 amLet me see if I have this right: the compressed files are roughly 10% the size of the uncompressed ones.
For KNNNNNvKQ, which is about 40 GB at one bit per position for both sides to move, the number of iterations is about 150.
Total amount of data written and read, including the merging and final compression phases:

Code: Select all

total bytes written = 603622779632
total bytes read = 2265607540387
So that is 562 GB written, 2110 GB read (where 1k=1024, storage specifications usually use 1k=1000, but let's work with 1024).

To estimate the amount of data written without compression of intermediate files, I think it is easier to switch to 20 GB per wtm, 20 GB per btm, and 300 iterations, 150 calc_L() passes and 150 calc_W() passes. Each calc_L() pass writes 2x20 GB = 40 Gb. Each calc_W() pass does either one or two writes, depending on whether the cumulative wins are updated, but if we don't use compression there is nothing to gain from not updating (except reducing ssd wear). So the fair comparison is also 2x20 GB = 40 GB per calc_W() pass. (We can view the delta-approach to the cumulative wins as part of the compression.)

So that means 300 x 40 GB = 12000 GB during the regular generation.
This ignores extra passes to calculate positions with winning captures, cursed-win captures, drawing captures, blessed-loss captures: another 4 full writes of 40 GB = 160 GB.
There is some more writing for sub-slices, but let's ignore that.
Then we have the merging phase, which read and combines all the slices and writes "merged slices", 1 per side for wdl, 1 for dtz. These are 1 byte per position, so 16 * 20 + 8 * 20 = 480 GB.
These merge slices are then read again for the final (custom) compression. This is still included in the above numbers, which means we should deduct 36 GB (final compressed WDL+DTZ tables) for a fair comparison.

So with compression: 562 - 36 GB = 526 GB written (during generation and merging).
Without compression: 12000 + 160 + 480 = 12640 GB
That gives an overall compression factor of about 24x for this table.

I was now using Zstd level 6. This could be lowered for faster but less efficient compression. Level 6 is probably on the high side (certainly when all data is written to a RAM disk).

It seems a modern HDD can achieve up to about 250 MB/s of sustained write and read speed.
During the final compression this is not an issue at all, so let's look at the generation and merging phase, with about 526 + (2110 - 36) = 2600 GB written+read (a bit more than 36 should be deducted because the merged compressed slices are bigger than the final compressed slices, but let's ignore). That would need 2600x4s = 2h53m. So let's say 3 hours.
The compression and merging phase with RAM disk lasted about 2h15m.

So sequential disk throughput would be the bottleneck. And this assumes that disk throughput capacity would be used optimally, which will not be the case with the current generator. (I would at least need to implement some form of prefetching of slices into the page cache before they are actually needed, but I think this is doable.)

On the other hand, one can hope that some of the data being read is cached and does not actually have to be read from disk. With full-sized pawnless tables this is perhaps not so likely for the heavier iterations, but after that things might look better. And the generator could also help here by saving the compressed "candidate loss" slices to RAM disk and reusing them as soon as possible, and probably the same should be done for the small subslices for dealing with captures, etc. For pawnful tables things might look even better. (And with 2+ pawns, hdd throughput is not going to be an issue. You really want the tables being probed from promotions on SSD, though.)

So with a careful approach, and perhaps two HDDs in a RAID configuration, this should be doable.
In the first iterations, both the compressed and uncompressed approaches will write a lot of data (though uncompressed obviously writes significantly more). In the later iterations, the compressed approach writes very little because of your "delta update" trick (which actually resolves my main concern).

Your application of the ski-rental problem here is very interesting; I used to work on similar optimization instances.
The decision criterion I use is roughly: update if the the total amount of delta data read since the last update exceeds the size of the last full cumulative slice, where I keep track of this per K-slice.

This criterion fails to take into account that reading even a very small delta slice with almost no bits set still results in a full pass through memory to OR almost every byte with 0. I could add this into the cost of deltas, but it seems better to just solve the real problem and skip all-zero cache lines and, if necesaary in a second step, use a sparse compression scheme for very sparse slices, which just stores the indices of the bits that need to be set (maybe with Zstd or similar on top).

So currently these last very sparse iterations are totally memory bound, which results in a very quiet PC.
Or in other words, it is possible to do better than 2h15m.
Do you have any rough estimate for the total write volume (in TB) for the full project?
Not really.
If KNNNNNvKQ is representative, then one should expect 120x500 GB = 60TB for the largest 8-men pawnless tables (with no identical pair of pieces). But there is a lot of variation between tables.
Also, do you have a rough estimate of the total time required to generate the full tablebase, assuming a single machine similar to the one you are running on (and enough SSD capacity)?
Many decades.
If it is on average a month per table with 1 pawn, then than part alone will take over 60 years. 792 months is a long time.
Regarding memory bandwidth: could the index function be designed to support reading in "blocks of bits" instead of random bits? Perhaps the pieces with the highest mobility should be the least in the index hierarchy.
I do that since 2011.
As I say, a GPU might be useful for this , but I it wouldn't be a simple implementation.
I don't think so. Prove me wrong.
syzygy
Posts: 6010
Joined: Tue Feb 28, 2012 11:56 pm

Re: disk-based 8-piece TB generator

Post by syzygy »

KNNNNNvKQ

White wins (black loses) with dtz=92:
[pgn][Event ""]
[FEN "2N5/N7/8/8/8/q4Nk1/3K4/5NN1 b - - 0 1"]
1...Kf2 2.N1h2 Qe3+ 3.Kc2 Kg2 4.Kb2 Qe4 5.Kb3 Qd5+ 6.Kc3 Qc5+ 7.Kd3 Qf5+ 8.Kd4
Qf4+ 9.Kd5 Qf7+ 10.Kd6 Qf6+ 11.Kd7 Qf5+ 12.Ke7 Qe4+ 13.Kf6 Qa4 14.Ke6 Qa6+
15.Kd5 Qb7+ 16.Kd6 Qb4+ 17.Kc7 Qa5+ 18.Kc6 Qa2 19.Kc5 Qe6 20.Kb4 Qe4+ 21.Ka5
Qf5+ 22.Ka6 Qc5 23.Nb6 Qa3+ 24.Kb7 Qe7+ 25.Ka8 Qe4+ 26.Kb8 Qf4+ 27.Kb7 Qf7+
28.Ka8 Qe6 29.Nac8 Qc6+ 30.Ka7 Qc7+ 31.Ka6 Qc3 32.Ng4 Qa1+ 33.Kb5 Qb1+ 34.Kc4
Qa2+ 35.Kd4 Qa1+ 36.Ke3 Qa3+ 37.Ke4 Qb4+ 38.Nd4 Kh1 39.Nd5 Qb1+ 40.Kf3 Qd1+
41.Nge2 Qf1+ 42.Nf2+ Kh2 43.Ke3 Qg2 44.Ndf4 Qc6 45.Ne4 Qa4 46.Nf3+ Kh1 47.Nf2#[/pgn]

Black wins (white loses) with dtz=52:
[pgn][Event ""]
[FEN "1N6/7N/7N/8/4k2N/8/8/1K2q1N1 w - - 0 1"]
1.Kc2 Qf2+ 2.Kb3 Qb6+ 3.Kc2 Qc5+ 4.Kd2 Qe3+ 5.Kc2 Qd3+ 6.Kb2 Qd2+ 7.Kb3 Kd3
8.N6f5 Qf4 9.Nf6 Qc4+ 10.Ka3 Qc5+ 11.Kb3 Qb6+ 12.Ka3 Kc4 13.Ka2 Qb3+ 14.Ka1
Qd1+ 15.Ka2 Qd2+ 16.Ka3 Qa5+ 17.Kb2 Qe5+ 18.Ka2 Kc3 19.Nc6 Qe6+ 20.Ka1 Qe1+
21.Ka2 Qf2+ 22.Ka3 Qc5+ 23.Ka2 Qc4+ 24.Ka1 Qa4+ 25.Kb1 Qc2+ 26.Ka1 Qb2#[/pgn]

Cursed win for white (49 ply until a conversion into a 50-move draw):
[pgn][Event ""]
[FEN "6N1/6N1/7N/8/8/6k1/4q3/K1N3N1 b - - 0 1"]
1...Qe5+ 2.Ka2 Qd5+ 3.Ka3 Qd6+ 4.Kb3 Qd1+ 5.Kb2 Qd2+ 6.Kb1 Qb4+ 7.Kc2 Qc5+
8.Kd2 Qa5+ 9.Kd1 Qa4+ 10.Ke1 Qb4+ 11.Ke2 Qc4+ 12.Kd2 Qb4+ 13.Kd3 Qb5+ 14.Kd4
Qd7+ 15.Ke3 Qa7+ 16.Kd3 Qa6+ 17.Kc3 Qa5+ 18.Kc4 Qa4+ 19.Kc5 Qa7+ 20.Kc6 Qa4+
21.Kc7 Kf2 22.Nh3+ Kf1 23.Ne6 Qa5+ 24.Kd6 Qa3+ 25.Kd5 Qxh3 26.Nd4 Qd7+ 27.Ke4
Qe8+ 28.Kd3 Qg6+ 29.Kd2 Qg5+ 30.Kd1 Qh5+ 31.Nce2 Qh3 32.Nf4 Qc3 33.Nfe6 Qd3+
34.Kc1 Qc3+ 35.Nc2 Ke2 36.Nf5 Kd3 37.Nfd4 Qd2+ 38.Kb2 Qc3+ 39.Kb1 Qc8 40.Nf6
Qh8 41.Ng4 Qb8+ 42.Kc1 Qg3 43.Nf6 Qe5 44.Nh7 Qh2 45.Nhg5 Qd2+ 46.Kb1 Qd1+
47.Kb2 Qd2 48.Ngf3 Qa5 49.Kc1 Qa8 50.Kd1 Ke4 51.Ke2 Qa6+ 52.Kf2 Qa2 53.Neg5+
Kd5 54.Kg3 Kc5 55.Kf4 Qc4 56.Ne4+ Kb6 57.Ne3 Qf7+ 58.Ndf5 Ka5 59.Ke5 Qe8+
60.Kd4 Qa4+ 61.Nc4+ Kb4 62.Nfd2 Qd7+ 63.Ned6 Qa4 64.Kd3 Qa1 65.N6e4 Qh1 66.Nd4
Qh3+ 67.Ne3 Ka5 68.Nc4+ Kb4 69.Nb6 Ka5 70.Nd5 Qh7 71.Nc4+ Ka6 72.Nd6 Ka5 73.Kc4
Qh3 74.Nc6+ Ka4 75.Nc5+ Ka3 76.Ncb4 Kb2 77.Ncd3+ Ka1 78.Kb3 Qe6 79.Nc4 Kb1
80.Nc3+ Ka1 81.Nc2#[/pgn]

Cursed win for black (blessed loss for white, 20 ply until a conversion):
[pgn][Event ""]
[FEN "N6N/8/7N/8/8/4k3/7N/1K2q2N w - - 0 1"]
1.Kc2 Qd2+ 2.Kb3 Qd3+ 3.Kb4 Kd4 4.Nb6 Qd2+ 5.Kb3 Qe3+ 6.Kb4 Qe7+ 7.Ka4 Kc5
8.Nf2 Qe8+ 9.Ka3 Qe3+ 10.Ka4 Qxf2 11.Nd7+ Kc4 12.Ne5+ Kc3 13.Kb5 Qe2+ 14.Kc6
Qxe5 15.N8f7 Qxh2 16.Nf5 Qg2+ 17.Kd7 Qg4 18.Nd6 Kb4 19.Kc7 Qe2 20.Kd7 Qg2
21.Ne7 Kc5 22.Ndf5 Qe4 23.Ke8 Qe6 24.Ng7 Qf6 25.Ngf5 Kc4 26.Kd7 Qb6 27.Ke8 Kd3
28.Kd7 Ke4 29.Nd6+ Kd4 30.Nc6+ Kc5 31.Ne4+ Kd5 32.Nf6+ Kc4 33.Ne5+ Kc5 34.Ke7
Qa7+ 35.Ke6 Qa6+ 36.Kf5 Qc8+ 37.Kf4 Kd6 38.Ne4+ Kd5 39.Nf3 Qc7+ 40.Ke3 Qe7
41.Nd2 Qa3+ 42.Kf4 Qc1 43.Ke3 Ke5 44.Kd3 Qa3+ 45.Nc3 Qd6+ 46.Ke2 Qe6 47.Ke3
Qh6+ 48.Ke2 Qh3 49.Nd1 Qh5+ 50.Ke1 Kd4 51.Nf1 Qe8+ 52.Kd2 Ke4 53.Kc2 Qa4+
54.Kd2 Kf3 55.Nh2+ Kg2 56.Ng4 Qd4+ 57.Kc2 Kf3 58.Nde3 Ke2 59.Kb3 Kd3 60.Nd1
Qc4+ 61.Ka3 Kd2 62.Ndf2 Qb5 63.Ka2 Kc2 64.Ne3+ Kc1 65.Ka3 Qb2+ 66.Ka4 Qxf2
67.Nc4 Qc5 68.Kb3 Kd1 69.Kc3 Ke2 70.Kb3 Kd3 71.Nb2+ Kd4 72.Nd1 Qc4+ 73.Kb2 Kd3
74.Ka3 Qc1+ 75.Nb2+ Kc2 76.Ka2 Qxb2#[/pgn]

Code: Select all

wtm:
99637325351 positions are wins.
14900664 positions are cursed wins.
7964033335 positions are draws.
180724 positions are blessed losses.
419675705 positions are losses.

btm:
4347229972 positions are wins.
3491300 positions are cursed wins.
46296004406 positions are draws.
47901678 positions are blessed losses.
64542822127 positions are losses.
WDL: 7.1 GB
DTZ: 29.0 GB
syzygy
Posts: 6010
Joined: Tue Feb 28, 2012 11:56 pm

Re: disk-based 8-piece TB generator

Post by syzygy »

syzygy wrote: Mon Jun 01, 2026 10:38 pm

Code: Select all

wtm:
99637325351 positions are wins.
14900664 positions are cursed wins.
7964033335 positions are draws.
180724 positions are blessed losses.
419675705 positions are losses.

btm:
4347229972 positions are wins.
3491300 positions are cursed wins.
46296004406 positions are draws.
47901678 positions are blessed losses.
64542822127 positions are losses.
Now there is a small discrepancy with Bourzutschky's results:

Code: Select all

Ending knnnnnkq
WTM max win:  46
BTM max loss: 46
WTM wins:    99,652,226,474 (92.2%)
BTM loses:   64,590,726,084 (56.1%)
White wins: 164,242,952,558 (73.6%)
WTM legal:  108,036,115,779
BTM legal:  115,237,449,483

BTM stalemated: 0
WTM winning captures: 44,855,044,362
WTM win percent without captures: 86.73
BTM saving captures: 41,354,699,417
BTM loss percent without captures: 87.42
I counted 99637325351 + 14900664 = 99,652,226,015 wtm wins for white (ignoring the 50-move rule).
Bourzutschky counted 99,652,226,474, so 459 more...

I counted 64,590,723,805 black btm losses.
Bourzutschky counted 64,590,726,084, so 2279 more.

Our wtm and btm legal position counts are identical.
Our wtm winning captures counts are identical.

I'm not sure what is his "BTM saving captures" count.
I count 3143702730 + 2908606 + 37626971425 = 40,773,582,761 btm positions with a (cursed) winning or drawing capture. This is not close to 41,354,669,417.
Hmm, his number will count the positions with a capture that wins or draws, whereas my number does not include the positions with a capture that draws but with another non-capture move that wins. So it is normal that his number is bigger.

All numbers per ply/move are different, but that is because of the 50-move rule.
To get comparable numbers I would have to disable the 50-move rule (which I could do fairly easily).

It seems he did not generate KQvKNNNNN, or at least I can't find his statistics for this (white wins only) table.
https://drive.google.com/drive/folders/ ... PuGCkwXDTI

Very annoying, such tiny discrepancies...
Ender8pieces
Posts: 23
Joined: Wed Jan 21, 2026 9:16 am
Full name: JS

Re: disk-based 8-piece TB generator

Post by Ender8pieces »

syzygy wrote: Mon Jun 01, 2026 10:20 pm
Do you have any rough estimate for the total write volume (in TB) for the full project?
Not really.
If KNNNNNvKQ is representative, then one should expect 120x500 GB = 60TB for the largest 8-men pawnless tables (with no identical pair of pieces). But there is a lot of variation between tables.
Also, do you have a rough estimate of the total time required to generate the full tablebase, assuming a single machine similar to the one you are running on (and enough SSD capacity)?
Many decades.
If it is on average a month per table with 1 pawn, then than part alone will take over 60 years. 792 months is a long time.
Thanks for the response. As a takeaway, if a sponsor wanted to wrap this up within a year, a very rough back-of-the-napkin calculation suggests it would cost hundreds of thousands of dollars.

We would be looking at a cluster of, say, 100 nodes and 100 SSDs of around 20TB each. Even if the SSD specs don't officially guarantee that volume of total writes on paper, there is a probability it would hold up in practice(~1000 writes), especially given the hours-long intervals between write bursts. It would definitely require a solid error-detection layer, though.

It’s certainly a massive investment, even compared to past precedents.