Are tablebases useless for Stockfish15?

Jouni · Post by **Jouni** » Fri Sep 02, 2022 8:35 am

I tested 600 endgame positions and 6 piece tablebases. Time control 60 + 0.6.

Score of stockfish15 TB vs stockfish15: 213 - 213 - 774 [0.500]
...      stockfish15 TB playing White: 135 - 75 - 390  [0.550] 600
...      stockfish15 TB playing Black: 78 - 138 - 384  [0.450] 600
...      White vs Black: 273 - 153 - 774  [0.550] 1200
Elo difference: 0.0 +/- 11.7, LOS: 50.0 %, DrawRatio: 64.5 %
1200 of 1200 games finished.

With SF14 there was a minor gain.

Code: Select all

Score of stockfish14 TB vs stockfish14: 218 - 194 - 788 [0.510]
...      stockfish14 TB playing White: 143 - 72 - 385  [0.559] 600
...      stockfish14 TB playing Black: 75 - 122 - 403  [0.461] 600
...      White vs Black: 265 - 147 - 788  [0.549] 1200
Elo difference: 6.9 +/- 11.5, LOS: 88.1 %, DrawRatio: 65.7 %
1200 of 1200 games finished.

I repeated test with another 600 positions and results were very similar.

Joerg Oster · Post by **Joerg Oster** » Fri Sep 02, 2022 8:43 am

mbabigian · Post by **mbabigian** » Sun Sep 04, 2022 5:04 am

The construction of the above test shows a fundamental lack of understanding of where tablebases provide benefit.

I run 7 piece tablebases and have run SF vs SF matches where I played a SF with a net not trained on 7 piece endgames or smaller (weaker net than master), versus master net that is trained on all stages of the game. The master SF is stronger than the test net without TBs used for either (for obvious reasons). If I give the net trained without 7 piece or smaller positions access to 7 piece endgame TBs, and no TBs for master SF, the former beats the later with double digit elo difference. I believe the last time I ran such a test it was 15000 games at 10+0.1 TC and if I recall correctly there was a greater than 20 elo gap. If I get time, I can run a SF vs SF+TBs match and post results.

The benefit of tablebases come from steering the middle game during search. Testing the ability of SF to play 6 piece endgame is pointless.

Jouni · Post by **Jouni** » Sun Sep 04, 2022 2:36 pm

Addition: I used these positions endgames.epd from https://github.com/official-stockfish/books. Additional test with SF15 classical evaluation

Code: Select all

Score of SF classic TB vs SF classic: 106 - 69 - 325 [0.537]
...      SF classic TB playing White: 60 - 26 - 164  [0.568] 250
...      SF classic TB playing Black: 46 - 43 - 161  [0.506] 250
...      White vs Black: 103 - 72 - 325  [0.531] 500
Elo difference: 25.8 +/- 18.0, LOS: 99.7 %, DrawRatio: 65.0 %
500 of 500 games finished.

mbabigian · Post by **mbabigian** » Sun Sep 04, 2022 5:47 pm

Use an opening book. I use the unbalanced UHO book that the Stockfish testing framework uses. Play at least 15000 games. Tests in the hundreds can show anything you want or don't want by happenstance.

I'm also assuming your WDL files are on fast SSD. If on a hard drive, don't waste electricity testing TBs. I also have a large amount of RAM, which helps memory mapping the used WDL files.

Unfortunately SF and other programs that use the Syzygy code stupidly memory map the DTZ files also. This can easily cause memory to disk paging. For example, if you run a high concurrency, some SF instances that have reached a TB position, start mapping DTZ files competing with the WDL files (and everything else running including SF) for RAM. DTZ files should never be mapped as most people doing IDeA analysis etc, run many engines at once. Without updating the Syzygy library most folks are stuck with this stupidity unfortunately. You can mitigate this some by setting aggressive adjudication rules to keep SF from playing TB positions. Without fast access to WDLs, it will be hard to overcome the speed loss with slightly better knowledge in the middle game.

jkominek · Post by **jkominek** » Mon Sep 05, 2022 8:51 am

mbabigian wrote: ↑Sun Sep 04, 2022 5:04 am The construction of the above test shows a fundamental lack of understanding of where tablebases provide benefit.

I run 7 piece tablebases and have run SF vs SF matches where I played a SF with a net not trained on 7 piece endgames or smaller (weaker net than master), versus master net that is trained on all stages of the game. The master SF is stronger than the test net without TBs used for either (for obvious reasons). If I give the net trained without 7 piece or smaller positions access to 7 piece endgame TBs, and no TBs for master SF, the former beats the later with double digit elo difference. I believe the last time I ran such a test it was 15000 games at 10+0.1 TC and if I recall correctly there was a greater than 20 elo gap. If I get time, I can run a SF vs SF+TBs match and post results.

The benefit of tablebases come from steering the middle game during search. Testing the ability of SF to play 6 piece endgame is pointless.

As I understand Jouni he is drawing his positions from an 157,846 entry epd file published by the Stockfish team. In effect he is using an opening book, just a very deep one. This collection has a man-count distribution listed in the table.

Code: Select all

Num  Count   Perc
 23      9   0.01
 22    522   0.33
 21   1107   0.70
 20   4002   2.54
 19   8045   5.10
 18  15489   9.81
 17  20398  12.92
 16  23788  15.07
 15  24060  15.24
 14  20932  13.26
 13  16578  10.50
 12  10758   6.82
 11   6541   4.14
 10   3320   2.10
  9   1580   1.00
  8    540   0.34
  7    141   0.09
  6     36   0.02
Tot 157846 100.00

This ranges from middlegame to endgame, with the mode in late middlegame. The positions are randomly ordered so it is fair to assume that Jouni's samples of 500/600 follow the same distribution.

The tests you report in which a short (6 or 8 move) opening book is used do indeed offer a real advantage: you measure bottom-line game performance deltas. That's good stuff as far as it goes. But in itself it is insufficient to determine where in the game phase the tablebases provide maximum benefit, which is the nub of your criticism. Having tablebases toggled on/off does not vary the conditions needed to assess that aspect. Instead one must cleave the opening book into phases -- say, according to some criteria: late opening, middlegame, late middlegame, early endgame -- and measure the contrastive effects. Jouni's experiment provides some useful information in that direction, and I can't see dismissing it as of no value.

The empirical question is no huge secret but I'll state it for sake of being explicit. Probing tablebases in the early game provides greater opportunity to steer the game, but the amount of information feed up the search tree is light. By the time the endgame is reached (with still more than 6 or 7 men on the board), the number of probe hits can run into tens of millions, but by that point the game result is often already sealed. Somewhere in between lies a game phase where tablebases have maximum impact.

Gaining insight into this question ties into another question that often comes up: what about 8-man tables, and the benefit they may bring? No one can say for sure. Three schools of thought seem to be held.

1. There is diminishing returns and 8-man tables will provide little benefit over what we already have.
2. They'll provide oracle knowledge earlier in the game, in greater quantity, and so should provide a significant assist, or
3. By the time they are generated, turned into syzygy-like format, and distributed, the neural net-based evaluations will be so good that the lookup tables are rendered irrelevant.

It sounds like you have the full 7-man set of syzygy with the rtbw files at least on fast SSD storage. Perhaps with your help some light can be shed on these questions.

mbabigian · Post by **mbabigian** » Mon Sep 05, 2022 5:23 pm

The OP asks are TB's useless. This has been stated many times and I have direct data that shows they are not. The original test will not show benefit due to its construction. In fact, late game tests have been done many times showing the same results as he posted. When I say "opening" book, I mean a book that is early opening. Endgame books are books, not opening books. Chess players have specific meanings for opening, middlegame, endgame. Calling a book with endgame positions an "opening book" makes no sense.

Yes, I have 7 piece on SSD, but currently my machine is testing something else for the next week. Perhaps when it finishes, I'll post a simple TB vs No TB result. TBs provide perfect knowledge at a cost (speed). The equation is very simple - does that knowledge provide as much or more benefit than the cost. This cannot be answered in "general."

A system with TBs on hard drive, the answer is no. A system with mediocre SSD and low RAM, maybe, but might be no. My results, will be for my system. If I moved my TBs to a data center grade high IOP RAM based storage device on PCIE-5.0 and increased my system RAM a few terabytes, the Elo delta would increase. I suspect, most people here don't want to know if TBs are a benefit, as much as they want to know if they are a benefit on their PC. I can't answer that; however, I can state how a test can be constructed to determine benefit. SF and other top programs play 6 piece endings very well without TBs, so testing how well they solve 6 piece endings will not show anything outside of the noise. In game play, with consumer hardware, a difference can be measured, which can answer "do they benefit me/my system."

My post is simply describing a proper test that can be used to determine if your hardware shows a boost from TBs. No one would suggest running a PassMark test, or disk IOP test and then stating serial ports are good for space telemetry, because the tests were not fit for purpose. That is my point regarding the original post.

Hopefully that is more clear.
Mike.

jkominek · Post by **jkominek** » Mon Sep 05, 2022 10:15 pm

mbabigian wrote: ↑Mon Sep 05, 2022 5:23 pm SF and other top programs play 6 piece endings very well without TBs, so testing how well they solve 6 piece endings will not show anything outside of the noise.

According to the description provided by the OP, Jouni, he drew his samples from Stockfish's endgames.epd, having the piece count distribution I listed. With it containing in total only 0.02% 6-man positions (none smaller), the chances are vanishing small that any of his starting positions are 6-man. Using 100% 6-man positions is the computer chess crime you charge him of having committed. So while you make some good points, your entire criticism is based on a mis-characterization.

You're right though that I shouldn't call the source material a "deep" opening book, and it is fair to call me out on that. I munged terminology needlessly. The file name is "endgames.epd" after all, but based on its contents, late middlegame book strikes me as the most apt description.

I look forward to reading more of your tablebase experimental results.

mbabigian · Post by **mbabigian** » Tue Sep 06, 2022 12:07 am

The "crime" is using endgame books. Using 6 piece starting positions is an "example." If it is not clear where "measurable" elo can be found from the above, further posts will be in vain.

Using endgame books is 0.5% better than using problem sets to rank engines. If that.

Colin-G · Post by **Colin-G** » Tue Sep 06, 2022 5:45 pm

I do not have 6 man TBs, only a full set of 5 man and no computers with more than 2 cpu cores.
When setting up a new engine I use the following test position to check that the TBs are working OK with it.

[d]8/1p2k3/8/8/3NN3/4Kp2/8/8 w - -

Stockfish 12 without TBs does not see the only winning move which is Nb5. Any other move leads to a draw.
No problem for any engine with 5 man TBs including Stockfish.
I still use Stockfish 12 for my engine-engine matches since it can still beat any of my other engines.
I don't know how Stockfish 15 without TBs copes with the above position.

Are tablebases useless for Stockfish15?

Are tablebases useless for Stockfish15?

Re: Are tablebases useless for Stockfish15?

Re: Are tablebases useless for Stockfish15?

Re: Are tablebases useless for Stockfish15?

Re: Are tablebases useless for Stockfish15?

Re: Are tablebases useless for Stockfish15?

Re: Are tablebases useless for Stockfish15?

Re: Are tablebases useless for Stockfish15?

Re: Are tablebases useless for Stockfish15?

Re: Are tablebases useless for Stockfish15?