mbabigian wrote: ↑Sun Sep 04, 2022 5:04 am
The construction of the above test shows a fundamental lack of understanding of where tablebases provide benefit.
I run 7 piece tablebases and have run SF vs SF matches where I played a SF with a net not trained on 7 piece endgames or smaller (weaker net than master), versus master net that is trained on all stages of the game. The master SF is stronger than the test net without TBs used for either (for obvious reasons). If I give the net trained without 7 piece or smaller positions access to 7 piece endgame TBs, and no TBs for master SF, the former beats the later with double digit elo difference. I believe the last time I ran such a test it was 15000 games at 10+0.1 TC and if I recall correctly there was a greater than 20 elo gap. If I get time, I can run a SF vs SF+TBs match and post results.
The benefit of tablebases come from steering the middle game during search. Testing the ability of SF to play 6 piece endgame is pointless.
As I understand Jouni he is drawing his positions from an 157,846 entry epd file published by the Stockfish team. In effect he is using an opening book, just a very deep one. This collection has a man-count distribution listed in the table.
Code: Select all
Num Count Perc
23 9 0.01
22 522 0.33
21 1107 0.70
20 4002 2.54
19 8045 5.10
18 15489 9.81
17 20398 12.92
16 23788 15.07
15 24060 15.24
14 20932 13.26
13 16578 10.50
12 10758 6.82
11 6541 4.14
10 3320 2.10
9 1580 1.00
8 540 0.34
7 141 0.09
6 36 0.02
Tot 157846 100.00
This ranges from middlegame to endgame, with the mode in late middlegame. The positions are randomly ordered so it is fair to assume that Jouni's samples of 500/600 follow the same distribution.
The tests you report in which a short (6 or 8 move) opening book is used do indeed offer a real advantage: you measure bottom-line game performance deltas. That's good stuff as far as it goes. But in itself it is insufficient to determine
where in the game phase the tablebases provide maximum benefit, which is the nub of your criticism. Having tablebases toggled on/off does not vary the conditions needed to assess that aspect. Instead one must cleave the opening book into phases -- say, according to some criteria: late opening, middlegame, late middlegame, early endgame -- and measure the contrastive effects. Jouni's experiment provides some useful information in that direction, and I can't see dismissing it as of no value.
The empirical question is no huge secret but I'll state it for sake of being explicit. Probing tablebases in the early game provides greater opportunity to steer the game, but the amount of information feed up the search tree is light. By the time the endgame is reached (with still more than 6 or 7 men on the board), the number of probe hits can run into tens of millions, but by that point the game result is often already sealed. Somewhere in between lies a game phase where tablebases have maximum impact.
Gaining insight into this question ties into another question that often comes up: what about 8-man tables, and the benefit they may bring? No one can say for sure. Three schools of thought seem to be held.
- 1. There is diminishing returns and 8-man tables will provide little benefit over what we already have.
2. They'll provide oracle knowledge earlier in the game, in greater quantity, and so should provide a significant assist, or
3. By the time they are generated, turned into syzygy-like format, and distributed, the neural net-based evaluations will be so good that the lookup tables are rendered irrelevant.
It sounds like you have the full 7-man set of syzygy with the rtbw files at least on fast SSD storage. Perhaps with your help some light can be shed on these questions.