syzygy wrote:Laskos wrote:Would be fun if it happens at TCEC.
If I have understood correctly, TCEC will use 6-piece TB adjudication (respecting the 50-move rule this time). So SF may win games or hold draws it would not actually be able to win or hold.
That's understandable from spectators' point of view, but a bit of a pity in this case.
I woke up again very early, and unexpectedly quickly built two new suites: 6-men-easy-wins and 6-men-easy-draws. They are from regular games, I have for some years now a huge EPD file, if I am not mistaken, built from Gaviota engine regular self-games (thousands of them) without adjudication. I took 6-men positions out of this EPD, and classified them as Wins and Draws using Syzygy 6-men bases (on SSD). From these regular 6-men positions, I did the same thing as with regular 5-men positions in the earlier post, to see how often Stockfish Final Natural TB fails to convert from 6-men.
Time control is now 0.25s per move, roughly equivalent to 10''+ 0.1'' of the Fishtest. I used Cutechess-Cli, because I was suspicious of SF FNTB behaviour on TB Draws at root with LittleBlitzer (observe time used by SF FNTB with 5-men Draws at root in earlier post).
Wins:
1000 games
Suite: Regular 6-men Wins
TC: 0.25s per move
Score of SF Master vs SF Final NTB: 500 - 471 - 29 [0.514] 1000
ELO difference: 10.08 +/- 21.22
Finished match
SF Final NTB fails to convert 29 out of 500 regular 6-men wins.
A rate of about 6%. Correct pentanomial errors (the games were side-and-reversed) are much smaller:
ELO difference: 10.08 +/- 3.54
Draws:
2000 games
Suite: Regular 6-men Draws
TC: 0.25s per move
Score of SF Master vs SF Final NTB: 2 - 0 - 1998 [0.500] 2000
ELO difference: 0.35 +/- 0.48
Finished match
Only two failures of SF Final NTB in 2000 games. It misses very rarely to convert TB Draws. Pentanomial errors here are almost identical to normal, trinomial errors used by Cutechess.
As the Stockfish self-games are pretty drawish, probably more 6-men Draws will be entered in games than 6-men Wins. That is, if Fishtest will not use adjudications in the regression test to pass Natural to the main branch. If they will use adjudications, the regression is 0 ELO points. If not, assuming that some 10% (probably less) of regular games from, say 2moves_v1.epd, are decided by 5-6 men positions, the regression is anyway probably no more than 0.5 ELO point, and Natural has excellent chances to pass the regression test. What is their window? [-4,0] or something like that.
I hope current master Syzygy, easy-mate or not, with its perfect play on root TB positions, will be developed parallelly, and in any case, there is Komodo, and other excellent engines using Syzygy bases as designed. Even for building these endgame suites I needed a perfect player, not an approximation to it. Also, sometimes I am toying with endgame chess as a model for full chess, it is often interesting to see the transition from perfect chess to full chess.