Not counting the disconnect at the start of the 15th game, which was counted as loss for SF, after 14 games: +2 -0 =12 for SF dev. Still, very good result for Leela, as P100 is maybe only 2 times faster than 1080ti due to fp16 (or less than 2 times?). So, maybe Leela still scales well at big time x hardware?CMCanavessi wrote: ↑Thu Aug 30, 2018 10:32 pm There's an interesting ongoing match between lc0 on 1 P100 and SFdev on 57 threads here: http://tcecbeta.chessdb.cn/bonusbeta/live.html
So far after 11 games, SF is +1.
As the 1xxxx run seems to have have stopped right when I returned from vacation, I did some tests. My results seem to indicate that not all is smooth there. First, real games (no adjudications, as I saw Leela drawing completely won endgames and losing completely drawn endgames) against SF8. Lc0 on GTX 1060, SF8 on one 3.8GHz i7 core.
Time control: 2 minutes + 2 seconds increment.
Code: Select all
ID10774
Score of lc0_v16 10774 (GTX 1060) vs SF 8 (1 core): 10 - 5 - 25 [0.563]
Elo difference: 43.66 +/- 66.31
40 of 40 games finished.
Code: Select all
ID11199
Score of lc0_v17 11199 (GTX 1060) vs SF 8 (1 core): 5 - 8 - 27 [0.463]
Elo difference: -26.11 +/- 61.76
40 of 40 games finished.
==================
Scaling:
I don't have neither big hardware nor can play LTC games. But test-suites, although indirectly, can show some scaling results at long time control in shorter total amount of time.
I tried first STS1500 test suite. Leela performs pretty miserably at it, weaker than Fruit 2.1 even at 20s/position, probably because called "Strategical", it still contains too much tactics. The result on STS shows Lc0 performing about 800 Elo points weaker than in real games. So, I left that suite aside to check the scaling, and took my own very positional suite of 200 positions, Openings200.epd. On it, Leela performs close to the strength shown in real games.
I am confident that from 0.1s/move to 1s/move, Leela does scale much better than any AB regular engine.
Openings200.epd results comparing Leela ID11199 on GTX 1060 to SF dev (one i7 core):
Lc0 ID11199:
Code: Select all
0.1s
score=51/200 [averages on correct positions: depth=1.9 time=0.06 nodes=45]
1.0s
score=105/200 [averages on correct positions: depth=3.8 time=0.30 nodes=392]
+54
SF dev 1 core:
Code: Select all
0.1s
score=92/200 [averages on correct positions: depth=8.4 time=0.02 nodes=41226]
1.0s
score=121/200 [averages on correct positions: depth=10.4 time=0.17 nodes=307632]
+29
The scaling at STC does indeed seem to be much better for Leela compared to Stockfish.
My issues in the past were the scaling from 4s/move to 20s/move. I seem to get pretty bad results for Leela in real games, but in too few games (I cannot play many LTC games). Let's see the scaling on this testsuite.
Openings200.epd results comparing Leela ID11199 on GTX 1060 to SF dev (one i7 core):
Lc0 ID11199:
Code: Select all
4.0s
score=125/200 [averages on correct positions: depth=4.9 time=1.30 nodes=2839]
20.0s
score=137/200 [averages on correct positions: depth=5.5 time=3.23 nodes=8414]
+12
SF dev 1 core:
Code: Select all
4.0s
score=131/200 [averages on correct positions: depth=13.2 time=0.67 nodes=1125191]
20.0s
score=153/200 [averages on correct positions: depth=14.4 time=1.93 nodes=2993124]
+22
So, from 4s/position to 20s/position (scaling at longer time control), scaling seems to be better for Stockfish than Leela.
I am not sure if this result is in any way conclusive. After all, this is just a test suite, not real games.