Policy determining quiet early opening preferences of Leela

Laskos · Post by **Laskos** » Mon Oct 05, 2020 8:57 am

MMarco wrote: ↑Mon Oct 05, 2020 2:06 am
Laskos wrote: ↑Sat Oct 03, 2020 8:00 pm
Code: Select all
Endgames:

Score of lc0_LS15 vs Fruit_21: 17 - 17 - 66  [0.500] 100
...      lc0_LS15 playing White: 8 - 10 - 32  [0.480] 50
...      lc0_LS15 playing Black: 9 - 7 - 34  [0.520] 50
...      White vs Black: 15 - 19 - 66  [0.480] 100
Elo difference: 0.0 +/- 39.8, LOS: 50.0 %, DrawRatio: 66.0 %
Finished match
Interesting. I guess that was at a fast time control. I ran Lc0 tcec-19 (on a rtx 2060) vs Stockfish tcec-19 (1 core Ryzen 9 4900H) at 100s + 1s with 5-men syzygy, on your test set. With these conditions, Lc0 and Stockfish are usually about on par (see my tournaments here: http://talkchess.com/forum3/viewtopic.p ... 56#p860072 ). My test conditions are such that engines calculate about 1000-1500 times fewer nodes per move than at TCEC.
Code: Select all
   # PLAYER               :  RATING  ERROR  PLAYED    (%)   CFS    W    D    L   D(%)
   1 Lc0 tcec-19          :     0.0   23.1     100  50.00    50   32   36   32  36.00
   2 Stockfish tcec-19    :     0.0   23.1     100  50.00   ---   32   36   32  36.00

White advantage = -21.05 +/- 29.75
Draw rate (equal opponents) = 36.10 % +/- 4.51
Games: https://gofile.io/d/xz9EBb

I would guess that the bad result against Fruit is due to Leela missing tactics at low depth. Given a reasonable time control, Leela is on par with Stockfish in this endgame test.

Do not use TBs and do not use any adjudications, it's important here, in this particular set-up. I am testing myself at longer TC (100 + 1) from these openings Lc0 J92-160 on RTX 2070 against SF12 on 4 i7 cores, a pretty favorable for Leela set-up (old Leela ratio of about 2.5), let's see.

jp · Post by jp » Mon Oct 05, 2020 9:00 am

MMarco wrote: ↑Mon Oct 05, 2020 2:06 am ... with 5-men syzygy ...

Do you mean both engines got 5-men TBs?

Neither engine should get TBs. It's supposed to be a test of their endgame play, not whether they can look up TBs.

Laskos · Post by **Laskos** » Mon Oct 05, 2020 10:52 am

Laskos wrote: ↑Mon Oct 05, 2020 8:57 am
MMarco wrote: ↑Mon Oct 05, 2020 2:06 am
Laskos wrote: ↑Sat Oct 03, 2020 8:00 pm
Code: Select all
Endgames:

Score of lc0_LS15 vs Fruit_21: 17 - 17 - 66  [0.500] 100
...      lc0_LS15 playing White: 8 - 10 - 32  [0.480] 50
...      lc0_LS15 playing Black: 9 - 7 - 34  [0.520] 50
...      White vs Black: 15 - 19 - 66  [0.480] 100
Elo difference: 0.0 +/- 39.8, LOS: 50.0 %, DrawRatio: 66.0 %
Finished match
Interesting. I guess that was at a fast time control. I ran Lc0 tcec-19 (on a rtx 2060) vs Stockfish tcec-19 (1 core Ryzen 9 4900H) at 100s + 1s with 5-men syzygy, on your test set. With these conditions, Lc0 and Stockfish are usually about on par (see my tournaments here: http://talkchess.com/forum3/viewtopic.p ... 56#p860072 ). My test conditions are such that engines calculate about 1000-1500 times fewer nodes per move than at TCEC.
Code: Select all
   # PLAYER               :  RATING  ERROR  PLAYED    (%)   CFS    W    D    L   D(%)
   1 Lc0 tcec-19          :     0.0   23.1     100  50.00    50   32   36   32  36.00
   2 Stockfish tcec-19    :     0.0   23.1     100  50.00   ---   32   36   32  36.00

White advantage = -21.05 +/- 29.75
Draw rate (equal opponents) = 36.10 % +/- 4.51
Games: https://gofile.io/d/xz9EBb

I would guess that the bad result against Fruit is due to Leela missing tactics at low depth. Given a reasonable time control, Leela is on par with Stockfish in this endgame test.
Do not use TBs and do not use any adjudications, it's important here, in this particular set-up. I am testing myself at longer TC (100 + 1) from these openings Lc0 J92-160 on RTX 2070 against SF12 on 4 i7 cores, a pretty favorable for Leela set-up (old Leela ratio of about 2.5), let's see.

I interrupted after 24 games (12 pairs side and reversed)

Code: Select all

Score of SF_12 vs Lc0_J92-190_CUDA: 4 - 2 - 18  [0.542] 24
...      SF_12 playing White: 2 - 2 - 8  [0.500] 12
...      SF_12 playing Black: 2 - 0 - 10  [0.583] 12
...      White vs Black: 2 - 4 - 18  [0.458] 24
Elo difference: 29.0 +/- 69.9, LOS: 79.3 %, DrawRatio: 75.0 %

In 12 pairs of games, the score is +2 -0 =10 for Stockfish, but looking at the PGN, I realized that these openings are too drawish pair-wise at longer TC and have a lower resolution power than at faster TC. It's a usual behavior of opening test suites with time control, but I can cure it with a more judiciously unbalanced opening suite. I will post the new suite and results. This has nothing to do with the fact that Stockfish is much stronger in endgames, it is just to show this fact in 100 games with a high confidence using some pre-set late endgames as openings.

24 longer TC games are here:

https://gofile.io/d/Ydjt7N

I will share the new opening suite and new LTC results.

Guenther · Post by **Guenther** » Mon Oct 05, 2020 12:16 pm

Laskos wrote: ↑Sat Oct 03, 2020 10:04 pm
...

In fact you are more correct than I thought. In endgames Lc0 LS15 (one of the best nets out there) on RTX 2070 is similar in strength to...Fruit 2.1 on one core, underperforming by at least 1000 Elo points compared to openings, where it is the strongest engine on my PC (and I have no upper limit of its strength in the openings). So, you are basically right, if by move 30-40 Lc0 is not winning, it will hardly win later.
Code: Select all
Endgames:

Score of lc0_LS15 vs Fruit_21: 17 - 17 - 66  [0.500] 100
...      lc0_LS15 playing White: 8 - 10 - 32  [0.480] 50
...      lc0_LS15 playing Black: 9 - 7 - 34  [0.520] 50
...      White vs Black: 15 - 19 - 66  [0.480] 100
Elo difference: 0.0 +/- 39.8, LOS: 50.0 %, DrawRatio: 66.0 %
Finished match
PGN:
https://gofile.io/d/U8aHbN

...

Nobody probably knows in many of these endgames whether they are won or drawn. You have the PGN file. I can play from these endgames SF12 against Fruit 2.1, but keep in mind that endgames contribute to less than 15% of the Elo of an engine. Thew bottom line is: Lc0 with a good net and strong GPU is as weak in endgames as Fruit 2.1 on one core, a tremendous underperformance you agree or not.

I checked the games now from the match LC0 vs. Fruit and after this it wasn't necessary anymore to check the SF12 vs. Fruit games.
The reason is, that something seems immanent weak in those LC0 games. It often reaches a totally won endgame, but in 95% of those games
it boils down to the issue it cannot convert K+R vs. R 'endgames' and leads them to 50 moves draws.
(those are the most of 'wrong result' games, there are a very few others too, as K+Q vs. K+N, also possible that early adjudications hides
this problem from most people)

Just one of many examples:
[pgn]
[Event "?"]
[Site "?"]
[Date "2020.10.03"]
[Round "7"]
[White "Fruit_21"]
[Black "lc0_LS15"]
[Result "1/2-1/2"]
[TimeControl "15+0.25"]
[GameDuration "00:00:58"]
[GameEndTime "2020-10-03T11:28:16.905 GTB Daylight Time"]
[GameStartTime "2020-10-03T11:27:18.850 GTB Daylight Time"]
[PlyCount "133"]
[FEN "Q7/8/5p2/4p1k1/5p2/5P1K/3q4/8 b - - 0 1"]
[SetUp "1"]

{--------------
Q . . . . . . .
. . . . . . . .
. . . . . p . .
. . . . p . k .
. . . . . p . .
. . . . . P . K
. . . q . . . .
. . . . . . . .
black to play
--------------}
1... Qd7+ {+7.70/7 4.00} 2. Kg2 {-2.12/13 7.00} Qe6 {+8.02/6 5.50} 3. Qb7
{-2.12/12 7.70} Kh6 {+9.46/6 6.00} 4. Qb2 {-2.11/12 4.10} Qg8+
{+9.27/5 6.00} 5. Kf1 {-2.13/14 6.50} Qc4+ {+10.89/6 4.30} 6. Kf2
{-2.13/14 4.40} Qd4+ {+14.47/6 4.80} 7. Qxd4 {-10.71/25 5.20} exd4
{+11.29/8 3.00} 8. Ke2 {-10.57/17 5.00} Kg5 {+11.40/9 8.70} 9. Kd3
{-10.60/18 5.80} Kh4 {+12.63/8 2.10} 10. Kxd4 {-10.71/21 6.10} Kg3
{+12.65/8 6.30} 11. Ke4 {-10.63/19 4.50} f5+ {+11.93/9 2.90} 12. Ke5
{-10.63/19 5.80} Kxf3 {+12.15/8 4.40} 13. Kxf5 {-10.76/16 5.70} Kg3
{+9.47/9 8.00} 14. Ke4 {-10.68/15 4.10} f3 {+8.88/8 4.20} 15. Ke3
{-10.75/14 5.50} f2 {+10.51/8 5.90} 16. Ke2 {-10.74/12 5.00} Kg2
{+11.03/8 3.30} 17. Ke3 {-10.74/10 4.70} f1=R {+21.64/7 1.10} 18. Ke4
{-5.70/15 3.80} Kg3 {+23.33/6 6.40} 19. Ke3 {-5.67/11 4.80} Kg4
{+20.95/5 5.50} 20. Kd4 {-5.70/14 5.30} Kf4 {+31.48/5 5.40} 21. Kd3
{-5.88/16 3.30} Ra1 {+24.00/5 5.40} 22. Kc4 {-5.88/12 4.20} Ke4
{+28.95/5 5.40} 23. Kc5 {-5.88/12 3.90} Kd3 {+25.38/5 5.40} 24. Kd5
{-5.88/14 3.40} Ra5+ {+25.60/5 5.50} 25. Ke6 {-5.84/10 3.70} Kd4
{+33.27/5 5.40} 26. Kf6 {-5.87/11 3.20} Ke4 {+39.94/5 5.30} 27. Ke6
{-5.84/10 3.50} Ra4 {+39.44/5 5.40} 28. Kd6 {-6.03/16 4.60} Kd4
{+34.76/5 5.50} 29. Ke6 {-5.87/10 2.90} Ra3 {+29.41/5 5.40} 30. Kf5
{-5.97/12 3.30} Ra1 {+29.58/5 5.40} 31. Ke6 {-6.02/14 3.30} Ke4
{+35.36/5 5.40} 32. Kd6 {-5.88/12 3.00} Re1 {+27.05/5 5.30} 33. Kc5
{-5.93/14 4.50} Ke5 {+28.31/5 5.30} 34. Kc4 {-6.22/17 3.50} Ke4
{+26.53/5 5.20} 35. Kc5 {+0.00/59 1.00} Kd3 {+28.08/5 5.00} 36. Kd5
{-5.84/10 4.40} Kc3 {+17.57/5 5.50} 37. Kc5 {-5.84/10 4.80} Rh1
{+26.82/5 5.20} 38. Kd5 {-5.84/10 4.60} Kb4 {+36.82/5 5.30} 39. Ke4
{-5.84/10 4.60} Kc4 {+38.80/5 5.10} 40. Kf4 {-5.88/10 2.80} Rb1
{+28.87/5 5.10} 41. Ke4 {-5.88/10 2.80} Kc3 {+28.93/5 4.90} 42. Ke5
{-5.84/10 4.40} Kd3 {+38.68/5 4.90} 43. Kd5 {-5.84/10 4.40} Rg1
{+25.00/5 4.90} 44. Ke5 {-5.84/10 4.20} Kc4 {+31.73/5 4.80} 45. Kf4
{-5.88/10 3.50} Re1 {+24.54/5 4.70} 46. Kf5 {-5.84/10 4.30} Kd5
{+30.53/5 4.60} 47. Kf4 {-5.88/11 4.30} Kd4 {+23.88/5 4.50} 48. Kg5
{-5.91/12 4.10} Ke4 {+32.04/5 4.20} 49. Kf6 {-5.84/10 4.20} Kf4
{+29.20/5 4.20} 50. Kg6
{-M14/18 0.28}
50... Rd1 {+27.89/5 4.10} 51. Kf6 {-6.02/10 3.20} Ke4 {+26.51/5 3.90} 52.
Ke6 {-5.84/10 4.10} Rh1 {+27.90/5 3.80} 53. Kd6 {-5.88/9 2.50} Kd4
{+34.18/5 3.60} 54. Ke6 {-5.88/10 3.70} Kc5 {+36.94/5 3.60} 55. Ke5
{-5.88/10 4.00} Rh4 {+47.81/5 3.40} 56. Kf5 {-5.90/11 3.80} Ra4
{+34.03/5 3.40} 57. Ke5 {-5.84/9 2.50} Rd4 {+21.92/5 3.30} 58. Kf5
{-6.25/12 2.70} Kd5 {+24.04/5 3.10} 59. Kg6 {-6.14/11 3.70} Ke5
{+34.20/5 3.00} 60. Kf7 {-6.03/10 4.30} Kf5 {+19.44/5 3.10} 61. Ke7
{-M14/19 0.24}
61... Rb4 {+17.12/5 3.00} 62. Kd6 {-5.84/10 3.90} Ra4 {+5.37/5 2.90} 63.
Kd5 {+0.00/10 3.50} Rh4 {+1.52/5 2.80} 64. Kc5 {+0.00/18 2.30} Ke5
{+0.74/5 2.80} 65. Kc6 {+0.00/28 2.50} Rh1 {+0.24/5 2.70} 66. Kc5
{+0.00/34 2.40} Ke4 {+0.00/2 2.70} 67. Kd6 {+0.00/56 7.30} Rh6+
{+0.00/2 2.60}
{Draw by fifty moves rule} 1/2-1/2[/pgn]

It is also completely incomprehensible why it should promote to a Rook instead of a Qqueen in move 17 from above game?
[d]8/8/8/8/8/4K3/5pk1/8 b - - 3 17

I am not sure though, if this is a problem of the LS15 net used, or due to the very low depths or even sth else?
Not sure either, if the comparison is fair, because they added syzygy tables support due to some endgame weaknesses,
(much more than in trad. engines) also there exist already nets trained for endgames, which could be used for this test?

Moreover I thought the problem with getting too near to 50 moves draws in winning endgames was solved in LC0 long ago?
I haven't worked with LC0 for a long time and also did not read the discord channel for a long period.

MMarco · Post by **MMarco** » Mon Oct 05, 2020 12:34 pm

jp wrote: ↑Mon Oct 05, 2020 9:00 am
MMarco wrote: ↑Mon Oct 05, 2020 2:06 am ... with 5-men syzygy ...
Do you mean both engines got 5-men TBs?

Neither engine should get TBs. It's supposed to be a test of their endgame play, not whether they can look up TBs.

It depens on what you mean by endgame play. For me endgame play in much larger than converting TBs positions. It doesn't matter to me if the engine cannot mate with N and B against a bare king, if it plays well before, if it is able to supress opponent counterplay, transform its positional advantage or material advantage to reach an absolutely won position given in the TBs.

Laskos · Post by **Laskos** » Mon Oct 05, 2020 2:57 pm

Guenther wrote: ↑Mon Oct 05, 2020 12:16 pm
Laskos wrote: ↑Sat Oct 03, 2020 10:04 pm
...

In fact you are more correct than I thought. In endgames Lc0 LS15 (one of the best nets out there) on RTX 2070 is similar in strength to...Fruit 2.1 on one core, underperforming by at least 1000 Elo points compared to openings, where it is the strongest engine on my PC (and I have no upper limit of its strength in the openings). So, you are basically right, if by move 30-40 Lc0 is not winning, it will hardly win later.
Code: Select all
Endgames:

Score of lc0_LS15 vs Fruit_21: 17 - 17 - 66  [0.500] 100
...      lc0_LS15 playing White: 8 - 10 - 32  [0.480] 50
...      lc0_LS15 playing Black: 9 - 7 - 34  [0.520] 50
...      White vs Black: 15 - 19 - 66  [0.480] 100
Elo difference: 0.0 +/- 39.8, LOS: 50.0 %, DrawRatio: 66.0 %
Finished match
PGN:
https://gofile.io/d/U8aHbN

...

Nobody probably knows in many of these endgames whether they are won or drawn. You have the PGN file. I can play from these endgames SF12 against Fruit 2.1, but keep in mind that endgames contribute to less than 15% of the Elo of an engine. Thew bottom line is: Lc0 with a good net and strong GPU is as weak in endgames as Fruit 2.1 on one core, a tremendous underperformance you agree or not.
I checked the games now from the match LC0 vs. Fruit and after this it wasn't necessary anymore to check the SF12 vs. Fruit games.
The reason is, that something seems immanent weak in those LC0 games. It often reaches a totally won endgame, but in 95% of those games
it boils down to the issue it cannot convert K+R vs. R 'endgames' and leads them to 50 moves draws.
(those are the most of 'wrong result' games, there are a very few others too, as K+Q vs. K+N, also possible that early adjudications hides
this problem from most people)

Just one of many examples:
[pgn]
[Event "?"]
[Site "?"]
[Date "2020.10.03"]
[Round "7"]
[White "Fruit_21"]
[Black "lc0_LS15"]
[Result "1/2-1/2"]
[TimeControl "15+0.25"]
[GameDuration "00:00:58"]
[GameEndTime "2020-10-03T11:28:16.905 GTB Daylight Time"]
[GameStartTime "2020-10-03T11:27:18.850 GTB Daylight Time"]
[PlyCount "133"]
[FEN "Q7/8/5p2/4p1k1/5p2/5P1K/3q4/8 b - - 0 1"]
[SetUp "1"]

{--------------
Q . . . . . . .
. . . . . . . .
. . . . . p . .
. . . . p . k .
. . . . . p . .
. . . . . P . K
. . . q . . . .
. . . . . . . .
black to play
--------------}
1... Qd7+ {+7.70/7 4.00} 2. Kg2 {-2.12/13 7.00} Qe6 {+8.02/6 5.50} 3. Qb7
{-2.12/12 7.70} Kh6 {+9.46/6 6.00} 4. Qb2 {-2.11/12 4.10} Qg8+
{+9.27/5 6.00} 5. Kf1 {-2.13/14 6.50} Qc4+ {+10.89/6 4.30} 6. Kf2
{-2.13/14 4.40} Qd4+ {+14.47/6 4.80} 7. Qxd4 {-10.71/25 5.20} exd4
{+11.29/8 3.00} 8. Ke2 {-10.57/17 5.00} Kg5 {+11.40/9 8.70} 9. Kd3
{-10.60/18 5.80} Kh4 {+12.63/8 2.10} 10. Kxd4 {-10.71/21 6.10} Kg3
{+12.65/8 6.30} 11. Ke4 {-10.63/19 4.50} f5+ {+11.93/9 2.90} 12. Ke5
{-10.63/19 5.80} Kxf3 {+12.15/8 4.40} 13. Kxf5 {-10.76/16 5.70} Kg3
{+9.47/9 8.00} 14. Ke4 {-10.68/15 4.10} f3 {+8.88/8 4.20} 15. Ke3
{-10.75/14 5.50} f2 {+10.51/8 5.90} 16. Ke2 {-10.74/12 5.00} Kg2
{+11.03/8 3.30} 17. Ke3 {-10.74/10 4.70} f1=R {+21.64/7 1.10} 18. Ke4
{-5.70/15 3.80} Kg3 {+23.33/6 6.40} 19. Ke3 {-5.67/11 4.80} Kg4
{+20.95/5 5.50} 20. Kd4 {-5.70/14 5.30} Kf4 {+31.48/5 5.40} 21. Kd3
{-5.88/16 3.30} Ra1 {+24.00/5 5.40} 22. Kc4 {-5.88/12 4.20} Ke4
{+28.95/5 5.40} 23. Kc5 {-5.88/12 3.90} Kd3 {+25.38/5 5.40} 24. Kd5
{-5.88/14 3.40} Ra5+ {+25.60/5 5.50} 25. Ke6 {-5.84/10 3.70} Kd4
{+33.27/5 5.40} 26. Kf6 {-5.87/11 3.20} Ke4 {+39.94/5 5.30} 27. Ke6
{-5.84/10 3.50} Ra4 {+39.44/5 5.40} 28. Kd6 {-6.03/16 4.60} Kd4
{+34.76/5 5.50} 29. Ke6 {-5.87/10 2.90} Ra3 {+29.41/5 5.40} 30. Kf5
{-5.97/12 3.30} Ra1 {+29.58/5 5.40} 31. Ke6 {-6.02/14 3.30} Ke4
{+35.36/5 5.40} 32. Kd6 {-5.88/12 3.00} Re1 {+27.05/5 5.30} 33. Kc5
{-5.93/14 4.50} Ke5 {+28.31/5 5.30} 34. Kc4 {-6.22/17 3.50} Ke4
{+26.53/5 5.20} 35. Kc5 {+0.00/59 1.00} Kd3 {+28.08/5 5.00} 36. Kd5
{-5.84/10 4.40} Kc3 {+17.57/5 5.50} 37. Kc5 {-5.84/10 4.80} Rh1
{+26.82/5 5.20} 38. Kd5 {-5.84/10 4.60} Kb4 {+36.82/5 5.30} 39. Ke4
{-5.84/10 4.60} Kc4 {+38.80/5 5.10} 40. Kf4 {-5.88/10 2.80} Rb1
{+28.87/5 5.10} 41. Ke4 {-5.88/10 2.80} Kc3 {+28.93/5 4.90} 42. Ke5
{-5.84/10 4.40} Kd3 {+38.68/5 4.90} 43. Kd5 {-5.84/10 4.40} Rg1
{+25.00/5 4.90} 44. Ke5 {-5.84/10 4.20} Kc4 {+31.73/5 4.80} 45. Kf4
{-5.88/10 3.50} Re1 {+24.54/5 4.70} 46. Kf5 {-5.84/10 4.30} Kd5
{+30.53/5 4.60} 47. Kf4 {-5.88/11 4.30} Kd4 {+23.88/5 4.50} 48. Kg5
{-5.91/12 4.10} Ke4 {+32.04/5 4.20} 49. Kf6 {-5.84/10 4.20} Kf4
{+29.20/5 4.20} 50. Kg6
{-M14/18 0.28}
50... Rd1 {+27.89/5 4.10} 51. Kf6 {-6.02/10 3.20} Ke4 {+26.51/5 3.90} 52.
Ke6 {-5.84/10 4.10} Rh1 {+27.90/5 3.80} 53. Kd6 {-5.88/9 2.50} Kd4
{+34.18/5 3.60} 54. Ke6 {-5.88/10 3.70} Kc5 {+36.94/5 3.60} 55. Ke5
{-5.88/10 4.00} Rh4 {+47.81/5 3.40} 56. Kf5 {-5.90/11 3.80} Ra4
{+34.03/5 3.40} 57. Ke5 {-5.84/9 2.50} Rd4 {+21.92/5 3.30} 58. Kf5
{-6.25/12 2.70} Kd5 {+24.04/5 3.10} 59. Kg6 {-6.14/11 3.70} Ke5
{+34.20/5 3.00} 60. Kf7 {-6.03/10 4.30} Kf5 {+19.44/5 3.10} 61. Ke7
{-M14/19 0.24}
61... Rb4 {+17.12/5 3.00} 62. Kd6 {-5.84/10 3.90} Ra4 {+5.37/5 2.90} 63.
Kd5 {+0.00/10 3.50} Rh4 {+1.52/5 2.80} 64. Kc5 {+0.00/18 2.30} Ke5
{+0.74/5 2.80} 65. Kc6 {+0.00/28 2.50} Rh1 {+0.24/5 2.70} 66. Kc5
{+0.00/34 2.40} Ke4 {+0.00/2 2.70} 67. Kd6 {+0.00/56 7.30} Rh6+
{+0.00/2 2.60}
{Draw by fifty moves rule} 1/2-1/2[/pgn]

It is also completely incomprehensible why it should promote to a Rook instead of a Qqueen in move 17 from above game?
[d]8/8/8/8/8/4K3/5pk1/8 b - - 3 17

I am not sure though, if this is a problem of the LS15 net used, or due to the very low depths or even sth else?
Not sure either, if the comparison is fair, because they added syzygy tables support due to some endgame weaknesses,
(much more than in trad. engines) also there exist already nets trained for endgames, which could be used for this test?

Moreover I thought the problem with getting too near to 50 moves draws in winning endgames was solved in LC0 long ago?
I haven't worked with LC0 for a long time and also did not read the discord channel for a long period.

Thanks for this analysis. I am interested in regular nets which in learning use the standard opening position, in order to show that in the openings Lc0 with a net like that is extremely strong, but in endgames no better than a weak by current standards engine. Indeed, such misses as you have found are not very relevant and might be due to the net used or to too short time control. I am currently checking at 100s+1s J92-190 against SF12 from these 50 openings:

Code: Select all

8/6k1/2p1p3/n1P3BP/1p1P4/8/2K5/8 w - - ce 109; acd 33; acs 5.000; c0 "Stockfish 12";
8/2q4k/6p1/7p/3Q4/7P/6P1/1r4BK b - - ce 142; acd 36; acs 5.000; c0 "Stockfish 12";
5r2/3k2p1/3p4/4p3/1P4QP/8/5r2/6K1 b - - ce 219; acd 23; acs 5.000; c0 "Stockfish 12";
8/7p/2P1b2P/4B3/p1kP4/P7/3K4/8 w - - ce 120; acd 50; acs 5.000; c0 "Stockfish 12";
8/6nk/5np1/4Q2p/8/1BK5/5q2/2B5 b - - ce 171; acd 23; acs 5.000; c0 "Stockfish 12";
8/4Qpk1/6p1/1p5r/5PKP/P3R3/7q/8 b - - ce 215; acd 24; acs 5.000; c0 "Stockfish 12";
5rk1/7p/3p2p1/1PnP4/2P5/4K2P/8/R4B2 w - - ce 179; acd 24; acs 5.000; c0 "Stockfish 12";
8/3B4/k4p1p/6p1/1PbpP1P1/5K1P/8/8 b - - ce 180; acd 44; acs 5.000; c0 "Stockfish 12";
8/1p3p1k/6q1/1Q6/p3p3/P6P/2r1B2K/4B3 b - - ce 144; acd 24; acs 5.000; c0 "Stockfish 12";
4b3/6p1/6p1/5p2/7P/3k2P1/2p5/2B2K2 b - - ce 223; acd 42; acs 5.000; c0 "Stockfish 12";
8/3r1k2/7p/5qpP/2Q5/5BP1/6K1/8 b - - ce 159; acd 26; acs 5.000; c0 "Stockfish 12";
8/pp4k1/4p2R/3b4/3Pp3/8/PP6/1K6 w - - ce 116; acd 26; acs 5.000; c0 "Stockfish 12";
8/2p5/8/1pkr1p2/p3r3/P1P2KP1/1P6/5Q2 b - - ce 183; acd 24; acs 5.000; c0 "Stockfish 12";
8/1B1b4/6p1/p2Kp1k1/1p6/1Pb2P2/P7/7R w - - ce 234; acd 28; acs 5.000; c0 "Stockfish 12";
8/P7/6p1/5p2/5P2/3R1kPP/q7/6K1 b - - ce 140; acd 33; acs 5.000; c0 "Stockfish 12";
2R5/pp5p/2p3k1/8/3N2p1/2P5/PPK5/5r2 w - - ce 179; acd 26; acs 5.000; c0 "Stockfish 12";
8/6pk/7p/2q1b2P/4P1P1/1Q1K1P2/8/8 b - - ce 250; acd 30; acs 5.000; c0 "Stockfish 12";
4R3/8/4n1pk/2r5/5PK1/5QP1/4q3/8 b - - ce 247; acd 26; acs 5.000; c0 "Stockfish 12";
8/6k1/8/1p2P2p/2n1BR1P/2r3P1/4K3/8 w - - ce 112; acd 27; acs 5.000; c0 "Stockfish 12";
8/2p4r/1p3k2/1P1p4/P2R1P1p/7K/8/8 b - - ce 250; acd 28; acs 5.000; c0 "Stockfish 12";
8/pp2kn1r/6R1/3P1R2/4r3/P6P/2P3P1/6K1 b - - ce 176; acd 23; acs 5.000; c0 "Stockfish 12";
8/1q4kp/1p3p2/p3p2P/P1P5/6P1/2P5/5QK1 b - - ce 107; acd 24; acs 5.000; c0 "Stockfish 12";
k2r4/7p/4P3/8/Pb6/1P1p1R2/N7/1K6 w - - ce 218; acd 25; acs 5.000; c0 "Stockfish 12";
r7/8/1n1N4/p7/2P1pk2/1P6/1K5P/R7 w - - ce 107; acd 23; acs 5.000; c0 "Stockfish 12";
1r6/4p3/1P1p1k2/7p/7r/8/2Q3P1/6K1 w - - ce 121; acd 25; acs 5.000; c0 "Stockfish 12";
8/1B4p1/3kbp1p/3p4/1P1K2P1/5P2/6P1/8 w - - ce 179; acd 35; acs 5.000; c0 "Stockfish 12";
8/7k/6pp/1Rbq3r/8/4p2P/2Q3PB/7K b - - ce 144; acd 25; acs 5.000; c0 "Stockfish 12";
1R6/5bk1/8/3p1R1p/3P3P/3B1PK1/8/2q5 w - - ce 191; acd 25; acs 5.000; c0 "Stockfish 12";
8/6p1/6Q1/1p4B1/p1n1k2P/P1P1p1PK/1q6/8 b - - ce 180; acd 25; acs 5.000; c0 "Stockfish 12";
4r1k1/1K2Bp1p/6p1/P2pP1P1/8/3b4/1R6/8 w - - ce 117; acd 26; acs 5.000; c0 "Stockfish 12";
4b3/3r2k1/Bb2p1p1/1P2P3/5PK1/4p1P1/2Q5/8 w - - ce 227; acd 24; acs 5.000; c0 "Stockfish 12";
5r2/1Bq1kp1R/1p2p3/2p5/8/8/1P6/1K3Q2 w - - ce 248; acd 24; acs 5.000; c0 "Stockfish 12";
8/8/7p/8/1bN1nk2/3p4/5PKP/5N2 b - - ce 177; acd 30; acs 5.000; c0 "Stockfish 12";
8/4k3/3q4/3b1P1p/4B1pP/2Q3P1/7K/8 w - - ce 118; acd 32; acs 5.000; c0 "Stockfish 12";
3r3k/2K1b2P/7P/4pB2/R3P3/p4P2/8/8 w - - ce 124; acd 44; acs 5.000; c0 "Stockfish 12";
1Q4k1/5p1p/4p3/8/1P5N/5PPK/r2q3P/8 b - - ce 173; acd 27; acs 5.000; c0 "Stockfish 12";
6k1/5p2/b5p1/1pnp4/8/5P2/6KP/1R3B2 w - - ce 244; acd 29; acs 5.000; c0 "Stockfish 12";
2b5/2P3k1/2KBr3/1P4P1/8/8/3Q4/6q1 b - - ce 237; acd 26; acs 5.000; c0 "Stockfish 12";
3b1k2/8/1p4p1/1P2q3/2Pp3P/3Q2P1/2B3K1/8 w - - ce 116; acd 27; acs 5.000; c0 "Stockfish 12";
8/2p5/1pP5/p5p1/P2k1b2/5P1p/2R5/5K2 b - - ce 197; acd 30; acs 5.000; c0 "Stockfish 12";
2R5/5pp1/4k3/2P1p2P/6P1/5P2/2r2K2/8 w - - ce 237; acd 27; acs 5.000; c0 "Stockfish 12";
8/p5r1/kp6/5K2/P1RN1B2/8/8/4b3 w - - ce 207; acd 22; acs 5.000; c0 "Stockfish 12";
1b4k1/1P5p/4N2P/6P1/3KB3/8/8/6r1 w - - ce 149; acd 25; acs 5.000; c0 "Stockfish 12";
7k/4qNbp/1p2p3/p3P3/5Q2/7P/6P1/7K b - - ce 240; acd 28; acs 5.000; c0 "Stockfish 12";
8/5pk1/3Q1pnp/8/3p4/7P/5PP1/3q1BK1 b - - ce 238; acd 30; acs 5.000; c0 "Stockfish 12";
1k6/2b2R2/p1p5/P1P5/1P2Kp2/6p1/8/8 w - - ce 128; acd 49; acs 5.000; c0 "Stockfish 12";
6k1/5r2/R6p/1p1qp3/3n2Q1/6P1/5N1K/8 b - - ce 175; acd 26; acs 5.000; c0 "Stockfish 12";
8/2N3pk/5b1p/3Q3P/6K1/5PP1/8/2q5 w - - ce 190; acd 32; acs 5.000; c0 "Stockfish 12";
8/3nk3/4q1p1/2bR3p/4P2P/p5P1/P1Q3K1/8 b - - ce 116; acd 25; acs 5.000; c0 "Stockfish 12";
Qn3rk1/3p2p1/p6p/7P/P7/6K1/6P1/8 w - - ce 227; acd 35; acs 5.000; c0 "Stockfish 12";

https://gofile.io/d/aDU3ZD

Early to say something, but after 10 games I don't see trivial misses of Lc0 J92-190. Let's see. I will post the result and the PGN.

Laskos · Post by **Laskos** » Mon Oct 05, 2020 11:45 pm

Here in 100 games at 100 + 1 from the new openings:

Code: Select all

Score of SF_12 vs Lc0_J92-190_CUDA: 34 - 29 - 37  [0.525] 100
...      SF_12 playing White: 14 - 15 - 21  [0.490] 50
...      SF_12 playing Black: 20 - 14 - 16  [0.560] 50
...      White vs Black: 28 - 35 - 37  [0.465] 100
Elo difference: 17.4 +/- 54.4, LOS: 73.6 %, DrawRatio: 37.0 %
Finished match

Pair-wise (side and reversed) in 50 pairs SF12 score is +8 -3 =39
Not that conclusive a result from these unbalanced openings too. Moreover, Lc0 manages to beat SF12 in 3 pairs of side and reversed games, which surprises me. Will check overnight Lc0 vs Fruit from these unbalanced openings.

PGN:
https://gofile.io/d/w2WZ8f

jp · Post by jp » Tue Oct 06, 2020 9:24 am

MMarco wrote: ↑Mon Oct 05, 2020 12:34 pm
jp wrote: ↑Mon Oct 05, 2020 9:00 am Neither engine should get TBs. It's supposed to be a test of their endgame play, not whether they can look up TBs.
It depens on what you mean by endgame play. For me endgame play in much larger than converting TBs positions. It doesn't matter to me if the engine cannot mate with N and B against a bare king, if it plays well before, if it is able to supress opponent counterplay, transform its positional advantage or material advantage to reach an absolutely won position given in the TBs.

The TBs aren't just converting what the engine (possibly) cannot. The TBs are guiding its earlier play, in the TB hits from the engine search.

If you really take your position (I certainly do not) that conversion skills don't matter, you should just run engine matches without TBs and adjudicate when they get down to 5 pieces. (The argument is weak, anyway, because why should we believe that 5-piece endgames are just "conversion" and don't need "ability to suppress opponent counterplay, trainsform its advantage", etc.?)

Guenther · Post by **Guenther** » Tue Oct 06, 2020 10:01 am

Laskos wrote: ↑Mon Oct 05, 2020 11:45 pm Here in 100 games at 100 + 1 from the new openings:
Code: Select all
Score of SF_12 vs Lc0_J92-190_CUDA: 34 - 29 - 37  [0.525] 100
...      SF_12 playing White: 14 - 15 - 21  [0.490] 50
...      SF_12 playing Black: 20 - 14 - 16  [0.560] 50
...      White vs Black: 28 - 35 - 37  [0.465] 100
Elo difference: 17.4 +/- 54.4, LOS: 73.6 %, DrawRatio: 37.0 %
Finished match
Pair-wise (side and reversed) in 50 pairs SF12 score is +8 -3 =39
Not that conclusive a result from these unbalanced openings too. Moreover, Lc0 manages to beat SF12 in 3 pairs of side and reversed games, which surprises me. Will check overnight Lc0 vs Fruit from these unbalanced openings.

PGN:
https://gofile.io/d/w2WZ8f

It seems you changed the LC0 net too? Those games are completely different now and I am sure it has nothing to do with the start positions.

Looking at the depths in the final stage I would conclude that the difference to the previous match is much more owing to the net instead
of the time control. This would mean the LS15 net has to be very weak at rudimentary endgames.
(The depth difference to the previous match is just one ply in the final stage. It was 4-5 in the first one and is 5-6 in this one)

Laskos · Post by **Laskos** » Tue Oct 06, 2020 10:29 am

Guenther wrote: ↑Tue Oct 06, 2020 10:01 am
Laskos wrote: ↑Mon Oct 05, 2020 11:45 pm Here in 100 games at 100 + 1 from the new openings:
Code: Select all
Score of SF_12 vs Lc0_J92-190_CUDA: 34 - 29 - 37  [0.525] 100
...      SF_12 playing White: 14 - 15 - 21  [0.490] 50
...      SF_12 playing Black: 20 - 14 - 16  [0.560] 50
...      White vs Black: 28 - 35 - 37  [0.465] 100
Elo difference: 17.4 +/- 54.4, LOS: 73.6 %, DrawRatio: 37.0 %
Finished match
Pair-wise (side and reversed) in 50 pairs SF12 score is +8 -3 =39
Not that conclusive a result from these unbalanced openings too. Moreover, Lc0 manages to beat SF12 in 3 pairs of side and reversed games, which surprises me. Will check overnight Lc0 vs Fruit from these unbalanced openings.

PGN:
https://gofile.io/d/w2WZ8f
It seems you changed the LC0 net too? Those games are completely different now and I am sure it has nothing to do with the start positions.

Looking at the depths in the final stage I would conclude that the difference to the previous match is much more owing to the net instead
of the time control. This would mean the LS15 net has to be very weak at rudimentary endgames.
(The depth difference to the previous match is just one ply in the final stage. It was 4-5 in the first one and is 5-6 in this one)

Yes, I changed the net to the one used in TCEC by the devs. It is trained on the latest T60 games too which is in line with Lc0 framework. Yes, with the old opening positions I would have probably gotten a similar result, just that with these openings I made sure they are not too drawish pair-wise (side and reversed). The depth is not that much higher at this longer TC because this net is larger and 3-3.5 times slower in NPS.

I left overnight the same match conditions Lc0 against Fruit, and now Fruit is destroyed (showing also that the openings are good in discerning superiority)

Code: Select all

Score of Fruit_21 vs Lc0_J92-190_CUDA: 4 - 46 - 50  [0.290] 100
...      Fruit_21 playing White: 0 - 24 - 26  [0.260] 50
...      Fruit_21 playing Black: 4 - 22 - 24  [0.320] 50
...      White vs Black: 22 - 28 - 50  [0.470] 100
Elo difference: -155.5 +/- 47.4, LOS: 0.0 %, DrawRatio: 50.0 %
Finished match

PGN:
https://gofile.io/d/qQFSN1

Pair-wise score is +40 -0 =10 for Lc0.

All in all, it seems Lc0 with this net is only mildly weaker than SF12 at this time control and hardware in endgames, say the level of Komodo or Ethereal, but this has to be checked. That would mean that it underperforms by a couple of hundreds of Elo points in endgames, not 1000 as I have stated earlier. This surprises me, experiments one year ago with Lc0 had a different outcome IIRC. Lc0 still seems to have a longer path to conversion than the traditional engines, but it usually doesn't miss much now.

Policy determining quiet early opening preferences of Leela

Re: Policy determining quiet early opening preferences of Leela

Re: Policy determining quiet early opening preferences of Leela

Re: Policy determining quiet early opening preferences of Leela

Re: Policy determining quiet early opening preferences of Leela

Re: Policy determining quiet early opening preferences of Leela

Re: Policy determining quiet early opening preferences of Leela

Re: Policy determining quiet early opening preferences of Leela

Re: Policy determining quiet early opening preferences of Leela

Re: Policy determining quiet early opening preferences of Leela

Re: Policy determining quiet early opening preferences of Leela