Lc0 with latest test30 nets is vastly superior positionally

Laskos · Post by **Laskos** » Wed Jan 16, 2019 12:09 am

Kanizsa wrote: ↑Wed Jan 09, 2019 2:32 pm b2 b3.JPGAnother position that I suggest to add is this. Very very difficult for a program to find b3 in order to reply on Nc5 with Rb1! and b4 (what does Leela play?)

This one Leela with one of the latest test30 nets doesn't solve, it sticks to f2-f3 for at least 20 minutes (40 million nodes).
The previous one Leela solves instantly and sticks to it, a2-a4.

Thanks for these positions, I will include them.

Javier Ros · Post by **Javier Ros** » Wed Jan 16, 2019 9:06 am

Laskos wrote: ↑Wed Jan 16, 2019 12:09 am
Kanizsa wrote: ↑Wed Jan 09, 2019 2:32 pm b2 b3.JPGAnother position that I suggest to add is this. Very very difficult for a program to find b3 in order to reply on Nc5 with Rb1! and b4 (what does Leela play?)
This one Leela with one of the latest test30 nets doesn't solve, it sticks to f2-f3 for at least 20 minutes (40 million nodes).
The previous one Leela solves instantly and sticks to it, a2-a4.

Thanks for these positions, I will include them.

This tendency of positional level loss of the last versions of Lc0 is consistent with the result of my tests and experiments, see

viewtopic.php?f=2&t=69582&sid=0b6d895e6 ... 10#p786318

Perhaps the evolution of the learning process of Lc0 is leading her to improve in tactics and worsen in the positional level.

Henk · Post by **Henk** » Wed Jan 16, 2019 9:51 am

What I remember is that an engine scores best on Win at chess if it limits it's reductions and pruning to only alpha beta search.

So only applying optimizations that give no information loss. Problem is that it won't search deep.

Javier Ros · Post by **Javier Ros** » Wed Jan 16, 2019 10:40 am

Laskos wrote: ↑Tue Jan 08, 2019 9:49 am On my positional Openings200 test suite, largely based on databases of human games, I used Polyglot with particular settings, as engines like Lc0 and SF behave very differently in depth and such output. I used a setting whether from time/2 to time/1 engine sticks to the correct solution, as this seems the most representative for real moves played in games at roughly this time control per move. Usual testing from very short time to final time sticking to solution for example for 3 successive iterations is unreliable, as a regular engine can stick for 3 plies at very short times to the correct solution, only to change its mind at longer times on this positional test suite.

Lc0 on RTX 2070 GPU
Regular engines on 4 i7 fast cores.
Code: Select all
Stuck to the solution from 1s to 2s per position on 200 positions, top engines

Lc0 v20.1 ID32458: 143/200
Stockfish 10:      108/200
Komodo 12.3:        97/200
Ethreeal 11.00:     89/200

Stuck to the solution from 10s to 20s per position on 200 positions, top engines
      
Lc0 v20.1 ID32458: 157/200
Stockfish 10:      128/200
Komodo 12.3:       117/200
Ethereal 11.00:    112/200
Lc0 performance is very strong, covering human opening knowledge, a hard one, in a matter of seconds per position. I suspect that 15-20 of 200 solutions of the test suite built by me are wrong, so Lc0 with test30 nets approaches the upper limit of this positional test suite on longer time per position. Test30 ID32458 performs much better than test10 ID11261 positionally, but worse tactically (on WAC200, for example). All in all, they are about the same strength in CCRL 40/4 conditions. I do not know why they didn't manage to improve test30 tactically, as it's the main weakness of the latest nets.

The link to this positional opening suite is here:
http://s000.tinyupload.com/?file_id=249 ... 2088614166

I think it would be very interesting to compute the values of your test for versions 32367, 32409 (these two the best in my opinion) and compare with the current versions.
In addition, I believe your positional test can be used to predict which versions are the best for later testing, since due to its large number it is very difficult to choose the best version.

Laskos · Post by **Laskos** » Wed Jan 16, 2019 11:01 am

Javier Ros wrote: ↑Wed Jan 16, 2019 10:40 am
Laskos wrote: ↑Tue Jan 08, 2019 9:49 am On my positional Openings200 test suite, largely based on databases of human games, I used Polyglot with particular settings, as engines like Lc0 and SF behave very differently in depth and such output. I used a setting whether from time/2 to time/1 engine sticks to the correct solution, as this seems the most representative for real moves played in games at roughly this time control per move. Usual testing from very short time to final time sticking to solution for example for 3 successive iterations is unreliable, as a regular engine can stick for 3 plies at very short times to the correct solution, only to change its mind at longer times on this positional test suite.

Lc0 on RTX 2070 GPU
Regular engines on 4 i7 fast cores.
Code: Select all
Stuck to the solution from 1s to 2s per position on 200 positions, top engines

Lc0 v20.1 ID32458: 143/200
Stockfish 10:      108/200
Komodo 12.3:        97/200
Ethreeal 11.00:     89/200

Stuck to the solution from 10s to 20s per position on 200 positions, top engines
      
Lc0 v20.1 ID32458: 157/200
Stockfish 10:      128/200
Komodo 12.3:       117/200
Ethereal 11.00:    112/200
Lc0 performance is very strong, covering human opening knowledge, a hard one, in a matter of seconds per position. I suspect that 15-20 of 200 solutions of the test suite built by me are wrong, so Lc0 with test30 nets approaches the upper limit of this positional test suite on longer time per position. Test30 ID32458 performs much better than test10 ID11261 positionally, but worse tactically (on WAC200, for example). All in all, they are about the same strength in CCRL 40/4 conditions. I do not know why they didn't manage to improve test30 tactically, as it's the main weakness of the latest nets.

The link to this positional opening suite is here:
http://s000.tinyupload.com/?file_id=249 ... 2088614166
I think it would be very interesting to compute the values of your test for versions 32367, 32409 (these two the best in my opinion) and compare with the current versions.
In addition, I believe your positional test can be used to predict which versions are the best for later testing, since due to its large number it is very difficult to choose the best version.

I don't think that this positional test is of utmost relevancy for strength, there is tactics often involved in games, and unfortunately test30, although by now better positionally, is still weaker tactically than test10, and not improving tactically. All in all, the latest test30 nets are just a bit stronger than the best test10 nets (in the region of 20 or so Elo points).

I saw some regression in positional play of test30, but then again it jumped to record levels.
Here are the results for top engines, stuck to solution in 1s to 2s time interval per position. Latest ID32644 is incredibly strong on this suite:

Code: Select all

Lc0 v20.1 ID32644: 756/1000
Lc0 v20.1 ID32458: 712/1000
Houdini 6.03:      558/1000
Komodo 12.3:       556/1000
Stockfish 10:      524/1000
Booot 6.3.1:       494/1000
Andscacs 0.95:     484/1000
Ethereal 11.00:    457/1000
Fire 7.1:          431/1000
Texel 1.07:        419/1000

ID32644 surpasses by huge margins any regular engine. I am pretty happy that my 2-year old test-suite can see the huge positional superiority with good nets of Lc0. It was a suite not relying on analysis of engines (like STS is), but on databases of human games in the openings. I think Stockfish performs not very well compared to Komodo and Houdini due to its stupid 2moves_v1 random openings for Fishtest, and not some more regular openings.

Javier Ros · Post by **Javier Ros** » Wed Jan 16, 2019 12:28 pm

Laskos wrote: ↑Wed Jan 16, 2019 11:01 am
Code: Select all
Lc0 v20.1 ID32644: 756/1000
Lc0 v20.1 ID32458: 712/1000
Houdini 6.03:      558/1000
Komodo 12.3:       556/1000
Stockfish 10:      524/1000
Booot 6.3.1:       494/1000
Andscacs 0.95:     484/1000
Ethereal 11.00:    457/1000
Fire 7.1:          431/1000
Texel 1.07:        419/1000
ID32644 surpasses by huge margins any regular engine. I am pretty happy that my 2-year old test-suite can see the huge positional superiority with good nets of Lc0. It was a suite not relying on analysis of engines (like STS is), but on databases of human games in the openings. I think Stockfish performs not very well compared to Komodo and Houdini due to its stupid 2moves_v1 random openings for Fishtest, and not some more regular openings.

You are right! Lc0 32644 is playing again 8..h5 at the 8th position of Balsa_Top25 suite.
Lc0 is regaining positional strength! It haven't played it since 32409

11 seconds ply 12 on 2070.

FEN: rn1qkb1r/1p3ppp/p2pbn2/4p3/4P3/1NN1BP2/PPP3PP/R2QKB1R b KQkq - 0 8

Lc0201_32644:

....
11/25 00:03 82.133 23.910 -0,53 8. ... Bf8-e7 9.Qd1-d2 Nb8-d7 10.g2-g4 O-O 11.g4-g5 Nf6-h5 12.O-O-O b7-b5 13.Nc3-d5 Be6xd5 14.e4xd5 f7-f6 15.g5xf6 Be7xf6 16.Kc1-b1 Bf6-h4 17.Nb3-a5 Qd8-f6 18.Bf1-h3 Nd7-c5
11/26 00:05 138.822 24.265 -0,50 8. ... Bf8-e7 9.Qd1-d2 h7-h5 10.Nc3-d5 Nf6xd5 11.e4xd5 Be6-f5 12.Bf1-e2 Nb8-d7 13.O-O h5-h4 14.Nb3-a5 Qd8-c7 15.c2-c4 O-O 16.b2-b4 h4-h3 17.g2-g4 Bf5-g6 18.Ra1-c1 f7-f5
12/26 00:06 151.106 24.574 -0,50 8. ... Bf8-e7 9.Qd1-d2 h7-h5 10.Nc3-d5 Nf6xd5 11.e4xd5 Be6-f5 12.Bf1-e2 Nb8-d7 13.O-O h5-h4 14.Nb3-a5 Qd8-c7 15.c2-c4 O-O 16.b2-b4 h4-h3 17.g2-g4 Bf5-g6 18.Ra1-c1 f7-f5
12/27 00:07 186.884 25.977 -0,50 8. ... Bf8-e7 9.Qd1-d2 h7-h5 10.Nc3-d5 Nf6xd5 11.e4xd5 Be6-f5 12.Bf1-e2 Nb8-d7 13.O-O h5-h4 14.Nb3-a5 Qd8-c7 15.c2-c4 O-O 16.b2-b4 h4-h3 17.g2-g4 Bf5-g6 18.Ra1-c1 f7-f5
12/28 00:07 198.869 26.229 -0,50 8. ... Bf8-e7 9.Qd1-d2 h7-h5 10.Nc3-d5 Nf6xd5 11.e4xd5 Be6-f5 12.Bf1-e2 a6-a5 13.a2-a4 O-O 14.O-O Nb8-d7 15.Be2-b5 Nd7-f6 16.c2-c4 h5-h4 17.Qd2-f2 h4-h3 18.g2-g4 Bf5-d7
12/28 00:11 364.973 30.966 -0,41 8. ... h7-h5 9.Qd1-d2 Nb8-d7 10.O-O-O Bf8-e7 11.Kc1-b1 Ra8-c8 12.Nc3-d5 Nf6xd5 13.e4xd5 Be6-f5 14.Bf1-d3 Bf5xd3 15.Qd2xd3 Be7-g5 16.Be3-f2 Qd8-c7 17.c2-c3 O-O 18.Qd3-f5 Bg5-h6 19.Qf5xh5 b7-b5
12/28 00:16 557.394 33.195 -0,38 8. ... h7-h5 9.Qd1-d2 Nb8-d7 10.Nc3-d5 Be6xd5 11.e4xd5 g7-g6 12.O-O-O Bf8-g7 13.Kc1-b1 Qd8-c7 14.Bf1-e2 O-O 15.g2-g4 Nd7-b6 16.Be3-g5 h5xg4 17.Bg5xf6 Bg7xf6 18.f3xg4 Bf6-h4 19.Qd2-h6 Qc7-e7 20.Nb3-d2 Bh4-g5 21.Qh6-h3 e5-e4
12/29 00:17 568.714 33.207 -0,38 8. ... h7-h5 9.Qd1-d2 Nb8-d7 10.Nc3-d5 Be6xd5 11.e4xd5 g7-g6 12.O-O-O Qd8-c7 13.Kc1-b1 Bf8-g7 14.Bf1-e2 O-O 15.g2-g4 Nd7-b6 16.Be3-g5 h5xg4 17.Bg5xf6 Bg7xf6 18.f3xg4 Bf6-h4 19.Qd2-h6 Qc7-e7 20.Nb3-d2 Bh4-g5 21.Qh6-h3 Bg5xd2 22.Rd1xd2
12/30 00:20 690.376 34.461 -0,39 8. ... h7-h5 9.Qd1-d2 Nb8-d7 10.Nc3-d5 Be6xd5 11.e4xd5 g7-g6 12.O-O-O Qd8-c7 13.Kc1-b1 Bf8-g7 14.Bf1-e2 O-O 15.g2-g4 Nd7-b6 16.Be3-g5 h5xg4 17.Bg5xf6 Bg7xf6 18.f3xg4 Bf6-h4 19.Qd2-h6 Qc7-e7 20.Nb3-d2 Bh4-g5 21.Qh6-h3 Bg5xd2 22.Rd1xd2
13/30 00:20 696.869 34.560 -0,38 8. ... h7-h5 9.Qd1-d2 Nb8-d7 10.Nc3-d5 Be6xd5 11.e4xd5 g7-g6 12.O-O-O Qd8-c7 13.Kc1-b1 Bf8-g7 14.Bf1-e2 O-O 15.g2-g4 Nd7-b6 16.Be3-g5 h5xg4 17.Bg5xf6 Bg7xf6 18.f3xg4 Bf6-h4 19.Qd2-h6 Qc7-e7 20.Nb3-d2 Bh4-g5 21.Qh6-h3 Bg5xd2 22.Rd1xd2
13/31 00:22 783.187 34.605 -0,36 8. ... h7-h5 9.Qd1-d2 Nb8-d7 10.Nc3-d5 Be6xd5 11.e4xd5 g7-g6 12.O-O-O Qd8-c7 13.Kc1-b1 Bf8-g7 14.Bf1-e2 O-O 15.g2-g4 Nd7-b6 16.Be3-g5 h5xg4 17.Bg5xf6 Bg7xf6 18.f3xg4 Bf6-h4 19.Qd2-h6 Qc7-e7 20.Qh6-e3 Nb6-a4 21.Nb3-d2 e5-e4 22.Qe3xe4 Qe7xe4
13/32 00:24 842.683 34.421 -0,35 8. ... h7-h5 9.Qd1-d2 Nb8-d7 10.Nc3-d5 Be6xd5 11.e4xd5 g7-g6 12.O-O-O Qd8-c7 13.Kc1-b1 Bf8-g7 14.Bf1-e2 O-O 15.g2-g4 Nd7-b6 16.Be3-g5 h5xg4 17.Bg5xf6 Bg7xf6 18.f3xg4 Bf6-h4 19.Qd2-h6 Qc7-e7 20.Qh6-e3 Nb6-a4 21.Nb3-d2 e5-e4 22.Qe3xe4 Qe7xe4
13/33 00:29 997.945 34.143 -0,34 8. ... h7-h5 9.Qd1-d2 Nb8-d7 10.Nc3-d5 Be6xd5 11.e4xd5 g7-g6 12.O-O-O Qd8-c7 13.Kc1-b1 Bf8-g7 14.Rh1-g1 O-O 15.g2-g4 h5xg4 16.f3xg4 Nd7-b6 17.Qd2-g2 e5-e4 18.Be3-d4 Ra8-e8 19.c2-c4 e4-e3 20.h2-h4 Nb6xc4
13/37 00:33 1.197.343 35.456 -0,33 8. ... h7-h5 9.Qd1-d2 Nb8-d7 10.Nc3-d5 Be6xd5 11.e4xd5 g7-g6 12.O-O-O Qd8-c7 13.Kc1-b1 Bf8-g7 14.Rh1-g1 O-O 15.g2-g4 h5xg4 16.f3xg4 Nd7-b6 17.Qd2-g2 e5-e4 18.Be3-d4 Ra8-e8 19.c2-c4 e4-e3 20.h2-h4 Nb6xc4
14/37 00:34 1.228.053 35.249 -0,33 8. ... h7-h5 9.Qd1-d2 Nb8-d7 10.Nc3-d5 Be6xd5 11.e4xd5 g7-g6 12.O-O-O Qd8-c7 13.Kc1-b1 Bf8-g7 14.Bf1-e2 O-O 15.g2-g4 Rf8-c8 16.Rd1-c1 a6-a5 17.g4-g5 Nf6-e8 18.a2-a4 Qc7-d8 19.Be2-b5 Ne8-c7 20.c2-c4 b7-b6 21.Bb5-c6 Ra8-b8 22.Bc6-b5 Rb8-a8 23.Rh1-d1 Nc7-a6
14/37 00:37 1.350.835 36.200 -0,34 8. ... h7-h5 9.Qd1-d2 Nb8-d7 10.Nc3-d5 Be6xd5 11.e4xd5 g7-g6 12.O-O-O Qd8-c7 13.Kc1-b1 Bf8-g7 14.Bf1-e2 O-O 15.g2-g4 Rf8-c8 16.Rd1-c1 a6-a5 17.g4-g5 Nf6-e8 18.a2-a4 Qc7-d8 19.Be2-b5 Ne8-c7 20.Qd2-d3 Nc7xb5 21.Qd3xb5 b7-b6 22.Nb3-d2 Nd7-c5 23.Nd2-e4 Ra8-b8 24.Rh1-f1 Rc8-c7

Jouni · Post by **Jouni** » Wed Jan 16, 2019 1:46 pm

Lc0 score is no surprise with 40 MB of learning data. How about testing Brainfish?

Laskos · Post by **Laskos** » Wed Jan 16, 2019 2:14 pm

Jouni wrote: ↑Wed Jan 16, 2019 1:46 pm Lc0 score is no surprise with 40 MB of learning data. How about testing Brainfish?

Lc0 is a pattern learner. Cerebellum actually contains many of these openings. This positional strength of Lc0 is probably covering midgame too, where there are many not covered by a book positions (most).
Aside that, I had troubles testing BrainFish in Polyglot.

Jouni · Post by **Jouni** » Sat Feb 02, 2019 9:22 pm

Brainfish is difficult to test, because it don't use book in analysis mode

. But I was curious and tested it manually. Score was 152/200 with "0" sec limit.

Laskos · Post by **Laskos** » Sat Feb 02, 2019 10:28 pm

Jouni wrote: ↑Sat Feb 02, 2019 9:22 pm Brainfish is difficult to test, because it don't use book in analysis mode . But I was curious and tested it manually. Score was 152/200 with "0" sec limit.

Thanks, it is very close to what the best on this suite test30 nets (32819, for example) get on this suite, about 153/200, at 1s to 2s per position. So, Leela is from 1s to 2s per position in openings roughly as strong as the Cerebellum book, which is analyzed for dozens of minutes per position by Stockfish, and it is quite a feat. At longer TC (say Blitz or longer), Leela by itself plays stronger than the Cerebellum book in the openings.

Lc0 with latest test30 nets is vastly superior positionally

Re: Lc0 with latest test30 nets is vastly superior positionally

Re: Lc0 with latest test30 nets is vastly superior positionally

Re: Lc0 with latest test30 nets is vastly superior positionally

Re: Lc0 with latest test30 nets is vastly superior positionally

Re: Lc0 with latest test30 nets is vastly superior positionally

Re: Lc0 with latest test30 nets is vastly superior positionally

Re: Lc0 with latest test30 nets is vastly superior positionally

Re: Lc0 with latest test30 nets is vastly superior positionally

Re: Lc0 with latest test30 nets is vastly superior positionally

Re: Lc0 with latest test30 nets is vastly superior positionally