Lc0 with latest test30 nets is vastly superior positionally

Laskos · Post by **Laskos** » Tue Jan 08, 2019 9:49 am

On my positional Openings200 test suite, largely based on databases of human games, I used Polyglot with particular settings, as engines like Lc0 and SF behave very differently in depth and such output. I used a setting whether from time/2 to time/1 engine sticks to the correct solution, as this seems the most representative for real moves played in games at roughly this time control per move. Usual testing from very short time to final time sticking to solution for example for 3 successive iterations is unreliable, as a regular engine can stick for 3 plies at very short times to the correct solution, only to change its mind at longer times on this positional test suite.

Lc0 on RTX 2070 GPU
Regular engines on 4 i7 fast cores.

Code: Select all

Stuck to the solution from 1s to 2s per position on 200 positions, top engines

Lc0 v20.1 ID32458: 143/200
Stockfish 10:      108/200
Komodo 12.3:        97/200
Ethreeal 11.00:     89/200

Stuck to the solution from 10s to 20s per position on 200 positions, top engines
      
Lc0 v20.1 ID32458: 157/200
Stockfish 10:      128/200
Komodo 12.3:       117/200
Ethereal 11.00:    112/200

Lc0 performance is very strong, covering human opening knowledge, a hard one, in a matter of seconds per position. I suspect that 15-20 of 200 solutions of the test suite built by me are wrong, so Lc0 with test30 nets approaches the upper limit of this positional test suite on longer time per position. Test30 ID32458 performs much better than test10 ID11261 positionally, but worse tactically (on WAC200, for example). All in all, they are about the same strength in CCRL 40/4 conditions. I do not know why they didn't manage to improve test30 tactically, as it's the main weakness of the latest nets.

The link to this positional opening suite is here:
http://s000.tinyupload.com/?file_id=249 ... 2088614166

Lion · Post by **Lion** » Tue Jan 08, 2019 10:19 am

This doesn't surprise me although its good to have some data showing it.

As you said, its tactically that Lc0 has some weakness which gives it a little human like behaviour where it can loose in 25 moves tactical combination to SF dev and the following game show a positionnal master piece vs SF dev.

Its a welcome new era of computer chess...

rgds

Laskos · Post by **Laskos** » Tue Jan 08, 2019 12:13 pm

Lion wrote: ↑Tue Jan 08, 2019 10:19 am This doesn't surprise me although its good to have some data showing it.

As you said, its tactically that Lc0 has some weakness which gives it a little human like behaviour where it can loose in 25 moves tactical combination to SF dev and the following game show a positionnal master piece vs SF dev.

Its a welcome new era of computer chess...

rgds

I wanted to see some quantifiable result showing this positional dominance. We all see it in many won Leela games, or perceive it. But then, in my conditions, Lc0 is the level of SF10 overall as strength goes. Showing in some particular test suite of positions the vast positional dominance is not that easy. For example, Strategical Test Suite (1500 positions, STS) behaves pretty badly, it shows Leela as below all these top regular engines, so even below its regular strength, including tactical mishaps.

STS 1500 test suite:

Code: Select all

Stuck to the solution from 1s to 2s per position on 1500 positions

Stockfish 10:      1296/1500
Lc0 v20.1 ID32458: 1155/1500

That's a very bad indicator not only of positional strength, but of general strength.
I am pretty glad that my own positional test suite is probably the only or among the very few (I am unaware of other) test suites which show vast positional superiority of Lc0 over top regular engines. I spent to build this suite no more than a week, intermittently.

Kanizsa · Post by **Kanizsa** » Tue Jan 08, 2019 1:13 pm

Great Kai,
and congratulations for this work and its proof.

Insight of judging chess programs from the earliest opening moves is not new. Larry kaufman historically thought it in the 90s and appeared in a first suite of computer chess test on Computer Chess Report review (if I remember correctly). There appearead only few positions, maybe twenty. I remember that one of them was chosen in order to suggest the programs to play in Benoni opening move "a2-a4" in response to "a7-a6".
At that time "a2-a4" was a very difficult move that no program was able to play, neither Mchess or Genius or ChessMachine.

I will check if I'll found in your suite similar themes to limit pawn expansions on the Queen side.

Thank you again for your studies.

Werner · Post by **Werner** » Tue Jan 08, 2019 1:52 pm

Hi,
are you able to start Lc0 with a setup position?

I have no graphic Card and when I Setup a Position with blas Version for CPU inside Shredder GUI, Lc0 crashes here.
What GUI do you use?

Laskos · Post by **Laskos** » Tue Jan 08, 2019 6:10 pm

Werner wrote: ↑Tue Jan 08, 2019 1:52 pm Hi,
are you able to start Lc0 with a setup position?

I have no graphic Card and when I Setup a Position with blas Version for CPU inside Shredder GUI, Lc0 crashes here.
What GUI do you use?

From command line, shouldn't it be like any other UCI engine?
position fen r7/1b1r4/k1p1p1p1/1p1pPpPp/p1PP1P1P/PP1K4/8/4Q3 w - -
go

I haven't worked with blas version and Shredder GUI for long time.

I am using Polyglot 2.0.3 with its EPD test function.

My batch file looks like this:

polyglot.exe polyglot.ini epd-test -epd Openings200beta7.epd -min-time 1 -max-time 2 -depth-delta 1
pause

It says that the minimum time at which the correct solution must be present is at 1.00s, and if doesn't appear until 2.00s, it failed to find the solution. If I set -min-time 0.01, it will skew the test, as even at -depth-delta 3, engines on this positional test suite might stick to the correct solution at low depths even for 3 successive depths, only to change their mind towards the upper limit of 2.00s, but polyglot will anyway accept it as solved, not exploring to further depths. Also, one has to be careful using high -depth-delta with regular engines and Lc0, as depth is different thing in their case, best is to use 1 or 2 for depth-delta, but restrict the time window.

My polyglot.ini looks like that:

[PolyGlot]
EngineCommand=lc0_v201
EngineDir=.
Log=false
LogFile=polyglot.log
[Engine]
WeightsFile=F:\Users\Kai\Weights\weights_run2_32458.pb.gz
Backend=cudnn-fp16
MinibatchSize=512
NNCacheSize=2000000

Uri Blass · Post by **Uri Blass** » Tue Jan 08, 2019 7:08 pm

Lion wrote: ↑Tue Jan 08, 2019 10:19 am This doesn't surprise me although its good to have some data showing it.

As you said, its tactically that Lc0 has some weakness which gives it a little human like behaviour where it can loose in 25 moves tactical combination to SF dev and the following game show a positionnal master piece vs SF dev.

Its a welcome new era of computer chess...

rgds

Humans are weaker both tactically and positionally relative to chess engines.
If LC0's weaknesses in tactics give it a little human like behaviour then stockfish's weakness in positional play give it also a little human like behaviour.

Laskos · Post by **Laskos** » Tue Jan 08, 2019 7:12 pm

Kanizsa wrote: ↑Tue Jan 08, 2019 1:13 pm Great Kai,
and congratulations for this work and its proof.

Insight of judging chess programs from the earliest opening moves is not new. Larry kaufman historically thought it in the 90s and appeared in a first suite of computer chess test on Computer Chess Report review (if I remember correctly). There appearead only few positions, maybe twenty. I remember that one of them was chosen in order to suggest the programs to play in Benoni opening move "a2-a4" in response to "a7-a6".
At that time "a2-a4" was a very difficult move that no program was able to play, neither Mchess or Genius or ChessMachine.

I will check if I'll found in your suite similar themes to limit pawn expansions on the Queen side.

Thank you again for your studies.

It's a 2 year old suite already, I just wanted to see that if testing correctly, it's really so positional, as I checked with STS suite and the results were disappointing. STS is over-analyzed with engines, and it's basically adapted to obey regular engines paradigm. It might say something about regular engines, but it doesn't say anything about Leela. I used two years ago engines with my suite just to check that there is no tactics involved, and many options come as almost equal to engines, although databases of human games show clear preferences and statistic of outcomes. And now I see that Leela dominates copiously in this suite, if testing properly (and in my CPU/GPU conditions). I was pretty glad to discover this, as we all have this "feeling" that Leela is very strong positionally, but I wanted to quantify this superiority.

I don't think you will find any deep meaning of my positions, like that you exemplified, they are just intricate positional options pretty advanced into the openings.

mbabigian · Post by **mbabigian** » Wed Jan 09, 2019 6:31 am

I don't believe more training will substantially improve tactical strength. It appears the search technique used is plain weak. I theorize that some improved search method will do more to add elo than better training can at this point. Perhaps a hybrid MCTS, AB search like Mark and Larry have tried will work better, but I don't believe LC0's weak tactics can be solved via smarter networks. New search methods should be tried.

My two cents.

M ANSARI · Post by **M ANSARI** » Wed Jan 09, 2019 8:23 am

Or simply more powerful hardware. I also believe that for now the best bet for LC0 to improve is to have an AB backup for deep search when there are a lot of tactics in the position. This is actually pretty amazing and reminds me of the old days when computer engines were just getting strong and humans still had good success against engines. You would be killing the engine positionally ... and then one tiny tactical over sight and you are lost. This is very similar but multiplied many folds over. There might be a breakthrough to fix this weakness in LC0 without using AB search and that might simply be that hardware becomes so strong that search is deep enough to overcome any tactical tricks. But for now there needs to be a quick solution and that solution might be to just simply add a good AB search hybrid. This would seem to make a lot of sense as LC0 uses the GPU and thus the CPU is relatively available for use ... at least that is my understanding of how LC0 works..

Lc0 with latest test30 nets is vastly superior positionally

Lc0 with latest test30 nets is vastly superior positionally

Re: Lc0 with latest test30 nets is vastly superior positionally

Re: Lc0 with latest test30 nets is vastly superior positionally

Re: Lc0 with latest test30 nets is vastly superior positionally

Re: Lc0 with latest test30 nets is vastly superior positionally

Re: Lc0 with latest test30 nets is vastly superior positionally

Re: Lc0 with latest test30 nets is vastly superior positionally

Re: Lc0 with latest test30 nets is vastly superior positionally

Re: Lc0 with latest test30 nets is vastly superior positionally

Re: Lc0 with latest test30 nets is vastly superior positionally