Lc0 with latest test30 nets is vastly superior positionally

Kanizsa · Post by **Kanizsa** » Wed Jan 09, 2019 2:19 pm

Larry Kaufmann chose a position similar to this: it is a very difficult task for a program to find a2-a4! Also for today's standards.
Only Komodo finds it, neither Fritz 11 or Stockfish

Kanizsa · Post by **Kanizsa** » Wed Jan 09, 2019 2:32 pm

b2 b3.JPG

Another position that I suggest to add is this. Very very difficult for a program to find b3 in order to reply on Nc5 with Rb1! and b4 (what does Leela play?)

mbabigian · Post by **mbabigian** » Wed Jan 09, 2019 5:38 pm

More powerful hardware will mask over some of the search problem; however, as anyone that has let lc0 think to millions of nodes will tell you, it is not enough.

AB may not be necessary but a vastly improved MCTS IS at a minimum. In the past decades nearly all elo was gained through breakthroughs in search and pruning. It is time to put that vast treasure trove of knowledge to work for NN's. When we do, the tactical weakness will disappear.

mbabigian · Post by **mbabigian** » Wed Jan 09, 2019 5:40 pm

More powerful hardware will mask over some of the search problem; however, as anyone that has let lc0 think to millions of nodes will tell you, it is not enough.

AB may not be necessary but a vastly improved MCTS IS at a minimum. In the past decades nearly all elo was gained through breakthroughs in search and pruning. It is time to put that vast treasure trove of knowledge to work for NN's. When we do, the tactical weakness will disappear.

That's not to say they won't beat SF with a tactically weak version, but the weakness needs fixing regardless.

mbabigian · Post by **mbabigian** » Wed Jan 09, 2019 5:43 pm

Sorry about the double post. Was replying with my phone browser and something went haywire. Can't delete the first one...

yanquis1972 · Post by **yanquis1972** » Wed Jan 09, 2019 7:09 pm

mbabigian wrote: ↑Wed Jan 09, 2019 6:31 am I don't believe more training will substantially improve tactical strength. It appears the search technique used is plain weak. I theorize that some improved search method will do more to add elo than better training can at this point. Perhaps a hybrid MCTS, AB search like Mark and Larry have tried will work better, but I don't believe LC0's weak tactics can be solved via smarter networks. New search methods should be tried.

My two cents.

absolutely, but that goes for everything. it's only now that the leela team is able to replicate deepmind's parameters; everything before was guesswork in a lot of places. and imo the current approach should be enough to equal or surpass SF10, but test40 should reveal whether or not that's the case.

as i understand it albert silver is doing a lot of experimentation re non-zero approaches for deusX, which is another vast area to explore.

one thing i haven't understood is why (& i may be wrong) tactics have peaked before the first LR drop, & iirc actually regressed after. if that's the case it seems like it should be a solvable & maybe reversible issue.

Jouni · Post by **Jouni** » Sat Jan 12, 2019 5:49 pm

I am sceptical for all "positional" test suites and I removed STS from my testsuites! Reason: Houdini 6 was still the best and that suite can't detect any progress from SF8 -> SF9 -> SF10 so quite useless. BTW in Kai's 200 position set with 3s limit I got: Houdini6 109 Komodo123 103 and SF10 98. Houdini is a real positional master

. May be there are no such thing like positional play at all - only score means.

Laskos · Post by **Laskos** » Sun Jan 13, 2019 2:10 pm

Jouni wrote: ↑Sat Jan 12, 2019 5:49 pm I am sceptical for all "positional" test suites and I removed STS from my testsuites! Reason: Houdini 6 was still the best and that suite can't detect any progress from SF8 -> SF9 -> SF10 so quite useless. BTW in Kai's 200 position set with 3s limit I got: Houdini6 109 Komodo123 103 and SF10 98. Houdini is a real positional master . May be there are no such thing like positional play at all - only score means.

You have a pretty large variance in the test, engines, especially on many threads, switch often the best move on this positional test suite. Also, I recommended to use the solution found in between time/2 to time/1 (in Polyglot this is easily doable). I used the suite five times, to have a more reliable picture, and the results are (5 x 200 = 1000 positions)

Code: Select all

Lc0 v20.1 ID32458: 712/1000
Houdini 6.03:      558/1000
Komodo 12.3:       556/1000
Stockfish 10:      524/1000
Ethereal 11.0:     457/1000

The standard deviation of the result seems to be about 15, so Houdini and Komodo do seem a bit stronger than SF, but all of them far behind Leela. Observe also that Ethereal is significantly lower than the top 3 regular engines. And again, Leela seems vastly superior to any regular engine on this suite, despite being only on par with top regular engines in games in my conditions (CPU/GPU).

corres · Post by **corres** » Sun Jan 13, 2019 3:23 pm

Laskos wrote: ↑Sun Jan 13, 2019 2:10 pm ...
I used the suite five times, to have a more reliable picture, and the results are (5 x 200 = 1000 positions)
Code: Select all
Lc0 v20.1 ID32458: 712/1000
Houdini 6.03:      558/1000
Komodo 12.3:       556/1000
Stockfish 10:      524/1000
Ethereal 11.0:     457/1000
The standard deviation of the result seems to be about 15, so Houdini and Komodo do seem a bit stronger than SF, but all of them far behind Leela. Observe also that Ethereal is significantly lower than the top 3 regular engines. And again, Leela seems vastly superior to any regular engine on this suite, despite being only on par with top regular engines in games in my conditions (CPU/GPU).

Your results are confirm the earlier experiences Leela is stronger in positional play then top AB engines and
this is the behavior what compensates its weakness in tactical/endgame play.
But the main question is how far Leela can go relative to AB engines with this unbalanced chess knowledge.

mbabigian · Post by **mbabigian** » Sun Jan 13, 2019 4:58 pm

Even though nodes are counted differently per program, it would be interesting to see a fixed node count test done on tactical test suites. I think it would be just as illuminating. As node counts double I'd expect AB engines to tactically improve faster than LC0.

If this is true, smarter networks will be held back by the weak search until the problem is taken seriously.

I'd also be curious to see which approach solves more problems at similar node counts (despite the difficulties of comparing node counts between programs).

Lc0 with latest test30 nets is vastly superior positionally

Re: Lc0 with latest test30 nets is vastly superior positionally

Re: Lc0 with latest test30 nets is vastly superior positionally

Re: Lc0 with latest test30 nets is vastly superior positionally

Re: Lc0 with latest test30 nets is vastly superior positionally

Re: Lc0 with latest test30 nets is vastly superior positionally

Re: Lc0 with latest test30 nets is vastly superior positionally

Re: Lc0 with latest test30 nets is vastly superior positionally

Re: Lc0 with latest test30 nets is vastly superior positionally

Re: Lc0 with latest test30 nets is vastly superior positionally

Re: Lc0 with latest test30 nets is vastly superior positionally