Page 3 of 5

Re: Official: Lc0 is the strongest engine :)

Posted: Sun Oct 14, 2018 1:20 pm
by Damir
If you mean Komodo MCTS, than yes it will soon overtake normal Komodo version, as Mark and Larry are now solely concentrating in improving MCTS version of Komodo. Regarding Lc0 overtaking Komodo, you are asking about the wrong engine… :) :) :D

Re: Official: Lc0 is the strongest engine :)

Posted: Sun Oct 14, 2018 2:20 pm
by Laskos
lkaufman wrote: Sun Oct 14, 2018 1:50 am You inspired me to run a test to see if Komodo MCTS also plays middlegames relatively stronger than endgames. Current Komodo MCTS (dev) on one thread now can beat normal Komodo 9.1 (by 31 elo in my test). I then reran the test starting with positions with the queens already exchanged. The result was almost the same (+27 elo). So the endgame dropoff is unique to Lc0, which suggests that the cause is the neural network, rather than the MCTS search. It seems to me that the strength of Lc0 is not because the neural network evals are all that good, but is because they are much faster than a similar quality short normal search if they are run on a good GPU. It's as if the fast GPU makes the NN act like a rather poor evaluator with a normal search running at super-speed. Maybe I'm wrong, but that's the way it looks to me. If someone finds a way to use a good GPU effectively with a more normal eval, that might be the "holy grail" of computer chess.
I don't think so. First, the Leela's NN eval is very bulky and slow. And "black-box-like" with its many layered convolutions. We don't know what it is doing and will never know with such a complex net. I had past experience with images in Matlab (10 years ago or so), and after just two 3x3 convolutions from a simple image and a filter, it becomes a scramble. De-convolution only works to some degree, and only knowing the precise form of convolutions. "Blind de-convolution" works poorly even with a simple image and one filter. Imagine a Chess Board representations and such a layer of convolutions. Yes, this thing is compute-hungry, CPU, GPU, FPGA --- what works better, and CPU here fares badly. But I would say the opposite: this "black-box" slow eval is very strong, and can even account (strength-wise) for regular strong eval + 2-3 plies AB search, which is amazing. It is used in policy and value heads during its very slow MCTS search, VERY much slower than Komodo MCTS search (hundreds of times, maybe one-two thousands on CPU). And the MCTS search itself in Komodo is AFAIK more involved than the Lc0 MCTS search.

The specific endgame problem IMO has two origins:
1/ Leela is indeed a bit weak here compared to Leela opening and midgame, probably due less performance-wise importance, less visits, and peculiarities of each discrete endgame, which doesn't obey very often easy for NN patterns.
2/ Strong traditional engines are VERY strong in endgames due to their hand-crafted knowledge (and very deep AB search), used in endgames to much higher degree than in openings and midgames. They are "unnaturally" strong in endgames compared to openings and midgames from the point of view of a NN based engine and probably to a strong human.

Re: Official: Lc0 is the strongest engine :)

Posted: Sun Oct 14, 2018 3:47 pm
by lkaufman
smatovic wrote: Sun Oct 14, 2018 10:19 am
lkaufman wrote: Sun Oct 14, 2018 1:50 am It seems to me that the strength of Lc0 is not because the neural network evals are all that good, but is because they are much faster than a similar quality short normal search if they are run on a good GPU. It's as if the fast GPU makes the NN act like a rather poor evaluator with a normal search running at super-speed. Maybe I'm wrong, but that's the way it looks to me.
Hm, afaik LC0 makes only about 10 Knps to 40 Knps on top GPU with 2 threads on Host.

So it should be the other way around, cos of the good NN evaluation LC0 needs less nodes to search.
lkaufman wrote: Sun Oct 14, 2018 1:50 am If someone finds a way to use a good GPU effectively with a more normal eval, that might be the "holy grail" of computer chess.
I agree, there is a lot of horse power present in GPUs,
unfortunately there are some limitations like SIMT architecture and memory hierarchy.

--
Srdja
You cannot compare NPS with MCTS to alpha-beta. If we make Komodo MCTS do searches of similar quality to nn, it will get a few kn/sec, more than Lc0 on same CPU but much less than Lc0 with good GPU.

Re: Official: Lc0 is the strongest engine :)

Posted: Sun Oct 14, 2018 3:53 pm
by lkaufman
Damir wrote: Sun Oct 14, 2018 1:20 pm If you mean Komodo MCTS, than yes it will soon overtake normal Komodo version, as Mark and Larry are now solely concentrating in improving MCTS version of Komodo. Regarding Lc0 overtaking Komodo, you are asking about the wrong engine… :) :) :D
That's a pretty good reply. We haven't given up on normal Komodo, but I do expect the MCTS version to pass it before too long. Lc0 is no threat to Komodo on cheap hardware, but with good GPU it will probably become number one engine unless someone else (Komodo I hope) finds a good use for GPU.

Re: Official: Lc0 is the strongest engine :)

Posted: Sun Oct 14, 2018 4:01 pm
by lkaufman
jp wrote: Sun Oct 14, 2018 1:18 pm
duncan wrote: Sun Oct 14, 2018 12:45 pm
lkaufman wrote: Sun Oct 14, 2018 1:50 am
A bit off topic, but do you think there is any risk that lco will overtake Komodo, in the next year.
But you are in the best position to know.
On the same off topic, why is Komodo playing so badly on chess.com's CCCC right now? Is it a buggy version?
The problem seems to be that Komodo takes a few minutes to reach full speed on giant hardware with large hash tables. In TCEC this only costs a few elo, but in a blitz game it is fatal. So it's not a buggy version, it's just an unresolved problem on expensive hardware. Note that the MCTS version of Komodo, which doesn't seem to have this problem, is scoring almost as well as normal Komodo in this event, DESPITE USING ONLY 8 THREADS!! Once we solve the thread limit (and we're close), Komodo MCTS may actually be our strongest engine under these CCC conditions.

Re: Official: Lc0 is the strongest engine :)

Posted: Sun Oct 14, 2018 4:06 pm
by lkaufman
Laskos wrote: Sun Oct 14, 2018 2:20 pm
lkaufman wrote: Sun Oct 14, 2018 1:50 am You inspired me to run a test to see if Komodo MCTS also plays middlegames relatively stronger than endgames. Current Komodo MCTS (dev) on one thread now can beat normal Komodo 9.1 (by 31 elo in my test). I then reran the test starting with positions with the queens already exchanged. The result was almost the same (+27 elo). So the endgame dropoff is unique to Lc0, which suggests that the cause is the neural network, rather than the MCTS search. It seems to me that the strength of Lc0 is not because the neural network evals are all that good, but is because they are much faster than a similar quality short normal search if they are run on a good GPU. It's as if the fast GPU makes the NN act like a rather poor evaluator with a normal search running at super-speed. Maybe I'm wrong, but that's the way it looks to me. If someone finds a way to use a good GPU effectively with a more normal eval, that might be the "holy grail" of computer chess.
I don't think so. First, the Leela's NN eval is very bulky and slow. And "black-box-like" with its many layered convolutions. We don't know what it is doing and will never know with such a complex net. I had past experience with images in Matlab (10 years ago or so), and after just two 3x3 convolutions from a simple image and a filter, it becomes a scramble. De-convolution only works to some degree, and only knowing the precise form of convolutions. "Blind de-convolution" works poorly even with a simple image and one filter. Imagine a Chess Board representations and such a layer of convolutions. Yes, this thing is compute-hungry, CPU, GPU, FPGA --- what works better, and CPU here fares badly. But I would say the opposite: this "black-box" slow eval is very strong, and can even account (strength-wise) for regular strong eval + 2-3 plies AB search, which is amazing. It is used in policy and value heads during its very slow MCTS search, VERY much slower than Komodo MCTS search (hundreds of times, maybe one-two thousands on CPU). And the MCTS search itself in Komodo is AFAIK more involved than the Lc0 MCTS search.

The specific endgame problem IMO has two origins:
1/ Leela is indeed a bit weak here compared to Leela opening and midgame, probably due less performance-wise importance, less visits, and peculiarities of each discrete endgame, which doesn't obey very often easy for NN patterns.
2/ Strong traditional engines are VERY strong in endgames due to their hand-crafted knowledge (and very deep AB search), used in endgames to much higher degree than in openings and midgames. They are "unnaturally" strong in endgames compared to openings and midgames from the point of view of a NN based engine and probably to a strong human.
I agree with your comments, we also came to the same conclusion about nn=strong eval + 2-3 plies AB search. I just look at it differently; if GPU can do this much faster than cpu can do 3 ply search, that's why it is stronger. But there are many positions (especially but not only endgame) where NN can look pretty dumb compared to 3 ply search.

Re: Official: Lc0 is the strongest engine :)

Posted: Sun Oct 14, 2018 10:35 pm
by Werewolf
lkaufman wrote: Sun Oct 14, 2018 4:06 pm
Laskos wrote: Sun Oct 14, 2018 2:20 pm
lkaufman wrote: Sun Oct 14, 2018 1:50 am You inspired me to run a test to see if Komodo MCTS also plays middlegames relatively stronger than endgames. Current Komodo MCTS (dev) on one thread now can beat normal Komodo 9.1 (by 31 elo in my test). I then reran the test starting with positions with the queens already exchanged. The result was almost the same (+27 elo). So the endgame dropoff is unique to Lc0, which suggests that the cause is the neural network, rather than the MCTS search. It seems to me that the strength of Lc0 is not because the neural network evals are all that good, but is because they are much faster than a similar quality short normal search if they are run on a good GPU. It's as if the fast GPU makes the NN act like a rather poor evaluator with a normal search running at super-speed. Maybe I'm wrong, but that's the way it looks to me. If someone finds a way to use a good GPU effectively with a more normal eval, that might be the "holy grail" of computer chess.
I don't think so. First, the Leela's NN eval is very bulky and slow. And "black-box-like" with its many layered convolutions. We don't know what it is doing and will never know with such a complex net. I had past experience with images in Matlab (10 years ago or so), and after just two 3x3 convolutions from a simple image and a filter, it becomes a scramble. De-convolution only works to some degree, and only knowing the precise form of convolutions. "Blind de-convolution" works poorly even with a simple image and one filter. Imagine a Chess Board representations and such a layer of convolutions. Yes, this thing is compute-hungry, CPU, GPU, FPGA --- what works better, and CPU here fares badly. But I would say the opposite: this "black-box" slow eval is very strong, and can even account (strength-wise) for regular strong eval + 2-3 plies AB search, which is amazing. It is used in policy and value heads during its very slow MCTS search, VERY much slower than Komodo MCTS search (hundreds of times, maybe one-two thousands on CPU). And the MCTS search itself in Komodo is AFAIK more involved than the Lc0 MCTS search.

The specific endgame problem IMO has two origins:
1/ Leela is indeed a bit weak here compared to Leela opening and midgame, probably due less performance-wise importance, less visits, and peculiarities of each discrete endgame, which doesn't obey very often easy for NN patterns.
2/ Strong traditional engines are VERY strong in endgames due to their hand-crafted knowledge (and very deep AB search), used in endgames to much higher degree than in openings and midgames. They are "unnaturally" strong in endgames compared to openings and midgames from the point of view of a NN based engine and probably to a strong human.
I agree with your comments, we also came to the same conclusion about nn=strong eval + 2-3 plies AB search. I just look at it differently; if GPU can do this much faster than cpu can do 3 ply search, that's why it is stronger. But there are many positions (especially but not only endgame) where NN can look pretty dumb compared to 3 ply search.
Another thing just to toss in there is the progress of hardware. Some people care a lot about performance per watt or performance per $. But for me, the big thing is just what's available and how fast it is.

GPUs have been improving much faster than CPUs. If this continues, it'll get better and better for the GPU engines...

Re: Official: Lc0 is the strongest engine :)

Posted: Sun Oct 14, 2018 10:55 pm
by Milos
Werewolf wrote: Sun Oct 14, 2018 10:35 pm GPUs have been improving much faster than CPUs. If this continues, it'll get better and better for the GPU engines...
Ofc they were improving much faster when 9xx series was on 28nm, while Intel is already 4 generations on 14nm (10nm Cannon Lake is a total failure since Intel can't make a reliable and commercially viable 10nm CPU process, and TSMC can't either).
Once now 20xx series reached 12nm you are gonna witness the same thing that happened to Intel since 4 generations ago. Stagnation...

Re: Official: Lc0 is the strongest engine :)

Posted: Mon Oct 15, 2018 12:22 pm
by Werewolf
Milos wrote: Sun Oct 14, 2018 10:55 pm
Werewolf wrote: Sun Oct 14, 2018 10:35 pm GPUs have been improving much faster than CPUs. If this continues, it'll get better and better for the GPU engines...
Ofc they were improving much faster when 9xx series was on 28nm, while Intel is already 4 generations on 14nm (10nm Cannon Lake is a total failure since Intel can't make a reliable and commercially viable 10nm CPU process, and TSMC can't either).
Once now 20xx series reached 12nm you are gonna witness the same thing that happened to Intel since 4 generations ago. Stagnation...
The performance improvement from Pascal (16nm) to Turing (12nm) was about 15% IIRC. Probably the next shrink will yield less.

But with cards they seem to be freer than CPUs to try stuff if it's A.I related. So for Leela the change from Pascal to Turning is HUGE. If, for example, Nvidia produced a card which did away with CUDA cores and focused exclusively on things related to A.I - we could see at least one more decent jump up, IMO.

Re: Official: Lc0 is the strongest engine :)

Posted: Mon Oct 15, 2018 10:32 pm
by Milos
Werewolf wrote: Mon Oct 15, 2018 12:22 pm
Milos wrote: Sun Oct 14, 2018 10:55 pm
Werewolf wrote: Sun Oct 14, 2018 10:35 pm GPUs have been improving much faster than CPUs. If this continues, it'll get better and better for the GPU engines...
Ofc they were improving much faster when 9xx series was on 28nm, while Intel is already 4 generations on 14nm (10nm Cannon Lake is a total failure since Intel can't make a reliable and commercially viable 10nm CPU process, and TSMC can't either).
Once now 20xx series reached 12nm you are gonna witness the same thing that happened to Intel since 4 generations ago. Stagnation...
The performance improvement from Pascal (16nm) to Turing (12nm) was about 15% IIRC. Probably the next shrink will yield less.

But with cards they seem to be freer than CPUs to try stuff if it's A.I related. So for Leela the change from Pascal to Turning is HUGE. If, for example, Nvidia produced a card which did away with CUDA cores and focused exclusively on things related to A.I - we could see at least one more decent jump up, IMO.
There is even less than those 15% between Pascal and Turing architecture thanks to scaling. Tensor cores are not very useful and gain from them is minimal, at least for inference. The actual speed up above those 15% comes only from enabling FP16 in Turing that was intentionally crippled in Pascal and earlier architectures.
But this is one-trick pony. There is almost nothing more that Nvidia can bring architecturally in hardware. And as things look like now, we will wait at least 3 more years for working 10nm node.
I know ppl are dreamers but in this case this is totally unsound.