The next step for LC0?

chrisw · Post by **chrisw** » Sat Aug 29, 2020 1:23 am

smatovic wrote: ↑Fri Aug 28, 2020 5:35 pm
chrisw wrote: ↑Fri Aug 28, 2020 4:48 pm
smatovic wrote: ↑Fri Aug 28, 2020 3:28 pm
chrisw wrote: ↑Fri Aug 28, 2020 2:57 pm
smatovic wrote: ↑Fri Aug 28, 2020 1:16 pm I know, LC0's primary goal was an open source adaptation of A0, and I am not
into the Discord development discussions and alike, anyway, my 2 cents on this:

- MCTS-PUCT search was an descendant from AlphaGo, generalized to be applied on
Go, Shogi and Chess, it can utilize a GPU via batches but has its known weak-
nesses, tactics in form of "shallow-traps" in a row, and end-game.

- A CPU AB search will not work with NN on GPU via batches.

- NNUE makes no sense on GPU.

- LC0 has a GPU-cloud-cluster to play Reinforcement Learning games.

- LC0 plays already ~2400(?) Elo with an depth 1 search alone.

- It is estimated that the NN eval is worth 4 plies AB search.

Looking at the above points it seems pretty obvious what the next step for LC0
could be, drop the weak part, MCTS-PUCT search, ignore AB and NNUE, and focus
what LC0 is good at. Increase the plies encoded in NN, increase the Elo at
depth 1 eval.

To put it to an extreme, drop the search part completely, increase the CNN size
1000 fold, decrease NPS from ~50K to ~50, add multiple, increasing sized NNs
to be queried stepwise for Time Control.

Just thinking loud...

--
Srdja
Nice idea, but why still with CNN? Didn't NNUE show that fully connected works also? What would be useful would be a NN-only chess tournament, maybe with some sort of solid incentive for winning. Currently LC0 would presumably win that hands down, but ...
I have heard CNNs need less games for training, not sure how this compares in
chess to MLP/NNUE. If we assume a depth 1 search only, there is no reason not
to use NNUE even on GPU, maybe with extension of 30 distinct piece-count-indexed
NNs...

--
Srdja
Hehe, depth 1 NN chess would take them all back to the 1980s.
I didn’t look into NNUE training, but I guess the UE bit doesn’t refer to training? Also a while since, but isn’t training of even sizeable nets via GPU batching running at tens of thousands of positions per second? Maybe being random and out of loop here.
I mean you need fewer RL games played on the GPU-cloud-cluster to get the CNN saturated compared to MLP, not sure if this is fact, maybe some NN creater in here can clarify how LC0 CNN compares to NNUE in terms of games needed and horse power to train these nets.

chrisw wrote: ↑Fri Aug 28, 2020 4:48 pm If you do ply one matches only, then the incentive for NNUE disappears, surely? NNUE speed up is only really useful for search, other than the UE trick, it’s just a normal fully connected net with some simple non-linearity built into the inputs, no?
Hmm, not quite sure where the diff between UE and NNUE is,

There isn’t such a thing as a UE to be different from NNUE, I was just trying to stress the Efficiently Updatable aspect. NNUE gives the same result as the equivalent NN, the UE trick just allows for faster computation, giving same eval result, just quicker.

if you perform incremental updates for the NN in the first layer, where most of the weights are, then you get imo the first layer for "free", if during AB search, or during game play does not matter, or?

--
Srdja

Yup. Prob we have a wires crossed argument. I was proposing a ply one (zero search) NN competition. In this case being able to compute the network in ten microseconds as opposed to ten milliseconds is not important. In a searching, N ply, environment it is important.

smatovic · Post by **smatovic** » Sun Aug 30, 2020 8:49 am

chrisw wrote: ↑Sat Aug 29, 2020 1:23 am
smatovic wrote: ↑Fri Aug 28, 2020 5:35 pm
chrisw wrote: ↑Fri Aug 28, 2020 4:48 pm
smatovic wrote: ↑Fri Aug 28, 2020 3:28 pm
chrisw wrote: ↑Fri Aug 28, 2020 2:57 pm
smatovic wrote: ↑Fri Aug 28, 2020 1:16 pm I know, LC0's primary goal was an open source adaptation of A0, and I am not
into the Discord development discussions and alike, anyway, my 2 cents on this:

- MCTS-PUCT search was an descendant from AlphaGo, generalized to be applied on
Go, Shogi and Chess, it can utilize a GPU via batches but has its known weak-
nesses, tactics in form of "shallow-traps" in a row, and end-game.

- A CPU AB search will not work with NN on GPU via batches.

- NNUE makes no sense on GPU.

- LC0 has a GPU-cloud-cluster to play Reinforcement Learning games.

- LC0 plays already ~2400(?) Elo with an depth 1 search alone.

- It is estimated that the NN eval is worth 4 plies AB search.

Looking at the above points it seems pretty obvious what the next step for LC0
could be, drop the weak part, MCTS-PUCT search, ignore AB and NNUE, and focus
what LC0 is good at. Increase the plies encoded in NN, increase the Elo at
depth 1 eval.

To put it to an extreme, drop the search part completely, increase the CNN size
1000 fold, decrease NPS from ~50K to ~50, add multiple, increasing sized NNs
to be queried stepwise for Time Control.

Just thinking loud...

--
Srdja
Nice idea, but why still with CNN? Didn't NNUE show that fully connected works also? What would be useful would be a NN-only chess tournament, maybe with some sort of solid incentive for winning. Currently LC0 would presumably win that hands down, but ...
I have heard CNNs need less games for training, not sure how this compares in
chess to MLP/NNUE. If we assume a depth 1 search only, there is no reason not
to use NNUE even on GPU, maybe with extension of 30 distinct piece-count-indexed
NNs...

--
Srdja
Hehe, depth 1 NN chess would take them all back to the 1980s.
I didn’t look into NNUE training, but I guess the UE bit doesn’t refer to training? Also a while since, but isn’t training of even sizeable nets via GPU batching running at tens of thousands of positions per second? Maybe being random and out of loop here.
I mean you need fewer RL games played on the GPU-cloud-cluster to get the CNN saturated compared to MLP, not sure if this is fact, maybe some NN creater in here can clarify how LC0 CNN compares to NNUE in terms of games needed and horse power to train these nets.

chrisw wrote: ↑Fri Aug 28, 2020 4:48 pm If you do ply one matches only, then the incentive for NNUE disappears, surely? NNUE speed up is only really useful for search, other than the UE trick, it’s just a normal fully connected net with some simple non-linearity built into the inputs, no?
Hmm, not quite sure where the diff between UE and NNUE is,

There isn’t such a thing as a UE to be different from NNUE, I was just trying to stress the Efficiently Updatable aspect. NNUE gives the same result as the equivalent NN, the UE trick just allows for faster computation, giving same eval result, just quicker.

if you perform incremental updates for the NN in the first layer, where most of the weights are, then you get imo the first layer for "free", if during AB search, or during game play does not matter, or?

--
Srdja
Yup. Prob we have a wires crossed argument. I was proposing a ply one (zero search) NN competition. In this case being able to compute the network in ten microseconds as opposed to ten milliseconds is not important. In a searching, N ply, environment it is important.

...just fantasizing, maybe there are different kind of input layers with NNUE
possible, with billions instead of millions of weights in the first layer which
we get for free via incremental updates...

Like Laskos pointed out, even if new structured/layered NNs offer more Elo gain
per increased net size, we still need games for training, I guess this is the
bottleneck of such an ply one NN behemot, the GPU-cloud-cluster to play RL
games.

--
Srdja

Dann Corbit · Post by **Dann Corbit** » Sun Aug 30, 2020 10:15 am

I guess that nothing is optimal.
The chess nets we use with SF were developed for Shogi.

Why is the size optimal for chess with CPU only?

The net size for LC0 was magically pulled out of a hat. It is correct?

Soon there will be transparent memory access and CPU will be able to read HBM.

We are in the trembling toddler stage of neural nets. In ten years, we will laugh at where we are now.
I am pretty sure, in this instance, I am a prophet.

Raphexon · Post by **Raphexon** » Sun Aug 30, 2020 11:11 am

Dann Corbit wrote: ↑Sun Aug 30, 2020 10:15 am I guess that nothing is optimal.
The chess nets we use with SF were developed for Shogi.

Why is the size optimal for chess with CPU only?

The net size for LC0 was magically pulled out of a hat. It is correct?

Soon there will be transparent memory access and CPU will be able to read HBM.

We are in the trembling toddler stage of neural nets. In ten years, we will laugh at where we are now.
I am pretty sure, in this instance, I am a prophet.

Lco's architecture is optimized for GPU.

chrisw · Post by **chrisw** » Sun Aug 30, 2020 11:14 am

smatovic wrote: ↑Sun Aug 30, 2020 8:49 am
chrisw wrote: ↑Sat Aug 29, 2020 1:23 am
smatovic wrote: ↑Fri Aug 28, 2020 5:35 pm
chrisw wrote: ↑Fri Aug 28, 2020 4:48 pm
smatovic wrote: ↑Fri Aug 28, 2020 3:28 pm
chrisw wrote: ↑Fri Aug 28, 2020 2:57 pm
smatovic wrote: ↑Fri Aug 28, 2020 1:16 pm I know, LC0's primary goal was an open source adaptation of A0, and I am not
into the Discord development discussions and alike, anyway, my 2 cents on this:

- MCTS-PUCT search was an descendant from AlphaGo, generalized to be applied on
Go, Shogi and Chess, it can utilize a GPU via batches but has its known weak-
nesses, tactics in form of "shallow-traps" in a row, and end-game.

- A CPU AB search will not work with NN on GPU via batches.

- NNUE makes no sense on GPU.

- LC0 has a GPU-cloud-cluster to play Reinforcement Learning games.

- LC0 plays already ~2400(?) Elo with an depth 1 search alone.

- It is estimated that the NN eval is worth 4 plies AB search.

Looking at the above points it seems pretty obvious what the next step for LC0
could be, drop the weak part, MCTS-PUCT search, ignore AB and NNUE, and focus
what LC0 is good at. Increase the plies encoded in NN, increase the Elo at
depth 1 eval.

To put it to an extreme, drop the search part completely, increase the CNN size
1000 fold, decrease NPS from ~50K to ~50, add multiple, increasing sized NNs
to be queried stepwise for Time Control.

Just thinking loud...

--
Srdja
Nice idea, but why still with CNN? Didn't NNUE show that fully connected works also? What would be useful would be a NN-only chess tournament, maybe with some sort of solid incentive for winning. Currently LC0 would presumably win that hands down, but ...
I have heard CNNs need less games for training, not sure how this compares in
chess to MLP/NNUE. If we assume a depth 1 search only, there is no reason not
to use NNUE even on GPU, maybe with extension of 30 distinct piece-count-indexed
NNs...

--
Srdja
Hehe, depth 1 NN chess would take them all back to the 1980s.
I didn’t look into NNUE training, but I guess the UE bit doesn’t refer to training? Also a while since, but isn’t training of even sizeable nets via GPU batching running at tens of thousands of positions per second? Maybe being random and out of loop here.
I mean you need fewer RL games played on the GPU-cloud-cluster to get the CNN saturated compared to MLP, not sure if this is fact, maybe some NN creater in here can clarify how LC0 CNN compares to NNUE in terms of games needed and horse power to train these nets.

chrisw wrote: ↑Fri Aug 28, 2020 4:48 pm If you do ply one matches only, then the incentive for NNUE disappears, surely? NNUE speed up is only really useful for search, other than the UE trick, it’s just a normal fully connected net with some simple non-linearity built into the inputs, no?
Hmm, not quite sure where the diff between UE and NNUE is,

There isn’t such a thing as a UE to be different from NNUE, I was just trying to stress the Efficiently Updatable aspect. NNUE gives the same result as the equivalent NN, the UE trick just allows for faster computation, giving same eval result, just quicker.

if you perform incremental updates for the NN in the first layer, where most of the weights are, then you get imo the first layer for "free", if during AB search, or during game play does not matter, or?

--
Srdja
Yup. Prob we have a wires crossed argument. I was proposing a ply one (zero search) NN competition. In this case being able to compute the network in ten microseconds as opposed to ten milliseconds is not important. In a searching, N ply, environment it is important.
...just fantasizing, maybe there are different kind of input layers with NNUE
possible, with billions instead of millions of weights in the first layer which
we get for free via incremental updates...

Yes, we can creatively create NNUE with way more inputs, both by preprocessing and/or by multiplying pairs of the 64x6x2 raw inputs, or forming some other functions of them to use as input layer. But, the more input created this way, the more pathways to the first layer that need to be incrementally updated, so the “free” can start to get expensive.

Like Laskos pointed out, even if new structured/layered NNs offer more Elo gain
per increased net size, we still need games for training, I guess this is the
bottleneck of such an ply one NN behemot, the GPU-cloud-cluster to play RL
games.

--
Srdja

Yes, it’s conventionally a bottleneck because selecting a move in an RL game is done by NN+search algorithm, ie each training position arises from doing many (is it still 800 nodes for LC0, I am not in that loop?) nodes.
But there are two aspects that ameliorate that problem, or even turn it on it’s head:

1. The RL games could be produced by NN and no search, so that takes us one to one (feed forward to back propagation).

2. NNUE feed forward (cost of RL) is way faster than NNUE back propagation (cost of training), well, I’m assuming here NNUE trick isn’t yet optimised for back propagation.

In practice, one could possibly choose some low N nodes for producing the RL games such that throughput was continuous. Well, wishful thinking maybe.

The next step for LC0?

Re: The next step for LC0?

Re: The next step for LC0?

Re: The next step for LC0?

Re: The next step for LC0?

Re: The next step for LC0?