The next step for LC0?

smatovic · Post by **smatovic** » Fri Aug 28, 2020 1:16 pm

I know, LC0's primary goal was an open source adaptation of A0, and I am not
into the Discord development discussions and alike, anyway, my 2 cents on this:

- MCTS-PUCT search was an descendant from AlphaGo, generalized to be applied on
Go, Shogi and Chess, it can utilize a GPU via batches but has its known weak-
nesses, tactics in form of "shallow-traps" in a row, and end-game.

- A CPU AB search will not work with NN on GPU via batches.

- NNUE makes no sense on GPU.

- LC0 has a GPU-cloud-cluster to play Reinforcement Learning games.

- LC0 plays already ~2400(?) Elo with an depth 1 search alone.

- It is estimated that the NN eval is worth 4 plies AB search.

Looking at the above points it seems pretty obvious what the next step for LC0
could be, drop the weak part, MCTS-PUCT search, ignore AB and NNUE, and focus
what LC0 is good at. Increase the plies encoded in NN, increase the Elo at
depth 1 eval.

To put it to an extreme, drop the search part completely, increase the CNN size
1000 fold, decrease NPS from ~50K to ~50, add multiple, increasing sized NNs
to be queried stepwise for Time Control.

Just thinking loud...

--
Srdja

Laskos · Post by **Laskos** » Fri Aug 28, 2020 1:57 pm

smatovic wrote: ↑Fri Aug 28, 2020 1:16 pm I know, LC0's primary goal was an open source adaptation of A0, and I am not
into the Discord development discussions and alike, anyway, my 2 cents on this:

- MCTS-PUCT search was an descendant from AlphaGo, generalized to be applied on
Go, Shogi and Chess, it can utilize a GPU via batches but has its known weak-
nesses, tactics in form of "shallow-traps" in a row, and end-game.

- A CPU AB search will not work with NN on GPU via batches.

- NNUE makes no sense on GPU.

- LC0 has a GPU-cloud-cluster to play Reinforcement Learning games.

- LC0 plays already ~2400(?) Elo with an depth 1 search alone.

- It is estimated that the NN eval is worth 4 plies AB search.

Looking at the above points it seems pretty obvious what the next step for LC0
could be, drop the weak part, MCTS-PUCT search, ignore AB and NNUE, and focus
what LC0 is good at. Increase the plies encoded in NN, increase the Elo at
depth 1 eval.

To put it to an extreme, drop the search part completely, increase the CNN size
1000 fold, decrease NPS from ~50K to ~50, add multiple, increasing sized NNs
to be queried stepwise for Time Control.

Just thinking loud...

--
Srdja

Nah...

1 node per move, factor of 20 net size:

Score of SV_384x30_4585 vs 128x10_T70: 152 - 24 - 24 [0.820] 200
Elo difference: 263.4 +/- 57.0, LOS: 100.0 %, DrawRatio: 12.0 %

So, one can roughly say that 128x10 best nets at one node are some 2100 Elo and 384x30 best nets at one node are some 2350 Elo. And a factor of 10 for the net size gives about 200 Elo improvement at 1 node. Even with the domain of fantasy 1000x larger nets, the Elo at 1 node will reach 2950 only, far lower than 3500-3600 of top engines using search.

Maybe quite the opposite, for Lc0 the best hope might be to seriously improve the search, be it PUCT or other. Training even a 10x larger net is very hard at current time and hardware.

chrisw · Post by **chrisw** » Fri Aug 28, 2020 2:57 pm

smatovic wrote: ↑Fri Aug 28, 2020 1:16 pm I know, LC0's primary goal was an open source adaptation of A0, and I am not
into the Discord development discussions and alike, anyway, my 2 cents on this:

- MCTS-PUCT search was an descendant from AlphaGo, generalized to be applied on
Go, Shogi and Chess, it can utilize a GPU via batches but has its known weak-
nesses, tactics in form of "shallow-traps" in a row, and end-game.

- A CPU AB search will not work with NN on GPU via batches.

- NNUE makes no sense on GPU.

- LC0 has a GPU-cloud-cluster to play Reinforcement Learning games.

- LC0 plays already ~2400(?) Elo with an depth 1 search alone.

- It is estimated that the NN eval is worth 4 plies AB search.

Looking at the above points it seems pretty obvious what the next step for LC0
could be, drop the weak part, MCTS-PUCT search, ignore AB and NNUE, and focus
what LC0 is good at. Increase the plies encoded in NN, increase the Elo at
depth 1 eval.

To put it to an extreme, drop the search part completely, increase the CNN size
1000 fold, decrease NPS from ~50K to ~50, add multiple, increasing sized NNs
to be queried stepwise for Time Control.

Just thinking loud...

--
Srdja

Nice idea, but why still with CNN? Didn't NNUE show that fully connected works also? What would be useful would be a NN-only chess tournament, maybe with some sort of solid incentive for winning. Currently LC0 would presumably win that hands down, but ...

smatovic · Post by **smatovic** » Fri Aug 28, 2020 3:27 pm

Laskos wrote: ↑Fri Aug 28, 2020 1:57 pm
smatovic wrote: ↑Fri Aug 28, 2020 1:16 pm I know, LC0's primary goal was an open source adaptation of A0, and I am not
into the Discord development discussions and alike, anyway, my 2 cents on this:

- MCTS-PUCT search was an descendant from AlphaGo, generalized to be applied on
Go, Shogi and Chess, it can utilize a GPU via batches but has its known weak-
nesses, tactics in form of "shallow-traps" in a row, and end-game.

- A CPU AB search will not work with NN on GPU via batches.

- NNUE makes no sense on GPU.

- LC0 has a GPU-cloud-cluster to play Reinforcement Learning games.

- LC0 plays already ~2400(?) Elo with an depth 1 search alone.

- It is estimated that the NN eval is worth 4 plies AB search.

Looking at the above points it seems pretty obvious what the next step for LC0
could be, drop the weak part, MCTS-PUCT search, ignore AB and NNUE, and focus
what LC0 is good at. Increase the plies encoded in NN, increase the Elo at
depth 1 eval.

To put it to an extreme, drop the search part completely, increase the CNN size
1000 fold, decrease NPS from ~50K to ~50, add multiple, increasing sized NNs
to be queried stepwise for Time Control.

Just thinking loud...

--
Srdja

Nah...

1 node per move, factor of 20 net size:

Score of SV_384x30_4585 vs 128x10_T70: 152 - 24 - 24 [0.820] 200
Elo difference: 263.4 +/- 57.0, LOS: 100.0 %, DrawRatio: 12.0 %

So, one can roughly say that 128x10 best nets at one node are some 2100 Elo and 384x30 best nets at one node are some 2350 Elo. And a factor of 10 for the net size gives about 200 Elo improvement at 1 node. Even with the domain of fantasy 1000x larger nets, the Elo at 1 node will reach 2950 only, far lower than 3500-3600 of top engines using search.

Maybe quite the opposite, for Lc0 the best hope might be to seriously improve the search, be it PUCT or other. Training even a 10x larger net is very hard at current time and hardware.

Thx for the numbers, you are right if we assume that Elo gain scales linear
to net size, but I am not sure if new/differently structured/layerd NNs do not
add Elo in a kind of "new evaluation terms encoded" or simply by the "quality of
evaluation terms encoded", or alike, maybe you get what I mean.

I am not into these nets, did the SV_384x30_4585 net already reach saturation?
How many games were used to train SV_384x30 compared to 128x10_T70?

--
Srdja

smatovic · Post by **smatovic** » Fri Aug 28, 2020 3:28 pm

chrisw wrote: ↑Fri Aug 28, 2020 2:57 pm
smatovic wrote: ↑Fri Aug 28, 2020 1:16 pm I know, LC0's primary goal was an open source adaptation of A0, and I am not
into the Discord development discussions and alike, anyway, my 2 cents on this:

- MCTS-PUCT search was an descendant from AlphaGo, generalized to be applied on
Go, Shogi and Chess, it can utilize a GPU via batches but has its known weak-
nesses, tactics in form of "shallow-traps" in a row, and end-game.

- A CPU AB search will not work with NN on GPU via batches.

- NNUE makes no sense on GPU.

- LC0 has a GPU-cloud-cluster to play Reinforcement Learning games.

- LC0 plays already ~2400(?) Elo with an depth 1 search alone.

- It is estimated that the NN eval is worth 4 plies AB search.

Looking at the above points it seems pretty obvious what the next step for LC0
could be, drop the weak part, MCTS-PUCT search, ignore AB and NNUE, and focus
what LC0 is good at. Increase the plies encoded in NN, increase the Elo at
depth 1 eval.

To put it to an extreme, drop the search part completely, increase the CNN size
1000 fold, decrease NPS from ~50K to ~50, add multiple, increasing sized NNs
to be queried stepwise for Time Control.

Just thinking loud...

--
Srdja
Nice idea, but why still with CNN? Didn't NNUE show that fully connected works also? What would be useful would be a NN-only chess tournament, maybe with some sort of solid incentive for winning. Currently LC0 would presumably win that hands down, but ...

I have heard CNNs need less games for training, not sure how this compares in
chess to MLP/NNUE. If we assume a depth 1 search only, there is no reason not
to use NNUE even on GPU, maybe with extension of 30 distinct piece-count-indexed
NNs...

--
Srdja

chrisw · Post by **chrisw** » Fri Aug 28, 2020 4:48 pm

smatovic wrote: ↑Fri Aug 28, 2020 3:28 pm
chrisw wrote: ↑Fri Aug 28, 2020 2:57 pm
smatovic wrote: ↑Fri Aug 28, 2020 1:16 pm I know, LC0's primary goal was an open source adaptation of A0, and I am not
into the Discord development discussions and alike, anyway, my 2 cents on this:

- MCTS-PUCT search was an descendant from AlphaGo, generalized to be applied on
Go, Shogi and Chess, it can utilize a GPU via batches but has its known weak-
nesses, tactics in form of "shallow-traps" in a row, and end-game.

- A CPU AB search will not work with NN on GPU via batches.

- NNUE makes no sense on GPU.

- LC0 has a GPU-cloud-cluster to play Reinforcement Learning games.

- LC0 plays already ~2400(?) Elo with an depth 1 search alone.

- It is estimated that the NN eval is worth 4 plies AB search.

Looking at the above points it seems pretty obvious what the next step for LC0
could be, drop the weak part, MCTS-PUCT search, ignore AB and NNUE, and focus
what LC0 is good at. Increase the plies encoded in NN, increase the Elo at
depth 1 eval.

To put it to an extreme, drop the search part completely, increase the CNN size
1000 fold, decrease NPS from ~50K to ~50, add multiple, increasing sized NNs
to be queried stepwise for Time Control.

Just thinking loud...

--
Srdja
Nice idea, but why still with CNN? Didn't NNUE show that fully connected works also? What would be useful would be a NN-only chess tournament, maybe with some sort of solid incentive for winning. Currently LC0 would presumably win that hands down, but ...
I have heard CNNs need less games for training, not sure how this compares in
chess to MLP/NNUE. If we assume a depth 1 search only, there is no reason not
to use NNUE even on GPU, maybe with extension of 30 distinct piece-count-indexed
NNs...

--
Srdja

Hehe, depth 1 NN chess would take them all back to the 1980s.
I didn’t look into NNUE training, but I guess the UE bit doesn’t refer to training? Also a while since, but isn’t training of even sizeable nets via GPU batching running at tens of thousands of positions per second? Maybe being random and out of loop here.

If you do ply one matches only, then the incentive for NNUE disappears, surely? NNUE speed up is only really useful for search, other than the UE trick, it’s just a normal fully connected net with some simple non-linearity built into the inputs, no?

smatovic · Post by **smatovic** » Fri Aug 28, 2020 5:35 pm

chrisw wrote: ↑Fri Aug 28, 2020 4:48 pm
smatovic wrote: ↑Fri Aug 28, 2020 3:28 pm
chrisw wrote: ↑Fri Aug 28, 2020 2:57 pm
smatovic wrote: ↑Fri Aug 28, 2020 1:16 pm I know, LC0's primary goal was an open source adaptation of A0, and I am not
into the Discord development discussions and alike, anyway, my 2 cents on this:

- MCTS-PUCT search was an descendant from AlphaGo, generalized to be applied on
Go, Shogi and Chess, it can utilize a GPU via batches but has its known weak-
nesses, tactics in form of "shallow-traps" in a row, and end-game.

- A CPU AB search will not work with NN on GPU via batches.

- NNUE makes no sense on GPU.

- LC0 has a GPU-cloud-cluster to play Reinforcement Learning games.

- LC0 plays already ~2400(?) Elo with an depth 1 search alone.

- It is estimated that the NN eval is worth 4 plies AB search.

Looking at the above points it seems pretty obvious what the next step for LC0
could be, drop the weak part, MCTS-PUCT search, ignore AB and NNUE, and focus
what LC0 is good at. Increase the plies encoded in NN, increase the Elo at
depth 1 eval.

To put it to an extreme, drop the search part completely, increase the CNN size
1000 fold, decrease NPS from ~50K to ~50, add multiple, increasing sized NNs
to be queried stepwise for Time Control.

Just thinking loud...

--
Srdja
Nice idea, but why still with CNN? Didn't NNUE show that fully connected works also? What would be useful would be a NN-only chess tournament, maybe with some sort of solid incentive for winning. Currently LC0 would presumably win that hands down, but ...
I have heard CNNs need less games for training, not sure how this compares in
chess to MLP/NNUE. If we assume a depth 1 search only, there is no reason not
to use NNUE even on GPU, maybe with extension of 30 distinct piece-count-indexed
NNs...

--
Srdja
Hehe, depth 1 NN chess would take them all back to the 1980s.
I didn’t look into NNUE training, but I guess the UE bit doesn’t refer to training? Also a while since, but isn’t training of even sizeable nets via GPU batching running at tens of thousands of positions per second? Maybe being random and out of loop here.

I mean you need fewer RL games played on the GPU-cloud-cluster to get the CNN saturated compared to MLP, not sure if this is fact, maybe some NN creater in here can clarify how LC0 CNN compares to NNUE in terms of games needed and horse power to train these nets.

chrisw wrote: ↑Fri Aug 28, 2020 4:48 pm If you do ply one matches only, then the incentive for NNUE disappears, surely? NNUE speed up is only really useful for search, other than the UE trick, it’s just a normal fully connected net with some simple non-linearity built into the inputs, no?

Hmm, not quite sure where the diff between UE and NNUE is, if you perform incremental updates for the NN in the first layer, where most of the weights are, then you get imo the first layer for "free", if during AB search, or during game play does not matter, or?

--
Srdja

Laskos · Post by **Laskos** » Fri Aug 28, 2020 7:03 pm

smatovic wrote: ↑Fri Aug 28, 2020 3:27 pm
Laskos wrote: ↑Fri Aug 28, 2020 1:57 pm
smatovic wrote: ↑Fri Aug 28, 2020 1:16 pm I know, LC0's primary goal was an open source adaptation of A0, and I am not
into the Discord development discussions and alike, anyway, my 2 cents on this:

- MCTS-PUCT search was an descendant from AlphaGo, generalized to be applied on
Go, Shogi and Chess, it can utilize a GPU via batches but has its known weak-
nesses, tactics in form of "shallow-traps" in a row, and end-game.

- A CPU AB search will not work with NN on GPU via batches.

- NNUE makes no sense on GPU.

- LC0 has a GPU-cloud-cluster to play Reinforcement Learning games.

- LC0 plays already ~2400(?) Elo with an depth 1 search alone.

- It is estimated that the NN eval is worth 4 plies AB search.

Looking at the above points it seems pretty obvious what the next step for LC0
could be, drop the weak part, MCTS-PUCT search, ignore AB and NNUE, and focus
what LC0 is good at. Increase the plies encoded in NN, increase the Elo at
depth 1 eval.

To put it to an extreme, drop the search part completely, increase the CNN size
1000 fold, decrease NPS from ~50K to ~50, add multiple, increasing sized NNs
to be queried stepwise for Time Control.

Just thinking loud...

--
Srdja

Nah...

1 node per move, factor of 20 net size:

Score of SV_384x30_4585 vs 128x10_T70: 152 - 24 - 24 [0.820] 200
Elo difference: 263.4 +/- 57.0, LOS: 100.0 %, DrawRatio: 12.0 %

So, one can roughly say that 128x10 best nets at one node are some 2100 Elo and 384x30 best nets at one node are some 2350 Elo. And a factor of 10 for the net size gives about 200 Elo improvement at 1 node. Even with the domain of fantasy 1000x larger nets, the Elo at 1 node will reach 2950 only, far lower than 3500-3600 of top engines using search.

Maybe quite the opposite, for Lc0 the best hope might be to seriously improve the search, be it PUCT or other. Training even a 10x larger net is very hard at current time and hardware.
Thx for the numbers, you are right if we assume that Elo gain scales linear
to net size, but I am not sure if new/differently structured/layerd NNs do not
add Elo in a kind of "new evaluation terms encoded" or simply by the "quality of
evaluation terms encoded", or alike, maybe you get what I mean.

I am not into these nets, did the SV_384x30_4585 net already reach saturation?
How many games were used to train SV_384x30 compared to 128x10_T70?

--
Srdja

SV_384x30 nets use data from T60 run. T60 and T70 runs IIRC have comparable number of nets, probably T60 some more, but the hardware effort is 20 times larger to the same number of games and nodes per move for T60 large nets nets compared to T70 smallnets.

SV_384x30 nets did not reach saturation, but I guess are not that far-off of reaching it, maybe 50 Elo points. T60 and T70 pretty much reached saturation, not even sure if T60 is still trained. T60 320x24 is not a particularly successful run, at most normal time controls it seems only some 50 Elo points stronger than T40 256x20 nets. And weaker than 256x20 LS15 net. Even at fixed nodes T60 320x24 is level with LS15 256x20.

Yes, a "very large net" program would need a serious revision of the structure of the net.

Dann Corbit · Post by **Dann Corbit** » Sat Aug 29, 2020 1:06 am

Before long, cooperative algorithms with the CPU and GPU using the same physical memory will be possible. I think RDNA 3 is needed, but that won't be far off.
I think that what AMD is doing with infinity fabric is taking on a whole new kind of brilliance. It might be a real revolution in computation.

Dann Corbit · Post by **Dann Corbit** » Sat Aug 29, 2020 1:08 am

So my direction for LC0 would be to perform the same kind of effort for native API level operations using NVIDIA, but instead using AMD.
Once transparent memory access happens, the NVIDIA cards will only be for gamers.

The next step for LC0?

The next step for LC0?

Re: The next step for LC0?

Re: The next step for LC0?

Re: The next step for LC0?

Re: The next step for LC0?

Re: The next step for LC0?

Re: The next step for LC0?

Re: The next step for LC0?

Re: The next step for LC0?

Re: The next step for LC0?