The next step for LC0?

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

smatovic
Posts: 2658
Joined: Wed Mar 10, 2010 10:18 pm
Location: Hamburg, Germany
Full name: Srdja Matovic

The next step for LC0?

Post by smatovic »

I know, LC0's primary goal was an open source adaptation of A0, and I am not
into the Discord development discussions and alike, anyway, my 2 cents on this:

- MCTS-PUCT search was an descendant from AlphaGo, generalized to be applied on
Go, Shogi and Chess, it can utilize a GPU via batches but has its known weak-
nesses, tactics in form of "shallow-traps" in a row, and end-game.

- A CPU AB search will not work with NN on GPU via batches.

- NNUE makes no sense on GPU.

- LC0 has a GPU-cloud-cluster to play Reinforcement Learning games.

- LC0 plays already ~2400(?) Elo with an depth 1 search alone.

- It is estimated that the NN eval is worth 4 plies AB search.

Looking at the above points it seems pretty obvious what the next step for LC0
could be, drop the weak part, MCTS-PUCT search, ignore AB and NNUE, and focus
what LC0 is good at. Increase the plies encoded in NN, increase the Elo at
depth 1 eval.

To put it to an extreme, drop the search part completely, increase the CNN size
1000 fold, decrease NPS from ~50K to ~50, add multiple, increasing sized NNs
to be queried stepwise for Time Control.

Just thinking loud...

--
Srdja
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: The next step for LC0?

Post by Laskos »

smatovic wrote: Fri Aug 28, 2020 1:16 pm I know, LC0's primary goal was an open source adaptation of A0, and I am not
into the Discord development discussions and alike, anyway, my 2 cents on this:

- MCTS-PUCT search was an descendant from AlphaGo, generalized to be applied on
Go, Shogi and Chess, it can utilize a GPU via batches but has its known weak-
nesses, tactics in form of "shallow-traps" in a row, and end-game.

- A CPU AB search will not work with NN on GPU via batches.

- NNUE makes no sense on GPU.

- LC0 has a GPU-cloud-cluster to play Reinforcement Learning games.

- LC0 plays already ~2400(?) Elo with an depth 1 search alone.

- It is estimated that the NN eval is worth 4 plies AB search.

Looking at the above points it seems pretty obvious what the next step for LC0
could be, drop the weak part, MCTS-PUCT search, ignore AB and NNUE, and focus
what LC0 is good at. Increase the plies encoded in NN, increase the Elo at
depth 1 eval.

To put it to an extreme, drop the search part completely, increase the CNN size
1000 fold, decrease NPS from ~50K to ~50, add multiple, increasing sized NNs
to be queried stepwise for Time Control.

Just thinking loud...

--
Srdja

Nah...


1 node per move, factor of 20 net size:

Score of SV_384x30_4585 vs 128x10_T70: 152 - 24 - 24 [0.820] 200
Elo difference: 263.4 +/- 57.0, LOS: 100.0 %, DrawRatio: 12.0 %

So, one can roughly say that 128x10 best nets at one node are some 2100 Elo and 384x30 best nets at one node are some 2350 Elo. And a factor of 10 for the net size gives about 200 Elo improvement at 1 node. Even with the domain of fantasy 1000x larger nets, the Elo at 1 node will reach 2950 only, far lower than 3500-3600 of top engines using search.

Maybe quite the opposite, for Lc0 the best hope might be to seriously improve the search, be it PUCT or other. Training even a 10x larger net is very hard at current time and hardware.
chrisw
Posts: 4319
Joined: Tue Apr 03, 2012 4:28 pm

Re: The next step for LC0?

Post by chrisw »

smatovic wrote: Fri Aug 28, 2020 1:16 pm I know, LC0's primary goal was an open source adaptation of A0, and I am not
into the Discord development discussions and alike, anyway, my 2 cents on this:

- MCTS-PUCT search was an descendant from AlphaGo, generalized to be applied on
Go, Shogi and Chess, it can utilize a GPU via batches but has its known weak-
nesses, tactics in form of "shallow-traps" in a row, and end-game.

- A CPU AB search will not work with NN on GPU via batches.

- NNUE makes no sense on GPU.

- LC0 has a GPU-cloud-cluster to play Reinforcement Learning games.

- LC0 plays already ~2400(?) Elo with an depth 1 search alone.

- It is estimated that the NN eval is worth 4 plies AB search.

Looking at the above points it seems pretty obvious what the next step for LC0
could be, drop the weak part, MCTS-PUCT search, ignore AB and NNUE, and focus
what LC0 is good at. Increase the plies encoded in NN, increase the Elo at
depth 1 eval.

To put it to an extreme, drop the search part completely, increase the CNN size
1000 fold, decrease NPS from ~50K to ~50, add multiple, increasing sized NNs
to be queried stepwise for Time Control.

Just thinking loud...

--
Srdja
Nice idea, but why still with CNN? Didn't NNUE show that fully connected works also? What would be useful would be a NN-only chess tournament, maybe with some sort of solid incentive for winning. Currently LC0 would presumably win that hands down, but ...
smatovic
Posts: 2658
Joined: Wed Mar 10, 2010 10:18 pm
Location: Hamburg, Germany
Full name: Srdja Matovic

Re: The next step for LC0?

Post by smatovic »

Laskos wrote: Fri Aug 28, 2020 1:57 pm
smatovic wrote: Fri Aug 28, 2020 1:16 pm I know, LC0's primary goal was an open source adaptation of A0, and I am not
into the Discord development discussions and alike, anyway, my 2 cents on this:

- MCTS-PUCT search was an descendant from AlphaGo, generalized to be applied on
Go, Shogi and Chess, it can utilize a GPU via batches but has its known weak-
nesses, tactics in form of "shallow-traps" in a row, and end-game.

- A CPU AB search will not work with NN on GPU via batches.

- NNUE makes no sense on GPU.

- LC0 has a GPU-cloud-cluster to play Reinforcement Learning games.

- LC0 plays already ~2400(?) Elo with an depth 1 search alone.

- It is estimated that the NN eval is worth 4 plies AB search.

Looking at the above points it seems pretty obvious what the next step for LC0
could be, drop the weak part, MCTS-PUCT search, ignore AB and NNUE, and focus
what LC0 is good at. Increase the plies encoded in NN, increase the Elo at
depth 1 eval.

To put it to an extreme, drop the search part completely, increase the CNN size
1000 fold, decrease NPS from ~50K to ~50, add multiple, increasing sized NNs
to be queried stepwise for Time Control.

Just thinking loud...

--
Srdja

Nah...


1 node per move, factor of 20 net size:

Score of SV_384x30_4585 vs 128x10_T70: 152 - 24 - 24 [0.820] 200
Elo difference: 263.4 +/- 57.0, LOS: 100.0 %, DrawRatio: 12.0 %

So, one can roughly say that 128x10 best nets at one node are some 2100 Elo and 384x30 best nets at one node are some 2350 Elo. And a factor of 10 for the net size gives about 200 Elo improvement at 1 node. Even with the domain of fantasy 1000x larger nets, the Elo at 1 node will reach 2950 only, far lower than 3500-3600 of top engines using search.

Maybe quite the opposite, for Lc0 the best hope might be to seriously improve the search, be it PUCT or other. Training even a 10x larger net is very hard at current time and hardware.
Thx for the numbers, you are right if we assume that Elo gain scales linear
to net size, but I am not sure if new/differently structured/layerd NNs do not
add Elo in a kind of "new evaluation terms encoded" or simply by the "quality of
evaluation terms encoded", or alike, maybe you get what I mean.

I am not into these nets, did the SV_384x30_4585 net already reach saturation?
How many games were used to train SV_384x30 compared to 128x10_T70?

--
Srdja
smatovic
Posts: 2658
Joined: Wed Mar 10, 2010 10:18 pm
Location: Hamburg, Germany
Full name: Srdja Matovic

Re: The next step for LC0?

Post by smatovic »

chrisw wrote: Fri Aug 28, 2020 2:57 pm
smatovic wrote: Fri Aug 28, 2020 1:16 pm I know, LC0's primary goal was an open source adaptation of A0, and I am not
into the Discord development discussions and alike, anyway, my 2 cents on this:

- MCTS-PUCT search was an descendant from AlphaGo, generalized to be applied on
Go, Shogi and Chess, it can utilize a GPU via batches but has its known weak-
nesses, tactics in form of "shallow-traps" in a row, and end-game.

- A CPU AB search will not work with NN on GPU via batches.

- NNUE makes no sense on GPU.

- LC0 has a GPU-cloud-cluster to play Reinforcement Learning games.

- LC0 plays already ~2400(?) Elo with an depth 1 search alone.

- It is estimated that the NN eval is worth 4 plies AB search.

Looking at the above points it seems pretty obvious what the next step for LC0
could be, drop the weak part, MCTS-PUCT search, ignore AB and NNUE, and focus
what LC0 is good at. Increase the plies encoded in NN, increase the Elo at
depth 1 eval.

To put it to an extreme, drop the search part completely, increase the CNN size
1000 fold, decrease NPS from ~50K to ~50, add multiple, increasing sized NNs
to be queried stepwise for Time Control.

Just thinking loud...

--
Srdja
Nice idea, but why still with CNN? Didn't NNUE show that fully connected works also? What would be useful would be a NN-only chess tournament, maybe with some sort of solid incentive for winning. Currently LC0 would presumably win that hands down, but ...
I have heard CNNs need less games for training, not sure how this compares in
chess to MLP/NNUE. If we assume a depth 1 search only, there is no reason not
to use NNUE even on GPU, maybe with extension of 30 distinct piece-count-indexed
NNs...

--
Srdja
chrisw
Posts: 4319
Joined: Tue Apr 03, 2012 4:28 pm

Re: The next step for LC0?

Post by chrisw »

smatovic wrote: Fri Aug 28, 2020 3:28 pm
chrisw wrote: Fri Aug 28, 2020 2:57 pm
smatovic wrote: Fri Aug 28, 2020 1:16 pm I know, LC0's primary goal was an open source adaptation of A0, and I am not
into the Discord development discussions and alike, anyway, my 2 cents on this:

- MCTS-PUCT search was an descendant from AlphaGo, generalized to be applied on
Go, Shogi and Chess, it can utilize a GPU via batches but has its known weak-
nesses, tactics in form of "shallow-traps" in a row, and end-game.

- A CPU AB search will not work with NN on GPU via batches.

- NNUE makes no sense on GPU.

- LC0 has a GPU-cloud-cluster to play Reinforcement Learning games.

- LC0 plays already ~2400(?) Elo with an depth 1 search alone.

- It is estimated that the NN eval is worth 4 plies AB search.

Looking at the above points it seems pretty obvious what the next step for LC0
could be, drop the weak part, MCTS-PUCT search, ignore AB and NNUE, and focus
what LC0 is good at. Increase the plies encoded in NN, increase the Elo at
depth 1 eval.

To put it to an extreme, drop the search part completely, increase the CNN size
1000 fold, decrease NPS from ~50K to ~50, add multiple, increasing sized NNs
to be queried stepwise for Time Control.

Just thinking loud...

--
Srdja
Nice idea, but why still with CNN? Didn't NNUE show that fully connected works also? What would be useful would be a NN-only chess tournament, maybe with some sort of solid incentive for winning. Currently LC0 would presumably win that hands down, but ...
I have heard CNNs need less games for training, not sure how this compares in
chess to MLP/NNUE. If we assume a depth 1 search only, there is no reason not
to use NNUE even on GPU, maybe with extension of 30 distinct piece-count-indexed
NNs...

--
Srdja
Hehe, depth 1 NN chess would take them all back to the 1980s.
I didn’t look into NNUE training, but I guess the UE bit doesn’t refer to training? Also a while since, but isn’t training of even sizeable nets via GPU batching running at tens of thousands of positions per second? Maybe being random and out of loop here.

If you do ply one matches only, then the incentive for NNUE disappears, surely? NNUE speed up is only really useful for search, other than the UE trick, it’s just a normal fully connected net with some simple non-linearity built into the inputs, no?
smatovic
Posts: 2658
Joined: Wed Mar 10, 2010 10:18 pm
Location: Hamburg, Germany
Full name: Srdja Matovic

Re: The next step for LC0?

Post by smatovic »

chrisw wrote: Fri Aug 28, 2020 4:48 pm
smatovic wrote: Fri Aug 28, 2020 3:28 pm
chrisw wrote: Fri Aug 28, 2020 2:57 pm
smatovic wrote: Fri Aug 28, 2020 1:16 pm I know, LC0's primary goal was an open source adaptation of A0, and I am not
into the Discord development discussions and alike, anyway, my 2 cents on this:

- MCTS-PUCT search was an descendant from AlphaGo, generalized to be applied on
Go, Shogi and Chess, it can utilize a GPU via batches but has its known weak-
nesses, tactics in form of "shallow-traps" in a row, and end-game.

- A CPU AB search will not work with NN on GPU via batches.

- NNUE makes no sense on GPU.

- LC0 has a GPU-cloud-cluster to play Reinforcement Learning games.

- LC0 plays already ~2400(?) Elo with an depth 1 search alone.

- It is estimated that the NN eval is worth 4 plies AB search.

Looking at the above points it seems pretty obvious what the next step for LC0
could be, drop the weak part, MCTS-PUCT search, ignore AB and NNUE, and focus
what LC0 is good at. Increase the plies encoded in NN, increase the Elo at
depth 1 eval.

To put it to an extreme, drop the search part completely, increase the CNN size
1000 fold, decrease NPS from ~50K to ~50, add multiple, increasing sized NNs
to be queried stepwise for Time Control.

Just thinking loud...

--
Srdja
Nice idea, but why still with CNN? Didn't NNUE show that fully connected works also? What would be useful would be a NN-only chess tournament, maybe with some sort of solid incentive for winning. Currently LC0 would presumably win that hands down, but ...
I have heard CNNs need less games for training, not sure how this compares in
chess to MLP/NNUE. If we assume a depth 1 search only, there is no reason not
to use NNUE even on GPU, maybe with extension of 30 distinct piece-count-indexed
NNs...

--
Srdja
Hehe, depth 1 NN chess would take them all back to the 1980s.
I didn’t look into NNUE training, but I guess the UE bit doesn’t refer to training? Also a while since, but isn’t training of even sizeable nets via GPU batching running at tens of thousands of positions per second? Maybe being random and out of loop here.
I mean you need fewer RL games played on the GPU-cloud-cluster to get the CNN saturated compared to MLP, not sure if this is fact, maybe some NN creater in here can clarify how LC0 CNN compares to NNUE in terms of games needed and horse power to train these nets.
chrisw wrote: Fri Aug 28, 2020 4:48 pm If you do ply one matches only, then the incentive for NNUE disappears, surely? NNUE speed up is only really useful for search, other than the UE trick, it’s just a normal fully connected net with some simple non-linearity built into the inputs, no?
Hmm, not quite sure where the diff between UE and NNUE is, if you perform incremental updates for the NN in the first layer, where most of the weights are, then you get imo the first layer for "free", if during AB search, or during game play does not matter, or?

--
Srdja
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: The next step for LC0?

Post by Laskos »

smatovic wrote: Fri Aug 28, 2020 3:27 pm
Laskos wrote: Fri Aug 28, 2020 1:57 pm
smatovic wrote: Fri Aug 28, 2020 1:16 pm I know, LC0's primary goal was an open source adaptation of A0, and I am not
into the Discord development discussions and alike, anyway, my 2 cents on this:

- MCTS-PUCT search was an descendant from AlphaGo, generalized to be applied on
Go, Shogi and Chess, it can utilize a GPU via batches but has its known weak-
nesses, tactics in form of "shallow-traps" in a row, and end-game.

- A CPU AB search will not work with NN on GPU via batches.

- NNUE makes no sense on GPU.

- LC0 has a GPU-cloud-cluster to play Reinforcement Learning games.

- LC0 plays already ~2400(?) Elo with an depth 1 search alone.

- It is estimated that the NN eval is worth 4 plies AB search.

Looking at the above points it seems pretty obvious what the next step for LC0
could be, drop the weak part, MCTS-PUCT search, ignore AB and NNUE, and focus
what LC0 is good at. Increase the plies encoded in NN, increase the Elo at
depth 1 eval.

To put it to an extreme, drop the search part completely, increase the CNN size
1000 fold, decrease NPS from ~50K to ~50, add multiple, increasing sized NNs
to be queried stepwise for Time Control.

Just thinking loud...

--
Srdja

Nah...


1 node per move, factor of 20 net size:

Score of SV_384x30_4585 vs 128x10_T70: 152 - 24 - 24 [0.820] 200
Elo difference: 263.4 +/- 57.0, LOS: 100.0 %, DrawRatio: 12.0 %

So, one can roughly say that 128x10 best nets at one node are some 2100 Elo and 384x30 best nets at one node are some 2350 Elo. And a factor of 10 for the net size gives about 200 Elo improvement at 1 node. Even with the domain of fantasy 1000x larger nets, the Elo at 1 node will reach 2950 only, far lower than 3500-3600 of top engines using search.

Maybe quite the opposite, for Lc0 the best hope might be to seriously improve the search, be it PUCT or other. Training even a 10x larger net is very hard at current time and hardware.
Thx for the numbers, you are right if we assume that Elo gain scales linear
to net size, but I am not sure if new/differently structured/layerd NNs do not
add Elo in a kind of "new evaluation terms encoded" or simply by the "quality of
evaluation terms encoded", or alike, maybe you get what I mean.

I am not into these nets, did the SV_384x30_4585 net already reach saturation?
How many games were used to train SV_384x30 compared to 128x10_T70?

--
Srdja
SV_384x30 nets use data from T60 run. T60 and T70 runs IIRC have comparable number of nets, probably T60 some more, but the hardware effort is 20 times larger to the same number of games and nodes per move for T60 large nets nets compared to T70 smallnets.

SV_384x30 nets did not reach saturation, but I guess are not that far-off of reaching it, maybe 50 Elo points. T60 and T70 pretty much reached saturation, not even sure if T60 is still trained. T60 320x24 is not a particularly successful run, at most normal time controls it seems only some 50 Elo points stronger than T40 256x20 nets. And weaker than 256x20 LS15 net. Even at fixed nodes T60 320x24 is level with LS15 256x20.

Yes, a "very large net" program would need a serious revision of the structure of the net.
Dann Corbit
Posts: 12541
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: The next step for LC0?

Post by Dann Corbit »

Before long, cooperative algorithms with the CPU and GPU using the same physical memory will be possible. I think RDNA 3 is needed, but that won't be far off.
I think that what AMD is doing with infinity fabric is taking on a whole new kind of brilliance. It might be a real revolution in computation.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
Dann Corbit
Posts: 12541
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: The next step for LC0?

Post by Dann Corbit »

So my direction for LC0 would be to perform the same kind of effort for native API level operations using NVIDIA, but instead using AMD.
Once transparent memory access happens, the NVIDIA cards will only be for gamers.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.