I know, LC0's primary goal was an open source adaptation of A0, and I am not
into the Discord development discussions and alike, anyway, my 2 cents on this:
- MCTS-PUCT search was an descendant from AlphaGo, generalized to be applied on
Go, Shogi and Chess, it can utilize a GPU via batches but has its known weak-
nesses, tactics in form of "shallow-traps" in a row, and end-game.
- A CPU AB search will not work with NN on GPU via batches.
- NNUE makes no sense on GPU.
- LC0 has a GPU-cloud-cluster to play Reinforcement Learning games.
- LC0 plays already ~2400(?) Elo with an depth 1 search alone.
- It is estimated that the NN eval is worth 4 plies AB search.
Looking at the above points it seems pretty obvious what the next step for LC0
could be, drop the weak part, MCTS-PUCT search, ignore AB and NNUE, and focus
what LC0 is good at. Increase the plies encoded in NN, increase the Elo at
depth 1 eval.
To put it to an extreme, drop the search part completely, increase the CNN size
1000 fold, decrease NPS from ~50K to ~50, add multiple, increasing sized NNs
to be queried stepwise for Time Control.
Just thinking loud...
--
Srdja
The next step for LC0?
Moderators: hgm, Rebel, chrisw
-
- Posts: 2658
- Joined: Wed Mar 10, 2010 10:18 pm
- Location: Hamburg, Germany
- Full name: Srdja Matovic
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: The next step for LC0?
smatovic wrote: ↑Fri Aug 28, 2020 1:16 pm I know, LC0's primary goal was an open source adaptation of A0, and I am not
into the Discord development discussions and alike, anyway, my 2 cents on this:
- MCTS-PUCT search was an descendant from AlphaGo, generalized to be applied on
Go, Shogi and Chess, it can utilize a GPU via batches but has its known weak-
nesses, tactics in form of "shallow-traps" in a row, and end-game.
- A CPU AB search will not work with NN on GPU via batches.
- NNUE makes no sense on GPU.
- LC0 has a GPU-cloud-cluster to play Reinforcement Learning games.
- LC0 plays already ~2400(?) Elo with an depth 1 search alone.
- It is estimated that the NN eval is worth 4 plies AB search.
Looking at the above points it seems pretty obvious what the next step for LC0
could be, drop the weak part, MCTS-PUCT search, ignore AB and NNUE, and focus
what LC0 is good at. Increase the plies encoded in NN, increase the Elo at
depth 1 eval.
To put it to an extreme, drop the search part completely, increase the CNN size
1000 fold, decrease NPS from ~50K to ~50, add multiple, increasing sized NNs
to be queried stepwise for Time Control.
Just thinking loud...
--
Srdja
Nah...
1 node per move, factor of 20 net size:
Score of SV_384x30_4585 vs 128x10_T70: 152 - 24 - 24 [0.820] 200
Elo difference: 263.4 +/- 57.0, LOS: 100.0 %, DrawRatio: 12.0 %
So, one can roughly say that 128x10 best nets at one node are some 2100 Elo and 384x30 best nets at one node are some 2350 Elo. And a factor of 10 for the net size gives about 200 Elo improvement at 1 node. Even with the domain of fantasy 1000x larger nets, the Elo at 1 node will reach 2950 only, far lower than 3500-3600 of top engines using search.
Maybe quite the opposite, for Lc0 the best hope might be to seriously improve the search, be it PUCT or other. Training even a 10x larger net is very hard at current time and hardware.
-
- Posts: 4319
- Joined: Tue Apr 03, 2012 4:28 pm
Re: The next step for LC0?
Nice idea, but why still with CNN? Didn't NNUE show that fully connected works also? What would be useful would be a NN-only chess tournament, maybe with some sort of solid incentive for winning. Currently LC0 would presumably win that hands down, but ...smatovic wrote: ↑Fri Aug 28, 2020 1:16 pm I know, LC0's primary goal was an open source adaptation of A0, and I am not
into the Discord development discussions and alike, anyway, my 2 cents on this:
- MCTS-PUCT search was an descendant from AlphaGo, generalized to be applied on
Go, Shogi and Chess, it can utilize a GPU via batches but has its known weak-
nesses, tactics in form of "shallow-traps" in a row, and end-game.
- A CPU AB search will not work with NN on GPU via batches.
- NNUE makes no sense on GPU.
- LC0 has a GPU-cloud-cluster to play Reinforcement Learning games.
- LC0 plays already ~2400(?) Elo with an depth 1 search alone.
- It is estimated that the NN eval is worth 4 plies AB search.
Looking at the above points it seems pretty obvious what the next step for LC0
could be, drop the weak part, MCTS-PUCT search, ignore AB and NNUE, and focus
what LC0 is good at. Increase the plies encoded in NN, increase the Elo at
depth 1 eval.
To put it to an extreme, drop the search part completely, increase the CNN size
1000 fold, decrease NPS from ~50K to ~50, add multiple, increasing sized NNs
to be queried stepwise for Time Control.
Just thinking loud...
--
Srdja
-
- Posts: 2658
- Joined: Wed Mar 10, 2010 10:18 pm
- Location: Hamburg, Germany
- Full name: Srdja Matovic
Re: The next step for LC0?
Thx for the numbers, you are right if we assume that Elo gain scales linearLaskos wrote: ↑Fri Aug 28, 2020 1:57 pmsmatovic wrote: ↑Fri Aug 28, 2020 1:16 pm I know, LC0's primary goal was an open source adaptation of A0, and I am not
into the Discord development discussions and alike, anyway, my 2 cents on this:
- MCTS-PUCT search was an descendant from AlphaGo, generalized to be applied on
Go, Shogi and Chess, it can utilize a GPU via batches but has its known weak-
nesses, tactics in form of "shallow-traps" in a row, and end-game.
- A CPU AB search will not work with NN on GPU via batches.
- NNUE makes no sense on GPU.
- LC0 has a GPU-cloud-cluster to play Reinforcement Learning games.
- LC0 plays already ~2400(?) Elo with an depth 1 search alone.
- It is estimated that the NN eval is worth 4 plies AB search.
Looking at the above points it seems pretty obvious what the next step for LC0
could be, drop the weak part, MCTS-PUCT search, ignore AB and NNUE, and focus
what LC0 is good at. Increase the plies encoded in NN, increase the Elo at
depth 1 eval.
To put it to an extreme, drop the search part completely, increase the CNN size
1000 fold, decrease NPS from ~50K to ~50, add multiple, increasing sized NNs
to be queried stepwise for Time Control.
Just thinking loud...
--
Srdja
Nah...
1 node per move, factor of 20 net size:
Score of SV_384x30_4585 vs 128x10_T70: 152 - 24 - 24 [0.820] 200
Elo difference: 263.4 +/- 57.0, LOS: 100.0 %, DrawRatio: 12.0 %
So, one can roughly say that 128x10 best nets at one node are some 2100 Elo and 384x30 best nets at one node are some 2350 Elo. And a factor of 10 for the net size gives about 200 Elo improvement at 1 node. Even with the domain of fantasy 1000x larger nets, the Elo at 1 node will reach 2950 only, far lower than 3500-3600 of top engines using search.
Maybe quite the opposite, for Lc0 the best hope might be to seriously improve the search, be it PUCT or other. Training even a 10x larger net is very hard at current time and hardware.
to net size, but I am not sure if new/differently structured/layerd NNs do not
add Elo in a kind of "new evaluation terms encoded" or simply by the "quality of
evaluation terms encoded", or alike, maybe you get what I mean.
I am not into these nets, did the SV_384x30_4585 net already reach saturation?
How many games were used to train SV_384x30 compared to 128x10_T70?
--
Srdja
-
- Posts: 2658
- Joined: Wed Mar 10, 2010 10:18 pm
- Location: Hamburg, Germany
- Full name: Srdja Matovic
Re: The next step for LC0?
I have heard CNNs need less games for training, not sure how this compares inchrisw wrote: ↑Fri Aug 28, 2020 2:57 pmNice idea, but why still with CNN? Didn't NNUE show that fully connected works also? What would be useful would be a NN-only chess tournament, maybe with some sort of solid incentive for winning. Currently LC0 would presumably win that hands down, but ...smatovic wrote: ↑Fri Aug 28, 2020 1:16 pm I know, LC0's primary goal was an open source adaptation of A0, and I am not
into the Discord development discussions and alike, anyway, my 2 cents on this:
- MCTS-PUCT search was an descendant from AlphaGo, generalized to be applied on
Go, Shogi and Chess, it can utilize a GPU via batches but has its known weak-
nesses, tactics in form of "shallow-traps" in a row, and end-game.
- A CPU AB search will not work with NN on GPU via batches.
- NNUE makes no sense on GPU.
- LC0 has a GPU-cloud-cluster to play Reinforcement Learning games.
- LC0 plays already ~2400(?) Elo with an depth 1 search alone.
- It is estimated that the NN eval is worth 4 plies AB search.
Looking at the above points it seems pretty obvious what the next step for LC0
could be, drop the weak part, MCTS-PUCT search, ignore AB and NNUE, and focus
what LC0 is good at. Increase the plies encoded in NN, increase the Elo at
depth 1 eval.
To put it to an extreme, drop the search part completely, increase the CNN size
1000 fold, decrease NPS from ~50K to ~50, add multiple, increasing sized NNs
to be queried stepwise for Time Control.
Just thinking loud...
--
Srdja
chess to MLP/NNUE. If we assume a depth 1 search only, there is no reason not
to use NNUE even on GPU, maybe with extension of 30 distinct piece-count-indexed
NNs...
--
Srdja
-
- Posts: 4319
- Joined: Tue Apr 03, 2012 4:28 pm
Re: The next step for LC0?
Hehe, depth 1 NN chess would take them all back to the 1980s.smatovic wrote: ↑Fri Aug 28, 2020 3:28 pmI have heard CNNs need less games for training, not sure how this compares inchrisw wrote: ↑Fri Aug 28, 2020 2:57 pmNice idea, but why still with CNN? Didn't NNUE show that fully connected works also? What would be useful would be a NN-only chess tournament, maybe with some sort of solid incentive for winning. Currently LC0 would presumably win that hands down, but ...smatovic wrote: ↑Fri Aug 28, 2020 1:16 pm I know, LC0's primary goal was an open source adaptation of A0, and I am not
into the Discord development discussions and alike, anyway, my 2 cents on this:
- MCTS-PUCT search was an descendant from AlphaGo, generalized to be applied on
Go, Shogi and Chess, it can utilize a GPU via batches but has its known weak-
nesses, tactics in form of "shallow-traps" in a row, and end-game.
- A CPU AB search will not work with NN on GPU via batches.
- NNUE makes no sense on GPU.
- LC0 has a GPU-cloud-cluster to play Reinforcement Learning games.
- LC0 plays already ~2400(?) Elo with an depth 1 search alone.
- It is estimated that the NN eval is worth 4 plies AB search.
Looking at the above points it seems pretty obvious what the next step for LC0
could be, drop the weak part, MCTS-PUCT search, ignore AB and NNUE, and focus
what LC0 is good at. Increase the plies encoded in NN, increase the Elo at
depth 1 eval.
To put it to an extreme, drop the search part completely, increase the CNN size
1000 fold, decrease NPS from ~50K to ~50, add multiple, increasing sized NNs
to be queried stepwise for Time Control.
Just thinking loud...
--
Srdja
chess to MLP/NNUE. If we assume a depth 1 search only, there is no reason not
to use NNUE even on GPU, maybe with extension of 30 distinct piece-count-indexed
NNs...
--
Srdja
I didn’t look into NNUE training, but I guess the UE bit doesn’t refer to training? Also a while since, but isn’t training of even sizeable nets via GPU batching running at tens of thousands of positions per second? Maybe being random and out of loop here.
If you do ply one matches only, then the incentive for NNUE disappears, surely? NNUE speed up is only really useful for search, other than the UE trick, it’s just a normal fully connected net with some simple non-linearity built into the inputs, no?
-
- Posts: 2658
- Joined: Wed Mar 10, 2010 10:18 pm
- Location: Hamburg, Germany
- Full name: Srdja Matovic
Re: The next step for LC0?
I mean you need fewer RL games played on the GPU-cloud-cluster to get the CNN saturated compared to MLP, not sure if this is fact, maybe some NN creater in here can clarify how LC0 CNN compares to NNUE in terms of games needed and horse power to train these nets.chrisw wrote: ↑Fri Aug 28, 2020 4:48 pmHehe, depth 1 NN chess would take them all back to the 1980s.smatovic wrote: ↑Fri Aug 28, 2020 3:28 pmI have heard CNNs need less games for training, not sure how this compares inchrisw wrote: ↑Fri Aug 28, 2020 2:57 pmNice idea, but why still with CNN? Didn't NNUE show that fully connected works also? What would be useful would be a NN-only chess tournament, maybe with some sort of solid incentive for winning. Currently LC0 would presumably win that hands down, but ...smatovic wrote: ↑Fri Aug 28, 2020 1:16 pm I know, LC0's primary goal was an open source adaptation of A0, and I am not
into the Discord development discussions and alike, anyway, my 2 cents on this:
- MCTS-PUCT search was an descendant from AlphaGo, generalized to be applied on
Go, Shogi and Chess, it can utilize a GPU via batches but has its known weak-
nesses, tactics in form of "shallow-traps" in a row, and end-game.
- A CPU AB search will not work with NN on GPU via batches.
- NNUE makes no sense on GPU.
- LC0 has a GPU-cloud-cluster to play Reinforcement Learning games.
- LC0 plays already ~2400(?) Elo with an depth 1 search alone.
- It is estimated that the NN eval is worth 4 plies AB search.
Looking at the above points it seems pretty obvious what the next step for LC0
could be, drop the weak part, MCTS-PUCT search, ignore AB and NNUE, and focus
what LC0 is good at. Increase the plies encoded in NN, increase the Elo at
depth 1 eval.
To put it to an extreme, drop the search part completely, increase the CNN size
1000 fold, decrease NPS from ~50K to ~50, add multiple, increasing sized NNs
to be queried stepwise for Time Control.
Just thinking loud...
--
Srdja
chess to MLP/NNUE. If we assume a depth 1 search only, there is no reason not
to use NNUE even on GPU, maybe with extension of 30 distinct piece-count-indexed
NNs...
--
Srdja
I didn’t look into NNUE training, but I guess the UE bit doesn’t refer to training? Also a while since, but isn’t training of even sizeable nets via GPU batching running at tens of thousands of positions per second? Maybe being random and out of loop here.
Hmm, not quite sure where the diff between UE and NNUE is, if you perform incremental updates for the NN in the first layer, where most of the weights are, then you get imo the first layer for "free", if during AB search, or during game play does not matter, or?
--
Srdja
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: The next step for LC0?
SV_384x30 nets use data from T60 run. T60 and T70 runs IIRC have comparable number of nets, probably T60 some more, but the hardware effort is 20 times larger to the same number of games and nodes per move for T60 large nets nets compared to T70 smallnets.smatovic wrote: ↑Fri Aug 28, 2020 3:27 pmThx for the numbers, you are right if we assume that Elo gain scales linearLaskos wrote: ↑Fri Aug 28, 2020 1:57 pmsmatovic wrote: ↑Fri Aug 28, 2020 1:16 pm I know, LC0's primary goal was an open source adaptation of A0, and I am not
into the Discord development discussions and alike, anyway, my 2 cents on this:
- MCTS-PUCT search was an descendant from AlphaGo, generalized to be applied on
Go, Shogi and Chess, it can utilize a GPU via batches but has its known weak-
nesses, tactics in form of "shallow-traps" in a row, and end-game.
- A CPU AB search will not work with NN on GPU via batches.
- NNUE makes no sense on GPU.
- LC0 has a GPU-cloud-cluster to play Reinforcement Learning games.
- LC0 plays already ~2400(?) Elo with an depth 1 search alone.
- It is estimated that the NN eval is worth 4 plies AB search.
Looking at the above points it seems pretty obvious what the next step for LC0
could be, drop the weak part, MCTS-PUCT search, ignore AB and NNUE, and focus
what LC0 is good at. Increase the plies encoded in NN, increase the Elo at
depth 1 eval.
To put it to an extreme, drop the search part completely, increase the CNN size
1000 fold, decrease NPS from ~50K to ~50, add multiple, increasing sized NNs
to be queried stepwise for Time Control.
Just thinking loud...
--
Srdja
Nah...
1 node per move, factor of 20 net size:
Score of SV_384x30_4585 vs 128x10_T70: 152 - 24 - 24 [0.820] 200
Elo difference: 263.4 +/- 57.0, LOS: 100.0 %, DrawRatio: 12.0 %
So, one can roughly say that 128x10 best nets at one node are some 2100 Elo and 384x30 best nets at one node are some 2350 Elo. And a factor of 10 for the net size gives about 200 Elo improvement at 1 node. Even with the domain of fantasy 1000x larger nets, the Elo at 1 node will reach 2950 only, far lower than 3500-3600 of top engines using search.
Maybe quite the opposite, for Lc0 the best hope might be to seriously improve the search, be it PUCT or other. Training even a 10x larger net is very hard at current time and hardware.
to net size, but I am not sure if new/differently structured/layerd NNs do not
add Elo in a kind of "new evaluation terms encoded" or simply by the "quality of
evaluation terms encoded", or alike, maybe you get what I mean.
I am not into these nets, did the SV_384x30_4585 net already reach saturation?
How many games were used to train SV_384x30 compared to 128x10_T70?
--
Srdja
SV_384x30 nets did not reach saturation, but I guess are not that far-off of reaching it, maybe 50 Elo points. T60 and T70 pretty much reached saturation, not even sure if T60 is still trained. T60 320x24 is not a particularly successful run, at most normal time controls it seems only some 50 Elo points stronger than T40 256x20 nets. And weaker than 256x20 LS15 net. Even at fixed nodes T60 320x24 is level with LS15 256x20.
Yes, a "very large net" program would need a serious revision of the structure of the net.
-
- Posts: 12541
- Joined: Wed Mar 08, 2006 8:57 pm
- Location: Redmond, WA USA
Re: The next step for LC0?
Before long, cooperative algorithms with the CPU and GPU using the same physical memory will be possible. I think RDNA 3 is needed, but that won't be far off.
I think that what AMD is doing with infinity fabric is taking on a whole new kind of brilliance. It might be a real revolution in computation.
I think that what AMD is doing with infinity fabric is taking on a whole new kind of brilliance. It might be a real revolution in computation.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
-
- Posts: 12541
- Joined: Wed Mar 08, 2006 8:57 pm
- Location: Redmond, WA USA
Re: The next step for LC0?
So my direction for LC0 would be to perform the same kind of effort for native API level operations using NVIDIA, but instead using AMD.
Once transparent memory access happens, the NVIDIA cards will only be for gamers.
Once transparent memory access happens, the NVIDIA cards will only be for gamers.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.