PUCT and FPUR additional settings bring very little if anything unless you go to really long TC.
lc0-win-20180512-cuda90-cudnn712-00
Moderators: hgm, Rebel, chrisw
-
- Posts: 1339
- Joined: Fri Nov 02, 2012 9:43 am
- Location: New Delhi, India
Re: lc0-win-20180512-cuda90-cudnn712-00
How long, in your Opinion ?
i7 5960X @ 4.1 Ghz, 64 GB G.Skill RipJaws RAM, Twin Asus ROG Strix OC 11 GB Geforce 2080 Tis
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: lc0-win-20180512-cuda90-cudnn712-00
I had the impression that it gave me a lot, when I first started experimenting with my GPU, at something like 1'+ 1'' TC. Will have to check.
-
- Posts: 4190
- Joined: Wed Nov 25, 2009 1:47 am
Re: lc0-win-20180512-cuda90-cudnn712-00
I went newest Lc0-cudnn default vs. Lc0-cudnn PUCT=3, FPUR=0 at 1'+0.6''TC on GTX770 which is like half to third speed of GTX1060 and got only +14Elo after 500 games for new settings.
-
- Posts: 1243
- Joined: Sat Dec 13, 2008 7:00 pm
Re: lc0-win-20180512-cuda90-cudnn712-00
I don't remember the exact discussion but note that I was likely talking about the regular version, not the cuDNN one.Albert Silver wrote: ↑Mon May 21, 2018 4:39 am It's a good theory, except that I tested all my PUCT values at 3+0 and 5+0, and then proposed them to GCP. He in turn tested them at very fast TCs, but stopped the test early due to disastrous results.The lower PUCT value was stronger at very short TCs, while the higher PUCT values only shined at longer TCs.
Things get significantly more complex if you use the cuDNN version with batching. Essentially, batching forces you to evaluate a lot of positions at once. It's a bit like getting a 256-core machine. Now, the best parameters for your serial search might not be the best ones to get the most Elo improvement on the 256-core one.
In theory the cuDNN version should be weaker at the same NPS because batching requires you to parallelize the search (regular Leela Chess Zero uses batch size=1), and MCTS or not, this costs efficiency. But in the case of chess + cuDNN the gain is large enough that you get a huge speedup. And because Leela Zero is still a bit blind for some things, it seems to be turning out that going very wide is plugging some of those holes and helps, especially as you win back the efficiency loss with the batching speedup.
I would suspect that fiddling with the virtual loss parameters can also help, but they're not exposed in either engine.
The whole thing sounds very nasty to tune to me. You have a situation where you get huge NPS gains that should come at large losses in search efficiency, and probably does, but you're spending the inefficiency covering up holes in the engine.
For self-play, ideally you don't care at all and run everything serial, and then batch over concurrent games. This gives you the best of both worlds: full batching speedup and no loss of search efficiency. But I don't think Leela Chess Zero has this capability in the client or engines, and AFAIK no-one is developing it. (Someone made something like this for Leela Zero, FWIW, but it had too annoying install dependencies)
I understand there's talk about using lc0 as the default engine for training games. There's an interesting consideration here that the level of the games will change, at least if the visits are kept fixed and the default parameters are used (which IIRC use huge batches).
-
- Posts: 4190
- Joined: Wed Nov 25, 2009 1:47 am
Re: lc0-win-20180512-cuda90-cudnn712-00
This is just your feeling which has nothing to do with reality.Gian-Carlo Pascutto wrote: ↑Wed May 23, 2018 6:23 pm The whole thing sounds very nasty to tune to me. You have a situation where you get huge NPS gains that should come at large losses in search efficiency, and probably does,
LC0-cudnn is on par with LC0 when same search parameters are used and fixed number of nodes TC.
It is also easily verified looking at GPU usage. LC0 has very high GPU usage processing essential 10x less nodes than LC0-cudnn, which means the only thing that is really inefficient is your handwritten LC0 inference calculations, i.e. that outdated Winograd's 3x3 convolution implementation.
-
- Posts: 3019
- Joined: Wed Mar 08, 2006 9:57 pm
- Location: Rio de Janeiro, Brazil
Re: lc0-win-20180512-cuda90-cudnn712-00
I managed to get this running in CLOP and am running my own tests, which I will share once I have anything. Right now error margin is too large from convergence (+/-52 Elo). Also, maybe something weird in my setup, since tc=1+1 is going a lot faster than my usual 1+1 games. I suspect I misunderstood this to be 1m+1s and instead it is 1s+1s...Gian-Carlo Pascutto wrote: ↑Wed May 23, 2018 6:23 pmI don't remember the exact discussion but note that I was likely talking about the regular version, not the cuDNN one.Albert Silver wrote: ↑Mon May 21, 2018 4:39 am It's a good theory, except that I tested all my PUCT values at 3+0 and 5+0, and then proposed them to GCP. He in turn tested them at very fast TCs, but stopped the test early due to disastrous results.The lower PUCT value was stronger at very short TCs, while the higher PUCT values only shined at longer TCs.
Things get significantly more complex if you use the cuDNN version with batching. Essentially, batching forces you to evaluate a lot of positions at once. It's a bit like getting a 256-core machine. Now, the best parameters for your serial search might not be the best ones to get the most Elo improvement on the 256-core one.
In theory the cuDNN version should be weaker at the same NPS because batching requires you to parallelize the search (regular Leela Chess Zero uses batch size=1), and MCTS or not, this costs efficiency. But in the case of chess + cuDNN the gain is large enough that you get a huge speedup. And because Leela Zero is still a bit blind for some things, it seems to be turning out that going very wide is plugging some of those holes and helps, especially as you win back the efficiency loss with the batching speedup.
I would suspect that fiddling with the virtual loss parameters can also help, but they're not exposed in either engine.
The whole thing sounds very nasty to tune to me. You have a situation where you get huge NPS gains that should come at large losses in search efficiency, and probably does, but you're spending the inefficiency covering up holes in the engine.
For self-play, ideally you don't care at all and run everything serial, and then batch over concurrent games. This gives you the best of both worlds: full batching speedup and no loss of search efficiency. But I don't think Leela Chess Zero has this capability in the client or engines, and AFAIK no-one is developing it. (Someone made something like this for Leela Zero, FWIW, but it had too annoying install dependencies)
I understand there's talk about using lc0 as the default engine for training games. There's an interesting consideration here that the level of the games will change, at least if the visits are kept fixed and the default parameters are used (which IIRC use huge batches).
"Tactics are the bricks and sticks that make up a game, but positional play is the architectural blueprint."
-
- Posts: 143
- Joined: Wed Jan 17, 2018 1:26 pm
Re: lc0-win-20180512-cuda90-cudnn712-00
From how I understand the current plan, batching in self-play with lc0 will be exclusively over concurrent games, so the search efficiency in single games will not be affected at all.Gian-Carlo Pascutto wrote: ↑Wed May 23, 2018 6:23 pm For self-play, ideally you don't care at all and run everything serial, and then batch over concurrent games. This gives you the best of both worlds: full batching speedup and no loss of search efficiency. But I don't think Leela Chess Zero has this capability in the client or engines, and AFAIK no-one is developing it. (Someone made something like this for Leela Zero, FWIW, but it had too annoying install dependencies)
I understand there's talk about using lc0 as the default engine for training games. There's an interesting consideration here that the level of the games will change, at least if the visits are kept fixed and the default parameters are used (which IIRC use huge batches).
-
- Posts: 4190
- Joined: Wed Nov 25, 2009 1:47 am
Re: lc0-win-20180512-cuda90-cudnn712-00
So who is gonna write that concurrent self-play client, you? Do have any idea how much effort is required to do this properly without introducing further bugs? I guess you don't...jkiliani wrote: ↑Wed May 23, 2018 7:12 pmFrom how I understand the current plan, batching in self-play with lc0 will be exclusively over concurrent games, so the search efficiency in single games will not be affected at all.Gian-Carlo Pascutto wrote: ↑Wed May 23, 2018 6:23 pm For self-play, ideally you don't care at all and run everything serial, and then batch over concurrent games. This gives you the best of both worlds: full batching speedup and no loss of search efficiency. But I don't think Leela Chess Zero has this capability in the client or engines, and AFAIK no-one is developing it. (Someone made something like this for Leela Zero, FWIW, but it had too annoying install dependencies)
I understand there's talk about using lc0 as the default engine for training games. There's an interesting consideration here that the level of the games will change, at least if the visits are kept fixed and the default parameters are used (which IIRC use huge batches).
-
- Posts: 162
- Joined: Thu Dec 17, 2009 10:46 am
Re: lc0-win-20180512-cuda90-cudnn712-00
That's done already - try:So who is gonna write that concurrent self-play client, you? Do have any idea how much effort is required to do this properly without introducing further bugs? I guess you don't...
lc0-cudnn selfplay --parallelism=8 --backend=multiplexing "--backend-opts=cudnn(threads=2)" --games=1000 --visits=800 --tempdecay-moves=10
you can also pass different arguments for both players by adding "player1: --argument=x player2: --argument=y"
You should really tone down your language and insults, Milos - why don't you contribute instead of complaining? Let us all see your awesome skills.