Uri Blass wrote: ↑Wed Aug 15, 2018 10:19 am
I read that lc0 changed pruning and I wonder if you used the same number that I read to be 0.604 in the TCEC chat in your tests
Also it turned out that dev version of lc0 was sent to TCEC instead of release, so it also contained other changes.
Change look fine if you look into code, it's possible that they have bug (e.g. visited_policy caching).
People on discord reported that that version with the same settings and single-thread works differently from the release, which should not be.
I think the scaling of an NN based chess engine is determined decisively by two factor:
1. scaling of MCTS,
2. how big and how filled the NN is.
Supposing MCTS scales well, only a small and/or not fully filled NN itself can cause the bad scaling.
I am afraid in the case of Leela and its derivative their scaling is mainly determined by the issues of NN.
Poor score indicates something is broken. Which is always likely with entered late changes.
High draw rate indicates higher than "normal" width to depth ratio. In general depth finds interesting lines and possible wins. Width defends against overlooking stuff.
More width and less depth = safe, defensive play, but dull. High draw rate. Simples.
Could be broken somewhere, could be late changes conspired to alter the search profile, could be that the scaling to more nodes tends to reflect in width rather than depth. Could be random, but the operating assumption has to be that there is a problem.
chrisw wrote: ↑Wed Aug 15, 2018 12:58 pm
Poor score indicates something is broken. Which is always likely with entered late changes.
High draw rate indicates higher than "normal" width to depth ratio. In general depth finds interesting lines and possible wins. Width defends against overlooking stuff.
More width and less depth = safe, defensive play, but dull. High draw rate. Simples.
Could be broken somewhere, could be late changes conspired to alter the search profile, could be that the scaling to more nodes tends to reflect in width rather than depth. Could be random, but the operating assumption has to be that there is a problem.
In the context of Leela, width vs depth is determined by PUCT. After a lot of tactical blunders in the past, we're now using higher PUCT values than before, but maybe this was an overcompensation? Also, optimal values for PUCT are likely quite dependent on typical search depth, unfortunately a CLOP run for low visit counts does not prove that the same value works well at higher visits. This type of optimisation will take a long time though, we may have to wait for TCEC 14 to see a well-optimised Leela.
chrisw wrote: ↑Wed Aug 15, 2018 12:58 pm
Poor score indicates something is broken. Which is always likely with entered late changes.
High draw rate indicates higher than "normal" width to depth ratio. In general depth finds interesting lines and possible wins. Width defends against overlooking stuff.
More width and less depth = safe, defensive play, but dull. High draw rate. Simples.
Could be broken somewhere, could be late changes conspired to alter the search profile, could be that the scaling to more nodes tends to reflect in width rather than depth. Could be random, but the operating assumption has to be that there is a problem.
In the context of Leela, width vs depth is determined by PUCT. After a lot of tactical blunders in the past, we're now using higher PUCT values than before, but maybe this was an overcompensation? Also, optimal values for PUCT are likely quite dependent on typical search depth, unfortunately a CLOP run for low visit counts does not prove that the same value works well at higher visits. This type of optimisation will take a long time though, we may have to wait for TCEC 14 to see a well-optimised Leela.
Yes, I wondered if that wasn't it. Did a PUCT change take place between Div3 and Div4?
It may be that PUCT needs a meta parameter that knows about total node search per move. And then adjusts the search profile. Since Leela is competing with AB engines, you need to compensate high node searches to take into account how AB engines profile changes with node count.
I would guess Stockfish, for example, with lots of cores and lots of time and whizzing off beyond iteration 30 or whatever, is basically getting mostly more depth, it likely has max-ed out on width. Well, not entirely, but I think its width/depth profile ratio is going to decrease with very high iteration counts. So, under this tenuous theory based on not a lot of evidence and my intuition, LC0 needs to match that profile and give slightly more to expansion and slightly less to exploration as iteration number gets very high.
chrisw wrote: ↑Wed Aug 15, 2018 12:58 pm
Poor score indicates something is broken. Which is always likely with entered late changes.
High draw rate indicates higher than "normal" width to depth ratio. In general depth finds interesting lines and possible wins. Width defends against overlooking stuff.
More width and less depth = safe, defensive play, but dull. High draw rate. Simples.
Could be broken somewhere, could be late changes conspired to alter the search profile, could be that the scaling to more nodes tends to reflect in width rather than depth. Could be random, but the operating assumption has to be that there is a problem.
In the context of Leela, width vs depth is determined by PUCT. After a lot of tactical blunders in the past, we're now using higher PUCT values than before, but maybe this was an overcompensation? Also, optimal values for PUCT are likely quite dependent on typical search depth, unfortunately a CLOP run for low visit counts does not prove that the same value works well at higher visits. This type of optimisation will take a long time though, we may have to wait for TCEC 14 to see a well-optimised Leela.
Yes, trying to fix manually, perturbatively say CPUCT and FPU, gave me incontrolable variations with time control (or nodes or depth). Doing fitting at short TC is completely irrelevant to long TC, and at long TC I cannot do much fitting, so I abandoned any fit and just use defaults (v16).
So in summary, it seems that Lc0 is experiencing the "growing pains" that all engines experience.
I also wonder if a lot of folks don't really understand how staggeringly strong a well-maintained, properly-configured, traditional engine can be on top-end hardware and long time controls.
zullil wrote: ↑Wed Aug 15, 2018 2:24 pm
So in summary, it seems that Lc0 is experiencing the "growing pains" that all engines experience.
I also wonder if a lot of folks don't really understand how staggeringly strong a well-maintained, properly-configured, traditional engine can be on top-end hardware and long time controls.
no, they (LC0) have it much much worse in the growing pains department. and yes, this you mention "well-maintained, properly-configured" makes an enormous difference
chrisw wrote: ↑Wed Aug 15, 2018 12:58 pm
Poor score indicates something is broken. Which is always likely with entered late changes.
High draw rate indicates higher than "normal" width to depth ratio. In general depth finds interesting lines and possible wins. Width defends against overlooking stuff.
More width and less depth = safe, defensive play, but dull. High draw rate. Simples.
Could be broken somewhere, could be late changes conspired to alter the search profile, could be that the scaling to more nodes tends to reflect in width rather than depth. Could be random, but the operating assumption has to be that there is a problem.
In the context of Leela, width vs depth is determined by PUCT. After a lot of tactical blunders in the past, we're now using higher PUCT values than before, but maybe this was an overcompensation? Also, optimal values for PUCT are likely quite dependent on typical search depth, unfortunately a CLOP run for low visit counts does not prove that the same value works well at higher visits. This type of optimisation will take a long time though, we may have to wait for TCEC 14 to see a well-optimised Leela.
Yes, trying to fix manually, perturbatively say CPUCT and FPU, gave me incontrolable variations with time control (or nodes or depth). Doing fitting at short TC is completely irrelevant to long TC, and at long TC I cannot do much fitting, so I abandoned any fit and just use defaults (v16).
Are you assaulting PUCT via the external parameters, or modifying source and re-compiling? Changing and testing the search profile with nodecount effectively would have to be the latter. Needs code writing.
chrisw wrote: ↑Wed Aug 15, 2018 12:58 pm
Poor score indicates something is broken. Which is always likely with entered late changes.
High draw rate indicates higher than "normal" width to depth ratio. In general depth finds interesting lines and possible wins. Width defends against overlooking stuff.
More width and less depth = safe, defensive play, but dull. High draw rate. Simples.
Could be broken somewhere, could be late changes conspired to alter the search profile, could be that the scaling to more nodes tends to reflect in width rather than depth. Could be random, but the operating assumption has to be that there is a problem.
In the context of Leela, width vs depth is determined by PUCT. After a lot of tactical blunders in the past, we're now using higher PUCT values than before, but maybe this was an overcompensation? Also, optimal values for PUCT are likely quite dependent on typical search depth, unfortunately a CLOP run for low visit counts does not prove that the same value works well at higher visits. This type of optimisation will take a long time though, we may have to wait for TCEC 14 to see a well-optimised Leela.
Yes, trying to fix manually, perturbatively say CPUCT and FPU, gave me incontrolable variations with time control (or nodes or depth). Doing fitting at short TC is completely irrelevant to long TC, and at long TC I cannot do much fitting, so I abandoned any fit and just use defaults (v16).
Are you assaulting PUCT via the external parameters, or modifying source and re-compiling? Changing and testing the search profile with nodecount effectively would have to be the latter. Needs code writing.
No, just external parameters. Even those are hard to optimize, and they heavily depend on time control (nodes, depth).