Scaling of Asmfish with large thread count

corres · Post by **corres** » Sat Oct 08, 2016 11:16 am

Scaling of speed is one thing but scaling of Elo is an other thing.
What is the fact in the scaling of Elo - this is the main question, I think.

corres · Post by **corres** » Sat Oct 08, 2016 11:16 am

Scaling of speed is one thing but scaling of Elo is an other thing.
What is the fact in the scaling of Elo - this is the main question, I think.

matthewlai · Post by **matthewlai** » Sat Oct 08, 2016 4:07 pm

bob wrote: The term "scaling" generally applies to performance. IE time to depth as the most direct way of measuring performance in a chess engine.

I would say the most direct way of measuring performance is through playing strength.

TTD does not take into account widening of trees, and widening of trees is one (inefficient, if our tree shapes are optimal) way of increasing playing strength.

For the purpose of this discussion, let's define "depth" to be the length of the longest branch (thus taking into account reductions but not extensions).

It's clear that an engine that exhaustively searches every position to depth 10 will be stronger than an engine that searches every position to depth 10 with reductions. That's because every once in a while we have wrong reductions.

Now that doesn't mean we should stop reducing. If we reduce good moves rarely enough and reduce bad moves often enough, reductions are still a net gain in playing strength, because the engine can search deeper.

This is how lazy SMP works. With 2 cores, it may have 2x the NPS, but each NPS now contributes less to overall strength because the tree shape is not optimal. But not optimal doesn't mean no improvement. It will still play stronger at same depths due to tree widening.

So using NPS speedup over-estimates the effect of lazy SMP due to ignoring inefficient use of nodes, and using TTD under-estimates it by ignoring strength gained by tree widening. The ultimate measure of chess engine performance is playing strength, and that's what should be measured.

shrapnel · Post by **shrapnel** » Sat Oct 08, 2016 4:42 pm

matthewlai wrote:
bob wrote: The term "scaling" generally applies to performance. IE time to depth as the most direct way of measuring performance in a chess engine.
I would say the most direct way of measuring performance is through playing strength.

TTD does not take into account widening of trees, and widening of trees is one (inefficient, if our tree shapes are optimal) way of increasing playing strength.

For the purpose of this discussion, let's define "depth" to be the length of the longest branch (thus taking into account reductions but not extensions).

It's clear that an engine that exhaustively searches every position to depth 10 will be stronger than an engine that searches every position to depth 10 with reductions. That's because every once in a while we have wrong reductions.

Now that doesn't mean we should stop reducing. If we reduce good moves rarely enough and reduce bad moves often enough, reductions are still a net gain in playing strength, because the engine can search deeper.

This is how lazy SMP works. With 2 cores, it may have 2x the NPS, but each NPS now contributes less to overall strength because the tree shape is not optimal. But not optimal doesn't mean no improvement. It will still play stronger at same depths due to tree widening.

So using NPS speedup over-estimates the effect of lazy SMP due to ignoring inefficient use of nodes, and using TTD under-estimates it by ignoring strength gained by tree widening. The ultimate measure of chess engine performance is playing strength, and that's what should be measured.

+1. Good Post.
So, in the end, it boils down to finding the right BALANCE, as always.

bob · Post by **bob** » Sat Oct 08, 2016 5:27 pm

matthewlai wrote:
bob wrote: The term "scaling" generally applies to performance. IE time to depth as the most direct way of measuring performance in a chess engine.
I would say the most direct way of measuring performance is through playing strength.

TTD does not take into account widening of trees, and widening of trees is one (inefficient, if our tree shapes are optimal) way of increasing playing strength.

For the purpose of this discussion, let's define "depth" to be the length of the longest branch (thus taking into account reductions but not extensions).

It's clear that an engine that exhaustively searches every position to depth 10 will be stronger than an engine that searches every position to depth 10 with reductions. That's because every once in a while we have wrong reductions.

Now that doesn't mean we should stop reducing. If we reduce good moves rarely enough and reduce bad moves often enough, reductions are still a net gain in playing strength, because the engine can search deeper.

This is how lazy SMP works. With 2 cores, it may have 2x the NPS, but each NPS now contributes less to overall strength because the tree shape is not optimal. But not optimal doesn't mean no improvement. It will still play stronger at same depths due to tree widening.

So using NPS speedup over-estimates the effect of lazy SMP due to ignoring inefficient use of nodes, and using TTD under-estimates it by ignoring strength gained by tree widening. The ultimate measure of chess engine performance is playing strength, and that's what should be measured.

I still do not buy the "widening" idea. That is, in general, an unintended consequence of parallel search. If it were "good" you would be doing it in the serial search as well, since it simply exposes the fact that the tree is a bit too narrow / selective and it is missing things.

So yes, it happens, but it is NOT "intentional". Measuring playing strength is always the ideal way, but with parallel programs, the resources required become enormous and impractical. Leaving TTD/TTS (time to depth or time to solution) as the only practical way of measuring parallel search effectiveness. For example, take ANY position with one best move. How long does it take to find it? That's an optimal measurement to determine parallel efficiency.

For me, were I to see that "widening" is effective, I would be looking to see why and incorporate that into the serial search. Simple way to measure this is with fixed depth games. If the parallel search + widening wins more games than the serial search to the same depth, then widening is happening / helping, and I would want to know exactly why. In my tests with Crafty, I see ZERO improvement due to the widening effect, which is my intention. I want to get rid of 100% of the widening effect because it is wasted effort that is slowing the parallel version down.

Werewolf · Post by **Werewolf** » Sat Oct 08, 2016 6:01 pm

matthewlai wrote:
So using NPS speedup over-estimates the effect of lazy SMP due to ignoring inefficient use of nodes, and using TTD under-estimates it by ignoring strength gained by tree widening. The ultimate measure of chess engine performance is playing strength, and that's what should be measured.

Probably the most helpful post I've read on the mysterious problem of how to measure speedup in an engine which uses lazy smp.

Scaling of Asmfish with large thread count

Re: Scaling of Asmfish with large thread count

Re: Scaling of Asmfish with large thread count

Re: Scaling of Asmfish with large thread count

Re: Scaling of Asmfish with large thread count

Re: Scaling of Asmfish with large thread count

Re: Scaling of Asmfish with large thread count