Scaling of Asmfish with large thread count

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

corres
Posts: 3657
Joined: Wed Nov 18, 2015 11:41 am
Location: hungary

Re: Scaling of Asmfish with large thread count

Post by corres »

Scaling of speed is one thing but scaling of Elo is an other thing.
What is the fact in the scaling of Elo - this is the main question, I think.
corres
Posts: 3657
Joined: Wed Nov 18, 2015 11:41 am
Location: hungary

Re: Scaling of Asmfish with large thread count

Post by corres »

Scaling of speed is one thing but scaling of Elo is an other thing.
What is the fact in the scaling of Elo - this is the main question, I think.
matthewlai
Posts: 793
Joined: Sun Aug 03, 2014 4:48 am
Location: London, UK

Re: Scaling of Asmfish with large thread count

Post by matthewlai »

bob wrote: The term "scaling" generally applies to performance. IE time to depth as the most direct way of measuring performance in a chess engine.
I would say the most direct way of measuring performance is through playing strength.

TTD does not take into account widening of trees, and widening of trees is one (inefficient, if our tree shapes are optimal) way of increasing playing strength.

For the purpose of this discussion, let's define "depth" to be the length of the longest branch (thus taking into account reductions but not extensions).

It's clear that an engine that exhaustively searches every position to depth 10 will be stronger than an engine that searches every position to depth 10 with reductions. That's because every once in a while we have wrong reductions.

Now that doesn't mean we should stop reducing. If we reduce good moves rarely enough and reduce bad moves often enough, reductions are still a net gain in playing strength, because the engine can search deeper.

This is how lazy SMP works. With 2 cores, it may have 2x the NPS, but each NPS now contributes less to overall strength because the tree shape is not optimal. But not optimal doesn't mean no improvement. It will still play stronger at same depths due to tree widening.

So using NPS speedup over-estimates the effect of lazy SMP due to ignoring inefficient use of nodes, and using TTD under-estimates it by ignoring strength gained by tree widening. The ultimate measure of chess engine performance is playing strength, and that's what should be measured.
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.
shrapnel
Posts: 1339
Joined: Fri Nov 02, 2012 9:43 am
Location: New Delhi, India

Re: Scaling of Asmfish with large thread count

Post by shrapnel »

matthewlai wrote:
bob wrote: The term "scaling" generally applies to performance. IE time to depth as the most direct way of measuring performance in a chess engine.
I would say the most direct way of measuring performance is through playing strength.

TTD does not take into account widening of trees, and widening of trees is one (inefficient, if our tree shapes are optimal) way of increasing playing strength.

For the purpose of this discussion, let's define "depth" to be the length of the longest branch (thus taking into account reductions but not extensions).

It's clear that an engine that exhaustively searches every position to depth 10 will be stronger than an engine that searches every position to depth 10 with reductions. That's because every once in a while we have wrong reductions.

Now that doesn't mean we should stop reducing. If we reduce good moves rarely enough and reduce bad moves often enough, reductions are still a net gain in playing strength, because the engine can search deeper.

This is how lazy SMP works. With 2 cores, it may have 2x the NPS, but each NPS now contributes less to overall strength because the tree shape is not optimal. But not optimal doesn't mean no improvement. It will still play stronger at same depths due to tree widening.

So using NPS speedup over-estimates the effect of lazy SMP due to ignoring inefficient use of nodes, and using TTD under-estimates it by ignoring strength gained by tree widening. The ultimate measure of chess engine performance is playing strength, and that's what should be measured.
+1. Good Post.
So, in the end, it boils down to finding the right BALANCE, as always.
i7 5960X @ 4.1 Ghz, 64 GB G.Skill RipJaws RAM, Twin Asus ROG Strix OC 11 GB Geforce 2080 Tis
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Scaling of Asmfish with large thread count

Post by bob »

matthewlai wrote:
bob wrote: The term "scaling" generally applies to performance. IE time to depth as the most direct way of measuring performance in a chess engine.
I would say the most direct way of measuring performance is through playing strength.

TTD does not take into account widening of trees, and widening of trees is one (inefficient, if our tree shapes are optimal) way of increasing playing strength.

For the purpose of this discussion, let's define "depth" to be the length of the longest branch (thus taking into account reductions but not extensions).

It's clear that an engine that exhaustively searches every position to depth 10 will be stronger than an engine that searches every position to depth 10 with reductions. That's because every once in a while we have wrong reductions.

Now that doesn't mean we should stop reducing. If we reduce good moves rarely enough and reduce bad moves often enough, reductions are still a net gain in playing strength, because the engine can search deeper.

This is how lazy SMP works. With 2 cores, it may have 2x the NPS, but each NPS now contributes less to overall strength because the tree shape is not optimal. But not optimal doesn't mean no improvement. It will still play stronger at same depths due to tree widening.

So using NPS speedup over-estimates the effect of lazy SMP due to ignoring inefficient use of nodes, and using TTD under-estimates it by ignoring strength gained by tree widening. The ultimate measure of chess engine performance is playing strength, and that's what should be measured.
I still do not buy the "widening" idea. That is, in general, an unintended consequence of parallel search. If it were "good" you would be doing it in the serial search as well, since it simply exposes the fact that the tree is a bit too narrow / selective and it is missing things.

So yes, it happens, but it is NOT "intentional". Measuring playing strength is always the ideal way, but with parallel programs, the resources required become enormous and impractical. Leaving TTD/TTS (time to depth or time to solution) as the only practical way of measuring parallel search effectiveness. For example, take ANY position with one best move. How long does it take to find it? That's an optimal measurement to determine parallel efficiency.

For me, were I to see that "widening" is effective, I would be looking to see why and incorporate that into the serial search. Simple way to measure this is with fixed depth games. If the parallel search + widening wins more games than the serial search to the same depth, then widening is happening / helping, and I would want to know exactly why. In my tests with Crafty, I see ZERO improvement due to the widening effect, which is my intention. I want to get rid of 100% of the widening effect because it is wasted effort that is slowing the parallel version down.
Werewolf
Posts: 1797
Joined: Thu Sep 18, 2008 10:24 pm

Re: Scaling of Asmfish with large thread count

Post by Werewolf »

matthewlai wrote:
So using NPS speedup over-estimates the effect of lazy SMP due to ignoring inefficient use of nodes, and using TTD under-estimates it by ignoring strength gained by tree widening. The ultimate measure of chess engine performance is playing strength, and that's what should be measured.
Probably the most helpful post I've read on the mysterious problem of how to measure speedup in an engine which uses lazy smp.