It is absurdly better than any alternative.
http://www.ipmanchess.yolasite.com/test ... hreads.php
I wonder why.
Scaling of Asmfish with large thread count
Moderators: hgm, Rebel, chrisw
-
- Posts: 12542
- Joined: Wed Mar 08, 2006 8:57 pm
- Location: Redmond, WA USA
Scaling of Asmfish with large thread count
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Scaling of Asmfish with large thread count
I don't follow the question. Lazy-smp is SUPPOSED to scale well in terms of raw NPS since there is so little interaction between threads. But NPS is only part of the question. IE one can get perfect NPS scaling by just running N copies of the same program, but it won't play any stronger.Dann Corbit wrote:It is absurdly better than any alternative.
http://www.ipmanchess.yolasite.com/test ... hreads.php
I wonder why.
The term "scaling" generally applies to performance. IE time to depth as the most direct way of measuring performance in a chess engine. I also notice that they even reported a 103% NPS scaling which rings alarm bells for me. Tough to imagine how doubling cores would more than double NPS.
And more importantly, I notice that the trees are growing by a factor of 4+ from 8 to 72 cores which certainly establishes an upper bound on useful speedup, due to search overhead, which is what limits everyone regardless of raw NPS. This is much akin to trying to maximize your vehicle RPM by changing the final drive ratio. Tach reads higher but you won't be going near as fast in reality, which is not what you really want.
also, as an aside, asm fish is significantly faster than the C++/C versions as well, which would be expected. Might be that part of that asm optimizing is reducing memory/cache conflicts somewhere.
-
- Posts: 12542
- Joined: Wed Mar 08, 2006 8:57 pm
- Location: Redmond, WA USA
Re: Scaling of Asmfish with large thread count
Every engine on that page is a version of SF which all use lazy SMP.bob wrote:I don't follow the question. Lazy-smp is SUPPOSED to scale well in terms of raw NPS since there is so little interaction between threads. But NPS is only part of the question. IE one can get perfect NPS scaling by just running N copies of the same program, but it won't play any stronger.Dann Corbit wrote:It is absurdly better than any alternative.
http://www.ipmanchess.yolasite.com/test ... hreads.php
I wonder why.
The term "scaling" generally applies to performance. IE time to depth as the most direct way of measuring performance in a chess engine. I also notice that they even reported a 103% NPS scaling which rings alarm bells for me. Tough to imagine how doubling cores would more than double NPS.
And more importantly, I notice that the trees are growing by a factor of 4+ from 8 to 72 cores which certainly establishes an upper bound on useful speedup, due to search overhead, which is what limits everyone regardless of raw NPS. This is much akin to trying to maximize your vehicle RPM by changing the final drive ratio. Tach reads higher but you won't be going near as fast in reality, which is not what you really want.
also, as an aside, asm fish is significantly faster than the C++/C versions as well, which would be expected. Might be that part of that asm optimizing is reducing memory/cache conflicts somewhere.
But the C++ (Stockfish) and C (Cfish) versions have a 50% NPS loss at high core count compared to ASMFish.
That is the thing I find both astounding and puzzling.
The conclusion of the page author is Numa awareness.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
-
- Posts: 6442
- Joined: Tue Jan 09, 2007 12:31 am
- Location: PA USA
- Full name: Louis Zulli
Re: Scaling of Asmfish with large thread count
Yes, it's NUMA-awareness, but only because his system is running Windows, which is total crap with more than 64 threads unless NUMA is used.Dann Corbit wrote:Every engine on that page is a version of SF which all use lazy SMP.bob wrote:I don't follow the question. Lazy-smp is SUPPOSED to scale well in terms of raw NPS since there is so little interaction between threads. But NPS is only part of the question. IE one can get perfect NPS scaling by just running N copies of the same program, but it won't play any stronger.Dann Corbit wrote:It is absurdly better than any alternative.
http://www.ipmanchess.yolasite.com/test ... hreads.php
I wonder why.
The term "scaling" generally applies to performance. IE time to depth as the most direct way of measuring performance in a chess engine. I also notice that they even reported a 103% NPS scaling which rings alarm bells for me. Tough to imagine how doubling cores would more than double NPS.
And more importantly, I notice that the trees are growing by a factor of 4+ from 8 to 72 cores which certainly establishes an upper bound on useful speedup, due to search overhead, which is what limits everyone regardless of raw NPS. This is much akin to trying to maximize your vehicle RPM by changing the final drive ratio. Tach reads higher but you won't be going near as fast in reality, which is not what you really want.
also, as an aside, asm fish is significantly faster than the C++/C versions as well, which would be expected. Might be that part of that asm optimizing is reducing memory/cache conflicts somewhere.
But the C++ (Stockfish) and C (Cfish) versions have a 50% NPS loss at high core count compared to ASMFish.
That is the thing I find both astounding and puzzling.
The conclusion of the page author is Numa awareness.
-
- Posts: 417
- Joined: Sat May 24, 2014 9:16 am
Re: Scaling of Asmfish with large thread count
Cfish is also NUMA aware. AsmFish is extremely fast though; it's definitely the only Stockfish that I use with my server-rig, 36 cores and 72 threads. I also keep hyperthreading enabled and use all cores since the NPS gain is close to 35 percent. People can say what they want - but speed matters. It's clear that asmFish is one heck of an engine.Dann Corbit wrote:Every engine on that page is a version of SF which all use lazy SMP.bob wrote:I don't follow the question. Lazy-smp is SUPPOSED to scale well in terms of raw NPS since there is so little interaction between threads. But NPS is only part of the question. IE one can get perfect NPS scaling by just running N copies of the same program, but it won't play any stronger.Dann Corbit wrote:It is absurdly better than any alternative.
http://www.ipmanchess.yolasite.com/test ... hreads.php
I wonder why.
The term "scaling" generally applies to performance. IE time to depth as the most direct way of measuring performance in a chess engine. I also notice that they even reported a 103% NPS scaling which rings alarm bells for me. Tough to imagine how doubling cores would more than double NPS.
And more importantly, I notice that the trees are growing by a factor of 4+ from 8 to 72 cores which certainly establishes an upper bound on useful speedup, due to search overhead, which is what limits everyone regardless of raw NPS. This is much akin to trying to maximize your vehicle RPM by changing the final drive ratio. Tach reads higher but you won't be going near as fast in reality, which is not what you really want.
also, as an aside, asm fish is significantly faster than the C++/C versions as well, which would be expected. Might be that part of that asm optimizing is reducing memory/cache conflicts somewhere.
But the C++ (Stockfish) and C (Cfish) versions have a 50% NPS loss at high core count compared to ASMFish.
The conclusion of the page author is Numa awareness.
-
- Posts: 2684
- Joined: Sat Jun 14, 2008 9:17 pm
Re: Scaling of Asmfish with large thread count
Because of thisDann Corbit wrote:It is absurdly better than any alternative.
http://www.ipmanchess.yolasite.com/test ... hreads.php
I wonder why.
Code: Select all
setoption name threads value 72
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Scaling of Asmfish with large thread count
Doesn't answer his question. ALL versions had 72 threads. asmfish scales MUCH better than the other versions.mcostalba wrote:Because of thisDann Corbit wrote:It is absurdly better than any alternative.
http://www.ipmanchess.yolasite.com/test ... hreads.php
I wonder why.
Code: Select all
setoption name threads value 72
-
- Posts: 708
- Joined: Mon Jan 16, 2012 6:34 am
Re: Scaling of Asmfish with large thread count
Will ASM fish be competing in the TCEC superfinal?
-
- Posts: 2684
- Joined: Sat Jun 14, 2008 9:17 pm
Re: Scaling of Asmfish with large thread count
On Windows a process is not able to run by default on more than 64 'logical processors' (they call it like this).bob wrote:Doesn't answer his question. ALL versions had 72 threads. asmfish scales MUCH better than the other versions.mcostalba wrote:Because of thisDann Corbit wrote:It is absurdly better than any alternative.
http://www.ipmanchess.yolasite.com/test ... hreads.php
I wonder why.
Code: Select all
setoption name threads value 72
asmfish workarounds this limitation calling some OS specific system calls that Windows official docs call 'NUMA library', note that this has nothing to do with NUMA, it is just the name Windows calls the functions needed to workaround this limitation.
In case of asmFish NUMA-awarness it means to use these Windows-specific system calls.
In case for TCEC, the hardware for superfinal runs on more than 64 logical processors, then we will have to use these Windows library too, to avoid a sensible slowdown.
Below 64 logical processors, we have measured and tested on fishtest that the difference is zero with an error margin of about 5-8 ELO, that is the resolution we achieved with our test.
-
- Posts: 1080
- Joined: Fri Sep 16, 2016 6:55 pm
- Location: USA/Minnesota
- Full name: Leo Anger
Re: Scaling of Asmfish with large thread count
A good and logical question. I hope so.
Advanced Micro Devices fan.