Threads-Test - SF, Zappa, Komodo - 1 vs. 2, 4, 8, 16 Threads

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

Uri Blass
Posts: 10267
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Threads-Test - SF, Zappa, Komodo - 1 vs. 2, 4, 8, 16 Thr

Post by Uri Blass »

Laskos wrote:
Laskos wrote:
Uri Blass wrote:
syzygy wrote:
Uri Blass wrote:The diminishing returns that I think is about rating and not about time.
The weaker engine has lower rating so it has the potential to improve more from 1 thread to many threads.
Give SF very little time and it will have a very low rating.

Suppose you are correct and SF gained much less from time doubling / speed doubling (those are the same!!) than some other engine, independent of time control. Then either SF would have to be outrageously much stronger than the other engine at ultrashort time controls (which it is not), or SF would be much weaker than the other engine at long time controls (which it is not). So you are not correct.
An extreme example is when the stronger engine play perfect and in this case it is obvious that the stronger engine cannot improve by more threads.
Wrong example. SF is far, far, far, far, far, far, far from perfect at ultrashort time controls.
I believe that SF has a better search algorithm in the first point and
it is a reason that it can earn more elo (or at least the same elo) from doubling the speed with 1 core at Super fast time control.

I am not sure if it is fair to compare SMP implementation for different searches because it is possible that for some simple search it is easier to get bigger speed improvement so
I still think that the only fair comparison is when you start from the same elo.

Even if Zappa 8 cores is equivalent to being 6 times faster than Zappa 1 core when Stockfish 8 cores is equivalent to being 4 times faster than Stockfish 1 core it does not mean that Zappa implementation is better from my point of view because
it is possible that zappa's relative stupid search make it easier to get speed improvement from more cores.

Edit:Note that I do not claim that zappa implementation is not superior and from my point of view earning more elo from more cores when you start from the same elo is better.

My guess is that zappa earns more elo from more cores from the same elo starting point but it is not proved.
SMP effective speed-up (usually TTD, but I don't know for Zappa) is larger with time control. So, say we equal Zappa's strength to that of SF by giving Zappa 10 times more time. It will have an effective speed-up even larger relatively to what is presented here.
I tested SMP efficiency at same _strength_ for SF and Zappa Mexico II, giving ~11 times more time to Zappa at time-to-depth test. The results are pretty conclusive
100 position repeated 4 times, the average time to depth on 100 positions:

Code: Select all

Depth=20
SF 1  thread: 12:40
SF 4 threads:  4:22
Effective SF speed-up on 4 threads: 2.90

Code: Select all

Depth=15
Zappa Mexico II 1  thread: 140:20
Zappa Mexico II 4 threads:  43:45
Effective Zappa speed-up on 4 threads: 3.21

So, at same Elo, Zappa scales better on SMP than SF, and seemingly artifact of lower Elo engine improvement for Zappa is not supported.
I am not sure if depth 20 with 1 thread is equivalent to depth 20 with 4 threads
I know that there are programs that play better with more threads when they search to the same depth.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Threads-Test - SF, Zappa, Komodo - 1 vs. 2, 4, 8, 16 Thr

Post by Laskos »

Uri Blass wrote:
Laskos wrote:
Laskos wrote:
Uri Blass wrote:
syzygy wrote:
Uri Blass wrote:The diminishing returns that I think is about rating and not about time.
The weaker engine has lower rating so it has the potential to improve more from 1 thread to many threads.
Give SF very little time and it will have a very low rating.

Suppose you are correct and SF gained much less from time doubling / speed doubling (those are the same!!) than some other engine, independent of time control. Then either SF would have to be outrageously much stronger than the other engine at ultrashort time controls (which it is not), or SF would be much weaker than the other engine at long time controls (which it is not). So you are not correct.
An extreme example is when the stronger engine play perfect and in this case it is obvious that the stronger engine cannot improve by more threads.
Wrong example. SF is far, far, far, far, far, far, far from perfect at ultrashort time controls.
I believe that SF has a better search algorithm in the first point and
it is a reason that it can earn more elo (or at least the same elo) from doubling the speed with 1 core at Super fast time control.

I am not sure if it is fair to compare SMP implementation for different searches because it is possible that for some simple search it is easier to get bigger speed improvement so
I still think that the only fair comparison is when you start from the same elo.

Even if Zappa 8 cores is equivalent to being 6 times faster than Zappa 1 core when Stockfish 8 cores is equivalent to being 4 times faster than Stockfish 1 core it does not mean that Zappa implementation is better from my point of view because
it is possible that zappa's relative stupid search make it easier to get speed improvement from more cores.

Edit:Note that I do not claim that zappa implementation is not superior and from my point of view earning more elo from more cores when you start from the same elo is better.

My guess is that zappa earns more elo from more cores from the same elo starting point but it is not proved.
SMP effective speed-up (usually TTD, but I don't know for Zappa) is larger with time control. So, say we equal Zappa's strength to that of SF by giving Zappa 10 times more time. It will have an effective speed-up even larger relatively to what is presented here.
I tested SMP efficiency at same _strength_ for SF and Zappa Mexico II, giving ~11 times more time to Zappa at time-to-depth test. The results are pretty conclusive
100 position repeated 4 times, the average time to depth on 100 positions:

Code: Select all

Depth=20
SF 1  thread: 12:40
SF 4 threads:  4:22
Effective SF speed-up on 4 threads: 2.90

Code: Select all

Depth=15
Zappa Mexico II 1  thread: 140:20
Zappa Mexico II 4 threads:  43:45
Effective Zappa speed-up on 4 threads: 3.21

So, at same Elo, Zappa scales better on SMP than SF, and seemingly artifact of lower Elo engine improvement for Zappa is not supported.
I am not sure if depth 20 with 1 thread is equivalent to depth 20 with 4 threads
I know that there are programs that play better with more threads when they search to the same depth.
AFAIK both SF and Zappa do not widen with SMP (tested in some detail with SF, to lesser detail with Zappa). Some engines like Komodo and Rybka do widen, so TTD is not a measure for their SMP efficiency.
syzygy
Posts: 5557
Joined: Tue Feb 28, 2012 11:56 pm

Re: Threads-Test - SF, Zappa, Komodo - 1 vs. 2, 4, 8, 16 Thr

Post by syzygy »

Laskos wrote:AFAIK both SF and Zappa do not widen with SMP (tested in some detail with SF, to lesser detail with Zappa). Some engines like Komodo and Rybka do widen, so TTD is not a measure for their SMP efficiency.
And if Zappa widens (it probably does not) that would only mean it gains even more from SMP.
User avatar
lucasart
Posts: 3232
Joined: Mon May 31, 2010 1:29 pm
Full name: lucasart

Re: Threads-Test - SF, Zappa, Komodo - 1 vs. 2, 4, 8, 16 Thr

Post by lucasart »

fastgm wrote:Threads-Test

Code: Select all

Test conditions:

CPU:          Dual AMD Opteron 6376 (2x 16 Cores) 
OS:           Windows 7 Professional 64-Bit 
Tool:         Cutechess-Cli 
Hash-Table:   128 MB    
Openings:     fq1500.pgn - 1500 different opening positions, changing colors (3000 games per match) 
Time control: 60 + 0.05 seconds per game
Zappa confirms the thesis to have a very good SMP implementation. Up to 8 threads the performance increase per thread at this time control is really phenomenal!

Komodo can completely convince in this test. A continuous increase up to 16 threads.
This shows that even at this low time control an outstanding SMP implementation is still working. Very impressive!

Image
Image
Very interesting:
  • Zappa, Komodo, SF all have good scaling up to 4 CPU. The difference in slope between Zappa and Komodo,SF can be explained at least partially by the effect mentionned by Uri. Even without SMP, doubling NPS gives less elo as the absolute level of engines rise. So with Komodo and SF it is expected to give less than for Zappa.
  • Komodo is scaling flawlessly.
  • Zappa has a problem with 16 threads (Martin also mentionned crashed with 16 threads).
  • SF really starts to see diminishing returns beyond what is normally expected. A real flattening. Performance with 8 and 16 threads is bad.
Fortunately, Joona came up with a patch which seems to be neutral with few threads and a drastic improvement with many threads. It will be interesting to redo the SF analysis, once Joona's patch is commited.

All this is very important because TCEC uses 16 threads...
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: Threads-Test - SF, Zappa, Komodo - 1 vs. 2, 4, 8, 16 Thr

Post by lkaufman »

lucasart wrote: Fortunately, Joona came up with a patch which seems to be neutral with few threads and a drastic improvement with many threads. It will be interesting to redo the SF analysis, once Joona's patch is commited.

All this is very important because TCEC uses 16 threads...
Is this patch in the version which is now playing in TCEC? If so at what stage of TCEC was it introduced?
Uri Blass
Posts: 10267
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Threads-Test - SF, Zappa, Komodo - 1 vs. 2, 4, 8, 16 Thr

Post by Uri Blass »

lkaufman wrote:
lucasart wrote: Fortunately, Joona came up with a patch which seems to be neutral with few threads and a drastic improvement with many threads. It will be interesting to redo the SF analysis, once Joona's patch is commited.

All this is very important because TCEC uses 16 threads...
Is this patch in the version which is now playing in TCEC? If so at what stage of TCEC was it introduced?
No

It is in the version that is currently tested in stockfish framework

lt2

15+0.05 7 threads
LLR: 2.97 (-2.94,2.94) [-1.50,4.50]
Total: 2885 W: 519 L: 410 D: 1956

25+0.05 7 threads
LLR: 2.95 (-2.94,2.94) [0.00,6.00]
Total: 4401 W: 684 L: 566 D: 3151

now
60+0.05 3 threads test is not finished

LLR: 1.41 (-2.94,2.94) [-4.00,0.00]
Total: 31088 W: 4129 L: 4141 D: 22818