Threads-Test - SF, Zappa, Komodo - 1 vs. 2, 4, 8, 16 Threads

fastgm · Post by **fastgm** » Sun May 04, 2014 9:27 am

Threads-Test

Test conditions&#58;

CPU&#58;          Dual AMD Opteron 6376 &#40;2x 16 Cores&#41; 
OS&#58;           Windows 7 Professional 64-Bit 
Tool&#58;         Cutechess-Cli 
Hash-Table&#58;   128 MB    
Openings&#58;     fq1500.pgn - 1500 different opening positions, changing colors &#40;3000 games per match&#41; 
Time control&#58; 60 + 0.05 seconds per game

Zappa confirms the thesis to have a very good SMP implementation. Up to 8 threads the performance increase per thread at this time control is really phenomenal!

Komodo can completely convince in this test. A continuous increase up to 16 threads.
This shows that even at this low time control an outstanding SMP implementation is still working. Very impressive!

Hugo · Post by **Hugo** » Sun May 04, 2014 10:07 am

Andreas !!!

fantastic SMP test!!!
Thank you very much, this is awesome and deeply intresting!!

regards, Clemens Keck

Sedat Canbaz · Post by **Sedat Canbaz** » Sun May 04, 2014 10:53 am

Dear Andreas,

Very useful test and many thanks!

What about the SMP performance of the latest releases of Stockfish:
http://abrok.eu/stockfish/

And btw, I wonder a lot too about what will be the Stockfish benchmark speed performance:
http://www.sedatcanbaz.com/chess/?page_id=864

Best wishes,
Sedat

Uri Blass · Post by **Uri Blass** » Sun May 04, 2014 10:58 am

I think that it is not a fair test because the weaker program has the advantage thanks to diminishing returns.

I guess that the same SMP implementation is going to give more elo for weaker engines.

Zappa may have better SMP implementation than stockfish but the test does not prove it.

Lyudmil Tsvetkov · Post by **Lyudmil Tsvetkov** » Sun May 04, 2014 11:04 am

fastgm wrote:Threads-Test
Code: Select all
Test conditions&#58;

CPU&#58;          Dual AMD Opteron 6376 &#40;2x 16 Cores&#41; 
OS&#58;           Windows 7 Professional 64-Bit 
Tool&#58;         Cutechess-Cli 
Hash-Table&#58;   128 MB    
Openings&#58;     fq1500.pgn - 1500 different opening positions, changing colors &#40;3000 games per match&#41; 
Time control&#58; 60 + 0.05 seconds per game
Zappa confirms the thesis to have a very good SMP implementation. Up to 8 threads the performance increase per thread at this time control is really phenomenal!

Komodo can completely convince in this test. A continuous increase up to 16 threads.
This shows that even at this low time control an outstanding SMP implementation is still working. Very impressive!

Thanks Andreas.

It is knowledge that scales so well in Komodo, and not that much SMP implementation. Ask Larry about that.
I do not think that Komodo SMP is much better than SF one, but I am certain Komodo has more and more useful knowledge than SF.

That was the point: adding knowledge gains strength and scales very well at longer TC. Actually, the longer the TC, the more knowledge helps. And this makes sense, why do you have so much computing power, if there is nothing to compute? When your eval is simple, this performs well with STC, as you save valuable resources, but at LTC speed and simplification gains gradually fall off.

I guess the difference would be even more compelling, when you test with 32,64, etc. threads, or at much longer TC than 1 minute per game.

PS. I possibly again made a remark I should not have made; not directed at anyone or anything, just a casual remark, looking at the raw facts.

But of course, interesting to start a discussion: do you think that SMP implementation scales so well for Komodo, or knowledge?

I think Larry might be very helpful here.

syzygy · Post by **syzygy** » Sun May 04, 2014 12:16 pm

fastgm wrote:Threads-Test
Code: Select all
Test conditions&#58;

CPU&#58;          Dual AMD Opteron 6376 &#40;2x 16 Cores&#41; 
OS&#58;           Windows 7 Professional 64-Bit 
Tool&#58;         Cutechess-Cli 
Hash-Table&#58;   128 MB    
Openings&#58;     fq1500.pgn - 1500 different opening positions, changing colors &#40;3000 games per match&#41; 
Time control&#58; 60 + 0.05 seconds per game
Zappa confirms the thesis to have a very good SMP implementation. Up to 8 threads the performance increase per thread at this time control is really phenomenal!

Komodo can completely convince in this test. A continuous increase up to 16 threads.
This shows that even at this low time control an outstanding SMP implementation is still working. Very impressive!

Very nice test and I think it shows exactly what you say it does. This is short time control, so has not so much to do with diminishing returns and/or the issue of scaling at long time control.

I expect SF to scale less badly at 8 and 16 cores at longer time control.

What you could do, if you have the time and interest, is try to compare the scaling of an engine with fixed time control as threads go from 1 to 16 with the scaling of an engine running with a single thread as time control goes up. Ideally, one would like to know that for a particular engine (at a particular time control), playing with 16 cores makes it as strong as playing single-threaded with (say) 9x the time.

The best time control for testing this might be fixed time per move. (Otherwise time management might interfere. I imagine that time management of some engines acts differently when behind or ahead in time.)

Uri Blass · Post by **Uri Blass** » Sun May 04, 2014 1:54 pm

syzygy wrote:
fastgm wrote:Threads-Test
Code: Select all
Test conditions&#58;

CPU&#58;          Dual AMD Opteron 6376 &#40;2x 16 Cores&#41; 
OS&#58;           Windows 7 Professional 64-Bit 
Tool&#58;         Cutechess-Cli 
Hash-Table&#58;   128 MB    
Openings&#58;     fq1500.pgn - 1500 different opening positions, changing colors &#40;3000 games per match&#41; 
Time control&#58; 60 + 0.05 seconds per game
Zappa confirms the thesis to have a very good SMP implementation. Up to 8 threads the performance increase per thread at this time control is really phenomenal!

Komodo can completely convince in this test. A continuous increase up to 16 threads.
This shows that even at this low time control an outstanding SMP implementation is still working. Very impressive!
Very nice test and I think it shows exactly what you say it does. This is short time control, so has not so much to do with diminishing returns and/or the issue of scaling at long time control.

I expect SF to scale less badly at 8 and 16 cores at longer time control.

What you could do, if you have the time and interest, is try to compare the scaling of an engine with fixed time control as threads go from 1 to 16 with the scaling of an engine running with a single thread as time control goes up. Ideally, one would like to know that for a particular engine (at a particular time control), playing with 16 cores makes it as strong as playing single-threaded with (say) 9x the time.

The best time control for testing this might be fixed time per move. (Otherwise time management might interfere. I imagine that time management of some engines acts differently when behind or ahead in time.)

The diminishing returns that I think is about rating and not about time.
The weaker engine has lower rating so it has the potential to improve more from 1 thread to many threads.

An extreme example is when the stronger engine play perfect and in this case it is obvious that the stronger engine cannot improve by more threads.

correct comparison should be to start from the same rating point.
It can be done by using unequal time control.

If Zappa needs 10:1 time advantage to score 50% against stockfish then you should use time control that is 10 times slower for the tests of zappa relative to the test of stockfish.

Astatos · Post by **Astatos** » Sun May 04, 2014 4:07 pm

Thanks for the thread Andreas. Exceptionally interesting!

Uri let us not be too scholastic here. What you say can't be that much of a factor for Komodo vs Stockfish anyway.

Someone from the Stockfish team maybe can step in to explain to us what happened going to 8 and then 16 cores. A split depth thing? A NUMA thing? This was really surprising.

lkaufman · Post by **lkaufman** » Sun May 04, 2014 4:11 pm

Thank you for running this test. As has been pointed out, the superior scaling of Zappa may simply be due to its being a weaker program; you might try running it at four times the time limit to bring it closer to the other two.
As to why Komodo outscales SF with big hardware, yes, a part of this may be due to more chess knowledge. But the shape of the curves is quite different. This convinces me that our SMP is slightly inferior to SF on 2 or 4 cores but is superior on 8 and especially on 16. This makes perfect sense to me. It also seems to be consistent with results. I guess that when 8 core machines become common Komodo will look a lot better on the rating lists that test this.
Your test tells me that we probably won't find it easy to improve SMP scaling overall. That doesn't mean we won't try!

Ozymandias · Post by **Ozymandias** » Sun May 04, 2014 5:43 pm

Very neat. One question, did you leave ponder on?

I ask because 16 threads on a 32 core machine seems odd.

Threads-Test - SF, Zappa, Komodo - 1 vs. 2, 4, 8, 16 Threads

Threads-Test - SF, Zappa, Komodo - 1 vs. 2, 4, 8, 16 Threads

Re: Threads-Test - SF, Zappa, Komodo - 1 vs. 2, 4, 8, 16 Thr

Re: Threads-Test - SF, Zappa, Komodo - 1 vs. 2, 4, 8, 16 Thr

Re: Threads-Test - SF, Zappa, Komodo - 1 vs. 2, 4, 8, 16 Thr

Re: Threads-Test - SF, Zappa, Komodo - 1 vs. 2, 4, 8, 16 Thr

Re: Threads-Test - SF, Zappa, Komodo - 1 vs. 2, 4, 8, 16 Thr

Re: Threads-Test - SF, Zappa, Komodo - 1 vs. 2, 4, 8, 16 Thr

Re: Threads-Test - SF, Zappa, Komodo - 1 vs. 2, 4, 8, 16 Thr

Re: Threads-Test - SF, Zappa, Komodo - 1 vs. 2, 4, 8, 16 Thr

Re: Threads-Test - SF, Zappa, Komodo - 1 vs. 2, 4, 8, 16 Thr