Threads-Test - SF, Zappa, Komodo - 1 vs. 2, 4, 8, 16 Threads

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

fastgm
Posts: 818
Joined: Mon Aug 19, 2013 6:57 pm

Threads-Test - SF, Zappa, Komodo - 1 vs. 2, 4, 8, 16 Threads

Post by fastgm »

Threads-Test

Code: Select all

Test conditions:

CPU:          Dual AMD Opteron 6376 (2x 16 Cores) 
OS:           Windows 7 Professional 64-Bit 
Tool:         Cutechess-Cli 
Hash-Table:   128 MB    
Openings:     fq1500.pgn - 1500 different opening positions, changing colors (3000 games per match) 
Time control: 60 + 0.05 seconds per game
Zappa confirms the thesis to have a very good SMP implementation. Up to 8 threads the performance increase per thread at this time control is really phenomenal!

Komodo can completely convince in this test. A continuous increase up to 16 threads.
This shows that even at this low time control an outstanding SMP implementation is still working. Very impressive!

Image
Image
Hugo
Posts: 782
Joined: Tue Dec 01, 2009 11:10 am

Re: Threads-Test - SF, Zappa, Komodo - 1 vs. 2, 4, 8, 16 Thr

Post by Hugo »

Andreas !!!

fantastic SMP test!!!
Thank you very much, this is awesome and deeply intresting!!

regards, Clemens Keck
Sedat Canbaz
Posts: 3018
Joined: Thu Mar 09, 2006 11:58 am
Location: Antalya/Turkey

Re: Threads-Test - SF, Zappa, Komodo - 1 vs. 2, 4, 8, 16 Thr

Post by Sedat Canbaz »

Dear Andreas,

Very useful test and many thanks!

What about the SMP performance of the latest releases of Stockfish:
http://abrok.eu/stockfish/

And btw, I wonder a lot too about what will be the Stockfish benchmark speed performance:
http://www.sedatcanbaz.com/chess/?page_id=864


Best wishes,
Sedat
Uri Blass
Posts: 10269
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Threads-Test - SF, Zappa, Komodo - 1 vs. 2, 4, 8, 16 Thr

Post by Uri Blass »

I think that it is not a fair test because the weaker program has the advantage thanks to diminishing returns.

I guess that the same SMP implementation is going to give more elo for weaker engines.

Zappa may have better SMP implementation than stockfish but the test does not prove it.
Lyudmil Tsvetkov
Posts: 6052
Joined: Tue Jun 12, 2012 12:41 pm

Re: Threads-Test - SF, Zappa, Komodo - 1 vs. 2, 4, 8, 16 Thr

Post by Lyudmil Tsvetkov »

fastgm wrote:Threads-Test

Code: Select all

Test conditions:

CPU:          Dual AMD Opteron 6376 (2x 16 Cores) 
OS:           Windows 7 Professional 64-Bit 
Tool:         Cutechess-Cli 
Hash-Table:   128 MB    
Openings:     fq1500.pgn - 1500 different opening positions, changing colors (3000 games per match) 
Time control: 60 + 0.05 seconds per game
Zappa confirms the thesis to have a very good SMP implementation. Up to 8 threads the performance increase per thread at this time control is really phenomenal!

Komodo can completely convince in this test. A continuous increase up to 16 threads.
This shows that even at this low time control an outstanding SMP implementation is still working. Very impressive!

Image
Image
Thanks Andreas.

It is knowledge that scales so well in Komodo, and not that much SMP implementation. Ask Larry about that.
I do not think that Komodo SMP is much better than SF one, but I am certain Komodo has more and more useful knowledge than SF.

That was the point: adding knowledge gains strength and scales very well at longer TC. Actually, the longer the TC, the more knowledge helps. And this makes sense, why do you have so much computing power, if there is nothing to compute? When your eval is simple, this performs well with STC, as you save valuable resources, but at LTC speed and simplification gains gradually fall off.

I guess the difference would be even more compelling, when you test with 32,64, etc. threads, or at much longer TC than 1 minute per game.

PS. I possibly again made a remark I should not have made; not directed at anyone or anything, just a casual remark, looking at the raw facts.

But of course, interesting to start a discussion: do you think that SMP implementation scales so well for Komodo, or knowledge?

I think Larry might be very helpful here.
syzygy
Posts: 5557
Joined: Tue Feb 28, 2012 11:56 pm

Re: Threads-Test - SF, Zappa, Komodo - 1 vs. 2, 4, 8, 16 Thr

Post by syzygy »

fastgm wrote:Threads-Test

Code: Select all

Test conditions:

CPU:          Dual AMD Opteron 6376 (2x 16 Cores) 
OS:           Windows 7 Professional 64-Bit 
Tool:         Cutechess-Cli 
Hash-Table:   128 MB    
Openings:     fq1500.pgn - 1500 different opening positions, changing colors (3000 games per match) 
Time control: 60 + 0.05 seconds per game
Zappa confirms the thesis to have a very good SMP implementation. Up to 8 threads the performance increase per thread at this time control is really phenomenal!

Komodo can completely convince in this test. A continuous increase up to 16 threads.
This shows that even at this low time control an outstanding SMP implementation is still working. Very impressive!
Very nice test and I think it shows exactly what you say it does. This is short time control, so has not so much to do with diminishing returns and/or the issue of scaling at long time control.

I expect SF to scale less badly at 8 and 16 cores at longer time control.

What you could do, if you have the time and interest, is try to compare the scaling of an engine with fixed time control as threads go from 1 to 16 with the scaling of an engine running with a single thread as time control goes up. Ideally, one would like to know that for a particular engine (at a particular time control), playing with 16 cores makes it as strong as playing single-threaded with (say) 9x the time.

The best time control for testing this might be fixed time per move. (Otherwise time management might interfere. I imagine that time management of some engines acts differently when behind or ahead in time.)
Uri Blass
Posts: 10269
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Threads-Test - SF, Zappa, Komodo - 1 vs. 2, 4, 8, 16 Thr

Post by Uri Blass »

syzygy wrote:
fastgm wrote:Threads-Test

Code: Select all

Test conditions:

CPU:          Dual AMD Opteron 6376 (2x 16 Cores) 
OS:           Windows 7 Professional 64-Bit 
Tool:         Cutechess-Cli 
Hash-Table:   128 MB    
Openings:     fq1500.pgn - 1500 different opening positions, changing colors (3000 games per match) 
Time control: 60 + 0.05 seconds per game
Zappa confirms the thesis to have a very good SMP implementation. Up to 8 threads the performance increase per thread at this time control is really phenomenal!

Komodo can completely convince in this test. A continuous increase up to 16 threads.
This shows that even at this low time control an outstanding SMP implementation is still working. Very impressive!
Very nice test and I think it shows exactly what you say it does. This is short time control, so has not so much to do with diminishing returns and/or the issue of scaling at long time control.

I expect SF to scale less badly at 8 and 16 cores at longer time control.

What you could do, if you have the time and interest, is try to compare the scaling of an engine with fixed time control as threads go from 1 to 16 with the scaling of an engine running with a single thread as time control goes up. Ideally, one would like to know that for a particular engine (at a particular time control), playing with 16 cores makes it as strong as playing single-threaded with (say) 9x the time.

The best time control for testing this might be fixed time per move. (Otherwise time management might interfere. I imagine that time management of some engines acts differently when behind or ahead in time.)
The diminishing returns that I think is about rating and not about time.
The weaker engine has lower rating so it has the potential to improve more from 1 thread to many threads.

An extreme example is when the stronger engine play perfect and in this case it is obvious that the stronger engine cannot improve by more threads.

correct comparison should be to start from the same rating point.
It can be done by using unequal time control.

If Zappa needs 10:1 time advantage to score 50% against stockfish then you should use time control that is 10 times slower for the tests of zappa relative to the test of stockfish.
Astatos
Posts: 18
Joined: Thu Apr 10, 2014 5:20 pm

Re: Threads-Test - SF, Zappa, Komodo - 1 vs. 2, 4, 8, 16 Thr

Post by Astatos »

Thanks for the thread Andreas. Exceptionally interesting!

Uri let us not be too scholastic here. What you say can't be that much of a factor for Komodo vs Stockfish anyway.

Someone from the Stockfish team maybe can step in to explain to us what happened going to 8 and then 16 cores. A split depth thing? A NUMA thing? This was really surprising.
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: Threads-Test - SF, Zappa, Komodo - 1 vs. 2, 4, 8, 16 Thr

Post by lkaufman »

Thank you for running this test. As has been pointed out, the superior scaling of Zappa may simply be due to its being a weaker program; you might try running it at four times the time limit to bring it closer to the other two.
As to why Komodo outscales SF with big hardware, yes, a part of this may be due to more chess knowledge. But the shape of the curves is quite different. This convinces me that our SMP is slightly inferior to SF on 2 or 4 cores but is superior on 8 and especially on 16. This makes perfect sense to me. It also seems to be consistent with results. I guess that when 8 core machines become common Komodo will look a lot better on the rating lists that test this.
Your test tells me that we probably won't find it easy to improve SMP scaling overall. That doesn't mean we won't try!
User avatar
Ozymandias
Posts: 1532
Joined: Sun Oct 25, 2009 2:30 am

Re: Threads-Test - SF, Zappa, Komodo - 1 vs. 2, 4, 8, 16 Thr

Post by Ozymandias »

Very neat. One question, did you leave ponder on?

I ask because 16 threads on a 32 core machine seems odd.