Threads test incl. Crafty 24.1

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

fastgm
Posts: 818
Joined: Mon Aug 19, 2013 6:57 pm

Threads test incl. Crafty 24.1

Post by fastgm »

Here the result:

Image

Up to 8 threads a great performance by Crafty, the second best behind Zappa so far.
With 16 threads then an unexpected drop of performance. But why this performance hit?
Is the time control too short? On the other hand, Komodo copes very well with these conditions.
In all test runs, total 12000 games per engine test, there was not a single loss of time.
Thus, the communication between the Cutechess client and the engines and also the configuration should be alright.
A short test 1 vs. 16 threads with almost 400 games at double time control 120+0.05 sec. shows also no different result.

There remain unanswered questions.

Has this behavior at 16 threads something to do with NUMA?
My System is a 32-way dual 16 core AMD Opteron 6376, Mainboard ASUS KGPE-D16 with 8 x 4 GB 1600 MHz DDR3.
OS is Windows 7 Professional 64 Bit.
Crafty reports: System is NUMA. 4 nodes reported by windows

I used the Crafty version "crafty-24.1-x64-sse3.exe" from http://www.kikrtech.com/ and I ran two games 1 vs. 16 threads simultaneously, without pondering.

Image
Adam Hair
Posts: 3226
Joined: Wed May 06, 2009 10:31 pm
Location: Fuquay-Varina, North Carolina

Re: Threads test incl. Crafty 24.1

Post by Adam Hair »

Thanks, Andreas!
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Threads test incl. Crafty 24.1

Post by Laskos »

Thank you very much, Andreas. Now, would you do me a favour? Can you test Komodo 8 on 32 threads against Komodo 8 on 16 threads in 3,000 games at 60"+0.05"? We had some arguments on this issue needed to be clarified. It is very time consuming, I know, maybe 4 days test, but it would be enlightening for some. Maybe you can even post intermediate results. Your posts are of very high quality and importance.
Sedat Canbaz
Posts: 3018
Joined: Thu Mar 09, 2006 11:58 am
Location: Antalya/Turkey

Re: Threads test incl. Crafty 24.1

Post by Sedat Canbaz »

Well -done Andreas ! A professional test...

BTW,
I knew that the old Crafty 22.8 version does not work well on high CPUs machines (e.g 12 core and higher…)
http://www.sedatcanbaz.com/chess/?page_id=874

But especially with the newest Crafty 24.1,
I expected to see better performance on 16 cores...
And sad that still the latest Crafty version suffers too...!

Actually exception Crafty,
A lot of MP engines are not ready for the latest fast machines (e.g 12 core and higher) and it looks like those engines need optimizations...


Best,
Sedat
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Threads test incl. Crafty 24.1

Post by bob »

First question is what is the NPS for the 1, 2, 4, 8 and 16 core tests? I have a machine here that simply scales poorly beyond 8 cores. I've been looking at it off and on, but have found no real reason other than it is a dual 8-core box, and there is a lot of conflicts in such an architecture. For example, all 8 cores on a single chip beating on the shared L3 cache. And all 16 cores beating on the pathways to memory ( assume your hardware is NUMA since every intel/amd box I have seen in recent history has been NUMA.

I've been promising myself that I would write a self-tuner for the parallel search stuff, but I have not had/taken the time to do this yet. But it is REALLY needed, since each machine has some different choke-point that causes problems. I have a 12 core box that ought to hit 60M nodes per second, but I am stuck at 45-50M (this is 3-4 year old hardware). I have analyzed, tested, tuned, tweaked, cursed and about everything else I could think of, but that last 10-15M NPS is simply not to be found, YET. That's what has led to the recent significant parallel search changes, cleaning up everything so that I can ultimately figure out what is wrong. I'm still afraid that I am going to have to rewrite the search in a non-recursive way, ala' Cray Blitz, to close that gap further, since the classic YBW algorithm is way too restrictive in where splits can be done, namely "here" or nowhere, where DTS could do much better. I have recently been working on singular extensions, and in the process have cleaned up the search significantly (no duplicated code in Search and SearchParallel() any longer, for example). So converting to an iterated search will not be as hard as it would have been a year ago. I hate to lose the clean recursive formulation, but I am getting ready to do so. This machine is going to give me 60M nodes per second or melt down.
jdart
Posts: 4366
Joined: Fri Mar 10, 2006 5:23 am
Location: http://www.arasanchess.org

Re: Threads test incl. Crafty 24.1

Post by jdart »

Lots of people would be happy with 45-50M nps. Crafty is already exceptionally fast in terms of nps compared to most programs.

But I agree YBWC is a bottleneck. I did some instrumentation on my program a while back and found that some significant amount of thread idle time was due to having no suitable thread candidate that could fulfull the YBWC conditions.

NUMA is also a factor. Having a shared hashtable across all NUMA nodes is always going to be a performance hit.

I also have a to-do on my list to use thread-local storage for various caches that are per-thread. Currently they are allocated locally as class variables at the start but if a thread becomes idle and then active again, it may have been migrated to a different NUMA node and then its cache is not local.

--Jon
fastgm
Posts: 818
Joined: Mon Aug 19, 2013 6:57 pm

Re: Threads test incl. Crafty 24.1

Post by fastgm »

Currently another test is running, but afterwards i can start your suggested test with Komodo 8.

Andreas
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Threads test incl. Crafty 24.1

Post by Laskos »

fastgm wrote:Currently another test is running, but afterwards i can start your suggested test with Komodo 8.

Andreas
Thanks Andreas, very appreciated.
Joerg Oster
Posts: 937
Joined: Fri Mar 10, 2006 4:29 pm
Location: Germany

Re: Threads test incl. Crafty 24.1

Post by Joerg Oster »

Laskos wrote:
fastgm wrote:Currently another test is running, but afterwards i can start your suggested test with Komodo 8.

Andreas
Thanks Andreas, very appreciated.
+1! :D
Jörg Oster
Sedat Canbaz
Posts: 3018
Joined: Thu Mar 09, 2006 11:58 am
Location: Antalya/Turkey

Re: Threads test incl. Crafty 24.1

Post by Sedat Canbaz »

Dear Andreas,

One thing more,
You are one of my biggest Stockfish bench supporters
No one sent too many benchmarks as you, so the record belongs to you !
Thank you again..!)

Btw, I wonder a lot,
What will be the Stockfish bench speed on your AMD Opteron 6376

Can you run please a new Stockfish bench on your AMD Opteron 6376 ?

Code: Select all

kN/s  Cores  EXE   Processors             Speed      Hardware Users
8750     8   x64   AMD FX-8350 Vishera    @4.50GHz   Andreas Strangmüller
6715     8   x64   2x Intel Xeon E5450     3.00GHz   Andreas Strangmüller
6687     4   x64   Intel Core i7-2600     @4.20GHz   Andreas Strangmüller
5715     4   x64   Intel Core i5-750      @3.50GHz   Andreas Strangmüller
4821     4   x64   Intel i7-4700MQ        @2.40GHz   Andreas Strangmüller
3728     4   x64   Intel Core 2 QX6700    @2.93GHz   Andreas Strangmüller

For full list of Stockfish Benchmarks:
http://www.sedatcanbaz.com/chess/?page_id=19


Best,
Sedat