Core behaviour

Discussion of chess software programming and technical issues.

Moderators: bob, hgm, Harvey Williamson

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
User avatar
Rebel
Posts: 4700
Joined: Thu Aug 18, 2011 10:04 am

Core behaviour

Post by Rebel » Wed Jun 28, 2017 10:16 am

I became a bit concerned after noticing a (very) strange result during an eng-eng match and decided to dive into it, wrote some statistic stuff and started to play a match between 2 equal ProDeo engines on my 8-core Intel Xeon, Windows 7 Pro, 16.000 games (8 x 2000) at 40m/15s. During the match (with cute) I can take snapshots any time, here is one.

Code: Select all

c:\cc\240-1\param.txt - Time used            :  3:21:54 [MIDG depth = 10.72)
c:\cc\240-2\param.txt - Time used            :  3:21:56 [MIDG depth = 10.53)
c:\cc\240-3\param.txt - Time used            :  3:21:48 [MIDG depth = 10.71)
c:\cc\240-4\param.txt - Time used            :  3:22:01 [MIDG depth = 10.59)
c:\cc\240-5\param.txt - Time used            :  3:21:57 [MIDG depth = 10.64)
c:\cc\240-6\param.txt - Time used            :  3:21:53 [MIDG depth = 10.49)
c:\cc\240-7\param.txt - Time used            :  3:21:37 [MIDG depth = 10.71)
c:\cc\240-8\param.txt - Time used            :  3:22:04 [MIDG depth = 10.52)

c:\cc\220-1\param.txt - Time used            :  3:22:06 [MIDG depth = 10.75)
c:\cc\220-2\param.txt - Time used            :  3:21:32 [MIDG depth = 10.57)
c:\cc\220-3\param.txt - Time used            :  3:21:51 [MIDG depth = 10.74)
c:\cc\220-4\param.txt - Time used            :  3:21:33 [MIDG depth = 10.77)
c:\cc\220-5\param.txt - Time used            :  3:22:02 [MIDG depth = 10.70)
c:\cc\220-6\param.txt - Time used            :  3:21:39 [MIDG depth = 10.50)
c:\cc\220-7\param.txt - Time used            :  3:22:14 [MIDG depth = 10.55)
c:\cc\220-8\param.txt - Time used            :  3:21:19 [MIDG depth = 10.74)

240  26:55:10 (187.192M nodes) NPS = 1.932K
220  26:54:16 (191.253M nodes) NPS = 1.975K

Depth Stats      MIDG   END0   END1   END2
240             10.61  11.25  12.11  14.98
220             10.66  11.30  12.13  15.03
There are some strange things going on here, first of all there have been already 3000 games played and the depth stats have been long settled, yet version 220 has an average middlegame depth of 10.66 while version 240 has only 10.61. A difference of 0.05 may not look much but by experience I know it is a big deal.

So why is this difference coming from and the measuring of each process seperately gives a hint. Let's condider the 8th entry of each match.

c:\cc\240-8\param.txt - Time used : 3:22:04 [MIDG depth = 10.52)
c:\cc\220-8\param.txt - Time used : 3:21:19 [MIDG depth = 10.74)

That's a difference of 0.22.

So how does Windows divide the 8 matches over the 8 cores when starting cute? Do some cores byte each other while others remain hardly unused? The following screenshot hints ro that, note the load percentages of the 8 cores.

Image

On this level the typical NPS = 1.9M but I also had a case the NPS of 240 was 2.1M, 200,000 per sec more without any reasonable explanation.

So, what's going on?

[A] quit computer chess dummy.

Joost Buijs
Posts: 986
Joined: Thu Jul 16, 2009 8:47 am
Location: Almere, The Netherlands

Re: Core behaviour

Post by Joost Buijs » Wed Jun 28, 2017 2:55 pm

I don't think this is something to be concerned about. In my experience Windows does a very good job assigning processes to the right processor/core, even with hyperthreading enabled.

Since your system is dual processor I assume it is a NUMA system, I don't know if cutechess-cli handles this in a special way but it could be an explanation for the things you are seeing.

And there are many other things that can cause behavior like this, I guess you just have to live with it.

User avatar
Rebel
Posts: 4700
Joined: Thu Aug 18, 2011 10:04 am

Re: Core behaviour

Post by Rebel » Wed Jun 28, 2017 5:31 pm

Joost Buijs wrote: I don't think this is something to be concerned about. In my experience Windows does a very good job assigning processes to the right processor/core, even with hyperthreading enabled.
It means one can not test accurately.
Joost Buijs wrote:Since your system is dual processor I assume it is a NUMA system, I don't know if cutechess-cli handles this in a special way but it could be an explanation for the things you are seeing.
I think you are right about NUMA, it needs some special attention. On my development PC a normal quad setting the affinity does a good job.


Image

Not so on the Xeon 8 core.

I will have to dig deeper.

User avatar
Rebel
Posts: 4700
Joined: Thu Aug 18, 2011 10:04 am

Re: Core behaviour

Post by Rebel » Thu Jun 29, 2017 2:43 pm

Image

So I managed after all by setting the right affinities on a NUMA 2 node system.

I would say that the Windows scheduler isn't so good at all.

Wish someone more knowledgable would comment on this, looks pretty serious.

User avatar
Rebel
Posts: 4700
Joined: Thu Aug 18, 2011 10:04 am

Re: Core behaviour

Post by Rebel » Fri Jun 30, 2017 9:50 am

UPDATE

I have been fiddling with affinity. I have a match running against a 50 elo stronger engine and yet I get a 62.5% score representing and increase of 140 elo (50+90).

Nice cheat!

I suppose you can do it with: SetThreadAffinityMask in an engine and damage your opponent.

To be continued.

Joost Buijs
Posts: 986
Joined: Thu Jul 16, 2009 8:47 am
Location: Almere, The Netherlands

Re: Core behaviour

Post by Joost Buijs » Fri Jun 30, 2017 10:58 am

Strange that in your case changing the affinity-mask has such a big influence. I've been experimenting with SetThreadAffinityMask and SetProcessAffinityMask many times but I could never find a significant difference between setting the masks or not.

The two systems I currently use are SMP and not NUMA (5960X and 6950X), I guess that will make a difference.

Usually I configure the BIOS in such a way that the cores are always running on the same clock-frequency being under load or not, this gives me the least noise when testing, I guess this is only possible with K and X processors and not with Xeon, however I'm not sure about this because I haven't used Xeon processors for at least 8 years.

Besides changing the affinity masks you have to make sure that you allocate your memory on the right processor with VirtualAllocExNuma, this can certainly make a difference.

bob
Posts: 20550
Joined: Mon Feb 27, 2006 6:30 pm
Location: Birmingham, AL

Re: Core behaviour

Post by bob » Sun Jul 02, 2017 3:37 am

Rebel wrote:UPDATE

I have been fiddling with affinity. I have a match running against a 50 elo stronger engine and yet I get a 62.5% score representing and increase of 140 elo (50+90).

Nice cheat!

I suppose you can do it with: SetThreadAffinityMask in an engine and damage your opponent.

To be continued.
Windows has NEVER been very good from a benchmarking perspective. Getting repeatable results is hard enough just due to all the hardware crap that goes on (hyper-threading, clock frequency scaling, memory issues like NUMA, cache and thread bouncing...

All of my testing was done on a linux cluster running a lightweight kernel that didn't do anything tricky whatsoever. No paging / swapping, etc...

User avatar
Rebel
Posts: 4700
Joined: Thu Aug 18, 2011 10:04 am

Re: Core behaviour

Post by Rebel » Mon Jul 03, 2017 7:06 am

Joost Buijs wrote:Strange that in your case changing the affinity-mask has such a big influence. I've been experimenting with SetThreadAffinityMask and SetProcessAffinityMask many times but I could never find a significant difference between setting the masks or not.

The two systems I currently use are SMP and not NUMA (5960X and 6950X), I guess that will make a difference.

Usually I configure the BIOS in such a way that the cores are always running on the same clock-frequency being under load or not, this gives me the least noise when testing, I guess this is only possible with K and X processors and not with Xeon, however I'm not sure about this because I haven't used Xeon processors for at least 8 years.

Besides changing the affinity masks you have to make sure that you allocate your memory on the right processor with VirtualAllocExNuma, this can certainly make a difference.
The Windows schedular always assigns any program to all available cores, like this.
Image

I found out that on a normal quad running a match these are the best settings:

Image Image Image

So on a quad you run no more than 3 matches and pin engine_x1 and engine_y1 to core-1, pin engine_x2 and engine_y2 to core-2, pin engine_x3 and engine_y3 to core-3.

The 4th core is for everything else running in the background.

Now I only have to write an utility for that.

User avatar
Rebel
Posts: 4700
Joined: Thu Aug 18, 2011 10:04 am

Re: Core behaviour

Post by Rebel » Mon Jul 03, 2017 7:30 am

bob wrote:
Rebel wrote:UPDATE

I have been fiddling with affinity. I have a match running against a 50 elo stronger engine and yet I get a 62.5% score representing and increase of 140 elo (50+90).

Nice cheat!

I suppose you can do it with: SetThreadAffinityMask in an engine and damage your opponent.

To be continued.
Windows has NEVER been very good from a benchmarking perspective. Getting repeatable results is hard enough just due to all the hardware crap that goes on (hyper-threading, clock frequency scaling, memory issues like NUMA, cache and thread bouncing...

All of my testing was done on a linux cluster running a lightweight kernel that didn't do anything tricky whatsoever. No paging / swapping, etc...
And it doesn't worry you that ill intended programmers can manipulate matches on rating lists with such huge negative results as my example above shows?

Or without ill intensions and just tune the cores to optimze their engine will likely harm the opponent engine as well to some extend. If one profits, the second will lose. The latter needs research.

bob
Posts: 20550
Joined: Mon Feb 27, 2006 6:30 pm
Location: Birmingham, AL

Re: Core behaviour

Post by bob » Tue Jul 04, 2017 12:02 am

My take here is that there is little use in worrying about something you can't fix. A process can always spawn 99 cpu burners that are quiet when the engine is searching, burning cycles when the opponent is searching. You can do similar things to blow out cache when running on a single machine. At some point, you have to depend on the testers realizing what is going on.

Need I mention the old chessbase nonsense with winboard engines (new game, stuff entire move list to program before each and every move, negating pondering, etc???

The way I mentioned testing is a lot safer. NO multitasking going on. No way to do multitasking from your opponent...

And still the GUI can interfere unless you do as I did and write your own so that the only variable is the opponent engines, and they are in a sandbox with impossible-to-escape walls.

Post Reply