Another Crafty-23.1 Nehalem scaling problem

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

zullil
Posts: 6442
Joined: Tue Jan 09, 2007 12:31 am
Location: PA USA
Full name: Louis Zulli

Another Crafty-23.1 Nehalem scaling problem

Post by zullil »

I recently upgraded from Mac OS X 10.5.8 (Leopard) to Mac OS X 10.6.2 (Snow Leopard) on my dual-quad Nehalem Mac Pro. Crafty-23.1 seems to scale poorly under 10.6.2. This immediately reminded me of the following thread:

http://www.talkchess.com/forum/viewtopic.php?t=30952

Since I retained the 10.5.8 system on the machine, I was able to do some comparison testing. I used a clean copy of Crafty-23.1 on each system, and compiled with "make darwin". All I added was -DCPUS=8.

The results are summarized below, as is basic information on my hardware. I am about to do a similar test using Stockfish-1.6.2, to see if this issue is specific to Crafty.

Code: Select all

Leopard System Results

file ./crafty: Mach-O 64-bit executable x86_64

gcc-4.2 -v
Using built-in specs.
Target: i686-apple-darwin9
Configured with: /var/tmp/gcc_42/gcc_42-5577~1/src/configure --disable-checking 
--enable-werror --prefix=/usr --mandir=/usr/share/man --enable-languages=c,objc,c++,obj-c++ 
--program-transform-name=/^[cg][^.-]*$/s/$/-4.2/ --with-slibdir=/usr/lib 
--build=i686-apple-darwin9 --with-gxx-include-dir=/usr/include/c++/4.0.0 
--host=i686-apple-darwin9 --target=i686-apple-darwin9
Thread model: posix
gcc version 4.2.1 (Apple Inc. build 5577)

./crafty 
unable to open book file [./book.bin].
book is disabled
unable to open book file [./books.bin].

Crafty v23.1 (1 cpus)

White(1): bench
Running benchmark. . .
......
Total nodes: 123805390
Raw nodes per second: 2840224
Total elapsed time: 43.59
White(1): quit


./crafty 
unable to open book file [./book.bin].
book is disabled
unable to open book file [./books.bin].

Crafty v23.1 (1 cpus)

White(1): mt=8
max threads set to 8.
White(1): bench
Running benchmark. . .
......
Total nodes: 164571163
Raw nodes per second: 13293308
Total elapsed time: 12.38
White(1): quit

---------------------------------------------------------------------------------------

Snow Leopard System Results 

file ./crafty: Mach-O 64-bit executable x86_64

gcc -v
Using built-in specs.
Target: i686-apple-darwin10
Configured with: /var/tmp/gcc/gcc-5646.1~2/src/configure --disable-checking 
--enable-werror --prefix=/usr --mandir=/share/man --enable-languages=c,objc,c++,obj-c++ 
--program-transform-name=/^[cg][^.-]*$/s/$/-4.2/ --with-slibdir=/usr/lib 
--build=i686-apple-darwin10 --with-gxx-include-dir=/include/c++/4.2.1
--program-prefix=i686-apple-darwin10- --host=x86_64-apple-darwin10 --target=i686-apple-darwin10
Thread model: posix
gcc version 4.2.1 (Apple Inc. build 5646) (dot 1)

./crafty 
unable to open book file [./book.bin].
book is disabled
unable to open book file [./books.bin].

Crafty v23.1 (1 cpus)

White(1): bench
Running benchmark. . .
......
Total nodes: 123805390
Raw nodes per second: 2859246
Total elapsed time: 43.30
White(1): quit

./crafty 
unable to open book file [./book.bin].
book is disabled
unable to open book file [./books.bin].

Crafty v23.1 (1 cpus)

White(1): mt=8
max threads set to 8.
White(1): bench
Running benchmark. . .
......
Total nodes: 186142575
Raw nodes per second: 9102326
Total elapsed time: 20.45
White(1): quit




My hardware:

  Model Name:   Mac Pro 
  Model Identifier:     MacPro4,1 
  Processor Name:       Quad-Core Intel Xeon 
  Processor Speed:      2.26 GHz 
  Number Of Processors: 2 
  Total Number Of Cores:        8 
  L2 Cache (per processor):     512 KB 
  L3 Cache:     8 MB 
  Memory:       6 GB 
  Processor Interconnect Speed: 5.86 GT/s 
zullil
Posts: 6442
Joined: Tue Jan 09, 2007 12:31 am
Location: PA USA
Full name: Louis Zulli

Re: Another Crafty-23.1 Nehalem scaling problem

Post by zullil »

The issue is also present when I compile with icc. And the issue does not seem to affect Stockfish, though Stockfish already scales badly (as measured by nps) when compared to Crafty.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Another Crafty-23.1 Nehalem scaling problem

Post by bob »

zullil wrote:I recently upgraded from Mac OS X 10.5.8 (Leopard) to Mac OS X 10.6.2 (Snow Leopard) on my dual-quad Nehalem Mac Pro. Crafty-23.1 seems to scale poorly under 10.6.2. This immediately reminded me of the following thread:

http://www.talkchess.com/forum/viewtopic.php?t=30952

Since I retained the 10.5.8 system on the machine, I was able to do some comparison testing. I used a clean copy of Crafty-23.1 on each system, and compiled with "make darwin". All I added was -DCPUS=8.

The results are summarized below, as is basic information on my hardware. I am about to do a similar test using Stockfish-1.6.2, to see if this issue is specific to Crafty.

Code: Select all

Leopard System Results

file ./crafty: Mach-O 64-bit executable x86_64

gcc-4.2 -v
Using built-in specs.
Target: i686-apple-darwin9
Configured with: /var/tmp/gcc_42/gcc_42-5577~1/src/configure --disable-checking 
--enable-werror --prefix=/usr --mandir=/usr/share/man --enable-languages=c,objc,c++,obj-c++ 
--program-transform-name=/^[cg][^.-]*$/s/$/-4.2/ --with-slibdir=/usr/lib 
--build=i686-apple-darwin9 --with-gxx-include-dir=/usr/include/c++/4.0.0 
--host=i686-apple-darwin9 --target=i686-apple-darwin9
Thread model: posix
gcc version 4.2.1 (Apple Inc. build 5577)

./crafty 
unable to open book file [./book.bin].
book is disabled
unable to open book file [./books.bin].

Crafty v23.1 (1 cpus)

White(1): bench
Running benchmark. . .
......
Total nodes: 123805390
Raw nodes per second: 2840224
Total elapsed time: 43.59
White(1): quit


./crafty 
unable to open book file [./book.bin].
book is disabled
unable to open book file [./books.bin].

Crafty v23.1 (1 cpus)

White(1): mt=8
max threads set to 8.
White(1): bench
Running benchmark. . .
......
Total nodes: 164571163
Raw nodes per second: 13293308
Total elapsed time: 12.38
White(1): quit

---------------------------------------------------------------------------------------

Snow Leopard System Results 

file ./crafty: Mach-O 64-bit executable x86_64

gcc -v
Using built-in specs.
Target: i686-apple-darwin10
Configured with: /var/tmp/gcc/gcc-5646.1~2/src/configure --disable-checking 
--enable-werror --prefix=/usr --mandir=/share/man --enable-languages=c,objc,c++,obj-c++ 
--program-transform-name=/^[cg][^.-]*$/s/$/-4.2/ --with-slibdir=/usr/lib 
--build=i686-apple-darwin10 --with-gxx-include-dir=/include/c++/4.2.1
--program-prefix=i686-apple-darwin10- --host=x86_64-apple-darwin10 --target=i686-apple-darwin10
Thread model: posix
gcc version 4.2.1 (Apple Inc. build 5646) (dot 1)

./crafty 
unable to open book file [./book.bin].
book is disabled
unable to open book file [./books.bin].

Crafty v23.1 (1 cpus)

White(1): bench
Running benchmark. . .
......
Total nodes: 123805390
Raw nodes per second: 2859246
Total elapsed time: 43.30
White(1): quit

./crafty 
unable to open book file [./book.bin].
book is disabled
unable to open book file [./books.bin].

Crafty v23.1 (1 cpus)

White(1): mt=8
max threads set to 8.
White(1): bench
Running benchmark. . .
......
Total nodes: 186142575
Raw nodes per second: 9102326
Total elapsed time: 20.45
White(1): quit




My hardware:

  Model Name:   Mac Pro 
  Model Identifier:     MacPro4,1 
  Processor Name:       Quad-Core Intel Xeon 
  Processor Speed:      2.26 GHz 
  Number Of Processors: 2 
  Total Number Of Cores:        8 
  L2 Cache (per processor):     512 KB 
  L3 Cache:     8 MB 
  Memory:       6 GB 
  Processor Interconnect Speed: 5.86 GT/s 
Very first thing to try, for a sanity check. What happens if you use the default make option (do you have or can you install intel C?) No one close by me has such a box from apple, but I will post a query at the office to see if anything interesting is to be found. I have run on a couple lf Nehalem boxes and had no problems. But every platform is different. I see no reason why it won't scale, but I will try to run some tests on my 8-core box at the office to see if anything looks suspicious. For reference, can you also test 22.9, 23.0 and 23.1 on the same hardware to make sure that we didn't somehow unknowingly break something along the way?

As a last resort, if you can set it up to allow a remote login at some point, I can play with it remotely to see if anything obvious jumps out. Scaling is almost always related to memory/cache issues, but I have seen some compilers that would cause this to go haywire because of some oddball optimization it would try that would break something.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Another Crafty-23.1 Nehalem scaling problem

Post by bob »

zullil wrote:The issue is also present when I compile with icc. And the issue does not seem to affect Stockfish, though Stockfish already scales badly (as measured by nps) when compared to Crafty.
Your last comment, then is not very helpful. If it already scales badly, then it is unlikely it would do much worse. But my first guess is still some issue with memory and/or cache.

I asked in the other reply but again, can you also try 22.9 and 23.0 to see if they behave differently? If not, then we need to look at the hardware issues a bit more closely. If one scales fine and another doesn't, then that indicates a crafty-related problem.
zullil
Posts: 6442
Joined: Tue Jan 09, 2007 12:31 am
Location: PA USA
Full name: Louis Zulli

Re: Another Crafty-23.1 Nehalem scaling problem

Post by zullil »

Hi Bob,

I'll try out earlier versions of Crafty next.

I retested 23.1, this time compiling with icc. I used the following Makefile target:

Code: Select all

darwin:
               $(MAKE) target=FreeBSD \
               CC=icc CXX=icc \
               CFLAGS='$(CFLAGS) -Wall -O2 -m64' \
               CXFLAGS='$(CFLAGS) -Wall -O2 -m64' \
               LDFLAGS='$(LDFLAGS)' \
               LIBS='-lpthread -lstdc++' \
               opt='-DCPUS=8 -DINLINE64' \
               crafty-make
On both my systems, running bench with 1 cpu gave about 3.4 million nps. On the older system, bench with 8 cpus gave 15.4 million nps. On the newer system (Snow Leopard), bench with 8 cpus gave just 12.2 million nps.

I shall try to test 23.0 and 22.9 in the same manner.
zullil
Posts: 6442
Joined: Tue Jan 09, 2007 12:31 am
Location: PA USA
Full name: Louis Zulli

Re: Another Crafty-23.1 Nehalem scaling problem

Post by zullil »

Please disregard my most recent post, which contains incorrect (transposed) data.

The situation is now very confusing. (I checked the following carefully; no mistakes in this post.)

On both my 10.5.8 and 10.6.2 systems, I compiled both 22.9 and 23.1.

For all four builds I used

Code: Select all

darwin:
               $(MAKE) target=FreeBSD \
               CC=icc CXX=icc \
               CFLAGS='$(CFLAGS) -Wall -O2 -m64' \
               CXFLAGS='$(CFLAGS) -Wall -O2 -m64' \
               LDFLAGS='$(LDFLAGS)' \
               LIBS='-lpthread -lstdc++' \
               opt='-DCPUS=8 -DINLINE64' \
               crafty-make
Here's the data, all in millions of nps:

Crafty 22.9 on OS X 10.5.8:
1 cpu: 2.9
8 cpu: 11.3

Crafty 22.9 on OS X 10.6.2:
1 cpu: 3.0
8 cpu: 9.4

(So for 22.9, scaling is worse on the newer OS.)

Crafty 23.1 on OS X 10.5.8:
1 cpu: 3.4
8 cpu: 12.2

Crafty 23.1 on OS X 10.6.2:
1 cpu: 3.4
8 cpu: 15.4

(So for 23.1, scaling is better on the newer OS. [But not this morning when I made pgo icc builds.])

I'm totally confused now.
zullil
Posts: 6442
Joined: Tue Jan 09, 2007 12:31 am
Location: PA USA
Full name: Louis Zulli

Re: Another Crafty-23.1 Nehalem scaling problem

Post by zullil »

Finally had a chance to include 23.0. Sorry for the delay.


Here's the data, all in millions of nps:

Code: Select all

Crafty 22.9 on OS X 10.5.8: 
1 cpu: 2.9 
8 cpu: 11.3 

Crafty 22.9 on OS X 10.6.2: 
1 cpu: 3.0 
8 cpu: 9.4 

(So for 22.9, scaling is worse on the newer OS.)

Crafty 23.0 on OS X 10.5.8:
1cpu: 3.1
8cpu: 8.2

Crafty 23.0 on OS X 10.6.2:
1cpu: 2.9
8cpu: 15.3

(So for 23.0, scaling is better on the newer OS.)

Crafty 23.1 on OS X 10.5.8: 
1 cpu: 3.4 
8 cpu: 12.2 

Crafty 23.1 on OS X 10.6.2: 
1 cpu: 3.4 
8 cpu: 15.4 

(So for 23.1, scaling is better on the newer OS.)
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Another Crafty-23.1 Nehalem scaling problem

Post by bob »

Here's what I would expect:

log.001: time=30.20 mat=0 n=97218373 fh=95% nps=3.2M
log.002: time=30.35 mat=0 n=198382851 fh=95% nps=6.5M
log.003: time=31.01 mat=0 n=399493603 fh=94% nps=12.9M
log.004: time=30.89 mat=0 n=690102470 fh=94% nps=22.3M

I ran the same position for 30 seconds, using 1, 2, 4 and 8 cpus. Scaling is about 7 on this box with the current version (almost identical to 23.1). I had slightly better scaling numbers on a nehalem, but we don't currently have one up and running... But Nehalem ought to be somewhat better than this core-2 xeon box since the Nehalem has a better memory system.
zullil
Posts: 6442
Joined: Tue Jan 09, 2007 12:31 am
Location: PA USA
Full name: Louis Zulli

Re: Another Crafty-23.1 Nehalem scaling problem

Post by zullil »

bob wrote:Here's what I would expect:

log.001: time=30.20 mat=0 n=97218373 fh=95% nps=3.2M
log.002: time=30.35 mat=0 n=198382851 fh=95% nps=6.5M
log.003: time=31.01 mat=0 n=399493603 fh=94% nps=12.9M
log.004: time=30.89 mat=0 n=690102470 fh=94% nps=22.3M

I ran the same position for 30 seconds, using 1, 2, 4 and 8 cpus. Scaling is about 7 on this box with the current version (almost identical to 23.1). I had slightly better scaling numbers on a nehalem, but we don't currently have one up and running... But Nehalem ought to be somewhat better than this core-2 xeon box since the Nehalem has a better memory system.
Something seems amiss with my icc, so I switched back to gcc. This is on the new Snow Leopard system. Scaling seems pretty good, I guess. Will repeat this exact experiment with my old Leopard system on the same hardware.

Code: Select all

darwin:
        $(MAKE) target=FreeBSD \
                CC=gcc CXX=g++ \
                CFLAGS='$(CFLAGS) -O3 -msse4.2' \
                CXFLAGS='$(CFLAGS) -O3 -msse4.2' \
                LDFLAGS=$(LDFLAGS) \
                LIBS='-lpthread -lstdc++' \
                opt='-DCPUS=8 -DINLINE64' \
                crafty-make

Code: Select all

max threads set to 1.
Crafty v23.1 (1 cpus)
White(1): setboard 1rbr2k1/1q2bpp1/2pppn2/6B1/p3P3/2N2P2/PPP4P/1K1RQBR1 w - - 1 19

time=30.59  mat=0  n=97395988  fh=91%  nps=3.2M
extensions=3.3M qchecks=2.9M reduced=7.8M pruned=38.2M
predicted=0  evals=44.0M  50move=0  EGTBprobes=0  hits=0
SMP->  splits=0  aborts=0  data=0/512  elap=30.59


max threads set to 2.
Crafty v23.1 (2 cpus)
White(1): setboard 1rbr2k1/1q2bpp1/2pppn2/6B1/p3P3/2N2P2/PPP4P/1K1RQBR1 w - - 1 19

time=31.11  mat=0  n=175166321  fh=91%  nps=5.6M
extensions=5.9M qchecks=5.1M reduced=13.6M pruned=66.9M
predicted=0  evals=81.0M  50move=0  EGTBprobes=0  hits=0
SMP->  splits=490  aborts=69  data=5/512  elap=31.11


max threads set to 4.
Crafty v23.1 (4 cpus)
White(1): setboard 1rbr2k1/1q2bpp1/2pppn2/6B1/p3P3/2N2P2/PPP4P/1K1RQBR1 w - - 1 19

time=30.93  mat=0  n=345958868  fh=91%  nps=11.2M
extensions=12.7M qchecks=11.2M reduced=27.4M pruned=137.6M
predicted=0  evals=153.1M  50move=0  EGTBprobes=0  hits=0
SMP->  splits=4036  aborts=648  data=14/512  elap=30.93


max threads set to 8.
Crafty v23.1 (8 cpus)
White(1): setboard 1rbr2k1/1q2bpp1/2pppn2/6B1/p3P3/2N2P2/PPP4P/1K1RQBR1 w - - 1 19

time=30.78  mat=0  n=614520779  fh=90%  nps=20.0M
extensions=24.6M qchecks=22.7M reduced=48.3M pruned=250.0M
predicted=0  evals=262.9M  50move=0  EGTBprobes=0  hits=0
SMP->  splits=72982  aborts=13450  data=41/512  elap=30.78

zullil
Posts: 6442
Joined: Tue Jan 09, 2007 12:31 am
Location: PA USA
Full name: Louis Zulli

Re: Another Crafty-23.1 Nehalem scaling problem

Post by zullil »

Here are results from my 10.5.8 Leopard system on the same box. I'm coming to the conclusion that OS X is still learning to deal with Nehalem. I'm giving up on this now--too frustrating. Thanks for the help.

Code: Select all

max threads set to 1.
Crafty v23.1 (1 cpus)
White(1): setboard 1rbr2k1/1q2bpp1/2pppn2/6B1/p3P3/2N2P2/PPP4P/1K1RQBR1 w - - 1 19

time=30.79  mat=0  n=97395988  fh=91%  nps=3.2M
extensions=3.3M qchecks=2.9M reduced=7.8M pruned=38.2M
predicted=0  evals=44.0M  50move=0  EGTBprobes=0  hits=0
SMP->  splits=0  aborts=0  data=0/512  elap=30.79


max threads set to 2.

Crafty v23.1 (2 cpus)

White(1): setboard 1rbr2k1/1q2bpp1/2pppn2/6B1/p3P3/2N2P2/PPP4P/1K1RQBR1 w - - 1 19

time=30.44  mat=0  n=163726848  fh=91%  nps=5.4M
extensions=5.8M qchecks=5.1M reduced=12.9M pruned=64.3M
predicted=0  evals=73.3M  50move=0  EGTBprobes=0  hits=0
SMP->  splits=523  aborts=84  data=7/512  elap=30.44


max threads set to 4.
Crafty v23.1 (4 cpus)
White(1): setboard 1rbr2k1/1q2bpp1/2pppn2/6B1/p3P3/2N2P2/PPP4P/1K1RQBR1 w - - 1 19

time=30.32  mat=0  n=301277071  fh=91%  nps=9.9M
extensions=11.1M qchecks=9.8M reduced=24.3M pruned=120.4M
predicted=0  evals=132.3M  50move=0  EGTBprobes=0  hits=0
SMP->  splits=3574  aborts=643  data=15/512  elap=30.32


max threads set to 8.
Crafty v23.1 (8 cpus)
White(1): setboard 1rbr2k1/1q2bpp1/2pppn2/6B1/p3P3/2N2P2/PPP4P/1K1RQBR1 w - - 1 19

time=30.05  mat=0  n=468325941  fh=91%  nps=15.6M
extensions=17.6M qchecks=15.8M reduced=37.9M pruned=192.6M
predicted=0  evals=200.4M  50move=0  EGTBprobes=0  hits=0
SMP->  splits=53739  aborts=9453  data=38/512  elap=30.05