I ran searches of the opening position using 1 and 2 cores, and the times to reach the various depths were:
Code: Select all
depth 1 2 cores
6 12 12
7 23 20
8 43 34
9 90 62
10 185 175
11 514 370
12 1714 1050
13 3342 3003
14 7704 11078
15 23375 23536
In this raw output I do not only print the usual depth / score / time / nodes, but after that the number of hash probes, hash hits to entries written by the probing process, and hash to hits written by another process. (For testing purposes I stored this ID of the writing process in the hash entry.) All the counts are only from the processes themselves.
If I look at the 12-ply results, which is the largest depth for which they showed the same PV, the single-core case reached 1,997 knps. This drops to 1,635 knps with two cores, but the 1,602 knps of the slave process has of course to be added to that. Even so, the speed per process seems to be some 20% lower. I don't see such a nps decrease when I run two independent engine processes, so this must be overhead due to accessing shared memory.
It can also be seen that of the 16.8M nodes searched by the master process (for 12ply) there were 10M hash probes. (The rest must have been stand-pat cutoffs, which do not get toprobing the hash.) And that of these 2M were hits, 60% from the own process, 40% from the slave. The latter must be responsible for the speedup.
For the 10-ply case, where there was little speedup, there were 1.8M probes, 400k hits, about evenly distributed between the two sources. So not very from the 12-ply result.
Are these results typical for what others are seeing? The decrease in NPS/core seems a bit disappointing. Perhaps this is caused by a memory bottleneck in my laptop, so that I should betterswitch to testing on a bigger machine.
Code: Select all
SINGLE CORE
tellics say HaQiKi D 1.7d SMP
tellics say by H.G. Muller
> memory 256
> new
> variant xiangqi
> level 40 5 0
> time 1000000
> post
> go
1 44 0 76 36 4 0 proc0 f0e1
2 12 0 727 569 71 0 proc0 f0e1 f9e8
3 16 1 3078 1577 283 0 proc0 f0e1 f9e8 g3g4
4 24 1 21175 12300 2153 0 proc0 h2h6 b9c7 f0e1 c6c5
5 24 3 44229 26648 4773 0 proc0 f0e1 h7h3 g3g4 f9e8 c3c4
6 16 12 195637 123979 20743 0 proc0 h2e2 b7b6 f0e1 h7e7 b2b3 f9e8
7 20 23 404724 254746 46304 0 proc0 f0e1 f9e8 g3g4 c6c5 h0g2 b9c7 g2f4
8 16 43 791151 490244 94660 0 proc0 f0e1 f9e8 g3g4 c6c5 h0g2 b7b3 i0i1 b9c7
9 16 90 1696417 1028450 191668 0 proc0 f0e1 f9e8 g3g4 c6c5 h2e2 b9c7 h0g2 h7h3 g2f4
10 20 185 3592476 2172270 421711 0 proc0 f0e1 f9e8 g3g4 g9e7 h0g2 h9f8 g0e2 i9f9 i0f0 c6c5
11 20 514 10113875 6111370 1147416 0 proc0 f0e1 f9e8 g3g4 g9e7 h0g2 h9f8 g0e2 i9f9 i0f0 h7h3 c3c4
12 20 1714 34236944 20516263 3684090 0 proc0 f0e1 f9e8 g3g4 g9e7 h0g2 h9f8 g0e2 i9f9 i0f0 c6c5 g2f4 b9c7
13 20 3342 67688750 40199511 7149910 0 proc0 f0e1 f9e8 g3g4 g9e7 h0g2 h9f8 g0e2 i9f9 i0f0 c6c5 f0f6 h7h3 c3c4 c5c4 e2c4
14 20 7704 156402542 92280255 15623050 0 proc0 f0e1 f9e8 g3g4 g9e7 h0g2 h9f8 g0e2 i9f9 i0f0 c6c5 b2b6 b9c7 g2f4 h7g7
15 20 23375 472004116 276810109 42459029 0 proc0 f0e1 f9e8 g3g4 b7g7 c3c4 g6g5 g4g5 g7g0 h0i2 h7g7 b2e2 b9c7 g5f5 g0g3 i0f0
# stop iterating, time=233766, tl=116279, id=15 md=63 reps=2228 perps=1318
move f0e1
quit
2 CORES, MASTER PROCESS
tellics say HaQiKi D 1.7d SMP
tellics say by H.G. Muller
> cores 2
> memory 256
> new
> variant xiangqi
> level 40 5 0
> time 1000000
> post
> go
1 44 0 54 15 0 9 proc2 f0e1
2 12 0 327 238 6 55 proc2 f0e1
3 16 1 2234 1026 48 248 proc2 f0e1 f9e8
4 24 1 17640 10265 631 1674 proc2 h2h6 b9c7 f0e1 c6c5
5 24 3 37057 22395 1500 3515 proc2 f0e1 h7h3 g3g4 f9e8 c3c4
6 16 12 153286 97046 7835 13485 proc2 h2e2 b7b6 f0e1 h7e7 b2b3 f9e8
7 20 20 284689 180726 16977 23395 proc2 f0e1 f9e8 g3g4 c6c5 h0g2 b9c7 g2f4
8 16 34 512585 319958 34258 42523 proc2 f0e1 f9e8 g3g4 c6c5 h0g2 b7b3 i0i1 b9c7
9 16 62 1017669 621496 65475 83776 proc2 f0e1 f9e8 g3g4 c6c5 h2e2 b9c7 h0g2 h7h3 g2f4
10 20 175 2926448 1799590 203108 202104 proc2 f0e1 f9e8 g3g4 g9e7 h0g2 h9f8 g0e2 i9f9 i0f0 c6c5
11 20 370 6109947 3730090 430238 397176 proc2 f0e1 f9e8 g3g4 g9e7 h0g2 h9f8 g0e2 i9f9
12 20 1050 17164202 10200040 1170140 807093 proc2 f0e1 f9e8 g3g4 g9e7 h0g2 h9f8 g0e2 i9f9 i0f0 c6c5
13 20 3003 50039843 29685875 3327902 2172989 proc2 f0e1 f9e8 g3g4 g9e7 h0g2 h9f8 g0e2 i9f9 i0f0 c6c5 f0f6 h7h3 c3c4 c5c4 e2c4
14 20 11078 185478140 109582190 12631656 5890829 proc2 f0e1 f9e8 g3g4 b7g7 b2e2 b9c7 h0g2 g7g4 g2f4 c6c5 g0i2 g4g3 i0g0 g3c3 g0g6
15 20 23536 398692002 234644820 29374260 9615268 proc2 f0e1 f9e8 g3g4 c6c5 h0g2 b7e7 g0e2 b9c7 i0f0 a9b9 b2c2 b9b5 c3c4 c5c4 e2c4
# stop iterating, time=235360, tl=116279, id=15 md=63 reps=1654 perps=559
move f0e1
2 CORES, SLAVE PROCESS
tellics say HaQiKi D 1.7d SMP
tellics say by H.G. Muller
> cores 1
> attach 4194303 HashMem120500296
> new
> force
> variant xiangqi
> analyze
1 -44 0 76 36 4 0 proc1 f0e1
2 -12 0 678 530 59 26 proc1 f0e1 f9e8
3 -16 1 3009 1522 242 59 proc1 f0e1 f9e8 g3g4
4 -24 1 18944 11070 1480 849 proc1 h2h6 b9c7 f0e1 c6c5
5 -24 3 39290 23813 3056 2074 proc1 f0e1 h7h3 g3g4 f9e8
6 -16 12 157117 99533 11485 10667 proc1 h2e2 b7b6 f0e1 h7e7 b2b3 f9e8
7 -20 20 293153 187715 24439 19189 proc1 f0e1 f9e8 g3g4 c6c5
8 -16 34 515031 325751 42077 35991 proc1 f0e1 f9e8 g3g4 c6c5 h0g2 b7b3 i0i1 b9c7
9 -16 62 1012518 625025 77449 72163 proc1 f0e1 f9e8 g3g4 c6c5 h2e2 b9c7 h0g2 h7h3 g2f4
10 -20 175 2901145 1794147 219394 188840 proc1 f0e1 f9e8 g3g4 g9e7 h0g2 h9f8 g0e2 i9f9 i0f0 c6c5
11 -20 370 6011465 3685086 454099 367813 proc1 f0e1 f9e8 g3g4 g9e7 h0g2 h9f8 g0e2 i9f9 i0f0 h7h3 c3c4
12 -20 1050 16817093 10168706 1259155 808277 proc1 f0e1 f9e8 g3g4 g9e7 h0g2 h9f8 g0e2 i9f9 i0f0 c6c5 g2f4 b9c7
13 -20 3003 49335980 29463640 3565563 2014712 proc1 f0e1 f9e8 g3g4 g9e7 h0g2 h9f8 g0e2 i9f9 i0f0 c6c5 f0f6 h7h3 c3c4 c5c4 e2c4
14 -20 11078 182801534 108499592 13229123 5099258 proc1 f0e1 f9e8 g3g4 c6c5 h0g2 b7e7 g0e2 b9c7 i0f0 a9b9 b2c2 b9b5 c3c4 c5c4 e2c4
15 -20 23536 393714646 231360218 28944753 8944958 proc1 f0e1 f9e8 g3g4 c6c5 h0g2 b7e7 g0e2 b9c7 i0f0 a9b9 b2c2
> exit
# stop iterating, time=235360, tl=0, id=15 md=63 reps=2984 perps=1189
f0e1