Compare kns, Six programs using 4,8,12 threads

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Dann Corbit, Harvey Williamson

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
Post Reply
kgburcham
Posts: 2016
Joined: Sun Feb 17, 2008 3:19 pm

Compare kns, Six programs using 4,8,12 threads

Post by kgburcham » Fri Jul 08, 2011 6:43 pm

Notice Critter has poor scaling between 8 and 12 threads.
Hiarcs readme file says max is 16 threads, but will not use 12 threads.
I tried with Numa enabled and tried with Numa disabled in bios, not any dif.
Rebooted between each benchtest.
Notice scaling is good on Zappa, this is an old program, I doubt if Anthony had 12 threads to optimize his program.
Not sure what we can believe with programs anymore. Maybe Zappa scaling is not real.
Notice some solve quick even with poor scaling.
Some programs will stay at 100% in task manager but have poor scaling.
Most with poor scaling will not stay at 100% in task manager.
The programs that scale good stay at 100% in task manager.

Zappa Mexico II

12 threads

25/48 0:34 -2.13 47...Bh3 48.Kf2 Kf5 49.g3 Ke4 50.Bb2 Kd3 (618.225.267) 18165
26/55 1:32 -2.59 47...Bh3 48.Kf2 Kf5 49.g3 Ke4 50.Ke2 a3 (1.867.730.721) 20198
27/63 2:34 -2.64 47...Bh3 48.Kf2 Kf5 49.g3 Ke4 50.Ke2 a3 (3.261.566.331) 21112
28/63 4:41 -2.77 47...Bh3 48.Kf2 Kf5 49.g3 Ke4 50.Ke2 a3 (6.166.645.973) 21912

8 threads

24/60 2:28 -2.31 47...Bh3 48.Kh2 Bg4 49.Kg3 Kf5 50.Bb2 Ke4 (1.949.285.738) 13134
24/60 2:29 -2.31 47...Bh3 48.Kh2 Bg4 49.Kg3 Kf5 50.Bb2 Ke4 (1.955.105.542) 13106
25/61 4:55 -2.67 47...Bh3 48.Kf2 Kf5 49.Ke3 Bxg2 50.Bb2 Bh1 (4.190.393.257) 14168

4 threads

24/45 0:52 -1.65 47...a3 48.Kf2 Be4 49.g3 g5 50.hxg5 fxg5 (363.678.173) 6989
24/57 3:28 -2.41 47...Bh3 48.gxh3 Kf5 49.Kf2 Ke4 50.Bb2 f5 51.Ke1 Ke3 52.Ba3 d4 53.Bd6 f4 54.Ba3 Kd3 55.Kf2 (1.635.029.662) 7824
24/57 3:29 -2.41 47...Bh3 48.gxh3 Kf5 49.Kf2 Ke4 50.Bb2 f5 51.Ke1 Ke3 52.Ba3 d4 53.Bd6 f4 54.Ba3 Kd3 55.Kf2 (1.637.482.094) 7818

Critter 1.2 64-bit SSE4

12 threads

29/64 2:19 -2.06++ 47...Bh3 48.gxh3 Kf5 49.Kf2 Ke4 (2.646.814.377) 19016
29/64 2:36 -2.18++ 47...Bh3 48.gxh3 Kf5 49.Kf2 Ke4 (2.972.371.093) 18977
29/64 2:58 -2.36++ 47...Bh3 48.gxh3 Kf5 49.Kf2 Ke4 (3.386.695.229) 18982
29/64 3:32 -2.63++ 47...Bh3 48.gxh3 Kf5 49.Kf2 Ke4 (4.072.870.246) 19150
29/64 4:25 -3.03++ 47...Bh3 48.gxh3 Kf5 49.Kf2 Ke4 (5.199.174.678) 19565

8 threads

29/49 1:29 -1.94 47...Be4 48.Kf2 f5 49.g3 Kd7 (1.568.407.010) 17542
29/63 3:04 -2.06++ 47...Bh3 48.gxh3 Kf5 49.Kf2 Ke4 (3.299.742.415) 17903
29/63 3:27 -2.18++ 47...Bh3 48.gxh3 Kf5 49.Kf2 Ke4 (3.699.890.846) 17854
29/64 3:57 -2.36++ 47...Bh3 48.gxh3 Kf5 49.Kf2 Ke4 (4.261.475.894) 17914
29/70 5:04 -2.63++ 47...Bh3 48.gxh3 Kf5 49.Kf2 Ke4 (5.561.565.711) 18250

4 threads

28/53 1:33 -1.90 47...Be4 48.Kf2 f5 49.g3 Kd6 50.Bb4+ Kc6 (1.165.447.707) 12452
29/53 2:33 -1.90 47...Be4 48.Kf2 f5 49.g3 Kd6 50.Bb4+ Kc6 (1.903.715.198) 12435
30/53 4:07 -1.90 47...Be4 48.Kf2 f5 49.g3 Kd6 50.Bb4+ Kc6 (3.132.808.783) 12634

Deep Junior 12.5.0.3 UCI

12 threads

30.01 1:05 -1.13 47...Be4 48.g4 f5 49.h5 fxg4 50.h6 g5 51.h7 Bxh7 (2.165.707.280) 32883
31.01 1:16 -1.27 47...Be4 48.g4 f5 49.h5 fxg4 50.h6 g5 51.h7 Bxh7 52.Kf2 Kf5 (2.502.101.862) 32613
32.01 2:38 -1.31 47...Be4 48.g3 f5 49.Bd4 Kd6 50.Kf2 Kc6 (5.172.887.383) 32700
33.01 2:59 -1.31 47...Be4 48.g3 f5 49.Bd4 Kd6 50.Kf2 Kc6 51.Ke3 Kb5 (5.824.877.702) 32535
34.01 3:27 -1.31 47...Be4 48.g3 f5 49.Bd4 Kd6 50.Kf2 Kc6 51.Ke3 Kb5 (6.717.296.317) 32445
35.01 4:01 -1.31 47...Be4 48.g3 f5 49.Bd4 Kd6 50.Kf2 Kc6 51.Ke3 (7.848.601.512) 32442

8 threads

29.01 1:00 -1.02 47...Be4 48.Bd4 f5 49.Kf2 Kd6 50.Kg3 (1.323.530.398) 22052
30.01 1:23 -1.15 47...Be4 48.g4 f5 49.h5 fxg4 50.h6 g5 (1.837.998.654) 21995
31.01 1:35 -1.16 47...Be4 48.g4 f5 49.h5 fxg4 50.h6 g5 (2.091.636.914) 21858
32.01 2:06 -1.21 47...Be4 48.g4 f5 49.gxf5+ Bxf5 50.Kf2 Kd6 (2.742.151.157) 21733
33.01 2:43 -1.22 47...Be4 48.g4 f5 49.gxf5+ Kxf5 50.Kf2 Kf4 (3.492.002.863) 21417
34.01 3:45 -1.22 47...Be4 48.g4 f5 49.gxf5+ Kxf5 50.Kf2 Kf4 51.Ke2 a3 (4.680.163.551) 20710

4 threads

28.01 1:18 -0.98 47...Bc2 48.Kf2 f5 49.Ke3 Be4 (1.020.986.865) 13034
29.01 1:55 -0.97 47...Bc2 48.Kf2 Kf5 49.Ke3 a3 50.g4+ Kxg4 51.Bxf6 a2 (1.502.182.574) 12962
29.13 2:13 -1.13 47...Be4 48.g4 f5 49.h5 fxg4 50.h6 g5 51.h7 Bxh7 (1.716.417.252) 12825
30.01 2:23 -1.13 47...Be4 48.g4 f5 49.h5 fxg4 50.h6 g5 51.h7 Bxh7 (1.846.091.255) 12902
31.01 3:10 -1.32 47...Be4 48.Bd4 f5 49.Bb2 Kd6 50.Bd4 Kc6 (2.432.290.366) 12785
32.01 5:15 -1.32 47...Be4 48.Kf2 f5 49.g3 Kd6 50.Bd4 Kc6 (3.917.499.396) 12414

HIARCS 13.2 MP (2048 MB)

12 threads

28/44 0:21 -1.53 47...Be4 48.g3 f5 49.Kf2 Kd6 50.Bd4 Kc6 51.Ke2 Kb5 (207.252.551) 9643
29/44 0:28 -1.53 47...Be4 48.g3 f5 49.Kf2 Kd6 50.Bd4 Kc6 51.Ke2 Kb5 (278.884.527) 9716
30/47 0:46 -1.53 47...Be4 48.g3 f5 49.Kf2 Kd6 50.Bd4 Kc6 51.Ke2 Kb5 (461.132.718) 9983
31/50 3:27 -1.53 47...Be4 48.g3 f5 49.Kf2 Kd6 50.Bd4 Kc6 51.Ke2 Kb5 (2.417.857.560) 11665

8 threads

30/48 0:43 -1.50 47...Be4 48.g3 f5 49.Kf2 Kd6 50.Bd4 Kc6 (343.854.707) 7975
30/52 1:10 -1.50 47...Bh3 (563.702.929) 7970
30/52 1:45 -1.75++ 47...Bh3 (907.126.602) 8613
30/56 3:32 -2.74 47...Bh3 48.gxh3 Kf5 49.Kf2 Ke4 50.Bb2 f5 (2.026.871.273) 9545

4 threads

29/54 0:28 -1.57 47...Be4 48.Bb2 f5 49.g3 Kd6 50.Kf2 Kc5 (177.534.719) 6239
30/54 0:41 -1.57 47...Be4 48.Bb2 f5 49.g3 Kd6 50.Kf2 Kc5 (252.514.920) 6106
31/59 2:12 -1.57 47...Be4 48.Kf2 f5 49.g3 Kd6 50.Ba1 Kc5 (801.993.486) 6040
32/59 2:46 -1.57 47...Be4 48.Kf2 f5 49.g3 Kd6 50.Ba1 Kc5 (993.140.215) 5962
32/59 3:36 -1.58 47...Bh3 (1.274.626.698) 5899

Houdini 1.5ab-16 x64

12 threads

30/51 1:27 -1.47 47...Be4 48.g3 f5 49.Kf2 Kd6 50.Bb4+ Kc6 (2.243.711.372) 25716
30/68 2:38 -1.55++ 47...Bh3 (4.277.424.146) 27004
30/68 2:56 -1.71++ 47...Bh3 (4.781.677.819) 27120
30/70 3:42 -2.07++ 47...Bh3 (6.243.072.005) 28029
30/74 4:54 -2.07 47...Bh3 48.gxh3 Kf5 49.Kf2 Ke4 50.Bb2 f5 (9.172.729.975) 31112

8 threads

29/50 1:05 -1.49 47...Be4 48.g3 f5 49.Bb2 Kd6 50.Kf2 Kc5 (1.246.531.025) 19168
30/56 1:40 -1.49 47...Be4 48.g3 f5 49.Bb2 Kd6 50.Kf2 Kc5 (1.970.463.280) 19686
30/62 3:01 -1.57++ 47...Bh3 (3.758.007.868) 20760
30/62 3:25 -1.73++ 47...Bh3 (4.301.345.983) 20957
30/64 4:11 -2.09++ 47...Bh3 (5.439.449.512) 21604

4 threads

29/53 1:13 -1.50 47...Be4 48.g3 f5 49.Kf2 Kd6 50.Ke3 Kc5 (1.175.458.788) 15908
30/53 2:10 -1.50 47...Be4 48.g3 f5 49.Kf2 Kd6 50.Ke3 Kc5 (2.082.262.838) 15942
30/72 3:43 -1.58++ 47...Bh3 (3.609.318.783) 16166
30/72 4:07 -1.74++ 47...Bh3 (4.013.243.307) 16188

Stockfish 2.1.1 JA 64bit

12 threads

48/60 1:04 -2.90 47...a3 48.Kf2 Bc2 49.Ke3 Kf5 50.g3 a2 (1.333.228.237) 20700
49/60 1:14 -2.90 47...a3 48.Kf2 Bc2 49.Ke3 Kf5 50.g3 a2 (1.581.497.484) 21250
50/60 2:14 -2.90 47...a3 48.Kf2 Bc2 49.Ke3 Kf5 50.g3 a2 (3.084.444.326) 22954
51/60 4:51 -2.90 47...a3 48.Kf2 Bc2 49.Ke3 Kf5 50.g3 a2 (7.666.058.708) 26335
52/60 5:06 -2.90 47...a3 48.Kf2 Bc2 49.Ke3 Kf5 50.g3 a2 (8.078.547.948) 26331

8 threads

47/56 1:16 -2.90 47...Be4 48.Kf2 a3 49.g3 a2 50.Ke3 Kf5 (1.408.498.901) 18370
48/56 1:25 -2.90 47...Be4 48.Kf2 a3 49.g3 a2 50.Ke3 Kf5 (1.574.308.455) 18355
49/56 1:35 -2.90 47...Be4 48.Kf2 a3 49.g3 a2 50.Ke3 Kf5 (1.755.620.150) 18338
50/56 3:40 -2.90 47...Be4 48.Kf2 a3 49.g3 a2 50.Ke3 Kf5 (4.285.346.229) 19411
51/61 8:05 -2.90 47...Be4 48.Kf2 a3 49.g3 a2 50.Ke3 Kf5 (9.369.930.992) 19280

4 threads

45/53 1:41 -2.82 47...Be4 48.g3 a3 49.Kf2 Kf5 50.Ba1 Bh1 (1.115.203.804) 11022
46/54 1:55 -2.82 47...Be4 48.g3 a3 49.Kf2 Kf5 50.Ba1 Bh1 (1.267.913.621) 10979
47/54 2:07 -2.82 47...Be4 48.g3 a3 49.Kf2 Kf5 50.Ba1 Bh1 (1.394.745.765) 10948
48/54 2:41 -2.82 47...Be4 48.g3 a3 49.Kf2 Kf5 50.Ba1 Bh1 (1.769.133.725) 10933
49/54 3:35 -2.82 47...Be4 48.g3 a3 49.Kf2 Kf5 50.Ba1 Bh1 (2.353.594.489) 10905
50/62 4:12 -2.82 47...Be4 48.g3 a3 49.Kf2 Kf5 50.Ba1 Bh1 (2.769.527.004) 10954
51/62 4:40 -2.82 47...Be4 48.g3 a3 49.Kf2 Kf5 50.Ba1 Bh1 (3.073.588.559) 10977

[Site "Linares"]
[Date "1998"]
[Round "?"]
[White "Topalov"]
[Black "Shirov"]
[Result "0-1"]

1. d4 Nf6 2. c4 g6 3. Nc3 d5 4. cxd5 Nxd5 5. e4 Nxc3
6. bxc3 Bg7 7. Bb5+ c6 8. Ba4 O-O 9. Ne2 Nd7 10. O-O e5
11. f3 Qe7 12. Be3 Rd8 13. Qc2 Nb6 14. Bb3 Be6 15. Rad1 Nc4
16. Bc1 b5 17. f4 exd4 18. Nxd4 Bg4 19. Rde1 Qc5 20. Kh1 a5
21. h3 Bd7 22. a4 bxa4 23. Ba2 Be8 24. e5 Nb6 25. f5 Nd5
26. Bd2 Nb4 27. Qxa4 Nxa2 28. Qxa2 Bxe5 29. fxg6 hxg6
30. Bg5 Rd5 31. Re3 Qd6 32. Qe2 Bd7 33. c4 Bxd4 34. cxd5
Bxe3 35. Qxe3 Re8 36. Qc3 Qxd5 37. Bh6 Re5 38. Rf3 Qc5
39. Qa1 Bf5 40. Re3 f6 41. Rxe5 Qxe5 42. Qa2+ Qd5 43. Qxd5+
cxd5 44. Bd2 a4 45. Bc3 Kf7 46. h4 Ke6 47. Kg1 Bh3 48. gxh3
Kf5 49. Kf2 Ke4 50. Bxf6 d4 51. Be7 Kd3 52. Bc5 Kc4 53. Be7
Kb3 0-1

ernest
Posts: 1945
Joined: Wed Mar 08, 2006 7:30 pm

Re: Compare kns, Six programs using 4,8,12 threads

Post by ernest » Fri Jul 08, 2011 6:57 pm

Also strange, the poor scaling of Houdini between 4 and 8 threads...

We are far from the x1.5 to x2.0 for doubling the number of threads!
(for 1 => 2 threads, there used to be a talk of x1.7 but recently Bob was speaking of almost x2 !!! ...maybe I didn't understand him completely :) )

zullil
Posts: 6442
Joined: Mon Jan 08, 2007 11:31 pm
Location: PA USA
Full name: Louis Zulli

Re: Compare kns, Six programs using 4,8,12 threads

Post by zullil » Fri Jul 08, 2011 8:13 pm

kgburcham wrote:Notice Critter has poor scaling between 8 and 12 threads.

Critter 1.2 64-bit SSE4

12 threads

29/64 2:19 -2.06++ 47...Bh3 48.gxh3 Kf5 49.Kf2 Ke4 (2.646.814.377) 19016
29/64 2:36 -2.18++ 47...Bh3 48.gxh3 Kf5 49.Kf2 Ke4 (2.972.371.093) 18977
29/64 2:58 -2.36++ 47...Bh3 48.gxh3 Kf5 49.Kf2 Ke4 (3.386.695.229) 18982
29/64 3:32 -2.63++ 47...Bh3 48.gxh3 Kf5 49.Kf2 Ke4 (4.072.870.246) 19150
29/64 4:25 -3.03++ 47...Bh3 48.gxh3 Kf5 49.Kf2 Ke4 (5.199.174.678) 19565

8 threads

29/49 1:29 -1.94 47...Be4 48.Kf2 f5 49.g3 Kd7 (1.568.407.010) 17542
29/63 3:04 -2.06++ 47...Bh3 48.gxh3 Kf5 49.Kf2 Ke4 (3.299.742.415) 17903
29/63 3:27 -2.18++ 47...Bh3 48.gxh3 Kf5 49.Kf2 Ke4 (3.699.890.846) 17854
29/64 3:57 -2.36++ 47...Bh3 48.gxh3 Kf5 49.Kf2 Ke4 (4.261.475.894) 17914
29/70 5:04 -2.63++ 47...Bh3 48.gxh3 Kf5 49.Kf2 Ke4 (5.561.565.711) 18250

4 threads

28/53 1:33 -1.90 47...Be4 48.Kf2 f5 49.g3 Kd6 50.Bb4+ Kc6 (1.165.447.707) 12452
29/53 2:33 -1.90 47...Be4 48.Kf2 f5 49.g3 Kd6 50.Bb4+ Kc6 (1.903.715.198) 12435
30/53 4:07 -1.90 47...Be4 48.Kf2 f5 49.g3 Kd6 50.Bb4+ Kc6 (3.132.808.783) 12634
What values did you use for the Minimum Split Depth parameter in each of the three cases? I suggest you try it at 10 for Threads=8. Were you able to test higher values for this parameter, using a special build from R. Vida?

kgburcham
Posts: 2016
Joined: Sun Feb 17, 2008 3:19 pm

Re: Compare kns, Six programs using 4,8,12 threads

Post by kgburcham » Fri Jul 08, 2011 8:37 pm

zullil wrote:
kgburcham wrote:Notice Critter has poor scaling between 8 and 12 threads.

Critter 1.2 64-bit SSE4

12 threads

29/64 2:19 -2.06++ 47...Bh3 48.gxh3 Kf5 49.Kf2 Ke4 (2.646.814.377) 19016
29/64 2:36 -2.18++ 47...Bh3 48.gxh3 Kf5 49.Kf2 Ke4 (2.972.371.093) 18977
29/64 2:58 -2.36++ 47...Bh3 48.gxh3 Kf5 49.Kf2 Ke4 (3.386.695.229) 18982
29/64 3:32 -2.63++ 47...Bh3 48.gxh3 Kf5 49.Kf2 Ke4 (4.072.870.246) 19150
29/64 4:25 -3.03++ 47...Bh3 48.gxh3 Kf5 49.Kf2 Ke4 (5.199.174.678) 19565

8 threads

29/49 1:29 -1.94 47...Be4 48.Kf2 f5 49.g3 Kd7 (1.568.407.010) 17542
29/63 3:04 -2.06++ 47...Bh3 48.gxh3 Kf5 49.Kf2 Ke4 (3.299.742.415) 17903
29/63 3:27 -2.18++ 47...Bh3 48.gxh3 Kf5 49.Kf2 Ke4 (3.699.890.846) 17854
29/64 3:57 -2.36++ 47...Bh3 48.gxh3 Kf5 49.Kf2 Ke4 (4.261.475.894) 17914
29/70 5:04 -2.63++ 47...Bh3 48.gxh3 Kf5 49.Kf2 Ke4 (5.561.565.711) 18250

4 threads

28/53 1:33 -1.90 47...Be4 48.Kf2 f5 49.g3 Kd6 50.Bb4+ Kc6 (1.165.447.707) 12452
29/53 2:33 -1.90 47...Be4 48.Kf2 f5 49.g3 Kd6 50.Bb4+ Kc6 (1.903.715.198) 12435
30/53 4:07 -1.90 47...Be4 48.Kf2 f5 49.g3 Kd6 50.Bb4+ Kc6 (3.132.808.783) 12634
What values did you use for the Minimum Split Depth parameter in each of the three cases? I suggest you try it at 10 for Threads=8. Were you able to test higher values for this parameter, using a special build from R. Vida?
Minimum split depth:
12 threads used 10
8 threads used 10
4 threads used 6
Richard sent me a 32 bit version so I was not interested in testing a 32 bit version. I have not replied since.
I will test latest version 64 bit with max split depth available to at least 14.

thanks
kgburcham

Post Reply