Significant modification of Komodo SMP implementation

Laskos · Post by **Laskos** » Tue Jun 23, 2015 9:07 pm

In short, Komodo 9.1 seems to not widen or to widen very little going to several cores. It instead deepens more than Komodo 9.02. The total efficiency of SMP implementation is comparable, maybe a bit better now. What one will see on many cores is that Komodo will go 1-3 plies deeper than previously due mostly to SMP implementation (it goes a bit deeper than previously on one core too due to lower EBF).

Summary of the results:

Widening 1 -> 4 threads at Fixed Depth (and speedup due to it assuming 70 ELO points for doubling at 40/4' CCRL):

Code: Select all

Komodo 9.1      +9 ELO points   2^&#40;9/70&#41;  = 1.093
Komodo 9.02    +49 ELO points   2^&#40;49/70&#41; = 1.624

Time to Depth Speedup 1 -> 4 threads:

Code: Select all

               1 t      4 t
Komodo 9.1    49&#58;37 -> 18&#58;22   2.701
Komodo 9.02   57&#58;17 -> 32&#58;14   1.777

Total Effective Speedup 1 -> 4 threads:

Code: Select all

Komodo 9.1  -------  1.093 &#40;widening&#41; * 2.701 &#40;deepening&#41; = 2.95
Komodo 9.02 -------  1.624 &#40;widening&#41; * 1.777 &#40;deepening&#41; = 2.89

-----------------------------------------------------------------------------------

Parts of raw data below, together with Nodes, NPS and EBF:

Widening was determined in 2,000 games matches at fixed depth. Time to Depth on 150 middlegame positions on time control similar to 40/4' on 4 cores.

Code: Select all

------------------------------------------
Komodo 9.1   4 threads&#58;

Time&#58; 18&#58;22

  n/s&#58; 6.165.346  
  TotTime&#58; 18&#58;22m    SolTime&#58; 18&#58;22m
  Ply&#58; 0   Positions&#58;150   Avg Nodes&#58;       0   Branching = 0.00
  Ply&#58; 1   Positions&#58;150   Avg Nodes&#58;    1214   Branching = 0.00
  Ply&#58; 2   Positions&#58;150   Avg Nodes&#58;    2675   Branching = 2.20
  Ply&#58; 3   Positions&#58;150   Avg Nodes&#58;    4503   Branching = 1.68
  Ply&#58; 4   Positions&#58;150   Avg Nodes&#58;    6933   Branching = 1.54
  Ply&#58; 5   Positions&#58;150   Avg Nodes&#58;   11296   Branching = 1.63
  Ply&#58; 6   Positions&#58;150   Avg Nodes&#58;   17099   Branching = 1.51
  Ply&#58; 7   Positions&#58;150   Avg Nodes&#58;   25306   Branching = 1.48
  Ply&#58; 8   Positions&#58;150   Avg Nodes&#58;   42047   Branching = 1.66
  Ply&#58; 9   Positions&#58;150   Avg Nodes&#58;   75395   Branching = 1.79
  Ply&#58;10   Positions&#58;150   Avg Nodes&#58;  125270   Branching = 1.66
  Ply&#58;11   Positions&#58;150   Avg Nodes&#58;  234496   Branching = 1.87
  Ply&#58;12   Positions&#58;150   Avg Nodes&#58;  405474   Branching = 1.73
  Ply&#58;13   Positions&#58;150   Avg Nodes&#58;  678893   Branching = 1.67
  Ply&#58;14   Positions&#58;150   Avg Nodes&#58; 1238680   Branching = 1.82
  Ply&#58;15   Positions&#58;150   Avg Nodes&#58; 2112559   Branching = 1.71
  Ply&#58;16   Positions&#58;150   Avg Nodes&#58; 3727152   Branching = 1.76
  Ply&#58;17   Positions&#58;150   Avg Nodes&#58; 6264680   Branching = 1.68
  Ply&#58;18   Positions&#58;150   Avg Nodes&#58;10879795   Branching = 1.74
  Ply&#58;19   Positions&#58;150   Avg Nodes&#58;17147370   Branching = 1.58
  Ply&#58;20   Positions&#58;150   Avg Nodes&#58;27612880   Branching = 1.61
----------------------------------------------------------------

Komodo 9.02  4 threads&#58;

Time&#58; 32&#58;14

  n/s&#58; 6.067.374  
  TotTime&#58; 32&#58;14m    SolTime&#58; 32&#58;14m
  Ply&#58; 0   Positions&#58;150   Avg Nodes&#58;       0   Branching = 0.00
  Ply&#58; 1   Positions&#58;150   Avg Nodes&#58;    1778   Branching = 0.00
  Ply&#58; 2   Positions&#58;150   Avg Nodes&#58;    3397   Branching = 1.91
  Ply&#58; 3   Positions&#58;150   Avg Nodes&#58;    5269   Branching = 1.55
  Ply&#58; 4   Positions&#58;150   Avg Nodes&#58;    7564   Branching = 1.44
  Ply&#58; 5   Positions&#58;150   Avg Nodes&#58;   11859   Branching = 1.57
  Ply&#58; 6   Positions&#58;150   Avg Nodes&#58;   18462   Branching = 1.56
  Ply&#58; 7   Positions&#58;150   Avg Nodes&#58;   28404   Branching = 1.54
  Ply&#58; 8   Positions&#58;150   Avg Nodes&#58;   46493   Branching = 1.64
  Ply&#58; 9   Positions&#58;150   Avg Nodes&#58;   85164   Branching = 1.83
  Ply&#58;10   Positions&#58;150   Avg Nodes&#58;  155019   Branching = 1.82
  Ply&#58;11   Positions&#58;150   Avg Nodes&#58;  296885   Branching = 1.92
  Ply&#58;12   Positions&#58;150   Avg Nodes&#58;  537726   Branching = 1.81
  Ply&#58;13   Positions&#58;150   Avg Nodes&#58; 1004409   Branching = 1.87
  Ply&#58;14   Positions&#58;150   Avg Nodes&#58; 1913996   Branching = 1.91
  Ply&#58;15   Positions&#58;150   Avg Nodes&#58; 3288097   Branching = 1.72
  Ply&#58;16   Positions&#58;150   Avg Nodes&#58; 5768725   Branching = 1.75
  Ply&#58;17   Positions&#58;150   Avg Nodes&#58;10265609   Branching = 1.78
  Ply&#58;18   Positions&#58;150   Avg Nodes&#58;17198300   Branching = 1.68
  Ply&#58;19   Positions&#58;150   Avg Nodes&#58;30424533   Branching = 1.77
  Ply&#58;20   Positions&#58;150   Avg Nodes&#58;46554891   Branching = 1.53
----------------------------------------------------------------

Komodo 9.1   1 thread&#58;

Time&#58; 49&#58;37

  n/s&#58; 1.585.761  
  TotTime&#58; 49&#58;37m    SolTime&#58; 49&#58;37m
  Ply&#58; 0   Positions&#58;150   Avg Nodes&#58;       0   Branching = 0.00
  Ply&#58; 1   Positions&#58;150   Avg Nodes&#58;      76   Branching = 0.00
  Ply&#58; 2   Positions&#58;150   Avg Nodes&#58;     178   Branching = 2.34
  Ply&#58; 3   Positions&#58;150   Avg Nodes&#58;     442   Branching = 2.48
  Ply&#58; 4   Positions&#58;150   Avg Nodes&#58;     934   Branching = 2.11
  Ply&#58; 5   Positions&#58;150   Avg Nodes&#58;    1838   Branching = 1.97
  Ply&#58; 6   Positions&#58;150   Avg Nodes&#58;    3409   Branching = 1.85
  Ply&#58; 7   Positions&#58;150   Avg Nodes&#58;    6223   Branching = 1.83
  Ply&#58; 8   Positions&#58;150   Avg Nodes&#58;   13262   Branching = 2.13
  Ply&#58; 9   Positions&#58;150   Avg Nodes&#58;   27767   Branching = 2.09
  Ply&#58;10   Positions&#58;150   Avg Nodes&#58;   55219   Branching = 1.99
  Ply&#58;11   Positions&#58;150   Avg Nodes&#58;  110635   Branching = 2.00
  Ply&#58;12   Positions&#58;150   Avg Nodes&#58;  211810   Branching = 1.91
  Ply&#58;13   Positions&#58;150   Avg Nodes&#58;  408081   Branching = 1.93
  Ply&#58;14   Positions&#58;150   Avg Nodes&#58;  753479   Branching = 1.85
  Ply&#58;15   Positions&#58;150   Avg Nodes&#58; 1360955   Branching = 1.81
  Ply&#58;16   Positions&#58;150   Avg Nodes&#58; 2493638   Branching = 1.83
  Ply&#58;17   Positions&#58;150   Avg Nodes&#58; 4245531   Branching = 1.70
  Ply&#58;18   Positions&#58;150   Avg Nodes&#58; 6911908   Branching = 1.63
  Ply&#58;19   Positions&#58;150   Avg Nodes&#58;11728024   Branching = 1.70
  Ply&#58;20   Positions&#58;150   Avg Nodes&#58;18617928   Branching = 1.59
----------------------------------------------------------------
EBF&#58; 1.69


Komodo 9.02  1 thread&#58;

Time&#58; 57&#58;17

  n/s&#58; 1.643.862  
  TotTime&#58; 57&#58;17m    SolTime&#58; 57&#58;17m
  Ply&#58; 0   Positions&#58;150   Avg Nodes&#58;       0   Branching = 0.00
  Ply&#58; 1   Positions&#58;150   Avg Nodes&#58;      85   Branching = 0.00
  Ply&#58; 2   Positions&#58;150   Avg Nodes&#58;     178   Branching = 2.09
  Ply&#58; 3   Positions&#58;150   Avg Nodes&#58;     452   Branching = 2.54
  Ply&#58; 4   Positions&#58;150   Avg Nodes&#58;     898   Branching = 1.99
  Ply&#58; 5   Positions&#58;150   Avg Nodes&#58;    1820   Branching = 2.03
  Ply&#58; 6   Positions&#58;150   Avg Nodes&#58;    3532   Branching = 1.94
  Ply&#58; 7   Positions&#58;150   Avg Nodes&#58;    6475   Branching = 1.83
  Ply&#58; 8   Positions&#58;150   Avg Nodes&#58;   14054   Branching = 2.17
  Ply&#58; 9   Positions&#58;150   Avg Nodes&#58;   29837   Branching = 2.12
  Ply&#58;10   Positions&#58;150   Avg Nodes&#58;   61505   Branching = 2.06
  Ply&#58;11   Positions&#58;150   Avg Nodes&#58;  123503   Branching = 2.01
  Ply&#58;12   Positions&#58;150   Avg Nodes&#58;  246705   Branching = 2.00
  Ply&#58;13   Positions&#58;150   Avg Nodes&#58;  485365   Branching = 1.97
  Ply&#58;14   Positions&#58;150   Avg Nodes&#58;  842846   Branching = 1.74
  Ply&#58;15   Positions&#58;150   Avg Nodes&#58; 1530691   Branching = 1.82
  Ply&#58;16   Positions&#58;150   Avg Nodes&#58; 2784091   Branching = 1.82
  Ply&#58;17   Positions&#58;150   Avg Nodes&#58; 4760768   Branching = 1.71
  Ply&#58;18   Positions&#58;150   Avg Nodes&#58; 8066516   Branching = 1.69
  Ply&#58;19   Positions&#58;150   Avg Nodes&#58;13606344   Branching = 1.69
  Ply&#58;20   Positions&#58;150   Avg Nodes&#58;22660875   Branching = 1.67
----------------------------------------------------------------
EBF&#58; 1.72

On one core NPS seems a bit lower in the newer Komodo, but on 4 cores it is a bit faster than previous Komodo. The newer Komodo has better NPS scaling to multicore.

Effective Branching Factor decreased a bit on newer Komodo from 1.72 to 1.69.

shrapnel · Post by **shrapnel** » Tue Jun 23, 2015 9:24 pm

Laskos wrote:In short, Komodo 9.1 seems to not widen or to widen very little going to several cores. It instead deepens more than Komodo 9.02.

You are quite right.
Anyway, I already reported that Komodo 9.1 is reaching greater Depths faster than previous versions in the Topic "Komodo 9.1 Release' started by Larry Kaufman....that too without a loss in Strength.
More nps too, like you say.

Werewolf · Post by **Werewolf** » Wed Jun 24, 2015 12:08 am

I wonder if this change will affect Komodo's tactical strength on many cores.

bob · Post by **bob** » Wed Jun 24, 2015 12:47 am

I had told you "this is a bug and will probably be fixed."

It was and it was...

Nobody wants to "widen" intentionally, it is just making the tree larger than necessary.

lkaufman · Post by **lkaufman** » Wed Jun 24, 2015 5:43 am

bob wrote:I had told you "this is a bug and will probably be fixed."

It was and it was...

Nobody wants to "widen" intentionally, it is just making the tree larger than necessary.

Although we certainly find and fix bugs from time to time, this was not one of them. The widening got us plenty of elo, it was not at all wasted, but we made a deliberate decision to trade widening for depth, which our testing showed was a small net gain, and Kai has confirmed this. Good detective work, Kai!

Dann Corbit · Post by **Dann Corbit** » Wed Jun 24, 2015 8:33 am

lkaufman wrote:
bob wrote:I had told you "this is a bug and will probably be fixed."

It was and it was...

Nobody wants to "widen" intentionally, it is just making the tree larger than necessary.
Although we certainly find and fix bugs from time to time, this was not one of them. The widening got us plenty of elo, it was not at all wasted, but we made a deliberate decision to trade widening for depth, which our testing showed was a small net gain, and Kai has confirmed this. Good detective work, Kai!

Consider an engine that searches only the first node as selected by move ordering. It might be pretty good most of the time, and the engine would get absurdly deep, since it has a branching factor of 1.

But widening this search will certainly help the strength {unless it has a perfect oracle for the first move in the list as move ordering}.

In echoing Einstein's "Make things as simple as possible but no simpler."
We can also say: "Make the search as narrow as possible, but no narrower."

Laskos · Post by **Laskos** » Wed Jun 24, 2015 10:49 am

Werewolf wrote:I wonder if this change will affect Komodo's tactical strength on many cores.

I used the full STS suite pack (tactical-strategical suite) on 4 cores.
1500 positions, 10s per position.
The new Komodo 9.1 seems a bit better than Komodo 9.02, and most importantly, scales better on many cores to longer time control (from 1s/position to 10s/position). The depths searched by the new Komodo are by 1-2 plies larger.

Code: Select all

Komodo 9.1  4 threads&#58;

1394/1500  
Time&#58; 29&#58;30

   1 sec -> 1221/1500
   5 sec -> 1362/1500
  10 sec -> 1394/1500

Code: Select all

Komodo 9.02  4 threads&#58;

1383/1500  
Time&#58; 29&#58;05


   1 sec -> 1243/1500
   5 sec -> 1359/1500
  10 sec -> 1383/1500

The new Komodo starts worse at 1 second per position, and ends better than the old Komodo at 10 seconds per position. So, it suggests that the new Komodo will scale even better on more cores and longer TC, and Larry's words in this sense are probably not gratuitous.

Werewolf · Post by **Werewolf** » Wed Jun 24, 2015 1:51 pm

Thanks for testing that. It seems counter-intuitive to me, but there we go. So many tactical test-suites require finding strange moves, I would have thought a broad tree would have been better, but perhaps not.

Laskos · Post by **Laskos** » Wed Jun 24, 2015 1:59 pm

Werewolf wrote:Thanks for testing that. It seems counter-intuitive to me, but there we go. So many tactical test-suites require finding strange moves, I would have thought a broad tree would have been better, but perhaps not.

STS is not a typical tactical test suite, more a strategical one with immediate goals, many solutions are fairly quiet moves. I tested one tactical, Arasan 18 (250 positions), and the old Komodo (broader on multicore) fares indeed a bit better, so your intuition seems correct in this case.

bob · Post by **bob** » Wed Jun 24, 2015 4:45 pm

lkaufman wrote:
bob wrote:I had told you "this is a bug and will probably be fixed."

It was and it was...

Nobody wants to "widen" intentionally, it is just making the tree larger than necessary.
Although we certainly find and fix bugs from time to time, this was not one of them. The widening got us plenty of elo, it was not at all wasted, but we made a deliberate decision to trade widening for depth, which our testing showed was a small net gain, and Kai has confirmed this. Good detective work, Kai!

I consider anything that is a detriment to performance "a bug". This seems to fit that perfectly. If searching extra nodes is good (so-called "widening") then it should be just as good on a single-core search. The only way it will work on multi-core searches is if the search is too selective and the "widening" compensates for that a bit. But you can compensate in the same way in a serial search also.

Searching the SAME tree, just faster, is the goal for parallel search. Always has been. Widening is only practical if the parallel search just doesn't work very well and doesn't search any deeper with more cores. A point I don't think we have come anywhere near reaching yet, based on the test results I posted here a couple of weeks back.

Using the new Intel compiler has improved those results significantly. In any case, glad you guys are making progress. I've got a PM to send to Mark for something he should try that will be another small gain, since he has worked on hashing recently. There's another step he can take that works well in my testing.

Significant modification of Komodo SMP implementation

Significant modification of Komodo SMP implementation

Re: Significant modification of Komodo SMP implementation

Re: Significant modification of Komodo SMP implementation

Re: Significant modification of Komodo SMP implementation

Re: Significant modification of Komodo SMP implementation

Re: Significant modification of Komodo SMP implementation

Re: Significant modification of Komodo SMP implementation

Re: Significant modification of Komodo SMP implementation

Re: Significant modification of Komodo SMP implementation

Re: Significant modification of Komodo SMP implementation