Page 3 of 6

Re: Houdini with a six point lead near the halfway point of

Posted: Tue Nov 28, 2017 11:14 pm
by syzygy
syzygy wrote:So Komodo9 seems to have a few cache lines that are accessed relatively often by different threads.
Actually, it is a single cache line that accounts for 95% of the shared cache line hits.

This does not necessarily mean there is false sharing. It could be a single variable that is intended to be shared, such as a lock or a counter. (But if it's a counter, then it should be relatively easy to replace it with per-thread counters.)

It would be interesting to know if "perf c2c" reports a lot more sharing for Komodo compiled with LTO than for Komodo compiled without LTO.

It seems perf c2c report -N also reports node info. If there is sharing between threads on the same node but not between threads on different nodes, then cache-line sharing might not be the reason for the observed slowdown. (This could normally only be the case if Komodo does NUMA-specific things, which I don't know about.)

Re: Houdini with a six point lead near the halfway point of

Posted: Wed Nov 29, 2017 12:03 am
by royb
syzygy wrote:For komodo9:

Code: Select all

# perf c2c record -u komodo9
Komodo 9.02 64-bit by Don Dailey, Larry Kaufman and Mark Lefler
using hardware POPCNT
info string Licensed to Komodochess.com
setoption name Hash value 128
setoption name Threads value 6
info string Threads now set to 6
go depth 24
...
quit
[ perf record: Woken up 426 times to write data ]
[ perf record: Captured and wrote 106.983 MB perf.data (1401799 samples) ]
[root@localhost Rustfish]# perf c2c report --stats
...
=================================================
    Global Shared Cache Line Event Information   
=================================================
  Total Shared Cache Lines          :         14
  Load HITs on shared lines         :       9138
  Fill Buffer Hits on shared lines  :       5033
  L1D hits on shared lines          :        528
  L2D hits on shared lines          :         54
  LLC hits on shared lines          :       1809
  Locked Access on shared lines     :          0
  Store HITs on shared lines        :        178
  Store L1D hits on shared lines    :         14
  Total Merged records              :        598
Doing the same with Stockfish (go depth 29 to get about the same search time):

Code: Select all

=================================================
    Global Shared Cache Line Event Information   
=================================================
  Total Shared Cache Lines          :          1
  Load HITs on shared lines         :          1
  Fill Buffer Hits on shared lines  :          0
  L1D hits on shared lines          :          0
  L2D hits on shared lines          :          0
  LLC hits on shared lines          :          1
  Locked Access on shared lines     :          0
  Store HITs on shared lines        :          0
  Store L1D hits on shared lines    :          0
  Total Merged records              :          1
So Komodo9 seems to have a few cache lines that are accessed relatively often by different threads.
I've heard Larry Kaufman say (somewhere on the Internet) that there just seemed to be something holding back Komodo's search as compared to Stockfish's search. Could this be an indicator of where the problem might be?

Re: Houdini with a six point lead near the halfway point of

Posted: Wed Nov 29, 2017 12:58 am
by syzygy
royb wrote:I've heard Larry Kaufman say (somewhere on the Internet) that there just seemed to be something holding back Komodo's search as compared to Stockfish's search. Could this be an indicator of where the problem might be?
I cannot read his mind, but I don't think so. Komodo seems to have done quite well in the past on many cores, so my guess would be he was referring to SF's search as a whole including its single-threaded search. Reductions and other tricks that do work for SF but not for Komodo, perhaps... But I am only speculating here! (Possibly guided by snippets I have read here and there, but I don't have a link now. We may well have read the same thing ;).)

Whatever this shared cache line's function in Komodo 9 may be, that cache line is not the reason for the slowdown of Komodo 1970.00.

If false sharing is indeed the culprit, it is probably related to the compiler sort of accidentally placing two data structures close to each other in memory when LTO is enabled and not when disabled.

There are also other possibilities like LTO triggering excessive inlining that blows up the instruction cache, but I would except that to show with any number of threads and not just with 24 and getting worse with 43.

All of my speculations here may be completely off. But it made me find out about "perf c2c" which for sure is very useful. All potential TCEC authors should use it on their program (if it compiles on Linux) to prevent the type of problem that Laser and Nemorino had.

Re: Houdini with a six point lead near the halfway point of

Posted: Wed Nov 29, 2017 2:25 am
by mjlef
Ron,

this stuff looks great. I tried installing perf on a few linux boxes but it responds:

"perf: 'c2c' is not a perf-command. See 'perf --help'."

What did you do to get a suitable perf?

Mark

Re: Houdini with a six point lead near the halfway point of

Posted: Wed Nov 29, 2017 5:17 am
by Dirt
royb wrote:So, a 23% speed reduction would reduce the playing strength of Komodo 1970 by how much? 15 Elo?
Robert H. estimated 9 Elo, as per your link. I think that's close.

Re: Houdini with a six point lead near the halfway point of

Posted: Wed Nov 29, 2017 8:50 am
by syzygy
mjlef wrote:What did you do to get a suitable perf?
My linux system already had one: perf version 4.13.12.200.fc26 (Fedora 26).

Re: Houdini with a six point lead near the halfway point of

Posted: Wed Nov 29, 2017 3:25 pm
by Modern Times
syzygy wrote:
velmarin wrote:Houdart is right to decline a change.
Yes, and in my view the request to replace Komodo should have been rejected without consulting Houdart.
I agree also, but I guess there was a slight chance that he would have agreed to it so they put the question.

Problem is for Robert and Houdini is that some people may view it as a tainted or "false" victory if Houdini wins, because Komodo played with a version that had a bug. That would be grossly unfair of course and not right, but some people would have their doubts.

Robert is quite right - you keep updating your version during the tournament and you take the associated risks that you may introduce a bug, which is what happened here. Or you play it safe and use a proven version. You can't have your cake and eat it too.

It isn't relevant that it seems to be a compiler issue rather than a coding error. You could argue too of course that using an old version of the compiler is a risk in itself. I wonder if the same slowdown would have happened with a more recent version. (while acknowledging that the newer versions in themselves are consistently slower according to posts in the Programing Section)

Re: Houdini with a six point lead near the halfway point of

Posted: Wed Nov 29, 2017 3:34 pm
by AdminX
Modern Times wrote:
syzygy wrote:
velmarin wrote:Houdart is right to decline a change.
Yes, and in my view the request to replace Komodo should have been rejected without consulting Houdart.
I agree also, but I guess there was a slight chance that he would have agreed to it so they put the question.

Problem is for Robert and Houdini is that some people may view it as a tainted or "false" victory if Houdini wins, because Komodo played with a version that had a bug. That would be grossly unfair of course and not right, but some people would have their doubts.

Robert is quite right - you keep updating your version during the tournament and you take the associated risks that you may introduce a bug, which is what happened here. Or you play it safe and use a proven version. You can't have your cake and eat it too.

It isn't relevant that it seems to be a compiler issue rather than a coding error. You could argue too of course that using an old version of the compiler is a risk in itself. I wonder if the same slowdown would have happened with a more recent version. (while acknowledging that the newer versions in themselves are consistently slower according to posts in the Programing Section)
Image

Re: Houdini with a six point lead near the halfway point of

Posted: Wed Nov 29, 2017 7:48 pm
by syzygy
Modern Times wrote:
syzygy wrote:
velmarin wrote:Houdart is right to decline a change.
Yes, and in my view the request to replace Komodo should have been rejected without consulting Houdart.
I agree also, but I guess there was a slight chance that he would have agreed to it so they put the question.
And exactly that is why, in my view, they should not have put the question to him.
The engine programmers can provide updates only before the Stage or Superfinal start, not during. However, there will be no extra testing between stages, meaning that this is a gamble if the engine could be unstable.
...
In the case of a serious, play-limiting bug (like crashing or interface communication problems, not including losses on time) not discovered during the pre-Season testing, the engine can be updated once per Stage to fix this/these bugs only.
I think all agree that 23% lower nps is not a play-limiting bug "like crashing".

It is not fair to shift the blame/responsibility for not replacing Komodo to Houdart. (But luckily it seems that most people understand his refusal.)

(Were the other developer asked whether they could agree to a replacement of Nemorino and Laser in stage 1?)

Re: Houdini with a six point lead near the halfway point of

Posted: Wed Nov 29, 2017 8:14 pm
by syzygy
syzygy wrote:
mjlef wrote:What did you do to get a suitable perf?
My linux system already had one: perf version 4.13.12.200.fc26 (Fedora 26).
The perf sources are part of the kernel tree:
https://github.com/torvalds/linux/tree/ ... tools/perf