Houdini with a six point lead near the halfway point of TCEC
Moderators: hgm, Rebel, chrisw
-
- Posts: 1296
- Joined: Sun Mar 12, 2006 6:46 pm
- Location: Kelowna
- Full name: Tony Mokonen
Re: Houdini with a six point lead near the halfway point of
Interesting that the Komodo team are using an old version of MinGW. Not too surprising though, after reading the thread "gcc4.8 outperforming gcc5, gcc6, gcc7" in the programming forum.
-
- Posts: 5566
- Joined: Tue Feb 28, 2012 11:56 pm
Re: Houdini with a six point lead near the halfway point of
Yes, and in my view the request to replace Komodo should have been rejected without consulting Houdart.velmarin wrote:Houdart is right to decline a change.
(Someone will bring up the Stockfish "precedent", but the Stockfish team back then never requested a replacement of the binary. They may have asked for an increase in MoveOverhead - I am not entirely sure - but what they got was the old binary they did not ask for, and which then promptly lost another game on time because the problem was unrelated to the switch to lazy smp in the first place.)
-
- Posts: 5566
- Joined: Tue Feb 28, 2012 11:56 pm
Re: Houdini with a six point lead near the halfway point of
In my view, this has the signs of a case of false sharing, not of a compiler bug.AdminX wrote:It is important to point out that the approximately 8% speed reduction we noted on our best hardware (24 cores) is apparently as high as 23% on TCEC's 44-core machine based on Komodo's relative nodes per second vs. Houdini in Stage 2.
-
- Posts: 1494
- Joined: Thu Mar 30, 2006 2:08 pm
Re: Houdini with a six point lead near the halfway point of
We have seen an increase in slowdown due to the number of threads when LTO is used. Is there anyway to avoid "false sharing"? I assume the C++ key words "volatile" and "static" guide the compiler to know when it has to go fetch real memory versus cached memory. In Komodo I cannot think of two memory fetches close to each other shared by the threads. But I will start looking for them. Threads have several eval related hashes they use solely. But they should not share more than a few bytes, if even that given alignment when we allocate. Code is shared, but that si read and not written. Anyway, a lot to think about.syzygy wrote:In my view, this has the signs of a case of false sharing, not of a compiler bug.AdminX wrote:It is important to point out that the approximately 8% speed reduction we noted on our best hardware (24 cores) is apparently as high as 23% on TCEC's 44-core machine based on Komodo's relative nodes per second vs. Houdini in Stage 2.
-
- Posts: 2011
- Joined: Sun May 25, 2008 11:12 pm
- Location: Whitchurch. Shropshire, UK.
- Full name: Harvey Williamson
Re: Houdini with a six point lead near the halfway point of
Was a change requested?Dirt wrote:No, the compiler shares the blame.velmarin wrote:The Komodo team looked for one more thing, and lost strength.
Blaming the compiler is simply absurd.
Yes, I agree with that.velmarin wrote:Houdart is right to decline a change.
-
- Posts: 2488
- Joined: Tue Aug 30, 2016 8:19 pm
- Full name: Rasmus Althoff
Re: Houdini with a six point lead near the halfway point of
No - because the compiler has no idea about the CPU cache.mjlef wrote:I assume the C++ key words "volatile" and "static" guide the compiler to know when it has to go fetch real memory versus cached memory.
"volatile" instructs the compiler not to optimise away accesses because the variable may have changed "from outside" the current control flow. Whether this access ends in CPU cache or in actual memory is not visible.
"static" is for scoping, thereby of course easing optimisation.
For false sharing, if you use the alignment directive for variable A and B with both 64 byte alignment, then they cannot be in the same 64 byte line. If that is enough to prevent caching collision, I guess that should be fine:
Code: Select all
uint32_t __attribute__((aligned (64))) A;
uint32_t __attribute__((aligned (64))) B;
-
- Posts: 5566
- Joined: Tue Feb 28, 2012 11:56 pm
Re: Houdini with a six point lead near the halfway point of
Maybe "perf c2c" could help:mjlef wrote:We have seen an increase in slowdown due to the number of threads when LTO is used. Is there anyway to avoid "false sharing"? I assume the C++ key words "volatile" and "static" guide the compiler to know when it has to go fetch real memory versus cached memory. In Komodo I cannot think of two memory fetches close to each other shared by the threads. But I will start looking for them. Threads have several eval related hashes they use solely. But they should not share more than a few bytes, if even that given alignment when we allocate. Code is shared, but that si read and not written. Anyway, a lot to think about.syzygy wrote:In my view, this has the signs of a case of false sharing, not of a compiler bug.AdminX wrote:It is important to point out that the approximately 8% speed reduction we noted on our best hardware (24 cores) is apparently as high as 23% on TCEC's 44-core machine based on Komodo's relative nodes per second vs. Houdini in Stage 2.
https://joemario.github.io/blog/2016/09/01/c2c-blog/
On a recent Linux system, perf should already have the "c2c" option.
Doing "perf c2c record -u stockfish bench 128 6 17" on my 6-core PC and then "perf c2c2 report --stats", I get:
Code: Select all
=================================================
Global Shared Cache Line Event Information
=================================================
Total Shared Cache Lines : 11
Load HITs on shared lines : 13
Fill Buffer Hits on shared lines : 2
L1D hits on shared lines : 0
L2D hits on shared lines : 0
LLC hits on shared lines : 11
Locked Access on shared lines : 0
Store HITs on shared lines : 0
Store L1D hits on shared lines : 0
Total Merged records : 11
Code: Select all
=================================================
Global Shared Cache Line Event Information
=================================================
Total Shared Cache Lines : 487
Load HITs on shared lines : 1692
Fill Buffer Hits on shared lines : 698
L1D hits on shared lines : 105
L2D hits on shared lines : 30
LLC hits on shared lines : 653
Locked Access on shared lines : 0
Store HITs on shared lines : 20
Store L1D hits on shared lines : 12
Total Merged records : 540
Without --stats you get an overview of the shared cachelines. I only get addresses instead of symbols even when compiled with -g, perhaps because the shared cache lines are on the heap. Or I may be missing an option.
-
- Posts: 5566
- Joined: Tue Feb 28, 2012 11:56 pm
Re: Houdini with a six point lead near the halfway point of
That was not clear to me either from the thread's opening post, but indeed it was:Harvey Williamson wrote:Was a change requested?Dirt wrote:No, the compiler shares the blame.velmarin wrote:The Komodo team looked for one more thing, and lost strength.
Blaming the compiler is simply absurd.
Yes, I agree with that.velmarin wrote:Houdart is right to decline a change.
http://www.chessdom.com/houdini-with-a- ... t-of-tcec/
-
- Posts: 5566
- Joined: Tue Feb 28, 2012 11:56 pm
Re: Houdini with a six point lead near the halfway point of
For komodo9:
Doing the same with Stockfish (go depth 29 to get about the same search time):
So Komodo9 seems to have a few cache lines that are accessed relatively often by different threads.
Code: Select all
# perf c2c record -u komodo9
Komodo 9.02 64-bit by Don Dailey, Larry Kaufman and Mark Lefler
using hardware POPCNT
info string Licensed to Komodochess.com
setoption name Hash value 128
setoption name Threads value 6
info string Threads now set to 6
go depth 24
...
quit
[ perf record: Woken up 426 times to write data ]
[ perf record: Captured and wrote 106.983 MB perf.data (1401799 samples) ]
[root@localhost Rustfish]# perf c2c report --stats
...
=================================================
Global Shared Cache Line Event Information
=================================================
Total Shared Cache Lines : 14
Load HITs on shared lines : 9138
Fill Buffer Hits on shared lines : 5033
L1D hits on shared lines : 528
L2D hits on shared lines : 54
LLC hits on shared lines : 1809
Locked Access on shared lines : 0
Store HITs on shared lines : 178
Store L1D hits on shared lines : 14
Total Merged records : 598
Code: Select all
=================================================
Global Shared Cache Line Event Information
=================================================
Total Shared Cache Lines : 1
Load HITs on shared lines : 1
Fill Buffer Hits on shared lines : 0
L1D hits on shared lines : 0
L2D hits on shared lines : 0
LLC hits on shared lines : 1
Locked Access on shared lines : 0
Store HITs on shared lines : 0
Store L1D hits on shared lines : 0
Total Merged records : 1
-
- Posts: 3657
- Joined: Wed Nov 18, 2015 11:41 am
- Location: hungary
Re: Houdini with a six point lead near the halfway point of
[quote="AdminX"]
Chessdom Write Up:
[b]Statement by team Komodo[/b]
"....Although the bug has probably cost us some points it probably does not fully explain the current five point score deficit."
/[/quote]
This is the essence.
Houdini is the better engine now.
I think TCEC would be more correct competition without any exchange of initial competitors.
Chessdom Write Up:
[b]Statement by team Komodo[/b]
"....Although the bug has probably cost us some points it probably does not fully explain the current five point score deficit."
/[/quote]
This is the essence.
Houdini is the better engine now.
I think TCEC would be more correct competition without any exchange of initial competitors.