How could a compiler break the lockless hashing method?

hgm · Post by **hgm** » Mon Dec 09, 2013 11:46 am

rbarreira wrote:That would be a very strange decision ...

I am sure I would have nothing to gain from this -flto whatsoever. I wouldn't write code that could benefit from it. If global optimization tricks would exist that could make my program faster, they would already have been used in my code, and if they were not, there would be a reason why I didn't want them to be used, and I would not want to give the compiler a chance to wreck my code by these so-called 'optimizations'.

For me, a programming language is a way to instruct the hardware as to what it should do, not a blank cheque to the compiler writer to do whatever he dreams up in his crazy mind, and hope for the best.

Hiding behind a standard IMO is too feeble an excuse for abandoning common sense. Just doing something silly because the standard allows it, is malicious.

rbarreira · Post by **rbarreira** » Mon Dec 09, 2013 11:50 am

hgm wrote:
rbarreira wrote:That would be a very strange decision ...
I am sure I would have nothing to gain from this -flto whatsoever. I wouldn't write code that could benefit from it. If global optimization tricks would exist that could make my program faster, they would already have been used in my code, and if they were not, there would be a reason why I didn't want them to be used, and I would not want to give the compiler a chance to wreck my code by these so-called 'optimizations'.

For me, a programming language is a way to instruct the hardware as to what it should do, not a blank cheque to the compiler writer to do whatever he dreams up in his crazy mind, and hope for the best.

Then you should probably use a language other than C and/or stop upgrading your compiler.

As for your statement that whole-program optimization could never speed up your programs, you might be surprised if you actually tried it. Unless of course you're a superhuman who has analyzed everything related to your software.

hgm · Post by **hgm** » Mon Dec 09, 2013 11:52 am

rbarreira wrote:Then you should probably use a language other than C and/or stop upgrading your compiler.

Sure thing. That is one of the useful things I learned from this thread.

I have a very restrictive policy against upgrading in general; it is a common trend in software that things deteriorate more and more with increasing version number. I still lament the day I upgraded from Ubunto 8.04 to 10.04. Since then half my laptop is no longer working. The sound is flaky (with a stutter), and I can no longer use my USB connectors. Unfortunately they don't offer ways to easily downgrade, at Ubuntu... And look at Windows 8!

Rein Halbersma · Post by **Rein Halbersma** » Mon Dec 09, 2013 12:23 pm

hgm wrote: I still lament the day I upgraded from Ubunto 8.04 to 10.04. Since then half my laptop is no longer working. The sound is flaky (with a stutter), and I can no longer use my USB connectors. Unfortunately they don't offer ways to easily downgrade, at Ubuntu... And look at Windows 8!

Did you try to upgrade to 12.04? That's the oldest still supported version. You do realize that running unsupported versions is a security risk in general?

As far as compilers are concerned, modern versions are lightyears ahead in terms of optimization and correctness.

bob · Post by **bob** » Mon Dec 09, 2013 5:38 pm

flok wrote:
bob wrote:(1) acquire/release a lock on each probe or store. A performance killer because with a significant number of threads, and one hash probe and one hash store per node typically, the number of lock accesses is enormous. This kills performance.

Solution:
- create 256 locks
- address them by index: hash & 0xff

The chance that multiple threads are contending over the same lock is somewhat smaller.
I verified that this work. Well using Java that is.
And if you run with 16 or 32 cores? It is smaller, yes. But regardless, acquiring a lock is an inefficient process through cache. Look up the "read for ownership" description. The xor gets rid of all of that.

bob · Post by **bob** » Mon Dec 09, 2013 5:41 pm

syzygy wrote:
bob wrote:
syzygy wrote:
bob wrote:I compile search.c, hash.c and other files SEPARATELY. How can the compiler be certain there are no locks around HashStore() when it is called from search (which it can't see when compiling hash.c) and around HashProbe() when it is called?
Apart from this being a non-argument given that compilers nowadays can easily optimise across compilation units, you have apparently forgotten about crafty.c:
Code: Select all
/* last modified 01/18/09 */
/*
 *******************************************************************************
 *                                                                             *
 *   This module is designed for the most efficient compiling, as it includes  *
 *   all of the source files into one large wad so that the compiler can see   *
 *   all function calls and inline whatever is appropriate.  The includes are  *
 *   loosely ordered so that the most common functions occur first, to help    *
 *   with cache layout when the code is actually loaded.                       *
 *                                                                             *
 *******************************************************************************
 */
#include "search.c"
#include "thread.c"
#include "repeat.c"
#include "next.c"
#include "killer.c"
#include "quiesce.c"
#include "evaluate.c"
#include "movgen.c"
#include "make.c"
#include "unmake.c"
#include "hash.c"
...
Please grow up. Can you do that to Linux? Give it a whirl and report back. 99% of my crafty compiles are individual files for obvious speed reasons.
What on earth is the matter with you? Crafty evidently compiles as a single unit. I think it is time for you to consult your physician.

What is the matter with you? It compiles as a single unit "in final release mode". Not in development mode. I think it time for you to visit a psychiatrist...

There is no REQUIREMENT that any program be compiled as one single source. Therefore no assumptions can be made as to what the compiler can see at any instant in time. In addition, no compiler will look at the entire Crafty source to try to optimize across procedures. That's why we have "peephole optimizations". To limit the amount of code one looks at at an instant in time, otherwise the computational requirements become intractable.

bob · Post by **bob** » Mon Dec 09, 2013 5:42 pm

Rein Halbersma wrote:
bob wrote:Can you do that to Linux? Give it a whirl and report back. 99% of my crafty compiles are individual files for obvious speed reasons.
Interesting question as to whether the compiler could concatenate the entire Linux kernel and compile it in one go. You can google for "unity build" to find out more about this, but this is what gcc has to say:

http://gcc.gnu.org/onlinedocs/cpp/Imple ... imits.html
Maximum size of a source file.
The standard does not specify any lower limit on the maximum size of a source file. GNU cpp maps files into memory, so it is limited by the available address space. This is generally at least two gigabytes. Depending on the operating system, the size of physical memory may or may not be a limitation.
Whether that will succeed for Linux remains to be seen (maybe there is some order dependency between different source / header files) but at least the bottleneck will not lie with gcc.

Linux is not one large monolithic source. Kernel modules make this problematic since modules CAN interact, and can NOT be compiled together as one file...

bob · Post by **bob** » Mon Dec 09, 2013 5:48 pm

rbarreira wrote:
hgm wrote:Oh, then there isn't any real problem. Just never use the -flto, like you would avoid any -fauto-crash flag. If you are asking for trouble, you usually get it.
That would be a very strange decision in the context of this thread - the only reason why you would risk using undefined behavior for lockless hashing is better performance (avoiding spinlocks or mutexes). Since it's performance you're after, you're crippling yourself by not using flto / IPO (as icc calls it). What's the rationale for that? At the very least you'd owe it to yourself to compare safe hashing + flto vs unsafe hashing without flto.

This is not an academic question btw - many chess programs are being compiled with whole-program optimization (and in the case of Crafty it does a low-tech version of it by compiling the whole program as one .c file which #includes the other c files).

Did you bother to stop and ask yourself the question "Why is this option NOT the default?"

#1. to date, all it really does is allow cross-module inlining.

#2. it is slow.

syzygy · Post by **syzygy** » Mon Dec 09, 2013 5:55 pm

bob wrote:
syzygy wrote:
bob wrote:
syzygy wrote:
bob wrote:I compile search.c, hash.c and other files SEPARATELY. How can the compiler be certain there are no locks around HashStore() when it is called from search (which it can't see when compiling hash.c) and around HashProbe() when it is called?
Apart from this being a non-argument given that compilers nowadays can easily optimise across compilation units, you have apparently forgotten about crafty.c:
Code: Select all
/* last modified 01/18/09 */
/*
 *******************************************************************************
 *                                                                             *
 *   This module is designed for the most efficient compiling, as it includes  *
 *   all of the source files into one large wad so that the compiler can see   *
 *   all function calls and inline whatever is appropriate.  The includes are  *
 *   loosely ordered so that the most common functions occur first, to help    *
 *   with cache layout when the code is actually loaded.                       *
 *                                                                             *
 *******************************************************************************
 */
#include "search.c"
#include "thread.c"
#include "repeat.c"
#include "next.c"
#include "killer.c"
#include "quiesce.c"
#include "evaluate.c"
#include "movgen.c"
#include "make.c"
#include "unmake.c"
#include "hash.c"
...
Please grow up. Can you do that to Linux? Give it a whirl and report back. 99% of my crafty compiles are individual files for obvious speed reasons.
What on earth is the matter with you? Crafty evidently compiles as a single unit. I think it is time for you to consult your physician.
What is the matter with you? It compiles as a single unit "in final release mode". Not in development mode. I think it time for you to visit a psychiatrist...

bob wrote:I compile search.c, hash.c and other files SEPARATELY.

So during development you are safe and you don't care about the final release? Is that how I should understand it?

Me thinks you are really overstepping the boundaries of civil behavior. I see demons flying out of...

If you state "I do X" and somebody else indisputably shows that you do "not X", then you should admit that. You have no other choice, unless you are not a sane, reasonable adult. But no, you lack any such decency.

bob · Post by **bob** » Mon Dec 09, 2013 6:02 pm

Rein Halbersma wrote:
rbarreira wrote:And when it comes to C++ which has multithreading built-in (unlike C) the compiler can make more assumptions, helping it to optimize this stuff.
C11 and C++11 have essentially the same underlying memory model for multithreaded programs. C++11 adds a bunch of mutexes and other facilities on top of that, but the guarantees about instruction reordering are largely the same.

Still, I wouldn't count on any compiler breaking Robert's lockless hashing anytime soon.
I opened this thread with "could", not "would" in the subject line The likelihood of breakage maybe low, but it's a time bomb nevertheless, as with all forms of undefined behavior.

It's not a time-bomb. It is specifically designed so that no matter what is done to the two quadwords, nothing is broken. That hasn't changed.

As far as the compiler removing the xor's, that can't happen. Because the tt is accessed from many different places, and the compiler can't see them. Nor can it tell what is done with the tt in other parts of the code. It is possible to write out part of that data, and if it is not properly xor'ed, the reader will never understand the data. The compiler can remove xors and such for local data, because it can see where the variable comes into existence when it allocates space for it on the stack, and it can see to the end of the variable's life when the procedure returns. Any unnecessary operations within that scope can certainly be optimized away, but only as they apply to local data. Not global data.

How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?