What on earth is the matter with you? Crafty evidently compiles as a single unit. I think it is time for you to consult your physician.bob wrote:Please grow up. Can you do that to Linux? Give it a whirl and report back. 99% of my crafty compiles are individual files for obvious speed reasons.syzygy wrote:Apart from this being a non-argument given that compilers nowadays can easily optimise across compilation units, you have apparently forgotten about crafty.c:bob wrote:I compile search.c, hash.c and other files SEPARATELY. How can the compiler be certain there are no locks around HashStore() when it is called from search (which it can't see when compiling hash.c) and around HashProbe() when it is called?Code: Select all
/* last modified 01/18/09 */ /* ******************************************************************************* * * * This module is designed for the most efficient compiling, as it includes * * all of the source files into one large wad so that the compiler can see * * all function calls and inline whatever is appropriate. The includes are * * loosely ordered so that the most common functions occur first, to help * * with cache layout when the code is actually loaded. * * * ******************************************************************************* */ #include "search.c" #include "thread.c" #include "repeat.c" #include "next.c" #include "killer.c" #include "quiesce.c" #include "evaluate.c" #include "movgen.c" #include "make.c" #include "unmake.c" #include "hash.c" ...
How could a compiler break the lockless hashing method?
Moderators: hgm, Rebel, chrisw
-
- Posts: 5566
- Joined: Tue Feb 28, 2012 11:56 pm
Re: How could a compiler break the lockless hashing method?
-
- Posts: 5566
- Joined: Tue Feb 28, 2012 11:56 pm
Re: How could a compiler break the lockless hashing method?
You don't see a problem, yet you need to argue that a good compiler won't do this anti-optimization.bob wrote:Don't see any problem at all. Doesn't change a thing. Next line after exiting the loop access word1, so it has to load word 1 anyway. This is a bad optimization, btw, it saves nothing whatsoever and adds the extra code to load word one after the loop breaks rather than within the loop. Don't need that kind of anti-optimization and I don't believe any good compiler would choose to do something so broken...syzygy wrote:I can only assume you did not read.bob wrote:Doesn't break a thing, however.syzygy wrote:Unless I'm mistaken, the C standard allows the compiler to load a value multiple times. Since the C99 standard does not know about threads, it may load the hash entry, do the xor-trick to verify that the entry can be used, and load the hash entry again. Between the two loads the hash entry may change its content due to other threads. There you go.
Think about it.
A compiler can legally rewrite:asCode: Select all
word1 = htable->word1; word2 = htable->word2 ^ word1; if (word2 == temp_hashkey) break;
See the problem?Code: Select all
word2 = htable->word2 ^ htable->word1; if (word2 == temp_hashkey) break; word1 = htable->word1;
I suppose this means that you DO see the problem?
So this is what I should have written:
Code: Select all
word2 = htable->word2 ^ htable->word1;
if (word2 == temp_hashkey)
break;
}
if (entry < 4) {
word1 = htable->word1;
...
This rewrite is legal. The compiler may take your code and effectively produce the above code.
This rewrite breaks lockless hashing.
On architectures with few registers, a compiler may actually do this to avoid having to spill word1 onto the stack. Why store word1 on the stack when it can just as easily be reloaded from htable->word1?
Whether we can find an existing compiler for x86 that actually does this is not so interesting. What matters is that your code is broken.
And you really do not realise that reloading defeats the purpose of the xor-trick?Again, won't change a thing. I could even FORCE it to re-read it later by referring to it as volatile. But I don't care...
-
- Posts: 5566
- Joined: Tue Feb 28, 2012 11:56 pm
Re: How could a compiler break the lockless hashing method?
You are right, and Ricardo was right too.Aaron Becker wrote:There is a sequence point between any two statements, by definition (see Annex C of the C11 standard). The presence of a sequence point doesn't mean that the compiler can't optimize away the first write, because the only guarantee about program behavior is that observable results agree with the abstract machine defined in the standard. Section 5.1.2.3 of the standard is the relevant part: http://www.open-std.org/jtc1/sc22/wg14/ ... /n1570.pdfbob wrote:No. But any functional call inserted between the two lines will establish one. Right now there is NOTHING between the two writes.rbarreira wrote:Are you sure? Seems to me there is, according to what I've read on sequence points.syzygy wrote:An object is an area of memory. It's somewhere in the standard.rbarreira wrote:What does "modifying an object" mean exactly?syzygy wrote: The C standard in fact agrees with you:So I don't think eliminating the xor-operations is allowed by the C standard.5.1.2.3 ad 2 wrote:Accessing a volatile object, modifying an object, modifying a file, or calling a function that does any of those operations are all side effects, which are changes in the state of the execution environment. Evaluation of an expression may produce side effects. At certain specified points in the execution sequence called sequence points, all side effects of previous evaluations shall be complete and no side effects of subsequent evaluations shall have taken place. (A summary of the sequence points is given in annex C.)
There is no sequence point between the two stores.Look at the following transcript of a shell session I just did. Notice that even though I write two values to a, only one of them is actually written.
It seems the xor's can be optimised away after all, by a sufficiently smart compiler.
-
- Posts: 741
- Joined: Tue May 22, 2007 11:13 am
Re: How could a compiler break the lockless hashing method?
Interesting question as to whether the compiler could concatenate the entire Linux kernel and compile it in one go. You can google for "unity build" to find out more about this, but this is what gcc has to say:bob wrote:Can you do that to Linux? Give it a whirl and report back. 99% of my crafty compiles are individual files for obvious speed reasons.
http://gcc.gnu.org/onlinedocs/cpp/Imple ... imits.html
Whether that will succeed for Linux remains to be seen (maybe there is some order dependency between different source / header files) but at least the bottleneck will not lie with gcc.Maximum size of a source file.
The standard does not specify any lower limit on the maximum size of a source file. GNU cpp maps files into memory, so it is limited by the available address space. This is generally at least two gigabytes. Depending on the operating system, the size of physical memory may or may not be a limitation.
-
- Posts: 900
- Joined: Tue Apr 27, 2010 3:48 pm
Re: How could a compiler break the lockless hashing method?
Now I'm going to play devil's advocate (and keep in mind I am unsure about many things here as well).syzygy wrote: You are right, and Ricardo was right too.
It seems the xor's can be optimised away after all, by a sufficiently smart compiler.
I think one of the things that may prevent this kind of optimization (besides not smart enough compilers) are function calls to things like the pthreads library or other library functions. For example, the reason why you don't need volatile on shared variables protected by locks is that the compiler will see your pthread_mutex_lock call and think "I don't know that external function, anything can happen here".
This would also be the case with any inline assembly code containing "asm volatile" (in gcc that tells the optimizer to assume the asm code has unpredictable side effects), but that's assembly code so it's of course outside the C standard.
On the other hand, compilers have been making more and more assumptions even about library calls, for example gcc can convert a printf to a puts in some cases. And when it comes to C++ which has multithreading built-in (unlike C) the compiler can make more assumptions, helping it to optimize this stuff.
Still, I wouldn't count on any compiler breaking Robert's lockless hashing anytime soon.
-
- Posts: 741
- Joined: Tue May 22, 2007 11:13 am
Re: How could a compiler break the lockless hashing method?
C11 and C++11 have essentially the same underlying memory model for multithreaded programs. C++11 adds a bunch of mutexes and other facilities on top of that, but the guarantees about instruction reordering are largely the same.rbarreira wrote:And when it comes to C++ which has multithreading built-in (unlike C) the compiler can make more assumptions, helping it to optimize this stuff.
I opened this thread with "could", not "would" in the subject line The likelihood of breakage maybe low, but it's a time bomb nevertheless, as with all forms of undefined behavior.Still, I wouldn't count on any compiler breaking Robert's lockless hashing anytime soon.
-
- Posts: 900
- Joined: Tue Apr 27, 2010 3:48 pm
Re: How could a compiler break the lockless hashing method?
I was thinking about mutexes specifically when I wrote that - if you use C++11's mutexes, I presume the compiler can do whole-program analysis to determine the potential side effects (located in whatever code uses the same mutex), instead of a blanket "don't optimize anything as the mutex can have any and all side effects".Rein Halbersma wrote:C11 and C++11 have essentially the same underlying memory model for multithreaded programs. C++11 adds a bunch of mutexes and other facilities on top of that, but the guarantees about instruction reordering are largely the same.rbarreira wrote:And when it comes to C++ which has multithreading built-in (unlike C) the compiler can make more assumptions, helping it to optimize this stuff.
-
- Posts: 27808
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: How could a compiler break the lockless hashing method?
Oh, then there isn't any real problem. Just never use the -flto, like you would avoid any -fauto-crash flag. If you are asking for trouble, you usually get it.Rein Halbersma wrote:You can read about gcc -flto here: http://gcc.gnu.org/onlinedocs/gcc-4.8.1 ... tions.html IIUC, it will never to this with -O3 alone, you need to explicitly pass -flto to both compiler and linker.
-
- Posts: 900
- Joined: Tue Apr 27, 2010 3:48 pm
Re: How could a compiler break the lockless hashing method?
That would be a very strange decision in the context of this thread - the only reason why you would risk using undefined behavior for lockless hashing is better performance (avoiding spinlocks or mutexes). Since it's performance you're after, you're crippling yourself by not using flto / IPO (as icc calls it). What's the rationale for that? At the very least you'd owe it to yourself to compare safe hashing + flto vs unsafe hashing without flto.hgm wrote:Oh, then there isn't any real problem. Just never use the -flto, like you would avoid any -fauto-crash flag. If you are asking for trouble, you usually get it.
This is not an academic question btw - many chess programs are being compiled with whole-program optimization (and in the case of Crafty it does a low-tech version of it by compiling the whole program as one .c file which #includes the other c files).
Re: How could a compiler break the lockless hashing method?
bob wrote:(1) acquire/release a lock on each probe or store. A performance killer because with a significant number of threads, and one hash probe and one hash store per node typically, the number of lock accesses is enormous. This kills performance.
Solution:
- create 256 locks
- address them by index: hash & 0xff
The chance that multiple threads are contending over the same lock is somewhat smaller.
I verified that this work. Well using Java that is.