How could a compiler break the lockless hashing method?

syzygy · Post by **syzygy** » Mon Dec 09, 2013 10:44 am

bob wrote:
syzygy wrote:
bob wrote:I compile search.c, hash.c and other files SEPARATELY. How can the compiler be certain there are no locks around HashStore() when it is called from search (which it can't see when compiling hash.c) and around HashProbe() when it is called?
Apart from this being a non-argument given that compilers nowadays can easily optimise across compilation units, you have apparently forgotten about crafty.c:
Code: Select all
/* last modified 01/18/09 */
/*
 *******************************************************************************
 *                                                                             *
 *   This module is designed for the most efficient compiling, as it includes  *
 *   all of the source files into one large wad so that the compiler can see   *
 *   all function calls and inline whatever is appropriate.  The includes are  *
 *   loosely ordered so that the most common functions occur first, to help    *
 *   with cache layout when the code is actually loaded.                       *
 *                                                                             *
 *******************************************************************************
 */
#include "search.c"
#include "thread.c"
#include "repeat.c"
#include "next.c"
#include "killer.c"
#include "quiesce.c"
#include "evaluate.c"
#include "movgen.c"
#include "make.c"
#include "unmake.c"
#include "hash.c"
...
Please grow up. Can you do that to Linux? Give it a whirl and report back. 99% of my crafty compiles are individual files for obvious speed reasons.

What on earth is the matter with you? Crafty evidently compiles as a single unit. I think it is time for you to consult your physician.

syzygy · Post by **syzygy** » Mon Dec 09, 2013 10:58 am

bob wrote:
syzygy wrote:
bob wrote:
syzygy wrote:Unless I'm mistaken, the C standard allows the compiler to load a value multiple times. Since the C99 standard does not know about threads, it may load the hash entry, do the xor-trick to verify that the entry can be used, and load the hash entry again. Between the two loads the hash entry may change its content due to other threads. There you go.
Doesn't break a thing, however.

Think about it.
I can only assume you did not read.

A compiler can legally rewrite:
Code: Select all
    word1 = htable->word1;
    word2 = htable->word2 ^ word1;
    if &#40;word2 == temp_hashkey&#41;
      break;
as
Code: Select all
  word2 = htable->word2 ^ htable->word1;
  if &#40;word2 == temp_hashkey&#41;
    break;
  word1 = htable->word1;
See the problem?
Don't see any problem at all. Doesn't change a thing. Next line after exiting the loop access word1, so it has to load word 1 anyway. This is a bad optimization, btw, it saves nothing whatsoever and adds the extra code to load word one after the loop breaks rather than within the loop. Don't need that kind of anti-optimization and I don't believe any good compiler would choose to do something so broken...

You don't see a problem, yet you need to argue that a good compiler won't do this anti-optimization.

I suppose this means that you DO see the problem?

So this is what I should have written:

Code: Select all

    word2 = htable->word2 ^ htable->word1;
    if &#40;word2 == temp_hashkey&#41;
      break;
  &#125;
  if &#40;entry < 4&#41; &#123;
    word1 = htable->word1;
    ...

Compare to your hash.c.
This rewrite is legal. The compiler may take your code and effectively produce the above code.
This rewrite breaks lockless hashing.

On architectures with few registers, a compiler may actually do this to avoid having to spill word1 onto the stack. Why store word1 on the stack when it can just as easily be reloaded from htable->word1?

Whether we can find an existing compiler for x86 that actually does this is not so interesting. What matters is that your code is broken.

Again, won't change a thing. I could even FORCE it to re-read it later by referring to it as volatile. But I don't care...

And you really do not realise that reloading defeats the purpose of the xor-trick?

syzygy · Post by **syzygy** » Mon Dec 09, 2013 11:05 am

Aaron Becker wrote:
bob wrote:
rbarreira wrote:
syzygy wrote:
rbarreira wrote:
syzygy wrote: The C standard in fact agrees with you:
5.1.2.3 ad 2 wrote:Accessing a volatile object, modifying an object, modifying a file, or calling a function that does any of those operations are all side effects, which are changes in the state of the execution environment. Evaluation of an expression may produce side effects. At certain specified points in the execution sequence called sequence points, all side effects of previous evaluations shall be complete and no side effects of subsequent evaluations shall have taken place. (A summary of the sequence points is given in annex C.)
So I don't think eliminating the xor-operations is allowed by the C standard.
What does "modifying an object" mean exactly?
An object is an area of memory. It's somewhere in the standard.

Look at the following transcript of a shell session I just did. Notice that even though I write two values to a, only one of them is actually written.
There is no sequence point between the two stores.
Are you sure? Seems to me there is, according to what I've read on sequence points.
No. But any functional call inserted between the two lines will establish one. Right now there is NOTHING between the two writes.
There is a sequence point between any two statements, by definition (see Annex C of the C11 standard). The presence of a sequence point doesn't mean that the compiler can't optimize away the first write, because the only guarantee about program behavior is that observable results agree with the abstract machine defined in the standard. Section 5.1.2.3 of the standard is the relevant part: http://www.open-std.org/jtc1/sc22/wg14/ ... /n1570.pdf

You are right, and Ricardo was right too.

It seems the xor's can be optimised away after all, by a sufficiently smart compiler.

Rein Halbersma · Post by **Rein Halbersma** » Mon Dec 09, 2013 11:11 am

bob wrote:Can you do that to Linux? Give it a whirl and report back. 99% of my crafty compiles are individual files for obvious speed reasons.

Interesting question as to whether the compiler could concatenate the entire Linux kernel and compile it in one go. You can google for "unity build" to find out more about this, but this is what gcc has to say:

http://gcc.gnu.org/onlinedocs/cpp/Imple ... imits.html

Maximum size of a source file.
The standard does not specify any lower limit on the maximum size of a source file. GNU cpp maps files into memory, so it is limited by the available address space. This is generally at least two gigabytes. Depending on the operating system, the size of physical memory may or may not be a limitation.

Whether that will succeed for Linux remains to be seen (maybe there is some order dependency between different source / header files) but at least the bottleneck will not lie with gcc.

rbarreira · Post by **rbarreira** » Mon Dec 09, 2013 11:14 am

syzygy wrote: You are right, and Ricardo was right too.

It seems the xor's can be optimised away after all, by a sufficiently smart compiler.

Now I'm going to play devil's advocate (and keep in mind I am unsure about many things here as well).

I think one of the things that may prevent this kind of optimization (besides not smart enough compilers) are function calls to things like the pthreads library or other library functions. For example, the reason why you don't need volatile on shared variables protected by locks is that the compiler will see your pthread_mutex_lock call and think "I don't know that external function, anything can happen here".

This would also be the case with any inline assembly code containing "asm volatile" (in gcc that tells the optimizer to assume the asm code has unpredictable side effects), but that's assembly code so it's of course outside the C standard.

On the other hand, compilers have been making more and more assumptions even about library calls, for example gcc can convert a printf to a puts in some cases. And when it comes to C++ which has multithreading built-in (unlike C) the compiler can make more assumptions, helping it to optimize this stuff.

Still, I wouldn't count on any compiler breaking Robert's lockless hashing anytime soon.

Rein Halbersma · Post by **Rein Halbersma** » Mon Dec 09, 2013 11:19 am

rbarreira wrote:And when it comes to C++ which has multithreading built-in (unlike C) the compiler can make more assumptions, helping it to optimize this stuff.

C11 and C++11 have essentially the same underlying memory model for multithreaded programs. C++11 adds a bunch of mutexes and other facilities on top of that, but the guarantees about instruction reordering are largely the same.

Still, I wouldn't count on any compiler breaking Robert's lockless hashing anytime soon.

I opened this thread with "could", not "would" in the subject line

The likelihood of breakage maybe low, but it's a time bomb nevertheless, as with all forms of undefined behavior.

rbarreira · Post by **rbarreira** » Mon Dec 09, 2013 11:22 am

Rein Halbersma wrote:
rbarreira wrote:And when it comes to C++ which has multithreading built-in (unlike C) the compiler can make more assumptions, helping it to optimize this stuff.
C11 and C++11 have essentially the same underlying memory model for multithreaded programs. C++11 adds a bunch of mutexes and other facilities on top of that, but the guarantees about instruction reordering are largely the same.

I was thinking about mutexes specifically when I wrote that - if you use C++11's mutexes, I presume the compiler can do whole-program analysis to determine the potential side effects (located in whatever code uses the same mutex), instead of a blanket "don't optimize anything as the mutex can have any and all side effects".

hgm · Post by **hgm** » Mon Dec 09, 2013 11:23 am

Rein Halbersma wrote:You can read about gcc -flto here: http://gcc.gnu.org/onlinedocs/gcc-4.8.1 ... tions.html IIUC, it will never to this with -O3 alone, you need to explicitly pass -flto to both compiler and linker.

Oh, then there isn't any real problem. Just never use the -flto, like you would avoid any -fauto-crash flag. If you are asking for trouble, you usually get it.

rbarreira · Post by **rbarreira** » Mon Dec 09, 2013 11:30 am

hgm wrote:Oh, then there isn't any real problem. Just never use the -flto, like you would avoid any -fauto-crash flag. If you are asking for trouble, you usually get it.

That would be a very strange decision in the context of this thread - the only reason why you would risk using undefined behavior for lockless hashing is better performance (avoiding spinlocks or mutexes). Since it's performance you're after, you're crippling yourself by not using flto / IPO (as icc calls it). What's the rationale for that? At the very least you'd owe it to yourself to compare safe hashing + flto vs unsafe hashing without flto.

This is not an academic question btw - many chess programs are being compiled with whole-program optimization (and in the case of Crafty it does a low-tech version of it by compiling the whole program as one .c file which #includes the other c files).

flok · Post by **flok** » Mon Dec 09, 2013 11:45 am

bob wrote:(1) acquire/release a lock on each probe or store. A performance killer because with a significant number of threads, and one hash probe and one hash store per node typically, the number of lock accesses is enormous. This kills performance.

Solution:
- create 256 locks
- address them by index: hash & 0xff

The chance that multiple threads are contending over the same lock is somewhat smaller.
I verified that this work. Well using Java that is.

How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?