How could a compiler break the lockless hashing method?

rbarreira · Post by **rbarreira** » Wed Dec 11, 2013 6:25 pm

bob wrote:
rbarreira wrote:
bob wrote:
Rein Halbersma wrote:
bob wrote: Sorry you feel that way. I simply have my opinion, developed over several years of compiler development.
Objection, facts not in evidence. Please show us which conforming C compiler you actually developed? Or any other compiler that was used by anyone except yourself, your students or coworkers.
Didn't write a C compiler. Did work on a FORTRAN compiler for a long time that was used in production on TI supercomputers. Did work on a basic compiler (not interpreter) that was used in production for several years both within academic labs and outside. I also wrote a (puke) Pascal compiler that was used for students to write assignments in for many years. Was not available on the machine we were using (A Xerox box) at the time. That enough?

Pick up a good compiler book. One of the classic tests for optimizers is to produce the exact same results as the unoptimized code. C does not meet that standard. It did in the past.
When exactly did C meet that standard? Give me a compiler that has a working optimizer and I can easily make programs that give different results based on the optimization level.
Most older gcc's, for example. Feel free to try. But the program MUST work correctly with no optimization, and not be based on intentional bad programming such as not defining a value before it is used...

Which version of gcc?

bob · Post by **bob** » Wed Dec 11, 2013 6:49 pm

wgarvin wrote:
bob wrote:
wgarvin wrote:
rbarreira wrote:Bob, I'm starting to wonder if your account has been taken over by someone. Now you're displaying ignorance about basic C and integer numbers in one single post.
I know. Its like talking to a wall. He just thinks he knows better than everyone else, even when he's repeating the same arguments that have been refuted multiple times already in the same thread.
Sorry you feel that way. I simply have my opinion, developed over several years of compiler development. Answer me this:

If you run an ASM program on an original Intel pentium, and then you run the same code on the pentium PRO which supported OOE, and you get DIFFERENT answers, would you consider that acceptable? Why or why not?
I'd be disappointed, because one of the selling points Intel used to get people to buy Pentiums was its backward-compatibility with 8086,80286, 80386 and 80486 code. Nonetheless, there was the fdiv bug, leading to a limited recall of the affected parts. And there are minor differences and errata between processor generations, though Intel and AMD mostly did a great job keeping them backward-compatible.
bob wrote: If you compile the same source with -O0 and then with -O3 and you get different answers, would you consider that acceptable? Why or why not?
Well if my source was a correct, standard-conforming program then it might indicate a compiler bug. But 99 times out of 100 when this happens, its because my code has a bug in it, and in that situation its completely acceptable. If I'm accessing out of bounds in an array, that's undefined behavior. If I've overflowed some pointer arithmetic, that's undefined behavior. If I've shifted an N-bit int by N or more bits, that's undefined behavior. I don't expect my compiler to magically translate my broken source code into working programs in such cases, so I'm not surprised when it fails to do so.

The assumption is a program that produces NO warnings. No undefined variables, and such. But possibly with things inside that might cause UB or not. More specifically.

(1) program conforms to standard exactly, no chance of UB anywhere (I don't see how such a program can be written, but that is a different topic.)

(2) program conforms to standards in general, but has a potential UB (let's stick with int overflow). Somewhere he does the usual a = b + c but either b or c is too big. Should you get different answers with or without optimization? Not on my watch.

bob wrote: Finally, do your two answers match? If not, why? If both are yes, we probably can't communicate at all. Semantics are semantics. I don't want the computer hardware changing them on the fly just so it can execute things faster. If both are "no" then we seem to be in perfect agreement.
So I answered Yes to the first q, and No to the second, as would any programmer who actually knows both x86 and C.

I actually know both and have been programming in C for over 30 years, and x86 since the first 8080 hit the street. I think that qualifies me as knowing both. I find it unacceptable that a compiler would take a C program and convert into something sub-optimal, compared to what I would do by hand. And I am talking about direct coding here, not clever optimizations. If, for some reason, a programmer uses x * 2 / 2 then I assume he has a reason. Don't assume anything, just do what he says. The compiler's job is NOT to "fix stupid". It is to convert the program it is given into something that executes and produces the results according to that program, not according to what it "thinks" the programmer meant.

bob wrote: Because today, OOE is guaranteed to produce the exact same results as if the program is executed in-order. But compilers provably do produce different results with -O0 and -O3 as already discussed. I consider that ridiculous. Even more, I consider it poor compiler development.
If your program gives different results when compiled at -O0 and -O3 then either it is not a correct program (which is extremely common) or it is depending on extensions to the language, extra semantics that compiler vendor has provided or an extra spec/set of library semantics such as pthreads (this is also extremely common). In any case, as I have said several times already, only the programmer has the power to produce correct programs. If you write an incorrect program there is no way your compiler can magically fix it for you. If you think you "know" the language better than the compiler does, and refuse to believe your program is broken, then neither I nor your compiler can help you.

bob wrote:
He gives examples from x86 code of "how compilers ought to do it" that haven't been current for almost 20 years, like his ridiculous imul / idiv example. Come on, bob! There's more than one form of imul instruction in x86 assembly, and there has been for a long time.
RTFM. Why do you think that give us the one operand imul instruction? Because of the possibility of overflow and they give us a solution to prevent it completely between a successive multiply/divide pair. No you don't have to use 'em. And if you don't, you will get the expected 32 bit wrap whether you do imul or mul.
Right. And your C compiler doesn't have to use 'em, because of the way the * operator is specified in C. And when it evaluates "X * 2 / 2" and that "expected 32 bit wrap" occurrs, the result is not the original value of X. But because of the way the language spec is written, compilers are allowed to optimize it back down to "X", and they do.

And I recall either you or hgm objected that nobody would write "X * 2 / 2" on purpose anyway... but that point was also addressed by Chris Lattner in those blog posts I have quoted and linked here multiple times. Stupid expressions like that often result from language abstractions (macro expansion, inlining and with C++, templates). Also, give some thought to loop induction variables. They might not overflow at the same points as the original variables would, so the expression might have different values in those cases. But since any overflow of the original variables makes it into undefined behavior, the compiler can safely use loop induction variables in those cases.

I don't believe the compiler writers would bother to do this stuff if it didn't actually speed up some actual programs out there. Their job is to generate the best possible code while complying with the spec. Our job is to write programs, but if we want those programs to work reliably, we also have to comply with the spec.

bob wrote:
You can use the one you quoted, but it has severe restrictions on which registers it can use. The one compilers usually use is the 32 * 32 -> 32 bit form, because it can be used with any of the general-purpose registers. And also because they know they only need 32 bits of result, since thats how the C language has been specified, since forever. Compilers also sometimes implement the multiply using shifts and adds, and they definitely implement divide by a constant using other instructions besides idiv, because idiv can be slow. All of this has been true for decades.
You make my point. They don't do it the "right" way according to the hardware, they do it the way that is compatible with hardware that doesn't have that option. Even though most existing hardware today is capable of doing so. On most non-x86 boxes that have real registers, if you multiply x times an even-numbered register, you get a even+odd register pair for the product. If you multiply x times an odd-numbered register, you get a single register value that will overflow much sooner than the pair of registers.

But ignoring that, let's take a simple example:

x = 0x40000000 on a 32 bit box, but it is passed in to a function so that the compiler can't see the value.

x * 2 / 2 produces 0x40000000 as expected.

But if you use 0x40000000 * 2 / 2, broken constant folding turns that into 0xc0000000 which is not quite the same. Why does it do them differently? Shouldn't it at LEAST be consistent in mathematical operations. Does "+" imply some different operation when applied to constants vs variables??? If so, why?

Bob's stubbornness in a debate is pretty amazing. I can only imagine the cognitive dissonance it must cause to argue, and believe, such a mix of contradictory and nonsensical positions.
Feel free to cite my "nonsensical position". If you think consistency is nonsensical, I can see why we don't agree. But that doesn't mean my believe that consistency is reasonable to expect is a flawed concept.
I admit I was tempted to go through the whole thread and make a list of all the things that I considered nonsensical. I think it would be a long and tedious task, and of little value in the end.

I agree that consistency is a nice property where its possible to have it, and that compiler writers should maybe do more than they have been doing to try and protect us from the dark corners of undefined behavior in C and C++. But you also have to acknowledge that locking down some of those undefined behaviors will have some costs: performance will suffer a bit. I think there is growing recognition lately that other things besides performance are pretty important too (safety, reliability, ability to reason about the program's behavior by reading the source code, etc.) so hopefully over the next few years the overall situation will get better.

Consistency again. "performance is pretty important" we both agree? Do we both agree that detecting overlapping addresses in strcpy() is not free?

Instead of the simple:

while ((*dest++ = *src++)); loop you first have to search down src to find the null, and then ask

is dest < src && dest + length < src
OR is src < dest && src + length <dest

AND of course

src != dest.

If we satisfy those, we again fall back to the simple loop. Or we might even use memcpy since we already know the length. Or we might ignore the src/dest tests and use memmove() instead.

So did apple help us here with performance? No, it is slower. Did they fix any bugs? No. Did they improve functionality (as in the redirect to memmove if they overlap)? No.

So exactly what did they accomplish other than to make a bunch of people do a bunch of debugging because the code is ONLY broken on Mavericks, nowhere else?

Sort of like this: A railroad track has some specifications. width of rail, thickness. Spacing between rails. Suppose there is a note that says "it is not advisable to use soft wheels on the track because if they disintegrate a crash will occur." Seems simple enough. Along come many companies that take their standard trucks, put offset wheels on them to make the tires line up with the rails, and then put small lipped steel wheels on the front and back to force the truck to follow the tracks. The various train companies then start to use these vehicles to service the tracks, much cheaper and faster than having an engine run down the tracks. And a track manufacturer says "Nothing says I can't make this track surface spiked rather than smooth, to help break ice that forms, and it won't hurt the trains since they have steel wheels. The truck driver gets a surprise, however. No crash as he protected against that. But he get stuck because his rubber wheels are shredded and that is how he applies power to the rails to move his truck. Acceptable? BTW a study later shows the spikes have no influence on ice, the weight of an engine is so great no ice forming on the track can withstand the pressure produced by the engine's wheels and it causes no problems anyway.

But the truck drivers are not very happy, they were doing their job with something that had worked for 50+ years, and suddenly it no longer did because some wise-ass decided "anyone using rubber wheels on a track is outside the specs, and subject to undefined behavior. We'll teach those SOB's to follow the specs EXACTLY or else...

Sounds like Apple?

All wheels rubber would represent a problem. Main wheels steel but applying power through a rubber wheel does NOT represent a problem. But they sure got taught a lesson, eh?

There is no reason to break something that is not broken, unless it is to make it better. Apple did not make it better, although they could have.

bob · Post by **bob** » Wed Dec 11, 2013 7:04 pm

rbarreira wrote:
bob wrote:
rbarreira wrote:
bob wrote: What I am asking of the compiler is "get out of the way and let the hardware do what it will do." Don't try to make assumptions.
That is not a reasonable request of you, because not only is there different hardware out there but even on x86 there are many different ways to calculate "x * 2":

- the compiler can use imul
- the compiler can use add
- the compiler can use shl
- the compiler can use lea
- the compiler could conceivably use SIMD (SSE) instructions, if it's multiplying several values in a loop.

Why do you assume only one of those is a valid response by the compiler?
If you notice * and / in same expression, imul with one operand is a natural choice. "do the right thing." My students do this on class assignments without having to be told, because they want to make sure that my input won't break their code. multiplying two numbers > 65535 is problematic, unless you then divide by something.

So, "do the right thing" as opposed to "do something." I mean, for an int multiply, it COULD use the floating point processor. Should it?
So you're asking compilers to detect the presence of divisions in an expression, and use a different instruction for multiplication if that is the case?

I should stop arguing with you but this is starting to become hilarious, so I probably won't.

Nope. If you "do the right thing" it will work as I want. Just do the math. Don't try to get cute with the optimizing and think I mean something I don't, just because there is a potential for integer overflow there. BTW don't you think it a little inconsistent to argue as you do yet still have code with potential undefined behavior in it? Or do you write in C and check EVERY add, subtract, multiply and divide operation for overflow before performing it? I'll bet if you look at gcc, it doesn't either, in its source code. Meanwhile, back at the ranch, the compiler optimizer guys are breaking things as quickly as they can for optimizations that are REALLY not very significant.

I realize there is a definite risk in fixed-word-length architectures. I agree to that when I choose to use that architecture. If it is going to overflow, let it overflow. If it is not, let it complete. Don't arbitrarily toss the code out in some cases, and leave it alone in others. Trying to handle just a subset of overflows is a REAL problem because it makes compiler output unpredictable. Don't assume that if someone writes if (a + 1 > a) that they really mean if (1). If they meant if(1) they would have likely written if(1). Overflow can occur on any arithmetic operation. Nothing the compiler can do, other than to check the overflow flag after every instruction. nobody would tolerate that slow-down. So leave it alone rather than assuming I meant something different from what I wrote. If I write it one way, compile it that way. If you want to produce a warning that says "Hey, you wrote if (a + 1 > a) which is always true unless an overflow occurs, you might consider writing if(1) instead. UNLESS you think an overflow might occur, in which case you wrote exactly the write thing."

Much better.

Why don't you do what I did. Compile a few big programs with and without -fwrapv, and see if you can find any speed difference. I tried several chess engines and found absolutely no difference in speed. For the record, I compiled (a) crafty, (b) stockfish; (c) gnu chess 6.0something (whatever is current on macports); (d) fruit; (e) Toga 2, and (e) an older version of Arasan. If this were a useful optimization, wouldn't you expect at least one of those integer-only programs to show an improvement? I mean, we KNOW it breaks some silly programs, like the infinite loop example. Where is that benefit to justify taking such a risk? Surely there must be some actual programs that this helps, somewhere. I might even try building gcc itself with -fwrapv to see if it makes the compiler run slower. However, something tells me "no". I'll report back on this a bit later, it does take a while to compile.

bob · Post by **bob** » Wed Dec 11, 2013 7:12 pm

Rein Halbersma wrote:
bob wrote:
rbarreira wrote: 2 billion CPUs in 3 months is not enough for you?

http://news.softpedia.com/news/ARM-Reco ... 6252.shtml
OK, my mistake.
"Just as I thought I was out, they pull me back in."
Mods, please make that post sticky! That single utterance alone was worth the price of admission.

Your quote was ambiguous. Here is what "my mistake" was a response to: (namely the shift left vs shift right where I thought someone said shifting left on signed was undefined.) In reality it was shifting right, something most architectures do perfectly. It was not about the number of cpus around, where x86 still has an enormous lead over arm or anything else.

bob wrote: OK, my mistake. Then why does gcc do it using SAR to replace divide,

BTW since you do seem to have some native-english issues, a fact is always correct, else it is not a fact. An opinion is never correct or incorrect, it is an opinion, a belief, etc. I've been wrong multiple times in the past and have stated so here. You've been pretty far "out there" in this discussion yourself, but you are free to have any opinion you want about the topic.

hgm · Post by **hgm** » Wed Dec 11, 2013 7:13 pm

bob wrote:All wheels rubber would represent a problem. Main wheels steel but applying power through a rubber wheel does NOT represent a problem. But they sure got taught a lesson, eh?

And all regular trains get a 20% higher energy bill because of the extra resistance they have to overcome, due to the spikes...

But we are really drifting very far now from the topic of this thread, which was not strcpy but the lockless hashing. The strcpy() is just a side track to refute arguments that compiler writers would not do intentionally malicious things, and that all things defined as UB would have an upside in terms of increased performance for at least some users. Not so! In the strcpy case everyone obviously suffers. No need to discuss it further. Compiler writers can be purely malicious if the standard allows them.

The question I worry about, however, is if similarly malicious future compiler writers could abuse the standard in a similar way to force people to use expensive locks on code like the lockless hashing that would work perfectly without them (i.e. making a one-time copy from a volatile hash table, and verifying and using the data only from that copy), by arguing absence of such locks falls under the standard's definition of UB, and use that as an excuse to summon nasal demons?

syzygy · Post by **syzygy** » Wed Dec 11, 2013 7:26 pm

bob wrote:
syzygy wrote:Do you have any clue what you are asking from the compiler when you write x * 2 / 2?

If x is a signed int, and you want signed overflow to wrap, which so far you have always wanted, then what you are asking is to get -12768 when x = 20000 and ints are 16-bit.

If you do not understand this, you should not touch a compiler.
What I am asking of the compiler is "get out of the way and let the hardware do what it will do." Don't try to make assumptions.

x * 2 / 2 is done correctly. The compiler can replace it with x if it wants, or it can do the multiply/divide which STILL produces x. But the compiler comes up with a different answer when IT chooses what should happen. That I do not like.

Let's switch to 32-bit ints. We take x = 1500000000.

Code: Select all

  int p = x * 2 / 2;
  int q = &#40;x * 2&#41; / 2;
  int r = x * 2;
  int s = r / 2;

What should be the values of p, q, r, s?

bob · Post by **bob** » Wed Dec 11, 2013 7:27 pm

rbarreira wrote:
bob wrote: No. You just refuse to read. Use the best math available on the cpu, and let overflow happen when it must.
But that is exactly what the compilers try to do. They use the "best math" (in which best is usually dictated by performance and not some irrelevant notion of "best" which can still give different results depending on the hardware).

So when a compiler sees (a + 1 > a) the "best math" says that is (1), which means the entire thing is optimized away? ALL computers have to deal with overflow/underflow. It is a necessary condition of fixed length words. Making assumptions that are not valid within that domain seems both risky and foolish.

As far as how to do this, have you looked at the multiply and divide instructions? Particularly the divide? If so you might note that the eax:edx target is the only choice. Which means the one operand imul is the logical multiply since it stores the product in both registers, ready for the idiv instruction. Otherwise you have to put the dividend in eax, do a cdq to fill in edx. So it is actually simpler to do it that way when a divide immediately follows a multiply. Yes you will get two different results, if the multiply overflows the code I gave gives the right answer, the other approach (single register multiply) does not. Which is "the best thing to do?" I use imul regularly, and most of the time in the more traditional imul dest,src which leaves the product in the single register dest. Because most of the time I don't follow up with a divide (most common reason is array address calculations). If overflow happens, let it happen. If you want to be GOOD, you can use the code I gave to make the * followed by / produce a sane result, assuming the divisor is big enough to convert the > 32 bit product back to a 32 bit or less quotient. Otherwise it will STILL overflow.

What I would like to see is that the compiler translates what I write into machine code such that what I wrote is what was done. If it can rearrange instructions, or hoist loop invariant stuff out of the loop, and such, that's all fine since it doesn't change the semantics of what I wrote one scintilla. But when it starts to actively change my code, repeating the mantra "if behavior is undefined, I can do whatever I want" then I have a problem with it. For integer overflow, undefined does NOT mean "unexpected". I can tell you exactly what the result will be for any two operands you supply. That is not exactly "undefined".

rbarreira · Post by **rbarreira** » Wed Dec 11, 2013 7:29 pm

bob wrote:
rbarreira wrote:
bob wrote:
rbarreira wrote:
bob wrote: What I am asking of the compiler is "get out of the way and let the hardware do what it will do." Don't try to make assumptions.
That is not a reasonable request of you, because not only is there different hardware out there but even on x86 there are many different ways to calculate "x * 2":

- the compiler can use imul
- the compiler can use add
- the compiler can use shl
- the compiler can use lea
- the compiler could conceivably use SIMD (SSE) instructions, if it's multiplying several values in a loop.

Why do you assume only one of those is a valid response by the compiler?
If you notice * and / in same expression, imul with one operand is a natural choice. "do the right thing." My students do this on class assignments without having to be told, because they want to make sure that my input won't break their code. multiplying two numbers > 65535 is problematic, unless you then divide by something.

So, "do the right thing" as opposed to "do something." I mean, for an int multiply, it COULD use the floating point processor. Should it?
So you're asking compilers to detect the presence of divisions in an expression, and use a different instruction for multiplication if that is the case?

I should stop arguing with you but this is starting to become hilarious, so I probably won't.
Nope. If you "do the right thing" it will work as I want. Just do the math. Don't try to get cute with the optimizing and think I mean something I don't, just because there is a potential for integer overflow there. BTW don't you think it a little inconsistent to argue as you do yet still have code with potential undefined behavior in it? Or do you write in C and check EVERY add, subtract, multiply and divide operation for overflow before performing it? I'll bet if you look at gcc, it doesn't either, in its source code. Meanwhile, back at the ranch, the compiler optimizer guys are breaking things as quickly as they can for optimizations that are REALLY not very significant.

I realize there is a definite risk in fixed-word-length architectures. I agree to that when I choose to use that architecture. If it is going to overflow, let it overflow. If it is not, let it complete. Don't arbitrarily toss the code out in some cases, and leave it alone in others. Trying to handle just a subset of overflows is a REAL problem because it makes compiler output unpredictable. Don't assume that if someone writes if (a + 1 > a) that they really mean if (1). If they meant if(1) they would have likely written if(1). Overflow can occur on any arithmetic operation. Nothing the compiler can do, other than to check the overflow flag after every instruction. nobody would tolerate that slow-down. So leave it alone rather than assuming I meant something different from what I wrote. If I write it one way, compile it that way. If you want to produce a warning that says "Hey, you wrote if (a + 1 > a) which is always true unless an overflow occurs, you might consider writing if(1) instead. UNLESS you think an overflow might occur, in which case you wrote exactly the write thing."

Much better.

Why don't you do what I did. Compile a few big programs with and without -fwrapv, and see if you can find any speed difference. I tried several chess engines and found absolutely no difference in speed. For the record, I compiled (a) crafty, (b) stockfish; (c) gnu chess 6.0something (whatever is current on macports); (d) fruit; (e) Toga 2, and (e) an older version of Arasan. If this were a useful optimization, wouldn't you expect at least one of those integer-only programs to show an improvement? I mean, we KNOW it breaks some silly programs, like the infinite loop example. Where is that benefit to justify taking such a risk? Surely there must be some actual programs that this helps, somewhere. I might even try building gcc itself with -fwrapv to see if it makes the compiler run slower. However, something tells me "no". I'll report back on this a bit later, it does take a while to compile.

But what is the right thing? You just told me it depends on the operators present in the expression. By your criteria, any compiler that compiles "x * 2 / 2" to use the shl or add instructions is not doing "the right thing". So what other expressions shouldn't use shl or add?

Yes, my programs probably have bugs, some related to UB and some not. In neither case would I blame the compiler if those bugs caused my program to break. But note that you don't have to do checking in every expression. Only an actual int overflow is UB, not any addition.

I can't do performance experiments right now, nor do I want to prove that every single gcc optimization works well. I'm not a gcc developer.

bob · Post by **bob** » Wed Dec 11, 2013 7:32 pm

rbarreira wrote:
bob wrote:
rbarreira wrote:
bob wrote:
Rein Halbersma wrote:
bob wrote: Sorry you feel that way. I simply have my opinion, developed over several years of compiler development.
Objection, facts not in evidence. Please show us which conforming C compiler you actually developed? Or any other compiler that was used by anyone except yourself, your students or coworkers.
Didn't write a C compiler. Did work on a FORTRAN compiler for a long time that was used in production on TI supercomputers. Did work on a basic compiler (not interpreter) that was used in production for several years both within academic labs and outside. I also wrote a (puke) Pascal compiler that was used for students to write assignments in for many years. Was not available on the machine we were using (A Xerox box) at the time. That enough?

Pick up a good compiler book. One of the classic tests for optimizers is to produce the exact same results as the unoptimized code. C does not meet that standard. It did in the past.
When exactly did C meet that standard? Give me a compiler that has a working optimizer and I can easily make programs that give different results based on the optimization level.
Most older gcc's, for example. Feel free to try. But the program MUST work correctly with no optimization, and not be based on intentional bad programming such as not defining a value before it is used...
Which version of gcc?

Any that don't break the infinite loop program posted here a couple of days ago for starters... In the case of strcpy(), ANY version of gcc will compile Crafty 23.8 and you can build a book perfectly. Any version of ICC will compile crafty 23.8 and you can build a book perfectly. Ditto for Dec's Alpha c compiler, Sun's sparc C compiler. So far only one compiler on planet earth has a library that breaks 23.8 and earlier versions. Only one. Mavericks libc is the culprit.

Take ANY compiler and compile Crafty with and without optimizations and see what you get. I do it regularly. And I see no changes whatsoever, node counts, scores, hash probes/hits, you name it. Exactly as expected. If I used any integer overflow within Crafty, I suspect that would not be the case, since the compilers prefer to do what they think I meant as opposed to what I actually wrote.

bob · Post by **bob** » Wed Dec 11, 2013 7:38 pm

rbarreira wrote:
wgarvin wrote: I don't believe the compiler writers would bother to do this stuff if it didn't actually speed up some actual programs out there. Their job is to generate the best possible code while complying with the spec. Our job is to write programs, but if we want those programs to work reliably, we also have to comply with the spec.
It's actually trivial to give examples of where not using Bob's favorite instruction (1-operand imul) results in better performance. The easiest would be a situation where the compiler is using the edx register and doesn't want it clobbered. A slightly more complicated one would be auto-vectorization of code, where the pmullud SSE instruction would be used.

Read my post AGAIN. If you do a divide, you ARE going to use edx. Period. Therefore a single-operand mul makes perfect sense since you need a quadword to do a 32 bit idiv anyway. If you don't do that form of imul, you get to move the number to eax, then follow that with a cdq (another instruction that extends eax into edx, then do the idiv.

So with a divide, you use edx or else don't do the divide. Unless you decide to go to SSE or whatever. Which is yet another can of worms.

How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?

Re: How could a compiler break the lockless hashing method?