Is a querying the hash tables such a huge bottleneck?

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

Carey
Posts: 313
Joined: Wed Mar 08, 2006 8:18 pm

Re: Is a querying the hash tables such a huge bottleneck?

Post by Carey »

bob wrote:
Carey wrote:
bob wrote:
Carey wrote:
bob wrote:DId you _read_ my last comment? Windows vista 64 bits + 64 bit gcc gives the expected sizeof(long) = 8. So I _did_ test windows. I don't use the microsoft C++ compiler so I have no idea what they do. But anyone that uses sizeof(long)=anything_other_than_8 on a 64 bit architecture is certainly brain-dead. But MS has never let that stop them in the past...

BTW the crafty code you quoted has been there since 1995 or so. More recent versions use long on HAS_64BITS architectures with Linux. I've never changed the windows stuff since __int64 has been around since 1995 as well...
Bob,

You might want to read the C standard documents some times. Especially the C99 one which actually talk about 64 bit integers.
I actually have looked at this. Many times. And yes, I am aware that the standard does not say what "long" means, exactly, just that it means >= int in all cases.

However, for as long as I can remember, "long" has always been "the longest integer data type the hardware supports." Which is both logical and rational. Cray has always had int = long since it is a 64 bit architecture. With the quirk that pointers were always 32 bits because the a-registers (used for address computations) were 32 bits (eventually, early on they were 24 bits).
In the old days of 32 bit systems, that was fairly true. But it was more convention than requirement because C didn't mandate anything back then.

But when you go beyond 32 bits, the traditions break down and that's why C99 made some rules.
But for "long" on true 64 bit systems, I have yet to find a single case where long is < 64 bits, except, apparently some version of the microsoft C compiler (I have not verified which versions, if any, or if all, follow this rather odd practice). I did, long ago, verify that at least for linux, on any 64 bit platform I have tried, long = 64 bits. For X86_64, both gcc and icc follow this. And on my home windows vista 64 box, gcc does the same.

Not much more to say on the subject. Yes, the "standard" is lousy in many ways, such as why is an int always signed by default, while a char may be signed or unsigned at the whim of the compiler writer? "long long" was a non-standard extension that I used in 1995 in the first versions of Crafty when running on a 64 bit sun Sparc as well as on 32 bit X86/linux boxes.
Again, tradition. When the C89 standard was developed, their mandate was to "codify existing practice". They had to take what was already being done and make it law. And they had to break as little existing code as possible. They didn't have much room to be creative or even fix flaws in K&R C.

singed vs. unsigned char again goes back to the existing implementations they had to deal with back then.

(Also, as a side point, *ONLY* unsigned char is guaranteed to copy all the unmodified bits of a byte. Signed char doesn't because it can normalize +0 and -0 on hardware that doesn't know that zero doesn't actually have a sign. And only UChar is guaranteed to actually copy all the bits where as signed char can ignore some. As I said... C89 had to deal with a lot of weird existing implementations and hardware, and trying to do that resulted in some weird wording in the standard.)
That is a new one on me. I can't think of a single platform I have not worked on in the past 40 years, and none of them had issues with a -0, because that doesn't exist in 2's complement. In the days of packed decimal, yes. But for an 8-bit char (and of course not all machines even had 8 bit chars, univac/cdc used 6 bits as an example).

There were machines that would "sign extend" when moving a character to a word, and some that could not do so without explicit programming steps, which was what I had assumed led to the great signed/unsigned debacle.
I can't remember all the details and rationale for it, it's been a long time.

It was something P. J. Plauger wrote about after C89 was standardized. He was heavily involved in the C89 standard throughout the whole process. (I used to love reading his columns in C Users Journal and Embedded Systems Programming. His essays in E.S.P. were the main reason I subscribed.)

He was talking about some of the odd wording in the standard and the reasoning behind it and the troubles they had trying to accomodate everybody and the old systems, etc. And then how after everything was finalized, requests for interpretations started coming and they started discovering that in some places the standard they just finished didn't actually say what they thought it did.

It was just one of those things that just stuck in my head and so to this day I make sure I'm using unsigned char's for that kind of stuff even though it'll never matter on any of the systems I'll ever use.




short is 16 bits.

int is defined as the natural word size. It is *not* required to be 32 or 64 bits. It can be 16 bits, even. It's up to whatever the compiler writer chooses. Implementation defined. On a 32 bit compiler, it should be 32 bits, of course. On a 64 bit system, 64 bit int's would be the more appropriate size.

long is defined as at least 32 bits. On a 64 bit system, it can be either 32 or 64 bits.

long long is defined as at least 64 bits.


So it is entirely possible for 'int' to be 16 bits and 'long' at 32 bits and 'long long' at 64 bits. Not likely, of course. But possible and standard conforming.


That's why the C99 standard set up the stdint.h header to provide a nice portable way the programmer can be guaranteed of having the right size integers when they need it.

For C89, it doesn't even know about 64 bit integers, and the compiler writer is on his own and the programmer should actually check to make sure. In this case, 'int' is often 32 and 'long' 64 bits. But that violates the C convention / requirement that 'int' is the machine word size.


Of course, we've had this discussion before, and I'm well aware you don't like the C99 standard because not all compilers (<cough> Microsoft <cough>) support it.



But don't go saying that 'long' *must* be 64 bits because it doesn't.

I do not believe I said "it must be 64 bits". I rather said that it has been 64 bits on _every_ 64 bit architecture and compiler combination I have tried. You can look at the Crafty Makefile to see just how comprehensive that list of architectures and O/S choices is. Apparently there is / was / could-be some MS compiler that is/was brain-dead...
Maybe you didn't say 'must', but it sounded like it. It certainly sounded like 'always', though.

But I will admit that integer sizes etc. are something that really annoys me. C says what they are and it seems like people are still going around assuming they are still what they were 20 years ago on 16 bit PC.

People are still casting pointers to 'int' or 'long' even though C99 provides a guaranteed data type to hold it. It's stupid, but people do it. Just habit. (I guess that really says it. Just like when people moved from 16 bit DOS with 'huge' pointers to 32 bits, things were a bit painful. Now that everybody is comfortable with 32 bits, they are starting to take those habits to 64 bits.)

It's just such a habit to use 'short', 'int', and 'long' that they just don't bother to use data types that are actually guaranteed to be the size they need, even if there is a chance that a too small or big size could crash the program.

(You've mentioned Cray... I remember reading a few articles from Cray's programmers complaining about the idiotic programming of 32 bit programmers. Such as cases where a 32 bit int with all bits set being equated with -1, which of course fails when Cray tried to port those programs to the 64 bit Cray systems. Or shifting bits around and expecting them to fall off the end, and then comparing them to a constant.)



And you wont get any argument from me about the arrogance of the MS Compiler writers. They still refuse to support C99 even. Not can't, but just flatly refuse. I think that if they thought they could get away with it, they'd write a whole new proprietary language and then call it C and claim it was always that way.
Fortunately, I don't live in "that" world. :)
<laugh> There have been times over the years that I've intently wished I could move away from Microsoft stuff. But it's never worked out. Linux was just never suitable for me. I use MingW when I can, but as for the Microsoft OS WinDoze....

Back when XP came out, I bought XP Pro and told myself that I'd never buy another MS operating system.

Then when Vista came out, I actually bought a retail copy of Vista Ultimate, for the 64 bit version. Tried it, hated Vista, and stuck it on the shelf without even registering it. Then I bought three brand new computers that had Vista installed and couldn't go back to XP.

When Win 7 came out I bought the family pack and upgraded the Vista systems to Win 7.

When Win 8 comes out.....

(sigh) Somebody shoot me and put me out of my misery... (grin)

And that would even violate tradition and break a large number of programs that expect it at 32 bits. (Yes yes, I know programmer's shouldn't write a program that expects 'int' or 'long' to be a certain size without first checking, but many programs have done so in the past 25 years. Very few people these days actually bother to check to make sure a var type is the right size. They just assume 'int' is 32 bits and 'long' is probably 32 bits too.)


I'm not saying you should change your code because it's wrong or ugly or doesn't work or anything else. I'm just saying that 'long' does *NOT* have to be 64 bits. The C standard is pretty clear about that.

If you want to say that 'long' is 64 bits on the compilers you use and so that's what you program for, then that's a perfectly fine statement.
Actually, I think that _is_ what I said. :)
Actually.... no. You didn't.
Check this out:
bob wrote: I am using both Intel and GCC on my 64 bit linux boxes, and I can _guarantee_ you that a "long" is 64 bits:

Same code on windows produces the same results for 64 bit Windows Vista running gcc (my home box).. I don't have any 32 bit machines but did in the past, and there "long" was 32 bits as was "int"...
The original quote I responded to was this:
long is 32-bit even with a 64-bit version of Windows.
I certainly gave an example where that is false. Because I tested a 64 bit versions of windows vista with gcc and long = 64 as expected.

The problem is that by saying 'long' is 64 bits, you leave a gap at 32 bits.

C has always had the concept that 'int' is the natural word size of the program. Whether it's a 16 bit, 32 bit, 36 bit, 60 bit, 64 bit, etc. processor. As long as it's at least 16 bits.

If you make 'int' 32 bits on a 64 bit system, then you just violate 30 years of C tradition. It is allowed, but it's not.... 'expected' behavior. It's like making a compiler where 'int' is 27 bits on a 32 bit system. Sure you can do it, but you shouldn't.
Unfortunately, this is a dead horse, since X86-64 already violates this since int is still 32 while long is 64 (excepting msvc apparently).

Just because the horse is dead doesn't mean you should let it lay there and stink up the stables.


'int' is supposed to be a floating size that is determined by the whatever is most appropriate for the hardware. The natural register word size that the cpu operates on most efficiently.
Which would be 64 bits on x86-64 of course, but we don't have that sane approach anywhere.
I know. I know. C99 just didn't come out nearly quick enough. C89 took much *much* longer to finalize than they expected, so they didn't get the chance to fix stuff until it was too late.

That's one of the reasons I don't like using int & long when there is any need for them to be more than 32 bits.

I'd much rather explcitly state the size I'm using and and use that typedef'ed type as needed. That way there is never a misunderstanding later about what size it's supposed to be.
I have not tested Cray compilers on x86-64 architectures, would be interesting to see what they did, since on the original cray-1 architecture (and successors thru T90) int = long = 64.

That's why C99 developed stdint.h. So the programmer can depend on the sizes of integer types.

64 bit C compilers started showing up (from Cray etc.) before C99 was ratified. So many compiler writers just came up with their own solution. It would have been nice if C99 was C90 and avoided a lot of that, but of course that didn't happen.

The C99 standard provides these things to guarantee sizes to the programmer, so that no matter what compiler or what hardware is being used, they know for sure exactly what they are getting. Why not use them???

Sure, you can go by convention (ie: I tried it on the sytems I have, so it'll work everywhere!) or you use the explicitly sized types provided by C99 and guarantee it'll work and at the same time make sure that the readers understand that the var you just defined is supposed to be a certain size and not whatever the compiler writer chose.


As I've said before, this is just something that really irritates me. C gives you the tools to do it right, but most don't bother. Most don't even bother to explicitly indicate what bit size is needed for their vars. If a 'long' is good enough then they don't bother to document the bit size or do any checks to make sure it's not too small or too big.
One reason for not using them is compatibility. There are a zillion compilers in existence today, and most are not C99 compatible, which would mean some ugly #ifdef stuff (Crafty already has enough just to deal with the MS C++ flaws) not to mention different systems have different libraries with different #includes needed.
I know that many compilers don't.

But you can still fake it. Do a portability header as needed and typedef the appropriate things to int64_t etc. etc.

That way it's always clear to who ever is reading the code that you are explicitly expecting & requiring data sizes of a certain size.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Is a querying the hash tables such a huge bottleneck?

Post by bob »

CThinker wrote:
bob wrote: DId you _read_ my last comment? Windows vista 64 bits + 64 bit gcc gives the expected sizeof(long) = 8. So I _did_ test windows. I don't use the microsoft C++ compiler so I have no idea what they do. But anyone that uses sizeof(long)=anything_other_than_8 on a 64 bit architecture is certainly brain-dead. But MS has never let that stop them in the past...
This cannot be correct. No compiler for Windows will be useful if it treats long as 64-bit because there are plenty of Win32 APIs that have long parameters and expect them to be 32-bit. Whatever that test that you did is definitely flawed.

Here is the output of the 64-bit gcc (mingw64) and 64-bit LCC running on 64-bit Windows.

Code: Select all

C&#58;\Temp>type test.c
#include "stdio.h"
int main&#40;)&#123;
#ifndef _WIN64
error&#58; not 64-bit
#endif
printf&#40;"%i",sizeof&#40;long&#41;);
&#125;

C&#58;\Temp>\etc\mingw64\bin\gcc.exe test.c

C&#58;\Temp>a.exe
4
C&#58;\Temp>\etc\lcc\bin\lcc64.exe -IC&#58;\Etc\lcc\include64 test.c

C&#58;\Temp>\etc\lcc\bin\lcclnk64.exe test.obj

C&#58;\Temp>test.exe
4
C&#58;\Temp>
And here is for the 64-bit Intel C:

Code: Select all

C&#58;\Temp>type test.c
#include "stdio.h"
int main&#40;)&#123;
#ifndef _WIN64
error&#58; not 64-bit
#endif
printf&#40;"%i",sizeof&#40;long&#41;);
&#125;

C&#58;\Temp>icl test.c
Intel&#40;R&#41; C++ Intel&#40;R&#41; 64 Compiler Professional for applications running on Intel
&#40;R&#41; 64, Version 11.1    Build 20100806 Package ID&#58; w_cproc_p_11.1.067
Copyright &#40;C&#41; 1985-2010 Intel Corporation.  All rights reserved.

test.c
Microsoft &#40;R&#41; Incremental Linker Version 9.00.21022.08
Copyright &#40;C&#41; Microsoft Corporation.  All rights reserved.

-out&#58;test.exe
test.obj

C&#58;\Temp>test.exe
4
C&#58;\Temp>
So there you go. 64-bit GCC, 64-bit LCC and 64-bit Intel-C. all have 32-bit long in Windows.

This should be very easy for anyone to validate. Intel-C is free for 30 days here:
http://software.intel.com/en-us/article ... valuation/

64-bit GCC for Windows is free:
http://mingw-w64-dgn.googlecode.com/fil ... 0101009.7z

64-bit LCC is free:
http://www.q-software-solutions.de/pub/lccwin64.exe
My gcc is a version doesn't agree with that, as I posted. I won't begin to speculate on who does what to whom. I downloaded this pre-built version so I could compile and test on my wife's home box. And long is certainly 64 bits. This is not the mingw version of the compiler, however... When I get home I will try to see what it is. When I left, I had started the upgrade to windows 7, so it might be a day or two until I get past that hurdle...
wgarvin
Posts: 838
Joined: Thu Jul 05, 2007 5:03 pm
Location: British Columbia, Canada

Re: Is a querying the hash tables such a huge bottleneck?

Post by wgarvin »

bob wrote: My gcc is a version doesn't agree with that, as I posted. I won't begin to speculate on who does what to whom. I downloaded this pre-built version so I could compile and test on my wife's home box. And long is certainly 64 bits. This is not the mingw version of the compiler, however... When I get home I will try to see what it is. When I left, I had started the upgrade to windows 7, so it might be a day or two until I get past that hurdle...
Maybe its a cygwin build or something? The mingw ones probably have sizeof(long)==4, to match what Microsoft's compiler does. I expect the Intel compiler also matches Microsoft when compiling for win32. It would not be surprising if Linux builds of that compiler match what Linux usually does though, where sizeof(long)==8.

It turns out there are actually somewhat-standardized names for these size schemes: http://en.wikipedia.org/wiki/64-bit#64-bit_data_models

"LLP64" is what 64-bit Windows uses ("Long long and pointers are 64-bit").

Most Unix and Unix-like systems, including Linux, use "LP64" which is ("long and pointers are 64-bit").

Anyway, the 32-bit size of long is effectively part of the ABI on Windows and can't easily be changed. Even if Microsoft changed all their header files to use "unsigned int" instead of "unsigned long" in their typedefs for 32-bit types, there is plenty of object code and libraries being distributed out there which would not be able to link properly to new code if they changed the size of that type in the new code. So we're probably stuck with those sizes on Windows forever, or at least for a long time.

I've worked on several large C++ projects at various different companies which all had one thing in common: they defined their own complete set of known-sized types ("U8", "U16", "U32", "U64", "S8", "S16", "S32", "S64", "F32", "F64", "UPTRSIZE", "SPTRDIFF" etc.) and most or all of the code would use those types and never the built-in ones. On some projects, ALL builtin integer types except for character types and bool are completely verboten -- using the type "int" anywhere in some code was enough for it to fail a code review.

The rationale was that programmers are not great in general at picturing in their head the multiple conflicting sizes of a type and all of the consequences of mixing any of those possible sizes with the possible sizes of other types when they do math on them, pack them together into structures, perform bit-twiddling tricks on them, etc. Yes, some compiled code may be more efficient if you used "int" instead of some non-optimally-sized type. But we deemed it more important to be able to read the code from top to bottom and know what the actual behaviour was going to be, which was much easier when the primitive types involved had a known size and so overflow, arithmetic right shift, etc. would behave the way we expected across multiple target platforms (assuming 2's complement, etc.)

I adopted that same convention in my own little projects, the first thing I always do is make a header file with known-size typedefs in it. Even though there are fixed-size types now standardized in C99, I still think its better to have your own typedefs. It's essential if you want to be able to support non-C99 compilers (which I do), because its not legal/portable to define your own typedefs with the same names as the standard ones, and you want to be able to change or override your typedefs when you're faced with porting to a weird compiler. So its always better to have your own, which you can configure to match whatever compiler(s) you're using.

The game engines at my current employer all work this way too: we have some native 32-bit targets (e.g. Win32) and some native 64-bit targets (e.g. Xbox360 or PS3, 64-bit Windows) and all the code uses typedefs like the ones I listed above, we almost never use a bare "int" or "unsigned" or "long".

[Edit: also note one gotcha for people who are going to use the stdint.h types in C99, but also their own typedefs... it has to do with the strict aliasing rule. If you define your own type "U8" which is a typedef for "unsigned char", then your U8 type will technically be a "character type" so it will be allowed to alias with arbitrary other types (i.e. you can type-pun a "pointer to Foo" into a "pointer to U8" and access it through that pointer, without breaking the strict aliasing rule. Casting back the other way is not valid though. Anyway, if you rely on this, watch out because I don't think stdint.h type "uint8_t" is guaranteed to be a character type. So you could easily end up with code that obeys the strict aliasing rule when not using the C99 standardized types, but breaks the rule when it is using them. Instead, its much safer to always use an explicit "unsigned char*" or something when you want to do such type-punning. Either that, or make the actual data types into unions with an array of something else that you want to access it as... Or just disable strict aliasing on your compiler options (but that is not a good path going forward).]
wgarvin
Posts: 838
Joined: Thu Jul 05, 2007 5:03 pm
Location: British Columbia, Canada

Re: Is a querying the hash tables such a huge bottleneck?

Post by wgarvin »

Carey wrote:The problem is that by saying 'long' is 64 bits, you leave a gap at 32 bits.

C has always had the concept that 'int' is the natural word size of the program. Whether it's a 16 bit, 32 bit, 36 bit, 60 bit, 64 bit, etc. processor. As long as it's at least 16 bits.

If you make 'int' 32 bits on a 64 bit system, then you just violate 30 years of C tradition. It is allowed, but it's not.... 'expected' behavior. It's like making a compiler where 'int' is 27 bits on a 32 bit system. Sure you can do it, but you shouldn't.

'int' is supposed to be a floating size that is determined by the whatever is most appropriate for the hardware. The natural register word size that the cpu operates on most efficiently.

That's why C99 developed stdint.h. So the programmer can depend on the sizes of integer types.
Actually, for x86-64 you can make a good case that 32 bits IS the most appropriate size for the hardware. Sure, it can also use 64 bits with no penalty, but then instructions are longer, constants are larger, etc. On the other hand, the vast majority of code that's using "int" for a loop counter or something, does not need more than 32 bits. And when somebody writes "int myTable[4000] = { 5, 5, 10, ... }" they'll end up with a 16KB table instead of a 32KB one. [Edit: if pointers are 8 bytes, I'm a little more sympathetic to the argument that int should be the same size; but still, there are standardized types like ptrdiff_t and size_t that C or C++ programmers should know to use in those situations.]

If there was any significant penalty for doing 32-bit operations on these x86-64 chips, then it would be clear that 64 bits was more appropriate for int. I don't believe that's the case though.

You could look at it the other way: x86-64 chips have perhaps the *best* support ever implemented in a chip for 8-, 16-, 32- and 64-bit integer types. For example, some 32-bit RISC chips suck at loading and storing 8- or 16-bit quantities. Some 64-bit chips are not great at 32-bit math and vice versa (requiring extra instructions even for things like zero extension, which are "free" on x86-64). x86-64 lets you work with all sizes with ease, and with minimal degradation of performance. x86-64 also inherits the nice SIB addressing from x86, giving compilers nice flexibility when they need to address values that aren't in a register. x86-64 chips have excellent support for misaligned access, with small types even the penalties for crossing a cache line are reasonably small and getting smaller with each generation of chip. Along with their many other strengths (branch prediction, out-of-order execution, smaller code size than RISC ISAs) today's x86-64 chips are very programmer-friendly: it's easy to get good performance out of them without optimizing specifically for each chip.

Anyway, whether your x86-64 compiler uses 32-bits or 64-bits for int or long types, shouldn't matter. I think programmers should not use "int" and "long" directly, but should instead use a type that they *know* is either 32 bits or 64 bits, because they defined it themselves.
User avatar
hgm
Posts: 27787
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Is a querying the hash tables such a huge bottleneck?

Post by hgm »

I don't think you can say that AMD64 architecture has a most natural size for integers. They support 32-bit and 64-bit on equal footing. You can indicate your preference by setting the data-length bit in the segment descriptors, not? In 64-bit mode ("long mode", so not "compatibility mode"!), the default address size is 64 bit, but the default data size is 32 bit.
wgarvin
Posts: 838
Joined: Thu Jul 05, 2007 5:03 pm
Location: British Columbia, Canada

Re: Is a querying the hash tables such a huge bottleneck?

Post by wgarvin »

hgm wrote:I don't think you can say that AMD64 architecture has a most natural size for integers. They support 32-bit and 64-bit on equal footing. You can indicate your preference by setting the data-length bit in the segment descriptors, not? In 64-bit mode ("long mode", so not "compatibility mode"!), the default address size is 64 bit, but the default data size is 32 bit.
Yes, the operating system makes that choice for you though. User mode code doesn't usually mess with segment descriptors. Its a pretty fundamental choice they had to make, because they can not easily change it going forward (unless they add a new "mode" for code segments everywhere in their loader and thunking and DLL imports and dozens of similar things, and then continue to support both the old kind and the new kind of code segment, forever).

FWIW I think Microsoft made the right decision, because the transition from 16 to 32 bit was an obvious case of "16 bits isn't big enough" but I don't think the same thing is true at all of the transition from 32 to 64. Yes there are plenty of cases where having 64 bits is useful, but I venture a guess that *most* data in *most* programs is easy to represent in 32 bits. Making 64 the default for everything would seem to me to be overkill.

Its easy to forget that most workloads are not like bitboards :lol: After all, when you're just writing some code to do some minor task, how often do you actually use 64-bit types in it? I haven't done that very often, because I find that 32 bits is usually enough. I would use 64 bits for file sizes or offsets, because files bigger than 4 GB are not uncommon nowadays. I wouldn't use 64 bits for a "number of files" counter though -- when was the last time you did a batch processing of over four billion files at once?
User avatar
michiguel
Posts: 6401
Joined: Thu Mar 09, 2006 8:30 pm
Location: Chicago, Illinois, USA

Re: Is a querying the hash tables such a huge bottleneck?

Post by michiguel »

wgarvin wrote:
Carey wrote:The problem is that by saying 'long' is 64 bits, you leave a gap at 32 bits.

C has always had the concept that 'int' is the natural word size of the program. Whether it's a 16 bit, 32 bit, 36 bit, 60 bit, 64 bit, etc. processor. As long as it's at least 16 bits.

If you make 'int' 32 bits on a 64 bit system, then you just violate 30 years of C tradition. It is allowed, but it's not.... 'expected' behavior. It's like making a compiler where 'int' is 27 bits on a 32 bit system. Sure you can do it, but you shouldn't.

'int' is supposed to be a floating size that is determined by the whatever is most appropriate for the hardware. The natural register word size that the cpu operates on most efficiently.

That's why C99 developed stdint.h. So the programmer can depend on the sizes of integer types.
Actually, for x86-64 you can make a good case that 32 bits IS the most appropriate size for the hardware. Sure, it can also use 64 bits with no penalty, but then instructions are longer, constants are larger, etc. On the other hand, the vast majority of code that's using "int" for a loop counter or something, does not need more than 32 bits. And when somebody writes "int myTable[4000] = { 5, 5, 10, ... }" they'll end up with a 16KB table instead of a 32KB one. [Edit: if pointers are 8 bytes, I'm a little more sympathetic to the argument that int should be the same size; but still, there are standardized types like ptrdiff_t and size_t that C or C++ programmers should know to use in those situations.]

If there was any significant penalty for doing 32-bit operations on these x86-64 chips, then it would be clear that 64 bits was more appropriate for int. I don't believe that's the case though.

You could look at it the other way: x86-64 chips have perhaps the *best* support ever implemented in a chip for 8-, 16-, 32- and 64-bit integer types. For example, some 32-bit RISC chips suck at loading and storing 8- or 16-bit quantities. Some 64-bit chips are not great at 32-bit math and vice versa (requiring extra instructions even for things like zero extension, which are "free" on x86-64). x86-64 lets you work with all sizes with ease, and with minimal degradation of performance. x86-64 also inherits the nice SIB addressing from x86, giving compilers nice flexibility when they need to address values that aren't in a register. x86-64 chips have excellent support for misaligned access, with small types even the penalties for crossing a cache line are reasonably small and getting smaller with each generation of chip. Along with their many other strengths (branch prediction, out-of-order execution, smaller code size than RISC ISAs) today's x86-64 chips are very programmer-friendly: it's easy to get good performance out of them without optimizing specifically for each chip.

Anyway, whether your x86-64 compiler uses 32-bits or 64-bits for int or long types, shouldn't matter. I think programmers should not use "int" and "long" directly, but should instead use a type that they *know* is either 32 bits or 64 bits, because they defined it themselves.
Yes... but I think it is a bit dogmatic. I agree if the storage size will matter or the programmer does bit manipulation with them, but there are situations in which none of that matters and you may want to use int or unsigned because it will be the fastest type even if the code is ported. For instance, when it is used locally as a counter in a loop (as long as the minimum default it is big enough, of course).

I think that other types like size_t and ptrdiff_t should be used more.

Miguel
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Is a querying the hash tables such a huge bottleneck?

Post by bob »

wgarvin wrote:
hgm wrote:I don't think you can say that AMD64 architecture has a most natural size for integers. They support 32-bit and 64-bit on equal footing. You can indicate your preference by setting the data-length bit in the segment descriptors, not? In 64-bit mode ("long mode", so not "compatibility mode"!), the default address size is 64 bit, but the default data size is 32 bit.
Yes, the operating system makes that choice for you though. User mode code doesn't usually mess with segment descriptors. Its a pretty fundamental choice they had to make, because they can not easily change it going forward (unless they add a new "mode" for code segments everywhere in their loader and thunking and DLL imports and dozens of similar things, and then continue to support both the old kind and the new kind of code segment, forever).

FWIW I think Microsoft made the right decision, because the transition from 16 to 32 bit was an obvious case of "16 bits isn't big enough" but I don't think the same thing is true at all of the transition from 32 to 64. Yes there are plenty of cases where having 64 bits is useful, but I venture a guess that *most* data in *most* programs is easy to represent in 32 bits. Making 64 the default for everything would seem to me to be overkill.

Its easy to forget that most workloads are not like bitboards :lol: After all, when you're just writing some code to do some minor task, how often do you actually use 64-bit types in it? I haven't done that very often, because I find that 32 bits is usually enough. I would use 64 bits for file sizes or offsets, because files bigger than 4 GB are not uncommon nowadays. I wouldn't use 64 bits for a "number of files" counter though -- when was the last time you did a batch processing of over four billion files at once?
I can certainly say that I blow the 4 billion unsigned counter limit all the time...

But in any case, we have always had "ints" and "longs" Does it really make sense to treat them as the same thing, when the hardware actually has support for 64 bit instructions and has 64 bit registers?
rbarreira
Posts: 900
Joined: Tue Apr 27, 2010 3:48 pm

Re: Is a querying the hash tables such a huge bottleneck?

Post by rbarreira »

bob wrote:
wgarvin wrote:
hgm wrote:I don't think you can say that AMD64 architecture has a most natural size for integers. They support 32-bit and 64-bit on equal footing. You can indicate your preference by setting the data-length bit in the segment descriptors, not? In 64-bit mode ("long mode", so not "compatibility mode"!), the default address size is 64 bit, but the default data size is 32 bit.
Yes, the operating system makes that choice for you though. User mode code doesn't usually mess with segment descriptors. Its a pretty fundamental choice they had to make, because they can not easily change it going forward (unless they add a new "mode" for code segments everywhere in their loader and thunking and DLL imports and dozens of similar things, and then continue to support both the old kind and the new kind of code segment, forever).

FWIW I think Microsoft made the right decision, because the transition from 16 to 32 bit was an obvious case of "16 bits isn't big enough" but I don't think the same thing is true at all of the transition from 32 to 64. Yes there are plenty of cases where having 64 bits is useful, but I venture a guess that *most* data in *most* programs is easy to represent in 32 bits. Making 64 the default for everything would seem to me to be overkill.

Its easy to forget that most workloads are not like bitboards :lol: After all, when you're just writing some code to do some minor task, how often do you actually use 64-bit types in it? I haven't done that very often, because I find that 32 bits is usually enough. I would use 64 bits for file sizes or offsets, because files bigger than 4 GB are not uncommon nowadays. I wouldn't use 64 bits for a "number of files" counter though -- when was the last time you did a batch processing of over four billion files at once?
I can certainly say that I blow the 4 billion unsigned counter limit all the time...

But in any case, we have always had "ints" and "longs" Does it really make sense to treat them as the same thing, when the hardware actually has support for 64 bit instructions and has 64 bit registers?
It's more a case of maintaing portability with old windows code than a case of trying to make sense.

Regardless, long long or uint64_t do the job just fine and it's standard under C99, so it's not a big deal.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Is a querying the hash tables such a huge bottleneck?

Post by bob »

rbarreira wrote:
bob wrote:
wgarvin wrote:
hgm wrote:I don't think you can say that AMD64 architecture has a most natural size for integers. They support 32-bit and 64-bit on equal footing. You can indicate your preference by setting the data-length bit in the segment descriptors, not? In 64-bit mode ("long mode", so not "compatibility mode"!), the default address size is 64 bit, but the default data size is 32 bit.
Yes, the operating system makes that choice for you though. User mode code doesn't usually mess with segment descriptors. Its a pretty fundamental choice they had to make, because they can not easily change it going forward (unless they add a new "mode" for code segments everywhere in their loader and thunking and DLL imports and dozens of similar things, and then continue to support both the old kind and the new kind of code segment, forever).

FWIW I think Microsoft made the right decision, because the transition from 16 to 32 bit was an obvious case of "16 bits isn't big enough" but I don't think the same thing is true at all of the transition from 32 to 64. Yes there are plenty of cases where having 64 bits is useful, but I venture a guess that *most* data in *most* programs is easy to represent in 32 bits. Making 64 the default for everything would seem to me to be overkill.

Its easy to forget that most workloads are not like bitboards :lol: After all, when you're just writing some code to do some minor task, how often do you actually use 64-bit types in it? I haven't done that very often, because I find that 32 bits is usually enough. I would use 64 bits for file sizes or offsets, because files bigger than 4 GB are not uncommon nowadays. I wouldn't use 64 bits for a "number of files" counter though -- when was the last time you did a batch processing of over four billion files at once?
I can certainly say that I blow the 4 billion unsigned counter limit all the time...

But in any case, we have always had "ints" and "longs" Does it really make sense to treat them as the same thing, when the hardware actually has support for 64 bit instructions and has 64 bit registers?
It's more a case of maintaing portability with old windows code than a case of trying to make sense.

Regardless, long long or uint64_t do the job just fine and it's standard under C99, so it's not a big deal.
There are zillions of C compilers in use that don't include C99. That makes it a _really_ big deal.