bob wrote:
One quibble. When you say "much kinder to integer alignment" it depends on what kind of integer. I've been working on some ASM code of late where I have written a small library to be used in my 330 class. Wanted to give them some easy code to do basic input and output, and for some things, I wanted to use the C library. For example (read or _read depending on system). Turns out that code in the library uses XMM registers. Which is NOT forgiving of alignment errors, at least on my mac. If the stack alignment is not exactly correct, _read crashes on an xmm load. Pain in the a$$. It was quite educational to figure out just how the stack had to be aligned for each library routine call.
Oops.. yeah. I'm used to a couple of non-x86 platform where I think in terms of the three types of registers available: "integer", "float" and "vector" (SIMD).
When I said "integer" what I was actually thinking of, are what x86/x64 call "general-purpose registers", i.e. eax/edx/ecx/ebx, or rax/rdx/rcx/rbx etc. Those are very forgiving of mis-aligned accesses. Floating-point is usually done with SIMD registers anyways nowadays, and I can't remember the exact alignment rules there but its probably a hassle; the regular 16-byte XMM move instruction (MOVAPS?) requires 16-byte alignment, there's also an "unaligned" 16-byte move instruction but on many x86 and older x64 chips its actually slower than doing two separate 8-byte reads using MOVLPS/MOVHPS or something like that. Since it wouldn't be atomic anyway, I guess there's not much difference. Anyway, for general-purpose C code meant for x86, avoid misaligned float* or double* because those can get you into trouble. [Edit: I believe that 4- and 8-byte loads and stores involving XMM registers are allowed to be mis-aligned and are approximately as efficient as mis-aligned loads and stores with general-purpose registers, but I might be wrong about that. I mean, small penalty for crossing a cacheline and no penalty if its entirely within one cacheline. But if writing code that actually depends on this, better look it up in the Intel and AMD docs for sure.]
Another thing I forgot to mention that is pretty nice about x86: it lets you manipulate 8-, 16-, 32- and nowadays 64-bit quantities efficiently. e.g. you can load 8- or 16-bit values into a 32- or 64-bit register, and the zero-extension or sign-extension is usually free (via MOVZX or MOVSX, which support the same addressing modes as a regular MOV). On most other platforms it would cost you an extra instruction in there somewhere to extend a value to fill a larger register.
... An interesting thing is now happening in the game industry, because both of the "next-generation" consoles by Microsoft and Sony have x64 CPU cores in them now. Once developers have stopped making games for the older generation of consoles (360, PS3, and Wii/WiiU) then all of their target platforms (XBox One, PS4 and PC) are going to be x64! Finally, we can stop doing low-level nonsense to avoid Load-Hit-Stores. Finally, monomorphic virtual method calls will actually be as cheap as direct calls!