syzygy wrote:diep wrote:syzygy wrote:diep wrote:Note that in case of an i7 (not sandy bridge), it shows itself as having 64 bytes cacheline, however the memory controller only stores 192 bytes at a time. Sandy Bridge stores 256 bytes at once.
You are forgetting that cache lines can be stolen by other cores between two instructions and even halfway single (non-atomic) instructions.
This case goes 100% atomically correct.
So are you denying that in between two writes to the same cacheline, that cacheline can't be stolen by another core?
If the cacheline is stolen, only the first write will be committed to memory. The full cacheline is written, but the hash entry still has only half the data. It is then overwritten by the other, after which the first core writes the other half. The result is a corrupted hash entry.
And of course Joona is right that if you allow for OS scheduling in between two instructions it is even more clear that hash entries can get corrupted due to smp. There is no way you can prevent an OS interrupting your thread in between the two writes. And unless you set affinities, the second write can be executed by a completely different core.
This is entirely correct for PC processors.
The exception is with some HPC processors where if your program hits a control-break for example or a control-C that it can occur.
However in such case your program already terminates anyway...
So those big exceptions we can safely ignore here for a chessprogram.
Joona is incorrect that it can happen with 64 bits. Different CPU's cannot write just 64 bits to a memory cacheline, then another one writes 64 bits then this one write 64 bits again. This scenario is normally spoken not possible if you store in a normal sequential manner to the hashtable.
This 64 bits problem would only happen if you have f'd up code.
Like you write 64 bits here, then go write in another hashtable entry referencing another area of the RAM (telling in short the CPU to already flush this to the memory controller,
then again write the next 64 bits, go write in another hashtable entry,
then write the 3d 64 bits.
This is of course theoretic nonsense which we can safely ignore.
If you do a normal store of the entire entry, you willl simply not run into troubles except for 1 case.
It can only happen in between cachelines as it simply hasn't written that cacheline yet when it's still busy in this cacheline.
It's very easy to understand why what Joona says cannot happen.
CPU's would not be able to deliver the bandwidth simply that you need them to have to the memory controller.
To make it more clear if we have 2 CPU's and we have a split
of the cacheline within the entry.
Say our entry is 4 integers of 64 bytes (32 bytes).
CPU 0 writes normally spoken ABCD and CPU 1 writes abcd
Now if the split of the cacheline happens to be not aligned with this 32 bytes (which is possible of course that things aren't aligned).
then depending where the cacheline starts or stops we can have basically ABCD,abcd,ABcd,abCD,Abcd,ABCd etc.
We can NOT have: AbCD, aBcd,AbCd etc
In Diep's hashtable with the Zobrist trick i obviously only check whether the start of the entry is the same as the end of the entry. In between checks are pretty useless.
So i'm storing currently a byte or 24 (could get 32 soon - the luxury of hardware progress). And i'm using 32 bits integers for now to store.
So i have 6 quadwords. I just XOR quadcore 0 with quadword 5.
and i XOR quadword 1 with 4 etc.
There is no need to check for cases like AbCD, that's nonsense
Note that AbCD at some HPC processors, which you do not have at home, could occur when giving a control break to the program.