bob wrote:diep wrote:zamar wrote:Multithreading. Typical size of hash table entry size is 128 bits.
Thread1.write(64bits)
Thread2.write(64bits)
Thread2.write(64bits)
Thread1.write(64bits)
leads to corrupted entry
First of all this can only happen when you have a parallel searching engine.
Did Rebel already get SMP?
Secondly this happens even more seldom than collissions at PC hardware as it can only happen with stores
Around 1 in 200 billion stores
In case Ed has his hashtable aligned and a multiple of it forms a cacheline, you can prove it cannot happen at a PC.
Note that in case of an i7 (not sandy bridge), it shows itself as having 64 bytes cacheline, however the memory controller only stores 192 bytes at a time. Sandy Bridge stores 256 bytes at once.
Has 4 memory channels.
Note that when reading it can give a premature abort reading less bytes, which is why it has such fast latency for us to the RAM.
It can abort after reading 32 bytes.
So the only way it can go wrong is when an entry is at the end of a cacheline, in this case 256 bytes and by accident 1 cacheline gets updated quicker than another.
Odds of this happening is real real tiny, especially if your hashtable is some gigabytes in size.
So how you represented it cannot happen as it doesn't write 64 bits. It writes 256 bytes at once.
It can happen far more frequently than that. This has been discussed in the past. With 8 threads (cores) writing pairs of 64 bit words, ordering gets mangled and you can store a position where the first 8 bytes is from position P1, and the second 8 is from position P2, leading to this. I've seen dozens per minute in some positions. with the lockless hashing I use, it never happens, however.
Bob you're making this story up.
You did not see 'dozens of write errors a minute.
I've been extensively testing this and you need dozens of billions of stores to get one.
Of course i have ECC machines here. Maybe you had a bad dimm during your tests or your memory again serves you bad?
Also realize Alpha is a HPC processor. Today we only have x64 cpu's which is a different architecture.
In between alpha and the memory controller there are several steps to connect to other memory. That step can cause the AbCD sometimes which is not possible at x86/x64 hardware as they directly connect to the RAM.
Even then this also happens VERY SELDOM at the alpha.
I had at the time contact with the technical design team of memory controllers to verify how much all this could occur at different processors under which Alpha and the R14000, and i got each time the same answers from the engineers.
Even then at x86 i've been testing this extensively and if i got 1 write error or 1 or 2 collissions that was a lot in every single test.
This testing ran for months in 2003.
So from all persons here seems i'm the only one who really measured the number of write errors and the number of collissions.
p.s. note that the alpha's later on up to 2007 even were used for serving storage. Petabyte level, that's where i worked at the time.