for (int t = 0; t < num; t++) {
data = (uint8_t *)(((uintptr_t)data + 0x3f) & ~0x3f);
ei[t].precomp->data = data;
data += size[t][0][2];
if (split) {
data = (uint8_t *)(((uintptr_t)data + 0x3f) & ~0x3f);
ei[num + t].precomp->data = data;
data += size[t][1][2];
}
}
The pointer arithmetic suppose that the files in memory are aligned with 64 bytes, that it's true with mmap but not with malloc or with a global array.
Now that it works the probe with the 4men tb in ram is very very fast.
for (int t = 0; t < num; t++) {
data = (uint8_t *)(((uintptr_t)data + 0x3f) & ~0x3f);
ei[t].precomp->data = data;
data += size[t][0][2];
if (split) {
data = (uint8_t *)(((uintptr_t)data + 0x3f) & ~0x3f);
ei[num + t].precomp->data = data;
data += size[t][1][2];
}
}
The pointer arithmetic suppose that the files in memory are aligned with 64 bytes, that it's true with mmap but not with malloc or with a global array.
Now that it works the probe with the 4men tb in ram is very very fast.
You're lucky that the masking is done on a signed integer that is sign-extended first before conversion to a (potentially) 64-bit value.
Had you used ~0x3fu the program would break badly in 64-bit mode.
Anyway good that you found the problem. I was wondering because the code you posted should've worked, I wouldn't guess an aligment issue but it makes sense.