stegemma wrote:Only for those who already doesn't know this kind of tricks (so this is not for Muller
), sometime we can have a little speed-up in codes like this one:
Code: Select all
typedef int16_t tsq;
struct clsSimpleMove
{
union
{
uint64_t data;
struct
{
int16_t src, dst;
tsq taken, alfavalue;
};
};
};
bool clsEngineSimple::PushMove(int src, int dst)
{
tsq taken = board[dst];
if ((taken & color) == bit_empty)
{
#if 1
// Nodes: 3251309062, Time : 59988 ms, Nodes / s : 54198420
uint64_t data = (uint16_t)taken;
data = (data << 16) | (uint16_t)dst;
data = (data << 16) | (uint16_t)src;
pLastMove->data = data;
#else
// Nodes: 3251309062, Time : 62965 ms, Nodes / s : 51635947
pLastMove->src = src;
pLastMove->dst = dst;
pLastMove->taken = taken;
#endif
++pLastMove;
}
return taken != bit_empty;
}
This is just a test of the new generator of Satana, where i've removed all the 64 bit stuffs and get back to a simpler "old-way" approach (it is a perft 7 from starting position, still wrong and not complete). The speed-up si not big, only about 5% on my PC but is better than nothing. Of course it is not very portable because it depends on data alignement and endianess.
What compiler is that with and what optimization level?
I have found that with GCC at least, it doesn't introduce temporaries sometimes (that can be stored in register) when accessing the same element multiple times. This can be something along the same line.
For example, if you try to add all elements into the first of the array, and you try to do it without a temporary, it would look like this -
Code: Select all
for (i = 1; i < N; ++i)
{
a[0] += a[i];
}
It will generate code to do a memory store every time, instead of storing the sum in a register, and just do a memory store at the end. Of course, doing a memory store every time makes it much slower.
This behaviour is correct (and it must be done this way) if a is declared volatile, but it does this even when a is not.
This is true with GCC 4.5 or 4.6 (last time I checked), at -O3.
So I always try to minimize accessing memory through pointer (which includes array access), and always work with explicit temporaries when possible. If it's faster the other way, it's much easier for compiler to optimize it to take out the temporary, and I have no doubt GCC can do it. It just doesn't seem to want to introduce new temporaries.
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.