bob wrote:
Terrible idea. When a company (Intel) says "operation is undefined" it is foolish to try it out, and then _depend_ on the specific behaviour you notice. You can find a perfectly safe PopCnt() in Crafty that returns 64 when no bits are set, and which does not depend on undefined behaviour that can change in the next processor generation without anyone knowing.
Here is a 32 bit version that does not depend on undefined behaviour and which only has two extra instructions.
Mike's "unsafe" version is a trailing zero count, while your MSB is no leading zero count. I would prefer to return -1 for empty sets, since 64 somehow implies leading zero count, which it isn't due to missing xor 63 for none empty sets
I'll vote for to use leading/trailing zero counts for all sets including the empty, but assume MSB/LSB aka bitscan reverse/forward for none empty bitboards like in typical while (bits) loops. Safes one conditional jump and Mike's version becomes safe and even shorter.
The only thing is, is that using the edx register somehow makes it faster on all the processors that I have, Q6600 and earlier. It must prevent a processor stall or something.
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
The only thing is, is that using the edx register somehow makes it faster on all the processors that I have, Q6600 and earlier. It must prevent a processor stall or something.
That's surprising since internally there is no "edx" register anyway, thanks to the renaming mechanism.
The only thing is, is that using the edx register somehow makes it faster on all the processors that I have, Q6600 and earlier. It must prevent a processor stall or something.
Applied chaos theory . Surprising due to one or two byte longer code, which multiplies if heavily inlined. edx is explicitly used for no purpose other than to add a constant and "destroys" its content to improve register pressure. Otoh mov edx, 32 can be executed en passant with slower bsf, to make the final add somehow slightly faster. May be also related to how inlined ms inline assembly embeds in surrounding code.
using mov edx, 32 with add eax, edx --
2,576,068 nps
difference 59,969 nps
INCREASE OF 59,969/2,516,099 = .023834 or 2.38%
After e2e4:
2,497,449
2,554,567
57,118
INCREASE OF 2.29%
Since it speeds up the whole program by over 2% it can not be just a small increase for the function. There must be something else happening such as the before mentioned prevention of a pipeline stall.
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
The only thing is, is that using the edx register somehow makes it faster on all the processors that I have, Q6600 and earlier. It must prevent a processor stall or something.
Applied chaos theory . Surprising due to one or two byte longer code, which multiplies if heavily inlined. edx is explicitly used for no purpose other than to add a constant and "destroys" its content to improve register pressure. Otoh mov edx, 32 can be executed en passant with slower bsf, to make the final add somehow slightly faster. May be also related to how inlined ms inline assembly embeds in surrounding code.
This follows exactly from the observation that many make when they add or move a single line of code and things run faster or slower. The effect compounds over a long sequence of instructions, and changes cache misses/hits/fills/etc as well. Doesn't take much. I have removed code and seen speed go down slightly. I ignore such changes, since they are caused by something at a level where I have little if any control.
The only thing is, is that using the edx register somehow makes it faster on all the processors that I have, Q6600 and earlier. It must prevent a processor stall or something.
Applied chaos theory . Surprising due to one or two byte longer code, which multiplies if heavily inlined. edx is explicitly used for no purpose other than to add a constant and "destroys" its content to improve register pressure. Otoh mov edx, 32 can be executed en passant with slower bsf, to make the final add somehow slightly faster. May be also related to how inlined ms inline assembly embeds in surrounding code.
This follows exactly from the observation that many make when they add or move a single line of code and things run faster or slower. The effect compounds over a long sequence of instructions, and changes cache misses/hits/fills/etc as well. Doesn't take much. I have removed code and seen speed go down slightly. I ignore such changes, since they are caused by something at a level where I have little if any control.
I also know about that anomaly. And it does not apply here, because, I have revisited bsf many times over the years in many different versions of the program and using the extra edx register (that does not really exist) has ALWAYS been faster.
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through