Dann Corbit wrote:It's a bit under 9% faster, which I find rather surprising. Of course it's probably 1-2 Elo improvement tops.
Well, 9% faster is theoretically around +9 Elo...
Maybe that was true many years ago for any engine, and maybe it is still true for much weaker engines than SF. But nowadays for the top engines it's more like 3-4 Elo at most.
Mincho Georgiev wrote:I'm experimenting with an MMX version of popcnt right now. The difference between mine implementation and the one from the AMD optimization guide is the inlining ability, while the AMD function manages own stack frame. So far I couldn't notice any speed difference on C2D cpu between these two and the integer implementation. Could anyone try and say if there is a difference for an engine that extensively is using popcnt and bitboards?
I doubt MMX is going to be much help for popcount. There's also the extra overhead of getting it into and out of MMX registers, EMMS, etc. Population count is already amenable to SWAR techniques for 64-bit integer registers. (For 32-bit builds it takes extra instructions but they have good parallelism).
At some future time, the POPCNT instruction from SSE4.2 era will be widely available, but today there's still a lot of chips around that don't support it.
Mincho Georgiev wrote:I'm experimenting with an MMX version of popcnt right now. The difference between mine implementation and the one from the AMD optimization guide is the inlining ability, while the AMD function manages own stack frame. So far I couldn't notice any speed difference on C2D cpu between these two and the integer implementation. Could anyone try and say if there is a difference for an engine that extensively is using popcnt and bitboards?
I doubt MMX is going to be much help for popcount. There's also the extra overhead of getting it into and out of MMX registers, EMMS, etc. Population count is already amenable to SWAR techniques for 64-bit integer registers. (For 32-bit builds it takes extra instructions but they have good parallelism).
At some future time, the POPCNT instruction from SSE4.2 era will be widely available, but today there's still a lot of chips around that don't support it.
Plus, for some compilers (icl for one) there is a high probability the c version of popcnt to be translated into mmx even without /Qax:SSE or any other particular mmx optimization option specified explicitly. I was interested of trying these functions on other architectures. Unfortunately I don't own such, except for one pentium and one C2D.
Dann Corbit wrote: It's a bit under 9% faster, which I find rather surprising.
It would have been _very_ surprising if it was worth 9% given that the total time spent by SF in popping bits is much less then 9%
The reality is that the speed up is around 1% on my QUAD with Windows 7 and compiled with MSVC 64.
Better than nothing but still far from 9%...anyhow I already knew you are not a testing guru...I remembered you claimed something around 150 ELO increase for a null reduction tweak.
JVMerlino wrote:Maybe that was true many years ago for any engine, and maybe it is still true for much weaker engines than SF. But nowadays for the top engines it's more like 3-4 Elo at most.
Hi Jim,
My +9% => +9 Elo is based on the "traditional" speed x2 => +70 Elo, which I am not sure has been proved false, even today.
True, some people prefer +60 Elo, even +50 Elo (diminishing returns?)
But your 3-4 Elo means speed x2 => +23 to +31 Elo, which seems really too low...
JVMerlino wrote:Maybe that was true many years ago for any engine, and maybe it is still true for much weaker engines than SF. But nowadays for the top engines it's more like 3-4 Elo at most.
Hi Jim,
My +9% => +9 Elo is based on the "traditional" speed x2 => +70 Elo, which I am not sure has been proved false, even today.
True, some people prefer +60 Elo, even +50 Elo (diminishing returns?)
But your 3-4 Elo means speed x2 => +23 to +31 Elo, which seems really too low...
I totally agree and there is also no evidence that weaker engines earn more elo.
It may be the opposite because one of the weakness of weaker engines
is that they do not earn much from speed and they need bigger time handicap to beat stronger engines when you make the time control longer(I remember that I tested it with movei and old Rybka and movei could beat old rybka with 10:1 time handicap only at fast time control).
I believe that there is a diminishing returns if you look at the same engine
but not when you compare stronger engines to weaker engines.