Page 1 of 3
Fastest pawn quiet move generation I was able to come with
Posted: Sat Jun 10, 2017 6:54 pm
by cdani
Sure someone can improve this, but I think is really fast.
Code: Select all
b = bitboard of pawns
increment = 8 or 16
uint8_t cas;
uint8_t casd;
switch (popcount(b)) {
case 8:
cas = lsb(b);
b &= b - 1;
casd = cas + increment;
ml->moviment = MakeMove(cas, casd);
ml++;
case 7:
cas = lsb(b);
b &= b - 1;
casd = cas + increment;
ml->moviment = MakeMove(cas, casd);
ml++;
case 6:
cas = lsb(b);
b &= b - 1;
casd = cas + increment;
ml->moviment = MakeMove(cas, casd);
ml++;
case 5:
cas = lsb(b);
b &= b - 1;
casd = cas + increment;
ml->moviment = MakeMove(cas, casd);
ml++;
case 4:
cas = lsb(b);
b &= b - 1;
casd = cas + increment;
ml->moviment = MakeMove(cas, casd);
ml++;
case 3:
cas = lsb(b);
b &= b - 1;
casd = cas + increment;
ml->moviment = MakeMove(cas, casd);
ml++;
case 2:
cas = lsb(b);
b &= b - 1;
casd = cas + increment;
ml->moviment = MakeMove(cas, casd);
ml++;
case 1:
cas = lsb(b);
casd = cas + increment;
ml->moviment = MakeMove(cas, casd);
ml++;
case 0:
return ml;
default: //> 8 pawns for strange games or positions
altremovpeo2 :
if (b == 0)
return ml;
cas = lsb(b);
casd = cas + increment;
ml->moviment = MakeMove(cas, casd);
ml++;
b &= b - 1;
goto altremovpeo2;
}
Re: Fastest pawn quiet move generation I was able to come wi
Posted: Sat Jun 10, 2017 7:03 pm
by mkchan
Could do for brevity:
Code: Select all
case 8:
case 7:
case 6:
...
case 2:
cas = lsb(b);
b &= b - 1;
casd = cas + increment;
ml->moviment = MakeMove(cas, casd);
ml++;
case 1:
...
Re: Fastest pawn quiet move generation I was able to come wi
Posted: Sat Jun 10, 2017 7:10 pm
by elcabesa
cdani wrote:Sure someone can improve this, but I think is really fast.
Code: Select all
b = bitboard of pawns
increment = 8 or 16
...
you unrolled the loop, how much faster it's respect to the original loop?
And the most important question, how much elo you gained agaisnt a more readable loop?
Re: Fastest pawn quiet move generation I was able to come wi
Posted: Sat Jun 10, 2017 7:33 pm
by Kotlov
"switch - case" is slow construction
Re: Fastest pawn quiet move generation I was able to come wi
Posted: Sat Jun 10, 2017 7:35 pm
by mar
cdani wrote:Sure someone can improve this, but I think is really fast.
Well, a Duff machine
But where do you check that the squares in front of the pawns are empty?
I assume you mask-shift vacated squares in advance, like
Code: Select all
pawns & shift_backward(~occupied)
What was the performance gain versus a loop? (and overall speed gain?)
Re: Fastest pawn quiet move generation I was able to come wi
Posted: Sat Jun 10, 2017 8:08 pm
by hgm
Rather than incrementing ml all the time, you could have stored the moves in the opposite order, and hard-code the offset w.r.t. the inintial ml:
Code: Select all
switch((p = popcount(b))) {
case 8:
cas = lsb(b);
b &= b - 1;
casd = cas + increment;
ml[7].moviment = MakeMove(cas, casd);
case 7:
cas = lsb(b);
b &= b - 1;
casd = cas + increment;
ml[6].moviment = MakeMove(cas, casd);
...
case 0:
return ml += p;
It is questionable whether an optimizer could do that from your code.
Re: Fastest pawn quiet move generation I was able to come wi
Posted: Sat Jun 10, 2017 8:13 pm
by syzygy
hgm wrote:It is questionable whether an optimizer could do that from your code.
The optimizer will certainly not generate code that stores moves in the opposite order.
Re: Fastest pawn quiet move generation I was able to come wi
Posted: Sat Jun 10, 2017 8:15 pm
by mar
hgm wrote:Rather than incrementing ml all the time, you could have stored the moves in the opposite order, and hard-code the offset w.r.t. the inintial ml:
Code: Select all
switch((p = popcount(b))) {
case 8:
cas = lsb(b);
b &= b - 1;
casd = cas + increment;
ml[7].moviment = MakeMove(cas, casd);
case 7:
cas = lsb(b);
b &= b - 1;
casd = cas + increment;
ml[6].moviment = MakeMove(cas, casd);
...
case 0:
return ml += p;
It is questionable whether an optimizer could do that from your code.
Hmm, I liked your original suggestion:
Code: Select all
ml += (p = popcount(b));
switch(p) {
case 8:
....
ml[-8].m = ...;
...
case 0:
return ml;
}
Re: Fastest pawn quiet move generation I was able to come wi
Posted: Sat Jun 10, 2017 8:23 pm
by cdani
mkchan wrote:Could do for brevity:
Code: Select all
case 8:
case 7:
case 6:
...
case 2:
cas = lsb(b);
b &= b - 1;
casd = cas + increment;
ml->moviment = MakeMove(cas, casd);
ml++;
case 1:
...
This does not work, as the idea is that case 8 executes the code in case 8 + code in case 7 + code in case 6..., and is not doing this with your code.
Kotlov wrote:"switch - case" is slow construction
Yes, but with jump tables is quite fast.
Code: Select all
switch (popcount(b)) {
000000013FC0826B movzx ecx,al
000000013FC0826E cmp ecx,8
000000013FC08271 ja peons_moviments_quiets_o_sense_peoalpas+13Bh (013FC0839Bh)
000000013FC08277 mov eax,ecx
000000013FC08279 lea rcx,[__rg_language (013FBF0000h)]
000000013FC08280 mov r9d,dword ptr [rcx+rax*4+183C8h]
000000013FC08288 add r9,rcx
000000013FC0828B jmp r9
mar wrote:
But where do you check that the squares in front of the pawns are empty?
I assume you mask-shift vacated squares in advance, like
Code: Select all
pawns & shift_backward(~occupied)
Yes, b variable has only pawns that can advance.
elcabesa wrote:
you unrolled the loop, how much faster it's respect to the original loop?
And the most important question, how much elo you gained agaisnt a more readable loop?
mar wrote:
What was the performance gain versus a loop? (and overall speed gain?)
Of course minimal, something like 1100000 n/s to 1100500 n/s to give an idea. Is more for fun than for anything other, and also is interesting to share and to have other ideas. Any elo gain is due to accumulating a few of those things.
hgm wrote:Rather than incrementing ml all the time, you could have stored the moves in the opposite order, and hard-code the offset w.r.t. the inintial ml:
mar wrote:
Hmm, I liked your original suggestion:
Nice one! I will try, sure is even better! Thanks.
Re: Fastest pawn quiet move generation I was able to come wi
Posted: Sat Jun 10, 2017 8:51 pm
by syzygy
mar wrote:Hmm, I liked your original suggestion:
Code: Select all
ml += (p = popcount(b));
switch(p) {
case 8:
....
ml[-8].m = ...;
...
case 0:
return ml;
}
OK, that is something the optimizer probably could do, in theory.