The mailbox trials

hgm · Post by **hgm** » Thu Mar 04, 2021 12:03 am

I opened this thread to conduct a sort of blog on comparing the speed of several experimental mailbox algorithms. The speed comparison of the various techniques will be done for a toy 'model engine', which uses a fixed-depth search plus capture-only quiescence (including delta pruning), with MVV/LVA sorting of the captures, and an evaluation of PST plus (optionally) mobility. No hash table, no killer or history heuristic, so basically no particuar move ordering of the non-captures; the time-to-depth is not really of interest here, we will be comparing nodes per second. To not lose ourselves in details, promotion, castling and e.p. capture will be omitted. The speed will be measured in a search of the KiwiPete position.

The first design to be tested will be the 16x12 mailbox + piece list. We should already have a rough idea how this works, because Qperft uses this design. It does a perft(6) of KiwiPete in 45 sec, with bulk counting. That means it has done move generation in perft(5) nodes (193M nodes), which translates to 4.26 Mnps. But because that is perft there are two effects that slow it down compared to a search: there is some legality testing (on King moves), which a search based on pseudo-legal moves would not do. But, more importantly, in perft all nodes are 'full width', while in a search most nodes are QS nodes. Refraining from putting the non-captures in a move list would speed up move generation appreciably. And QS nodes that experience a stand-pat cutoff will not have to generate moves at all. (At least, if the evaluation doesn't include mobility.) So the search should be significantly faster, perhaps even double.

We will see shortly...

Mike Sherwin · Post by **Mike Sherwin** » Thu Mar 04, 2021 2:24 am

hgm wrote: ↑Thu Mar 04, 2021 12:03 am I opened this thread to conduct a sort of blog on comparing the speed of several experimental mailbox algorithms. The speed comparison of the various techniques will be done for a toy 'model engine', which uses a fixed-depth search plus capture-only quiescence (including delta pruning), with MVV/LVA sorting of the captures, and an evaluation of PST plus (optionally) mobility. No hash table, no killer or history heuristic, so basically no particuar move ordering of the non-captures; the time-do-depth is not really of interest here, we will be comparing nodes per second. To not lose ourselves in details, promotion, castling and e.p. capture will be omitted. The speed will be measured in a search of the KiwiPete position.

The first design to be tested will be the 16x12 mailbox + piece list. We should already have a rough idea how this works, because Qperft uses this design. It does a perft(6) of KiwiPete in 45 sec, with bulk counting. That means it has done move generation in perft(5) nodes (193M nodes), which translates to 4.26 Mnps. But because that is perft there are two effects that slow it down compared to a search: there is some legality testing (on King moves), which a search based on pseudo-legal moves would not do. But, more importantly, in perft all nodes are 'full width', while in a search most nodes are QS nodes. Refraining from putting the non-captures in a move list would speed up move generation appreciably. And QS nodes that experience a stand-pat cutoff will not have to generate moves at all. (At least, if the evaluation doesn't include mobility.) So the search should be significantly faster, perhaps even double.

We will see shortly...

Looking forward to it!

phhnguyen · Post by **phhnguyen** » Thu Mar 04, 2021 9:35 am

hgm wrote: ↑Thu Mar 04, 2021 12:03 am
It does a perft(6) of KiwiPete in 45 sec, with bulk counting.

I am interested in.

I’m still wondering which mailboxes or bitboards is faster on Perft functions. I have read some your posts about that but there’s no data. I have made some comparisons between my mailbox and Stockfish and SF was significantly faster than mine - but perhaps the reason I didn’t try to optimise mine. Can you test and post the results of your new mailbox, Qperf (turn off the hash function please) vs Stockfish on the same hardware (your computer), do perft 6, 7 for the starting position. Thanks

hgm · Post by **hgm** » Thu Mar 04, 2021 10:23 am

Well, I have Stockfish 10 on my computer, so I tried to run a perft in it ("go perft N" appears to do that). But it doesn't print any times. So how could I know how long it took?

phhnguyen · Post by **phhnguyen** » Thu Mar 04, 2021 10:57 am

hgm wrote: ↑Thu Mar 04, 2021 10:23 am Well, I have Stockfish 10 on my computer, so I tried to run a perft in it ("go perft N" appears to do that). But it doesn't print any times. So how could I know how long it took?

Can you download SF's source code and compile it? To compile on Unix-like systems, just run some command lines such as "make build ARCH=x86-64-modern".

If you can compile, re-download SF's PR at https://github.com/nguyenpham/Stockfish ... twithspeed, the perft will show time consumption and speed.

hgm · Post by **hgm** » Thu Mar 04, 2021 11:25 am

I never managed to compile Stockfish, on Windows. I only have a 32-bit compiler there anyway (gcc 3.4.4), so even if I could it probably would not be very representative. But perhaps I should conduct this whole experiment on Linux anyway. Cannot connect to GitHub from my Linux VM, though. (It complains about no common encryption standard.)

lithander · Post by **lithander** » Thu Mar 04, 2021 12:14 pm

Maybe compare against asmFish?
https://github.com/lantonov/asmFish
https://www.chessprogramming.org/AsmFish

It's based on stockfish and it's perft command measures the time out of the box!

Code: Select all

perft 7
a2a3 :  106743106
b2b3 :  133233975
c2c3 :  144074944
[...]
b1c3 :  148527161
g1f3 :  147678554
g1h3 :  120669525
===========================
Total time (ms) : 26051
Nodes searched  : 3195901860
Nodes/second    : 122678663

According to the authors on most hardware, asmFish 10 is between 16-24% faster than SF 10. And 122M Nodes/second sure seems fast to me!

I'm really looking forward to read your "blog"! Thanks for making the effort!

hgm · Post by **hgm** » Thu Mar 04, 2021 12:35 pm

Qperft takes 18.42 sec on my 3.2GHz i7:

Code: Select all

$ ../perft/perft 7
 - - - - - - - - - - - -
 - - - - - - - - - - - -
 - - r n b q k b n r - -
 - - p p p p p p p p - -
 - - . . . . . . . . - -
 - - . . . . . . . . - -
 - - . . . . . . . . - -
 - - . . . . . . . . - -
 - - P P P P P P P P - -
 - - R N B Q K B N R - -
 - - - - - - - - - - - -
 - - - - - - - - - - - -

Quick Perft by H.G. Muller
Perft mode: No hashing, bulk counting in horizon nodes

perft( 1)=           20 ( 0.000 sec)
perft( 2)=          400 ( 0.000 sec)
perft( 3)=         8902 ( 0.000 sec)
perft( 4)=       197281 ( 0.000 sec)
perft( 5)=      4865609 ( 0.050 sec)
perft( 6)=    119060324 ( 0.690 sec)
perft( 7)=   3195901860 (18.420 sec)

But I do not plan this to be a perft exercise. There is no QS in perft, and QS nodes dominate search. The accelerating techniques I want to investigate thus mainly focus on speeding up capture generation and evaluation (mobility!), possibly at the expense of make/unmake speed for non-captures.

phhnguyen · Post by **phhnguyen** » Thu Mar 04, 2021 2:07 pm

OK, I have tried myself. Downloaded your QPerft, compiled with command: gcc -O2 perft.c -o qperft
Stockfish compiled by command: make ARCH=x86-64-bmi2 COMP=gcc profileclean

Run (tried several times, pickup average ones):

Code: Select all

qperft 7

 - - - - - - - - - - - -
 - - - - - - - - - - - -
 - - r n b q k b n r - -
 - - p p p p p p p p - -
 - - . . . . . . . . - -
 - - . . . . . . . . - -
 - - . . . . . . . . - -
 - - . . . . . . . . - -
 - - P P P P P P P P - -
 - - R N B Q K B N R - -
 - - - - - - - - - - - -
 - - - - - - - - - - - -

Quick Perft by H.G. Muller
Perft mode: No hashing, bulk counting in horizon nodes

perft( 1)=           20 ( 0.000 sec)
perft( 2)=          400 ( 0.000 sec)
perft( 3)=         8902 ( 0.000 sec)
perft( 4)=       197281 ( 0.001 sec)
perft( 5)=      4865609 ( 0.023 sec)
perft( 6)=    119060324 ( 0.563 sec)
perft( 7)=   3195901860 (15.409 sec)

And perft 8:

Code: Select all

qperft 8
 - - - - - - - - - - - -
 - - - - - - - - - - - -
 - - r n b q k b n r - -
 - - p p p p p p p p - -
 - - . . . . . . . . - -
 - - . . . . . . . . - -
 - - . . . . . . . . - -
 - - . . . . . . . . - -
 - - P P P P P P P P - -
 - - R N B Q K B N R - -
 - - - - - - - - - - - -
 - - - - - - - - - - - -

Quick Perft by H.G. Muller
Perft mode: No hashing, bulk counting in horizon nodes

perft( 1)=           20 ( 0.000 sec)
perft( 2)=          400 ( 0.000 sec)
perft( 3)=         8902 ( 0.000 sec)
perft( 4)=       197281 ( 0.001 sec)
perft( 5)=      4865609 ( 0.024 sec)
perft( 6)=    119060324 ( 0.568 sec)
perft( 7)=   3195901860 (15.189 sec)
perft( 8)=  84998978956 (413.194 sec)

Stockfish perft 7:

Code: Select all

Stockfish 040321 by the Stockfish developers (see AUTHORS file)
go perft 7
a2a3: 106743106
b2b3: 133233975
c2c3: 144074944
d2d3: 227598692
e2e3: 306138410
f2f3: 102021008
g2g3: 135987651
h2h3: 106678423
a2a4: 137077337
b2b4: 134087476
c2c4: 157756443
d2d4: 269605599
e2e4: 309478263
f2f4: 119614841
g2g4: 130293018
h2h4: 138495290
b1a3: 120142144
b1c3: 148527161
g1f3: 147678554
g1h3: 120669525

Nodes searched: 3195901860
Time (ms)     : 15079
Nodes/second  : 211929831

Perft 8:

Code: Select all


go perft 8
a2a3: 2863411653
b2b3: 3579299617
c2c3: 3806229124
d2d3: 6093248619
e2e3: 8039390919
f2f3: 2728615868
g2g3: 3641432923
h2h3: 2860408680
a2a4: 3676309619
b2b4: 3569067629
c2c4: 4199667616
d2d4: 7184581950
e2e4: 8102108221
f2f4: 3199039406
g2g4: 3466204702
h2h4: 3711123115
b1a3: 3193522577
b1c3: 3926684340
g1f3: 3937354096
g1h3: 3221278282

Nodes searched: 84998978956
Time (ms)     : 412095
Nodes/second  : 206260140

Results are quite closed between QPerft and Stockfish.

Quick conclusion: both mailboxes and bitboards may have similar speeds on generating, making/unmaking moves!

MikeB · Post by **MikeB** » Thu Mar 04, 2021 2:27 pm

A rudimentary timer for those that are lazy like me ;>)

Copy and paste in bash

stime=$(($(date +%s%N)/1000000))
stockfish go perft 7
etime=$(($(date +%s%N)/1000000))
elapse=$(($etime-$stime))
echo "Elapsed Time in milliseconds: $elapse"

Code: Select all

Output:
# stime=$(($(date +%s%N)/1000000))
stockfish go perft 7
etime=$(($(date +%s%N)/1000000))
elapse=$(($etime-$stime))
echo "Elapsed Time in millisceonds: $elapse"
Stockfish 13 by the Stockfish developers (see AUTHORS file)
a2a3: 106743106
b2b3: 133233975
c2c3: 144074944
d2d3: 227598692
e2e3: 306138410
f2f3: 102021008
g2g3: 135987651
h2h3: 106678423
a2a4: 137077337
b2b4: 134087476
c2c4: 157756443
d2d4: 269605599
e2e4: 309478263
f2f4: 119614841
g2g4: 130293018
h2h4: 138495290
b1a3: 120142144
b1c3: 148527161
g1f3: 147678554
g1h3: 120669525

Nodes searched: 3195901860

Elapsed Time in milliseconds: 26574

The mailbox trials

The mailbox trials

Re: The mailbox trials

Re: The mailbox trials

Re: The mailbox trials

Re: The mailbox trials

Re: The mailbox trials

Re: The mailbox trials

Re: The mailbox trials

Re: The mailbox trials

Re: The mailbox trials