The mailbox trials

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
hgm
Posts: 27789
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: The mailbox trials

Post by hgm »

I improved my routine for printing an attack map as a visual aid in debugging, as just looking at the hexadecimal representation was not as easy anymore as when the bits were still spaced 4 apart. (Which made every hexadecimal digit correspond to one attack.) It separates the black and white attackers (which are interleaved) for printing. E.g. the map for KiwiPete looks like:
[d]r3k2r/p1ppqpb1/bn2pnp1/3PN3/1p2P3/2N2Q1p/PPPBBPPP/R3K2R w KQkq - 0 1

Code: Select all

              presence: @@@@@@@@@@@@@@@@ @@@@@@@@@@@@@@@@
                        PPPPPPPPNNBBRRQK ppppppppnnbbrrqk
16. 000e0204    P@d5    .@.......@:::::. ....@...@@:::::.
17. 100c0000    P@e4    .........@::::@. .........@:::::.
18. 01040000    P@a2    .........@::@::. ..........:::::.
19. 00000000    P@b2    ..........:::::. ..........:::::.
20. 00000000    P@c2    ..........:::::. ..........:::::.
21. 50000000    P@f2    ..........::::@@ ..........:::::.
22. 10008000    P@g2    ..........::::@. .......@..:::::.
23. 04000000    P@h2    ..........:::@:. ..........:::::.
24. 00000000    N@e5    ..........:::::. ..........:::::.
25. 10102040    N@c3    ...@......@:::@. ......@...:::::.
26. 40000000    B@d2    ..........:::::@ ..........:::::.
27. 50840000    B@e2    .........@::::@@ ..........:@:::.
28. 00000000    R@a1    ..........:::::. ..........:::::.
29. 00000000    R@h1    ..........:::::. ..........:::::.
30. 00411000    Q@f3    ......@.@.:@:::. ..........:::::.
31. 05100000    K@e1    ..........@:@@:. ..........:::::.

32. 02000000    p@a7    ..........:::::. ..........::@::.
33. 00000000    p@c7    ..........:::::. ..........:::::.
34. a00b0000    p@d7    ........@.:::::. ........@@::::@@
35. a0010000    p@f7    ........@.:::::. ..........::::@@
36. 200000a1    p@e6    @.........:::::. ..@@......::::@.
37. 00010080    p@g6    ........@.:::::. ...@......:::::.
38. 20000000    p@b4    ..........:::::. ..........::::@.
39. 18001000    p@h3    ......@...::::@. ..........:::@:.
40. 0000000a    n@b6    ..........:::::. @@........:::::.
41. 30200000    n@f6    ..........::::@. ..........@:::@.
42. 00000000    b@g7    ..........:::::. ..........:::::.
43. 00400000    b@a6    ..........:@:::. ..........:::::.
44. 00020000    r@a8    ..........:::::. ........@.:::::.
45. 00200000    r@h8    ..........:::::. ..........@::::.
46. 80000000    q@e7    ..........:::::. ..........:::::@
47. 2a080000    k@e8    ..........:::::. .........@::@@@.
Each @ represents an attack of the piece shown above on the piece on the left. A dot or colon indicates no attack. The colon is used to contrast the slider-attack pert of the map to the leaper part.

In particular this illustrates that slider attacks on enemy pieces (upper-right and lower-left quadrant) are much rarer (3 or 1) than those on friendly pieces (upper-left and lower-right quadrant, 12 each). This is important for the cost of incremental update, because the discovered moves of the enemy sliders is needed immediately (as these moves will be made in the reply), while discovered moves of friendly sliders can be made only 2 ply later. And the moves in the next ply will decide whether there will be a '2-ply later'. If there isn't, that part of the update (as well as updating the moves of the moved piece) will not have to be done.
User avatar
hgm
Posts: 27789
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: The mailbox trials

Post by hgm »

OK, progress! (Finally.)

I got the new attack map running with incremental update. I implemented this in steps: first using from-scratch generation of a new map in every node, using the AddMoves(piece, sqr) routines for all pieces not marked as captured in the piece list, in ze freshly zeroed memory area. This mainly to test the new code for map-based capture generation. (Lots of bugs there; the worst that to merge attackers sets for black I use right-shifts, and attackers[] was not defined as unsigned...) After debugging I got it to the point where it gave the same PVs and scores as the neighbor-table based version we tested before, which (for me) runs at 6 Mnps. As expected, the new version was much slower, as from-scratch map generation defeats the purpose. Especially since it generates captures for both players, rather than just for the side to move. Still, it was running at 3.8 Mnps.

I changed the test case to a 10-ply search, to measure timing more accurately. The new reference (with the old code) is now:

Code: Select all

 1  -16       718 0 e2a6 b4c3 b2c3 e6d5
 2  -16      8338 0 e2a6 b4c3 b2c3 e6d5
 3  -27     11701 0 e2a6 e6d5 g2h3 b4c3 d2c3 d5e4
 4  -27     69990 0 e2a6 e6d5 g2h3 b4c3 d2c3 d5e4
 5  -37    147413 0 e2a6 e6d5 c3d5 b6d5 e4d5 e7e5
 6  -32    404904 0 e2a6 e6d5 c3d5 b6d5 a6b7 a8b8 b7d5 f6d5 e5f7 e7f7 e4d5 g7b2 f3f7 e8f7
 7  -32    850164 0 e2a6 e6d5 c3d5 b6d5 a6b7 a8b8 b7d5 f6d5 e5f7 e7f7 e4d5 g7b2 f3f7 e8f7
 8  -32   4723911 0 e2a6 e6d5 c3d5 b6d5 a6b7 a8b8 b7d5 f6d5 e5f7 e7f7 e4d5 g7b2 f3f7 e8f7
 9  -96  12878122 0 d5e6 e7e6 e2a6 e6e5 g2h3 b4c3 d2c3 e5h5 f3f4
10  -82  43774990 0 e2a6 e6d5 e5g6 f7g6 c3b5 e8d8 g2h3 d5e4 f3g3 c7c5 g3g6 h8h3
t =  7.230 sec
  43774990 nodes (6.1 Mnps)
  39674278 QS (90.6%)
   6400416 evals (16.1%)
   2071938 stand-pats (5.2%)
  12773244 leaves (32.2%)
  38515424 move gens
captures: 90.8%
Umm, this is a tougher case, as it is less dominated by QS and captures than the 9-ply search. I did not make any effort to further speed up the non-captures; on the contrary, these now all call the from-scratch attack-map generation in MakeMove(), i.e. the 3.8 Mnps method, in all subsequent tests. So they are significantly slower than before.

Firs I played a bit with the method for clearing the memory area where a new new attack map would be generated. Initially I just used my own loop assigning 0 to all 32 attack-map elements. Then I tried memset(). This wasn't much faster: 3.9 Mnps. Then I tried memcpy, copying the zeros from a zero-initialized fixed memory buffer, to test how much slower memcpy() was compared to memset(). Surprise: no measurable difference! In fact intentionally doubling or quadrupling the amount of copied memory had only a minute effect on speed. Copying is nearly free! That is good news: it means we don't have to worry about restoring the attack map on UnMake at all: just do copy-make.

First I did another test, however: only generating the half of the map we need. I.e. the attacks by the side to move. That brought us up to 5.1 Mnps. Problem is that in this case we have to generate a new map for null moves too. But generating a half-map on a copied memory areas of zeros has to run AddMoves() for all 16 pieces of the stm. While incremental update only has to run it for two pieces. (Actually the same piece, in the old and new location.) So doing that on a copy of the previous map must be faster. The incremental update is complex, and error prone, so first I just added it to the version that did map generation from scratch, to compare it with the incrementally updated one. After much debugging, it could run the complete search without detecting any differences. But disastrously slow, of course: 2.6 Mnps.

But then, throwing out the from-scratch map and the comparison, we finally struck paydirt: 6.5 Mnps. 10% faster than the previous best. Still a bit disappointing. But that was because I had forgotten something: the from-scratch generation was still done with the null move. While the incremental update maintains both halves of the attack map, so that null move doesn't have to update anything at all. After removing this redundant map calculation, the speed went up to 8Mnps:

Code: Select all

 1  -16       532 0 e2a6 b4c3 b2c3 e6d5
 2  -16      1859 0 e2a6 b4c3 b2c3 e6d5
 3  -27      4421 0 e2a6 e6d5 g2h3 b4c3 d2c3 d5e4
 4  -27     59130 0 e2a6 e6d5 g2h3 b4c3 d2c3 d5e4
 5  -37    142018 0 e2a6 e6d5 c3d5 b6d5 e4d5 e7e5
 6  -32    427596 0 e2a6 e6d5 c3d5 b6d5 a6b7 a8b8 b7d5 f6d5 e5f7 e7f7 e4d5 g7b2 f3f7 e8f7
 7  -32    956057 0 e2a6 e6d5 c3d5 b6d5 a6b7 a8b8 b7d5 f6d5 e5f7 e7f7 e4d5 g7b2 f3f7 e8f7
 8  -32   4664824 0 e2a6 e6d5 c3d5 b6d5 a6b7 a8b8 b7d5 f6d5 e5f7 e7f7 e4d5 g7b2 f3f7 e8f7
 9  -96  12236440 0 d5e6 e7e6 e2a6 e6e5 g2h3 b4c3 d2c3 e5h5 f3f4
10  -82  41679702 0 e2a6 e6d5 e5g6 f7g6 c3b5 d5e4 f3g3 e8f8 g2h3 c7c5 g3g6 h8h3
t =  5.210 sec
  41679702 nodes (8.0 Mnps)
  37242403 QS (89.4%)
   5510077 evals (14.8%)
   1893393 stand-pats (5.1%)
  14445410 leaves (38.8%)
    134111 move gens
captures: 89.8%
We see that the number of nodes is not exactly the same as in the old version. This is no surprise, as the move order is not exactly the same. (Especially for HxL captures of minors and Pawns.) But in this case the node count is even smaller. I am still inclined this is mostly noise; a single position cannot be used to judge which move ordering is better.

Anyway, we wen up from 6 to 8Mnps, +33%. And it doesn't have to stop here, as there are still lots of inefficiencies. There is for instance no overkill pruning, and Search() routine is called with the MVV indicator set to King. So that it has to run through the opponent's entire piece list to find the MVV, and the map update has been done in vain when it doesn't find a non-futile one. As apparently happens in 38.8% of the nodes. Imagine how much faster we would be if we could prune these nodes through a cheap test! But that would just reduce execution time, as the nodes that are pruned would then also not be counted.

There was a similar situation with futility pruning in the old version: switching that off nearly tripled the nps, by adding lots of nodes where virtually nothing was done, reached by a dirt-cheap MakeMove(), without adding much execution time. This effect has now largely disappeared, because MakeMove() updates the attack map, and thus is no longer cheap. Just for fun I reran the last test with futility pruning off:

Code: Select all

 1  -16      1282 0 e2a6 b4c3 b2c3 e6d5
 2  -16     15123 0 e2a6 b4c3 b2c3 e6d5
 3  -27     25412 0 e2a6 e6d5 g2h3 b4c3 d2c3 d5e4
 4  -27    179979 0 e2a6 e6d5 g2h3 b4c3 d2c3 d5e4
 5  -37    412220 0 e2a6 e6d5 c3d5 b6d5 e4d5 e7e5
 6  -32   1200942 0 e2a6 e6d5 c3d5 b6d5 a6b7 a8b8 b7d5 f6d5 e5f7 e7f7 e4d5 g7b2 f3f7 e8f7
 7  -32   2769202 0 e2a6 e6d5 c3d5 b6d5 a6b7 a8b8 b7d5 f6d5 e5f7 e7f7 e4d5 g7b2 f3f7 e8f7
 8  -32  12329886 0 e2a6 e6d5 c3d5 b6d5 a6b7 a8b8 b7d5 f6d5 e5f7 e7f7 e4d5 g7b2 f3f7 e8f7
 9  -96  34438204 0 d5e6 e7e6 e2a6 e6e5 g2h3 b4c3 d2c3 e5h5 f3f4
10  -82 110957182 0 e2a6 e6d5 e5g6 f7g6 c3b5 e8d8 g2h3 d5e4 f3g3 c7c5 g3g6 h8h3
t = 12.811 sec
 110957182 nodes (8.7 Mnps)
 106437647 QS (95.9%)
 106437646 evals (100.0%)
  66686879 stand-pats (62.7%)
   5183909 leaves (4.9%)
    275950 move gens
captures: 91.8%
We see the nps is still a bit higher (up another 9% to 8.7Mnps), but the search time went up from 5.2 to 12.8 sec, more than a doubling!

[Edit] I uploaded (the still very messy) source code for this test to http://hgm.nubati.net/mailbox4.c . I am not sure whether it would compile on any compiler, because I had to embed some assembly code to be able to use the bit-scan forward instruction. That would probably have to be adapted to a 64-bit version when compiled for x64 architecture. And I am not sure the GNU format for assembly is understood by all compilers.
User avatar
lithander
Posts: 880
Joined: Sun Dec 27, 2020 2:40 am
Location: Bremen, Germany
Full name: Thomas Jahn

Re: The mailbox trials

Post by lithander »

I really enjoy reading this thread even though I understand only half of it. A much more intimate, bare-metal approach than what I use (like creating actual List<Move> instances and calling their generic Sort() methods and putting the garbage collector to work) so of course it's much faster than my simple code.

But I have a hard time to see put your results in perspective to other, more traditional implementations that are actually optimized for performance. When the dust settles and you reached your goals with this I'd appreciate some comparison of your method with alternative implementations. Some comments on the strengths and weaknesses in a broader context.
Minimal Chess (simple, open source, C#) - Youtube & Github
Leorik (competitive, in active development, C#) - Github & Lichess
User avatar
hgm
Posts: 27789
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: The mailbox trials

Post by hgm »

Well, sometimes I understand it only half when I write it. :lol: But nothing helps me as well for ordering my thoughts as writing them down.

As for comparison with more conventional techniques, I sort of started with that. The posting here is what I consider the conventional design. With a piece list indexed by piece number, through which you loop to find the not-yet-captured pieces. And then for each piece loop through the directions it moves in, and when it is a slider, for each direction loop over distance until you hit something. As detected on a board size array containing piece numbers, large enough to also hold 'off-board squares' to catch moves that would stray off board, filled with uncapturable 'edge guards' to reject such moves. Valid moves would be added to a move list, which would later be looped through for searching the moves, making sure captures are searched before other moves, in MVV/LVA order, by digging out the next capture in line by the time you need one.

This was doing 3.4 Mnps on my machine.
User avatar
hgm
Posts: 27789
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: The mailbox trials

Post by hgm »

For easier comparison I now also had that 'reference version' do the 10-ply search. With futility pruning disabled:

Code: Select all

 1  -16      1604 0 e2a6 b4c3 b2c3 e6d5
 2  -16     19699 0 e2a6 b4c3 b2c3 e6d5
 3  -27     29013 0 e2a6 e6d5 g2h3 b4c3 d2c3 d5e4
 4  -27    159704 0 e2a6 e6d5 g2h3 b4c3 d2c3 d5e4
 5  -37    350807 0 e2a6 e6d5 c3d5 b6d5 e4d5 e7e5
 6  -32    968309 0 e2a6 e6d5 c3d5 b6d5 a6b7 a8b8 b7d5 f6d5 e5f7 e7f7 e4d5 g7b2 f3f7 e8f7
 7  -32   2154022 0 e2a6 e6d5 c3d5 b6d5 a6b7 a8b8 b7d5 f6d5 e5f7 e7f7 e4d5 g7b2 f3f7 e8f7
 8  -32  10805568 0 e2a6 e6d5 c3d5 b6d5 a6b7 a8b8 b7d5 f6d5 e5f7 e7f7 e4d5 g7b2 f3f7 e8f7
 9  -96  32183957 0 d5e6 e7e6 e2a6 e6e5 g2h3 b4c3 d2c3 e5h5 f3f4
10  -82 105545026 0 e2a6 e6d5 e5g6 f7g6 c3b5 e8d8 g2h3 d5e4 f3g3 c7c5 g3g6 h8h3
t = 14.540 sec
 105545026 nodes (7.3 Mnps)
 101454688 QS (96.1%)
 101454683 evals (100.0%)
  62343773 stand-pats (61.4%)
  40077444 move gens
captures: 92.1%
Here the nps doesn't look so bad, but that is because the lack of futility pruning highly inflates the node count with 'empty' nodes. So node counts (and nps derived from it) can be highly misleading, and we really have to compare the times! With futility pruning 58.5% of the nodes disappear (see below), but they took only 8.3% of the time. If I calculated correctly that means that these 'nodes' (which were basically just pointless make/unmakes) were processed 7 times faster than nodes where moves were generated.

Code: Select all

 1  -16       725 0 e2a6 b4c3 b2c3 e6d5
 2  -16      9497 0 e2a6 b4c3 b2c3 e6d5
 3  -27     12797 0 e2a6 e6d5 g2h3 b4c3 d2c3 d5e4
 4  -27     71479 0 e2a6 e6d5 g2h3 b4c3 d2c3 d5e4
 5  -37    149317 0 e2a6 e6d5 c3d5 b6d5 e4d5 e7e5
 6  -32    399081 0 e2a6 e6d5 c3d5 b6d5 a6b7 a8b8 b7d5 f6d5 e5f7 e7f7 e4d5 g7b2 f3f7 e8f7
 7  -32    838855 0 e2a6 e6d5 c3d5 b6d5 a6b7 a8b8 b7d5 f6d5 e5f7 e7f7 e4d5 g7b2 f3f7 e8f7
 8  -32   4710223 0 e2a6 e6d5 c3d5 b6d5 a6b7 a8b8 b7d5 f6d5 e5f7 e7f7 e4d5 g7b2 f3f7 e8f7
 9  -96  12897101 0 d5e6 e7e6 e2a6 e6e5 g2h3 b4c3 d2c3 e5h5 f3f4
10  -82  43791355 0 e2a6 e6d5 e5g6 f7g6 c3b5 e8d8 g2h3 d5e4 f3g3 c7c5 g3g6 h8h3
t = 12.090 sec
  43791355 nodes (3.6 Mnps)
  39696416 QS (90.6%)
   6469235 evals (16.3%)
   2162593 stand-pats (5.4%)
  38444912 move gens
captures: 90.9%
So we went from 14.54 sec to 5.21 sec, so far.
Dann Corbit
Posts: 12538
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: The mailbox trials

Post by Dann Corbit »

This is a remarkably interesting study.
Thank you sincerely for your efforts.
I think it is incredibly instructive.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
kiroje
Posts: 60
Joined: Wed Jul 25, 2012 10:12 am
Location: Copenhagen, Denmark
Full name: Kim Jensen

Re: The mailbox trials

Post by kiroje »

This is simply one of th emost interesting threads(for me) for quite some time, I really enjoy reading all this(even though im not getting everything) the text written to each code piece is quite instructive.

Im looking ofrward to the next posts :)
“Modern chess is too much concerned with things like pawn structure. Forget it, checkmate ends the game.” – Nigel Short
Joost Buijs
Posts: 1563
Joined: Thu Jul 16, 2009 10:47 am
Location: Almere, The Netherlands

Re: The mailbox trials

Post by Joost Buijs »

hgm wrote: Wed Apr 07, 2021 4:21 pm [Edit] I uploaded (the still very messy) source code for this test to http://hgm.nubati.net/mailbox4.c . I am not sure whether it would compile on any compiler, because I had to embed some assembly code to be able to use the bit-scan forward instruction. That would probably have to be adapted to a 64-bit version when compiled for x64 architecture. And I am not sure the GNU format for assembly is understood by all compilers.
The uploaded source contains a very nasty bug in the Discover routine. You use the local variable bit without being initialized. As far as I know locals are not initialized at 0 by default. Probably you meant bit = BSF(incoming); ???

[Edit] Sorry, I already see the problem, you use bit in the BSF macro, typically something I would never write this way.

Code: Select all

                            55555555         aaaaaaaa
              presence: @@@@@@@@@@@@@@@@ @@@@@@@@@@@@@@@@
                        PPPPPPPPNNBBRRQK ppppppppnnbbrrqk
16. 000e0204    P@d5    .@.......@:::::. ....@...@@:::::.
17. 100c0000    P@e4    .........@::::@. .........@:::::.
18. 01040000    P@a2    .........@::@::. ..........:::::.
19. 00000000    P@b2    ..........:::::. ..........:::::.
20. 00000000    P@c2    ..........:::::. ..........:::::.
21. 50000000    P@f2    ..........::::@@ ..........:::::.
22. 10008000    P@g2    ..........::::@. .......@..:::::.
23. 04000000    P@h2    ..........:::@:. ..........:::::.
24. 00000000    N@e5    ..........:::::. ..........:::::.
25. 10102040    N@c3    ...@......@:::@. ......@...:::::.
26. 40000000    B@d2    ..........:::::@ ..........:::::.
27. 50840000    B@e2    .........@::::@@ ..........:@:::.
28. 00000000    R@a1    ..........:::::. ..........:::::.
29. 00000000    R@h1    ..........:::::. ..........:::::.
30. 00411000    Q@f3    ......@.@.:@:::. ..........:::::.
31. 05100000    K@e1    ..........@:@@:. ..........:::::.
32. 02000000    p@a7    ..........:::::. ..........::@::.
33. 00000000    p@c7    ..........:::::. ..........:::::.
34. a00b0000    p@d7    ........@.:::::. ........@@::::@@
35. a0010000    p@f7    ........@.:::::. ..........::::@@
36. 200000a1    p@e6    @.........:::::. ..@@......::::@.
37. 00010080    p@g6    ........@.:::::. ...@......:::::.
38. 20000000    p@b4    ..........:::::. ..........::::@.
39. 18001000    p@h3    ......@...::::@. ..........:::@:.
40. 0000000a    n@b6    ..........:::::. @@........:::::.
41. 30200000    n@f6    ..........::::@. ..........@:::@.
42. 00000000    b@g7    ..........:::::. ..........:::::.
43. 00400000    b@a6    ..........:@:::. ..........:::::.
44. 00020000    r@a8    ..........:::::. ........@.:::::.
45. 00200000    r@h8    ..........:::::. ..........@::::.
46. 80000000    q@e7    ..........:::::. ..........:::::@
47. 2a080000    k@e8    ..........:::::. .........@::@@@.
 1  -16       532 0 e2a6 b4c3 b2c3 e6d5
 2  -16      1859 0 e2a6 b4c3 b2c3 e6d5
 3  -27      4421 0 e2a6 e6d5 g2h3 b4c3 d2c3 d5e4
 4  -27     59130 0 e2a6 e6d5 g2h3 b4c3 d2c3 d5e4
 5  -37    142018 0 e2a6 e6d5 c3d5 b6d5 e4d5 e7e5
 6  -32    427596 0 e2a6 e6d5 c3d5 b6d5 a6b7 a8b8 b7d5 f6d5 e5f7 e7f7 e4d5 g7b2 f3f7 e8f7
 7  -32    956057 0 e2a6 e6d5 c3d5 b6d5 a6b7 a8b8 b7d5 f6d5 e5f7 e7f7 e4d5 g7b2 f3f7 e8f7
 8  -32   4664824 0 e2a6 e6d5 c3d5 b6d5 a6b7 a8b8 b7d5 f6d5 e5f7 e7f7 e4d5 g7b2 f3f7 e8f7
 9  -96  12236440 0 d5e6 e7e6 e2a6 e6e5 g2h3 b4c3 d2c3 e5h5 f3f4
10  -82  41679702 0 e2a6 e6d5 e5g6 f7g6 c3b5 d5e4 f3g3 e8f8 g2h3 c7c5 g3g6 h8h3
t =  3.089 sec
  41679702 nodes (13.5 Mnps)
  37242403 QS (89.4%)
   5510077 evals (14.8%)
   1893393 stand-pats (5.1%)
  14445410 leaves (38.8%)
    134111 move gens
captures: 89.8%

D:\Mailbox4\x64\Release\Mailbox4.exe (process 2752) exited with code 0.
Press any key to close this window . . .

This is what I get on my i9-10980XE compiled with CLang11
User avatar
hgm
Posts: 27789
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: The mailbox trials

Post by hgm »

Thanks for compiling. 13.5Mnps is pretty good, 1.7 times faster than my machine & compile. Was this a 64-bit compile? (Not that this should matter much, as I don't use any 64-bit variables other than the hash key.)

Indeed I did mean "int bit = BSF(incoming);", and in the pseudo-code examples I posted before I wrote it that way. But when I tried to actually implement it, I didn't see how I could keep it that way. The GNU syntax for assembler embedding does not support returning the result as if it were a function; you just have to mention the C variable to where the result must go. I suppose that to enhance readability I could have done

#define BSF(A, OP, B) asm(" bsf %1, %0\n" : "=r" (A) : "r" (B))

and then write in the code

BSF(bit, =, incoming);

but that seems hardly an improvement. As it was I chose the typographically easiest solution; just replace the = before BSF everywhere in my code by a semicolon.

BTW, I implemented a legality test now, and this gives a bit of a speedup (5.21 -> 4.70 sec, so about 10%). Before exposure of the King wouldonly be recognize in the child node, where the capture-generation code starts testing for attacks on the opponent King. And when there are any, of course cutoff with +INF score rather than search it.

But since the attack map makes this such a trivial test, there was no reason to do it already in MakeMove(), at the earliest opportunity during the update of the attack map. (Which is really the place that capture generation for the child moved to.) So I put it after the preliminary update now (recapture by former protectors of the victim, and enemy slider discovery). This brings the attackers set of the moving player's pieces up to date, so the exposure of his King can be tested. And if it is exposed the MakeMove() can be aborted with a -INF core (parent POV) before the more-expensive updating of the attacks on the opponent's pieces is done.

Failure to resolve a pre-existing check can be tested cheaply even without preliminary update of the map. At the start of a node I now save the attackers set of the stm's King, ANDed with the presence mask of the opponent's pieces, as inCheck variable. After a capture MakeMove then tests whether that variable is non-zero. And if it is, it test whether it still is non-zero after removing the bit for the captured piece (which is needed in the update anyway) from it, or whether the moved piece was a King. If neither of that is the case, the capture is illegal. This now has become the first thing that is done for making and searching a capture, even before updating the pstEval and testing for futility, because it is the cheapest test of all:

Code: Select all

  u->victimMask = attacker2bit[u->victim-16];               // save for unmake
  if(u->inCheck && u->inCheck & ~u->victimMask              // we were in check, and did not capture checker,
                && u->piece != COLOR + 15 - stm) return 0;  // nor moved the King => move is illegal
The u->inCheck variable is now also used to suppress null-move search. And because this, together with the test after preliminary update, which detects moving into check or moving of pinned pieces, prevents that Search() can be called with an illegal move, the testing for the MVV there can now start with Queen.

Of course this doesn't work miracles; there just aren't enough nodes where you are in check for that, or enough illegal moves. But it still cut 10% of the time. Of course nps went down because of it, because all the moves where the King could be captured are no longer inflating the count. And those nodes were pretty cheap to handle, as they were of course always cut-nodes.
Joost Buijs
Posts: 1563
Joined: Thu Jul 16, 2009 10:47 am
Location: Almere, The Netherlands

Re: The mailbox trials

Post by Joost Buijs »

hgm wrote: Thu Apr 08, 2021 2:41 pm Thanks for compiling. 13.5Mnps is pretty good, 1.7 times faster than my machine & compile. Was this a 64-bit compile? (Not that this should matter much, as I don't use any 64-bit variables other than the hash key.)
It is indeed a 64 bit compile. The speed difference is probably caused by a combination of the compiler and somewhat faster hardware. When using a single core the i9-10980XE seems to boost to 4.6 GHz. Personally I think it is less.

I thought that you are using a pretty old 32 bit compiler, however that it doesn't support intrinsics for BSF or TZCNT seems very unlikely, if it doesn't it must be really very old.