Comparison of bitboard attack-getter variants

Sven · Post by **Sven** » Mon Jan 04, 2016 1:09 am

During the last two weeks I have created a new ad-hoc chess engine just for fun and for testing purposes. Motivated by some recent discussions, mainly within the "wrong way" thread, I have built the engine such that it implements several different bitboard techniques for sliding attack generation. Now I have made a very simple speed comparison (NPS) of five approaches, and would like to share my results. The engine itself is still private (so you won't find a download link), and I can't say whether it will be published some day in the future, since it is currently not meant for competition.

The following five approaches were tested:
- Magics
- Hyperbola Quintessence
- Dumb7Fill
- Obstruction Difference
- and an unnamed approach which I simply call "No Headache" (at least the author knows why).

A more realistic test would of course including a "real" static evaluation function, use SEE, improve the search (to reach much bigger search depths) and more. Although I doubt that this would change the results significantly, I plan to repeat the same test after improving the engine significantly. Also other attack-getter implementations are on my ToDo list, including Kindergarten bitboards and a few more.

Test result overview:

Code: Select all

Attack-getter variant   Perft NPS  -% vs. Magics  Search NPS  -% vs. Magics
===========================================================================
Magics                    1509538                    4689055
Obstruction Difference    1305917         -13,5%     4480245          -4,5%
"No Headache"             1290733         -14,5%     4411828          -5,9%
Hyperbola Quintessence    1327537         -12,1%     4380272          -6,6%
Dumb7fill                 1174150         -22,2%     4102345         -12,5%

My brief summary: Magics are still on rank 1. Several other techniques are not far away from Magics, and they might serve as a good starting point if program speed is not (yet) an issue.

Detailled explanations follow below. A typical question might be "why is 'perft' so slow here, slower than search?". Actually it is much faster than it looks like since the reported number of nodes are the visited nodes, not the counted leaves (due to bulk counting).

Any serious comments and thoughts are welcome, of course.

Test description:

System environment
- PC: Lenovo Laptop W530 running Windows 7 Pro SP1
- CPU: Intel Core i7-3610QM CPU @2.30 GHz

Chess program
- private experimental engine "Attack64", not publicly available
- 64 bit C++ program, compiled with MS Visual Studio 2015 Community, no special optimization flags
- single-core program so far
- board representation based on bitboards, several C++ classes but almost no "sophisticated" C++ language features were used
- implements various different techniques for sliding piece attack generation, switchable by preprocessor define in one single source file
- uses HW instructions for popcnt, bsr, bsf, byteswap (intrinsics)
- focus on correctness, fast development and infrastructure for this test, not on playing strength
- very basic alpha-beta searcher without hash table, any kind of pruning or any search improvements (even no killer moves, no history)
- move ordering based on MVV/LVA, best move of previous iteration is searched first, nothing else
- static evaluation: only material + PST, incremental update in makeMove, parameters not tuned
- effective branching factor still very high

Testing conditions
- goal: compare speed (NPS) of various bitboard attack-getter implementations
- for each attack-getter variant the same, very basic tests were performed:
a) run "Perft selftest" of the engine which includes 18 test positions and takes about 20 seconds (total NPS = total nodes visited / total time)
b) search from starting position of chess to fixed depth of 7 plies (takes about 15 seconds due to the high branching factor)
- both test parts were performed three times, each time after restarting the engine, and the average NPS values were calculated

Test output remarks
a) Perft selftest:

Code: Select all

l=   ==> number of leaves counted &#40;perft number needed to check correctness&#41;
n=   ==> number of nodes visited &#40;relevant for speed&#41;
t=   ==> time in centiseconds
lps= ==> leaves per second
nps= ==> nodes per second &#40;the test result&#41;

b) Search:

Code: Select all

nodes   ==> total number of nodes visited during full-width + quiescence search
qsNodes ==> nodes visited during quiescence search only &#40;including depth=0 nodes&#41;
nps     ==> nodes per second &#40;the test result&#41;

Details of test result:

Code: Select all

Magics &#40;L. Hansen/P. Kannan and others&#41;
=======================================

a&#41; Perft selftest&#58;

l= 755136586, n=  28555328, t=    1896, lps=39827879, nps= 1506082
l= 755136586, n=  28555328, t=    1887, lps=40017837, nps= 1513265
l= 755136586, n=  28555328, t=    1892, lps=39912081, nps= 1509266

b&#41; Search starting position to depth 7&#58;

# nodes=69398003 qsNodes=64228262 nps=4685888
# nodes=69398003 qsNodes=64228262 nps=4692224
# nodes=69398003 qsNodes=64228262 nps=4689054


Obstruction Difference &#40;M. Hoffmann/G. Isenberg&#41;
================================================

a&#41; Perft selftest.

l= 755136586, n=  28555328, t=    2194, lps=34418258, nps= 1301519
l= 755136586, n=  28555328, t=    2195, lps=34402577, nps= 1300926
l= 755136586, n=  28555328, t=    2171, lps=34782892, nps= 1315307

b&#41; Search starting position to depth 7&#58;

# nodes=69398003 qsNodes=64228262 nps=4462894
# nodes=69398003 qsNodes=64228262 nps=4503439
# nodes=69398003 qsNodes=64228262 nps=4474403


"No Headache" &#40;M. Hoffmann&#41;
===========================

a&#41; Perft selftest&#58;

l= 755136586, n=  28555328, t=    2213, lps=34122755, nps= 1290344
l= 755136586, n=  28555328, t=    2212, lps=34138182, nps= 1290928
l= 755136586, n=  28555328, t=    2212, lps=34138182, nps= 1290928

b&#41; Search starting position to depth 7&#58;

# nodes=69398003 qsNodes=64228262 nps=4417441
# nodes=69398003 qsNodes=64228262 nps=4409021
# nodes=69398003 qsNodes=64228262 nps=4409021


Hyberbola Quintessence &#40;G. Isenberg/Aleks Peshkov&#41;
==================================================

a&#41; Perft selftest&#58;

l= 755136586, n=  28555328, t=    2152, lps=35089990, nps= 1326920
l= 755136586, n=  28555328, t=    2151, lps=35106303, nps= 1327537
l= 755136586, n=  28555328, t=    2150, lps=35122631, nps= 1328154

b&#41; Search starting position to depth 7&#58;

# nodes=69398003 qsNodes=64228262 nps=4381187
# nodes=69398003 qsNodes=64228262 nps=4372905
# nodes=69398003 qsNodes=64228262 nps=4386725


Dumb7fill &#40;Author&#58; ?)
=====================

a&#41; Perft selftest&#58;

l= 755136586, n=  28555328, t=    2435, lps=31011769, nps= 1172703
l= 755136586, n=  28555328, t=    2430, lps=31075579, nps= 1175116
l= 755136586, n=  28555328, t=    2431, lps=31062796, nps= 1174632

b&#41; Search starting position to depth 7&#58;

# nodes=69398003 qsNodes=64228262 nps=4101536
# nodes=69398003 qsNodes=64228262 nps=4103962
# nodes=69398003 qsNodes=64228262 nps=4101536

tpetzke · Post by **tpetzke** » Mon Jan 04, 2016 8:43 am

Hi Sven,

interesting comparison. I use magic bitboards in iCE, I used rotated ones before and even before that I had an 12x12 array.

I just want to add two things

(1) I don't think magic bitboards are more difficult than other techniques. Rotated bb are certainly but magic is not so tough.

(2) BitBoards shine in eval, so even if in the move generation something else would be faster I still would stick to the bitboards, as they help a lot in eval.

Dann Corbit · Post by **Dann Corbit** » Mon Jan 04, 2016 9:11 am

I guess that the lesson from this (assuming good implementations of each technique) is that it is utterly irrelevant which technique you choose.

Henk · Post by **Henk** » Mon Jan 04, 2016 9:37 am

Send it to Bayer (aspirins).

hgm · Post by **hgm** » Mon Jan 04, 2016 9:52 am

My main reservation towards magics is the large cache pressure it causes. This makes it look good in a sterile test environment like perft where the cache isn't used for anything else. But when you later start adding features to the engine that are also memory intensive, the combined cache pressure of magics and the new feature will slow the engine down so much that it offsets the benefits of the feature. So you will conclude that the feature isn't worth it, while in fact it could have been the magics that were not worth it.

Btw, I remember that Harald Luessen once did a similar test (on WB forum?), and that what came out best there was 'rotated indices', which you don't test now (and of which I also don't know exactly what it means).

Sven · Post by **Sven** » Mon Jan 04, 2016 11:01 am

hgm wrote:My main reservation towards magics is the large cache pressure it causes. This makes it look good in a sterile test environment like perft where the cache isn't used for anything else. But when you later start adding features to the engine that are also memory intensive, the combined cache pressure of magics and the new feature will slow the engine down so much that it offsets the benefits of the feature. So you will conclude that the feature isn't worth it, while in fact it could have been the magics that were not worth it.

Btw, I remember that Harald Luessen once did a similar test (on WB forum?), and that what came out best there was 'rotated indices', which you don't test now (and of which I also don't know exactly what it means).

So according to you we are in severe trouble since many world-class engines use magic bitboards for attack generation although they better shouldn't ... Now what should they do? Use Kindergarten bitboards? Convert back to mailbox/0x88 or an attack-set based representation? What do you propose?

Regarding the "sterile test environment like perft", have you also read the other 50% of my post? It seems that tree search is not exactly "perft".

And if I would add static evaluation features then these would make use of the attack getters as well, to a certain degree, as in mobility, king safety (finding attacks to squares close to the king), and so on. Pawn structure evaluation will mostly be done via pawn hash. Which eval features are really memory intensive in your opinion?

Karlo Bala · Post by **Karlo Bala** » Mon Jan 04, 2016 11:02 am

hgm wrote:My main reservation towards magics is the large cache pressure it causes. This makes it look good in a sterile test environment like perft where the cache isn't used for anything else. But when you later start adding features to the engine that are also memory intensive, the combined cache pressure of magics and the new feature will slow the engine down so much that it offsets the benefits of the feature. So you will conclude that the feature isn't worth it, while in fact it could have been the magics that were not worth it.

Btw, I remember that Harald Luessen once did a similar test (on WB forum?), and that what came out best there was 'rotated indices', which you don't test now (and of which I also don't know exactly what it means).

http://talkchess.com/forum/viewtopic.ph ... 11&t=16002

hgm · Post by **hgm** » Mon Jan 04, 2016 11:35 am

Indeed, that is the posting I remembered. It was on TalkChess after all. No wonder I could not find it on WB forum!.

hgm · Post by **hgm** » Mon Jan 04, 2016 12:38 pm

Sven Schüle wrote:So according to you we are in severe trouble since many world-class engines use magic bitboards for attack generation although they better shouldn't ... Now what should they do? Use Kindergarten bitboards? Convert back to mailbox/0x88 or an attack-set based representation? What do you propose?

Currently I consider the attack-set-based representation the most promising.

Regarding the "sterile test environment like perft", have you also read the other 50% of my post? It seems that tree search is not exactly "perft".

With only PST and no hash table I would not really call it memory-intensive... Of course it consumes time, which is why you see the impact of the attack getter go down in the search.

In an engine with very complex evaluation the attack getter might again become more important. The entire idea of bitboards is to make evaluation faster. For just PST + material a plain old 0x88 mailbox would probably be a lot faster than magic bitboards. But you did not try that. When Harald did the similar test I asked him about how his results compared to mailbox, but he told that could not be measured easily because of the inherently different logic in mailbox algorithms, which prevented them to be implemented as just plugging another module into the program.

Which eval features are really memory intensive in your opinion?

Table-driven SEE. Mate at a glance (and even worse, mate in 2 at a glance). Eval cache.

hgm · Post by **hgm** » Mon Jan 04, 2016 2:04 pm

Perhaps it is helpful to also post Harald's results here:

Code: Select all

+----------------------------+-----------------------+--------------+-----------+
| Name                       | Inventor or Supporter | Table Size   | Test-Time |
+----------------------------+-----------------------+--------------+-----------+
| Rotated Bitboards &#40;naive&#41;  | Robert Hyatt          |   77.2 KByte |  87.78 s  |
| Rotated Bitboards &#40;aligned&#41;| ?                     |   35.0 KByte |  86.65 s  |
| Rotated Bitboards &#40;switch&#41; | &#40;Harald Lüßen?)       |   14.9 KByte |  88.91 s  |
| Rotated Indices            | Alessandro Damiani    |  128.3 KByte |  78.93 s  |
| Exploding Bitboards        | Harald Lüßen          |    3.5 KByte | 101.79 s  |
| Magic Mult. &#40;Kinderg.)     | Gerd Isenberg         |    9.7 KByte |  91.87 s  |
| Magic M. &#40;Kinderg.) 32 bit | Gerd Isenberg         |    9.7 KByte |  81.37 s  |
| Sherwin Index              | Michael Sherwin       | 1402.4 KByte |  90.03 s  |
| Pradu Minimize Magic       | Pradyumna Kannan      |  844.0 KByte |  81.42 s  |
| Pradu Perfect Magic Hash   |  and                  |  627.4 KByte |  82.09 s  |
| Pradu Big                  | Lasse Hansen          | 2306.0 KByte |  82.33 s  |
| Kogge-Stone                | Steffan Westcott      |    0.0 KByte |  99.32 s  |
| Simple Shift               | --                    |    0.0 KByte | 110.25 s  |
| Naive Shift                | --                    |    0.0 KByte | 111.69 s  |
+----------------------------+-----------------------+--------------+-----------+

Comparison of bitboard attack-getter variants

Comparison of bitboard attack-getter variants

Re: Comparison of bitboard attack-getter variants

Re: Comparison of bitboard attack-getter variants

Re: Comparison of bitboard attack-getter variants

Re: Comparison of bitboard attack-getter variants

Re: Comparison of bitboard attack-getter variants

Re: Comparison of bitboard attack-getter variants

Re: Comparison of bitboard attack-getter variants

Re: Comparison of bitboard attack-getter variants

Re: Comparison of bitboard attack-getter variants