Devlog of Leorik
Moderator: Ras
-
- Posts: 1062
- Joined: Tue Apr 28, 2020 10:03 pm
- Full name: Daniel Infuehr
Re: Devlog of Leorik
Pext and Dotnet 8 definitely increased strength in 1 minute time controls more than 30 ELO is my guess
That's the version I posted earlier and I think the ratio is 20/10.
But I think switching movegen is a very personal choice when switching away from ones own.
You can also keep branches with commits you don't merge yet. Sort of like a preloaded bullet that can increase ELO on demand when merged.
That's the version I posted earlier and I think the ratio is 20/10.
But I think switching movegen is a very personal choice when switching away from ones own.
You can also keep branches with commits you don't merge yet. Sort of like a preloaded bullet that can increase ELO on demand when merged.
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
Daniel Inführ - Software Developer
-
- Posts: 915
- Joined: Sun Dec 27, 2020 2:40 am
- Location: Bremen, Germany
- Full name: Thomas Jahn
Re: Devlog of Leorik
Why would you guess that?dangi12012 wrote: ↑Thu Mar 30, 2023 8:15 pm Pext and Dotnet 8 definitely increased strength in 1 minute time controls more than 30 ELO is my guess
Doubling the speed of an engine is worth 50-70 Elo. You made Perft faster by 40% but that doesn't carry over fully to the engine which does much more than just generating moves. If you compare my build (.Net6) with your built (.Net8 + PEXT) depth 20 on the starting position is reached in 10s instead of 12s... I would have expected less than 30 Elo to be honest.
But 30 is what I got at 5s + 500ms:
Code: Select all
Score of Leorik-2.4 vs Leorik-2.4-Dangi: 1300 - 1893 - 3823 [0.458] 7016
... Leorik-2.4 playing White: 755 - 768 - 1985 [0.498] 3508
... Leorik-2.4 playing Black: 545 - 1125 - 1838 [0.417] 3508
... White vs Black: 1880 - 1313 - 3823 [0.540] 7016
Elo difference: -29.4 +/- 5.5, LOS: 0.0 %, DrawRatio: 54.5 %
And it seems like I've reproduced your results with my own implementation by the way:
Score of Leorik-2.4-Pext-Net8 vs Leorik-2.4-Dangi: 156 - 129 - 322 [0.522] 607
...but needs more games to be sure!
I was thinking of letting the 3 code paths for slider generation exist side-by-side and use #defines to switch between them. PEXT can't exist without another slider-attacks generator anyway unless you want to compile >100k bitboards directly into the engine.dangi12012 wrote: ↑Thu Mar 30, 2023 8:15 pm But I think switching movegen is a very personal choice when switching away from ones own.
You can also keep branches with commits you don't merge yet. Sort of like a preloaded bullet that can increase ELO on demand when merged.
-
- Posts: 1062
- Joined: Tue Apr 28, 2020 10:03 pm
- Full name: Daniel Infuehr
Re: Devlog of Leorik
Nice! you are right - with more testing like you did you can trust your results fully.
Yes like you said this way you can keep ARM target -
So when targeting X64 you get pext otherwise you get the other codepath.
Yes like you said this way you can keep ARM target -
Code: Select all
<PropertyGroup Condition=" '$(Configuration)|$(Platform)' == 'Release|x64' ">
<DefineConstants>WIN64;$(DefineConstants)</DefineConstants>
</PropertyGroup>
Code: Select all
#if WIN64
// CODE
#endif
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
Daniel Inführ - Software Developer
-
- Posts: 915
- Joined: Sun Dec 27, 2020 2:40 am
- Location: Bremen, Germany
- Full name: Thomas Jahn
Re: Devlog of Leorik
I have a version of Leorik running on KiSS now. Here are some preliminary perft results (best of five):
KiSS is sitting right in the middle between my legacy implementation and PEXT. 
I'll see if I can optimize the KiSS implementation for C# before running any matches. For now it's just a straight port without any thought of my own put into it.
Code: Select all
Leorik Perft (Standard, .Net 8)
1 OK! 1356ms, 87791K NPS
2 OK! 2163ms, 89523K NPS
3 OK! 2026ms, 88162K NPS
4 OK! 8712ms, 81041K NPS
5 OK! 0ms, 59140K NPS
6 OK! 1800ms, 91115K NPS
Total: 1361558651 Nodes, 16059ms, 84780K NPS
Code: Select all
Leorik Perft (PEXT, .Net 8)
1 OK! 1106ms, 107565K NPS
2 OK! 1660ms, 116675K NPS
3 OK! 1616ms, 110505K NPS
4 OK! 6465ms, 109198K NPS
5 OK! 0ms, 109634K NPS
6 OK! 1383ms, 118591K NPS
Total: 1361558651 Nodes, 12233ms, 111300K NPS
Code: Select all
Leorik Perft (KiSS, .Net 8)
1 OK! 1161ms, 102493K NPS
2 OK! 1923ms, 100703K NPS
3 OK! 1654ms, 107960K NPS
4 OK! 7797ms, 90542K NPS
5 OK! 0ms, 108674K NPS
6 OK! 1611ms, 101815K NPS
Total: 1361558651 Nodes, 14149ms, 96226K NPS

I'll see if I can optimize the KiSS implementation for C# before running any matches. For now it's just a straight port without any thought of my own put into it.
-
- Posts: 965
- Joined: Fri Aug 21, 2020 1:25 am
- Location: Planet Earth, Sol system
- Full name: Michael J Sherwin
Re: Devlog of Leorik
I was going to write something about why Black Magic also needs to be tested but I decided against it. I'm just too worn out. Yesterday I almost went to the hospital because my blood pressure was 193/110. I took double meds and after a couple hours it was 171/100. It eventually came down to 148/96 after I took another round of meds. All day yesterday my diastolic pressure (bottom number) never was below 90. Today it's fine, 132/76. I would put Black Magic bitboards in myself if I could. The m42 bitboard library iirc initializes the magic data at run time. Regardless of all that I am happy that KiSS is running in a real world engine with decent results. Thanks! 

-
- Posts: 915
- Joined: Sun Dec 27, 2020 2:40 am
- Location: Bremen, Germany
- Full name: Thomas Jahn
Re: Devlog of Leorik
Oh, damn!Mike Sherwin wrote: ↑Fri Mar 31, 2023 5:40 pm I was going to write something about why Black Magic also needs to be tested but I decided against it. I'm just too worn out. Yesterday I almost went to the hospital because my blood pressure was 193/110. I took double meds and after a couple hours it was 171/100. It eventually came down to 148/96 after I took another round of meds. All day yesterday my diastolic pressure (bottom number) never was below 90. Today it's fine, 132/76. I would put Black Magic bitboards in myself if I could. The m42 bitboard library iirc initializes the magic data at run time. Regardless of all that I am happy that KiSS is running in a real world engine with decent results. Thanks!![]()


I don't share the same enthusiasm for move generators that you and Dangi have. PEXT was simple to add and is beautiful in it's simplicity. Yours is a favor to a friend and I don't understand it fully yet and embrace the challenge to figure out the details. Also I'm sure I can get more speed out of it (in C# every array is dynamically allocated and I'll have to change a few things to place the bitboards in a more cache friendly way) and knowing that you'd appreciate it I'll be making it my default generator for sliders once the performance is where it should be. But... then I'm done.
If someone cares enough about Black Magic or other algorithms they are invited to make a fork and use it to run their own benchmarks.
-
- Posts: 915
- Joined: Sun Dec 27, 2020 2:40 am
- Location: Bremen, Germany
- Full name: Thomas Jahn
Re: Devlog of Leorik
I hope you're doing okay today? Good news are not a risk for raising blood pressure I hope? Because I *do* have good news for you!Mike Sherwin wrote: ↑Fri Mar 31, 2023 5:40 pm Regardless of all that I am happy that KiSS is running in a real world engine with decent results. Thanks!![]()
Code: Select all
static ulong[] DiagonalMask = new ulong[64];
static ulong[] AntiDiagonalMask = new ulong[64];
static ulong[] RookSubset = new ulong[2 * 64 * 64];
static ulong[] BishopSubset = new ulong[2 * 64 * 64];
public static ulong BishopAttacks(ulong occupation, int square)
{
ulong offset = (ulong)(square << 7);
ulong dIndex = ((occupation & DiagonalMask[square]) * FILE_B2_B7) >> 58;
ulong aIndex = ((occupation & AntiDiagonalMask[square]) * FILE_B2_B7) >> 58;
return BishopSubset[offset + dIndex] | BishopSubset[offset + 64 + aIndex];
}
public static ulong RookAttacks(ulong occupation, int square)
{
ulong offset = (ulong)(square << 7);
ulong hIndex = (occupation >> ((square & 0b111000) | 1)) & 63;
ulong vIndex = (((occupation >> (square & 0b000111)) & FILE_A2_A7) * DIAGONAL_C2_H7) >> 58;
return RookSubset[offset + hIndex] | RookSubset[offset + 64 + vIndex];
}
The 64 diagonal subsets for a square are placed beside the 64 anti-diagonal subsets for the same square making it a bit more cache friendly.
This might be a C# thing but I got better speed that way than with multi-dimensional arrays or jagged arrays.
Code: Select all
Leorik Perft KiSS
1 OK! 1137ms, 104697K NPS
2 OK! 1784ms, 108551K NPS
3 OK! 1617ms, 110429K NPS
4 OK! 7277ms, 97022K NPS
5 OK! 0ms, 76734K NPS
6 OK! 1515ms, 108265K NPS
Total: 1361558651 Nodes, 13332ms, 102123K NPS
But what's really nice is that it seems to perform on par with PEXT in a selfplay match. My theory is that in a real engine the cache is under higher pressure than when you just compute perft. And then the smaller lookup tables of KiSS give it an edge over PEXT and it catches up.
Here are 10K games at 5s + 500ms time control against Dangi's binary.
Code: Select all
Score of Leorik-2.4-KiSS-Net8 vs Leorik-2.4-Dangi: 2664 - 2417 - 4919 [0.512] 10000
... Leorik-2.4-KiSS-Net8 playing White: 1300 - 995 - 2705 [0.530] 5000
... Leorik-2.4-KiSS-Net8 playing Black: 1364 - 1422 - 2214 [0.494] 5000
... White vs Black: 2722 - 2359 - 4919 [0.518] 10000
Elo difference: 8.6 +/- 4.8, LOS: 100.0 %, DrawRatio: 49.2 %
Code: Select all
Score of Leorik-2.4-Pext-Net8 vs Leorik-2.4-D: 478 - 441 - 1039 [0.509] 1958
... Leorik-2.4-Pext-Net8 playing White: 246 - 211 - 524 [0.518] 981
... Leorik-2.4-Pext-Net8 playing Black: 232 - 230 - 515 [0.501] 977
... White vs Black: 476 - 443 - 1039 [0.508] 1958
Elo difference: 6.6 +/- 10.5, LOS: 88.9 %, DrawRatio: 53.1 %
-
- Posts: 965
- Joined: Fri Aug 21, 2020 1:25 am
- Location: Planet Earth, Sol system
- Full name: Michael J Sherwin
Re: Devlog of Leorik
Thank you so much Thomas for seeing the potential in my new mg! And even more for making it part of Leorik!! And making it better!!! Maybe KiSS is the chess engine version of Leoric's undead crown.lithander wrote: ↑Sat Apr 01, 2023 3:28 pmI hope you're doing okay today? Good news are not a risk for raising blood pressure I hope? Because I *do* have good news for you!Mike Sherwin wrote: ↑Fri Mar 31, 2023 5:40 pm Regardless of all that I am happy that KiSS is running in a real world engine with decent results. Thanks!![]()
I have refactored the code to use only 2 large arrays for the subsets.Code: Select all
static ulong[] DiagonalMask = new ulong[64]; static ulong[] AntiDiagonalMask = new ulong[64]; static ulong[] RookSubset = new ulong[2 * 64 * 64]; static ulong[] BishopSubset = new ulong[2 * 64 * 64]; public static ulong BishopAttacks(ulong occupation, int square) { ulong offset = (ulong)(square << 7); ulong dIndex = ((occupation & DiagonalMask[square]) * FILE_B2_B7) >> 58; ulong aIndex = ((occupation & AntiDiagonalMask[square]) * FILE_B2_B7) >> 58; return BishopSubset[offset + dIndex] | BishopSubset[offset + 64 + aIndex]; } public static ulong RookAttacks(ulong occupation, int square) { ulong offset = (ulong)(square << 7); ulong hIndex = (occupation >> ((square & 0b111000) | 1)) & 63; ulong vIndex = (((occupation >> (square & 0b000111)) & FILE_A2_A7) * DIAGONAL_C2_H7) >> 58; return RookSubset[offset + hIndex] | RookSubset[offset + 64 + vIndex]; }
The 64 diagonal subsets for a square are placed beside the 64 anti-diagonal subsets for the same square making it a bit more cache friendly.
This might be a C# thing but I got better speed that way than with multi-dimensional arrays or jagged arrays.
Over 100M NPS is great. Still 10M less than PEXT but 18M more than Leorik's previous code. And without any kind of exotic intrinsics, mind you!Code: Select all
Leorik Perft KiSS 1 OK! 1137ms, 104697K NPS 2 OK! 1784ms, 108551K NPS 3 OK! 1617ms, 110429K NPS 4 OK! 7277ms, 97022K NPS 5 OK! 0ms, 76734K NPS 6 OK! 1515ms, 108265K NPS Total: 1361558651 Nodes, 13332ms, 102123K NPS
But what's really nice is that it seems to perform on par with PEXT in a selfplay match. My theory is that in a real engine the cache is under higher pressure than when you just compute perft. And then the smaller lookup tables of KiSS give it an edge over PEXT and it catches up.
Here are 10K games at 5s + 500ms time control against Dangi's binary.
And my own PEXT implementation against Dangi's with the same time control but fewer games:Code: Select all
Score of Leorik-2.4-KiSS-Net8 vs Leorik-2.4-Dangi: 2664 - 2417 - 4919 [0.512] 10000 ... Leorik-2.4-KiSS-Net8 playing White: 1300 - 995 - 2705 [0.530] 5000 ... Leorik-2.4-KiSS-Net8 playing Black: 1364 - 1422 - 2214 [0.494] 5000 ... White vs Black: 2722 - 2359 - 4919 [0.518] 10000 Elo difference: 8.6 +/- 4.8, LOS: 100.0 %, DrawRatio: 49.2 %
Congratz, Mike! I've become really impressed with your work! =)Code: Select all
Score of Leorik-2.4-Pext-Net8 vs Leorik-2.4-D: 478 - 441 - 1039 [0.509] 1958 ... Leorik-2.4-Pext-Net8 playing White: 246 - 211 - 524 [0.518] 981 ... Leorik-2.4-Pext-Net8 playing Black: 232 - 230 - 515 [0.501] 977 ... White vs Black: 476 - 443 - 1039 [0.508] 1958 Elo difference: 6.6 +/- 10.5, LOS: 88.9 %, DrawRatio: 53.1 %


-
- Posts: 20
- Joined: Fri Mar 10, 2023 9:33 am
- Full name: Martin Novák
Re: Devlog of Leorik
Hello,Mike Sherwin wrote: ↑Sat Apr 01, 2023 4:04 pmThank you so much Thomas for seeing the potential in my new mg! And even more for making it part of Leorik!! And making it better!!! Maybe KiSS is the chess engine version of Leoric's undead crown.lithander wrote: ↑Sat Apr 01, 2023 3:28 pmI hope you're doing okay today? Good news are not a risk for raising blood pressure I hope? Because I *do* have good news for you!Mike Sherwin wrote: ↑Fri Mar 31, 2023 5:40 pm Regardless of all that I am happy that KiSS is running in a real world engine with decent results. Thanks!![]()
I have refactored the code to use only 2 large arrays for the subsets.Code: Select all
static ulong[] DiagonalMask = new ulong[64]; static ulong[] AntiDiagonalMask = new ulong[64]; static ulong[] RookSubset = new ulong[2 * 64 * 64]; static ulong[] BishopSubset = new ulong[2 * 64 * 64]; public static ulong BishopAttacks(ulong occupation, int square) { ulong offset = (ulong)(square << 7); ulong dIndex = ((occupation & DiagonalMask[square]) * FILE_B2_B7) >> 58; ulong aIndex = ((occupation & AntiDiagonalMask[square]) * FILE_B2_B7) >> 58; return BishopSubset[offset + dIndex] | BishopSubset[offset + 64 + aIndex]; } public static ulong RookAttacks(ulong occupation, int square) { ulong offset = (ulong)(square << 7); ulong hIndex = (occupation >> ((square & 0b111000) | 1)) & 63; ulong vIndex = (((occupation >> (square & 0b000111)) & FILE_A2_A7) * DIAGONAL_C2_H7) >> 58; return RookSubset[offset + hIndex] | RookSubset[offset + 64 + vIndex]; }
The 64 diagonal subsets for a square are placed beside the 64 anti-diagonal subsets for the same square making it a bit more cache friendly.
This might be a C# thing but I got better speed that way than with multi-dimensional arrays or jagged arrays.
Over 100M NPS is great. Still 10M less than PEXT but 18M more than Leorik's previous code. And without any kind of exotic intrinsics, mind you!Code: Select all
Leorik Perft KiSS 1 OK! 1137ms, 104697K NPS 2 OK! 1784ms, 108551K NPS 3 OK! 1617ms, 110429K NPS 4 OK! 7277ms, 97022K NPS 5 OK! 0ms, 76734K NPS 6 OK! 1515ms, 108265K NPS Total: 1361558651 Nodes, 13332ms, 102123K NPS
But what's really nice is that it seems to perform on par with PEXT in a selfplay match. My theory is that in a real engine the cache is under higher pressure than when you just compute perft. And then the smaller lookup tables of KiSS give it an edge over PEXT and it catches up.
Here are 10K games at 5s + 500ms time control against Dangi's binary.
And my own PEXT implementation against Dangi's with the same time control but fewer games:Code: Select all
Score of Leorik-2.4-KiSS-Net8 vs Leorik-2.4-Dangi: 2664 - 2417 - 4919 [0.512] 10000 ... Leorik-2.4-KiSS-Net8 playing White: 1300 - 995 - 2705 [0.530] 5000 ... Leorik-2.4-KiSS-Net8 playing Black: 1364 - 1422 - 2214 [0.494] 5000 ... White vs Black: 2722 - 2359 - 4919 [0.518] 10000 Elo difference: 8.6 +/- 4.8, LOS: 100.0 %, DrawRatio: 49.2 %
Congratz, Mike! I've become really impressed with your work! =)Code: Select all
Score of Leorik-2.4-Pext-Net8 vs Leorik-2.4-D: 478 - 441 - 1039 [0.509] 1958 ... Leorik-2.4-Pext-Net8 playing White: 246 - 211 - 524 [0.518] 981 ... Leorik-2.4-Pext-Net8 playing Black: 232 - 230 - 515 [0.501] 977 ... White vs Black: 476 - 443 - 1039 [0.508] 1958 Elo difference: 6.6 +/- 10.5, LOS: 88.9 %, DrawRatio: 53.1 %
I had hope that KiSS would surpass Black Magic because of the smaller memory usage. I had no hope in a million years at all that it would come in slightly faster than PEXT in a real engine. But then you predicted that it would be at least on par.
![]()
I've been thinking for a while now about combining PEXT with KiSS. I have no idea if it is just stupid or not. On my computer pext commands are quite slow so i cannot test it very well.
For example the vertical index could be calculated like this:
Code: Select all
uint64_t vIndex = _pext_u64(occ, vMask[sq]);
for blockers |= (256ull << (i << 3));
Code: Select all
for (index = 0; index < 64; index++) {
vSubset[sq][index] = 0;
uint64_t blockers = 0;
for (int i = 0; i <= 5; i++) {
if (index & (1ull << i)) {
blockers |= (256ull << (i << 3));
}
}
if ((sq >> 3) != RANK8) {
for (ts = sq + 8; ; ts += 8) {
vSubset[sq][index] |= (1ull << ts);
if (blockers & (1ull << (ts - (ts & 7)))) break;
if ((ts >> 3) == RANK8) break;
}
}
if ((sq >> 3) != RANK1) {
for (ts = sq - 8; ; ts -= 8) {
vSubset[sq][index] |= (1ull << ts);
if (blockers & (1ull << (ts - (ts & 7)))) break;
if ((ts >> 3) == RANK1) break;
}
}
}
hIndex would be even simpler just pext with hMask[sq].
It could be like pext but with smaller lookup table. But maybe it's just stupid idea.
Also in C++ i got quite good speed up when I used shift_horizontal_table[sq] instead of (square & 0b111000) | 1). Wouldn't it also make it little difference in c#?
-
- Posts: 965
- Joined: Fri Aug 21, 2020 1:25 am
- Location: Planet Earth, Sol system
- Full name: Michael J Sherwin
Re: Devlog of Leorik
Hi Martin, That sounds interesting. Maybe even amazing. My cpu is also not PEXT friendly. So we will have to hope Thomas thinks the same as us. Just to think after your last update I thought no more could be done.martinn wrote: ↑Sat Apr 01, 2023 10:32 pm Hello,
I've been thinking for a while now about combining PEXT with KiSS. I have no idea if it is just stupid or not. On my computer pext commands are quite slow so i cannot test it very well.
For example the vertical index could be calculated like this:In Mike's initialization of vSubset It would be necessary to change line: blockers |= (1ull << (((5 - i) << 3) + 8));Code: Select all
uint64_t vIndex = _pext_u64(occ, vMask[sq]);
for blockers |= (256ull << (i << 3));
I don't know how Thomas changed the code so I cannot say where it would change there.Code: Select all
for (index = 0; index < 64; index++) { vSubset[sq][index] = 0; uint64_t blockers = 0; for (int i = 0; i <= 5; i++) { if (index & (1ull << i)) { blockers |= (256ull << (i << 3)); } } if ((sq >> 3) != RANK8) { for (ts = sq + 8; ; ts += 8) { vSubset[sq][index] |= (1ull << ts); if (blockers & (1ull << (ts - (ts & 7)))) break; if ((ts >> 3) == RANK8) break; } } if ((sq >> 3) != RANK1) { for (ts = sq - 8; ; ts -= 8) { vSubset[sq][index] |= (1ull << ts); if (blockers & (1ull << (ts - (ts & 7)))) break; if ((ts >> 3) == RANK1) break; } } }
hIndex would be even simpler just pext with hMask[sq].
It could be like pext but with smaller lookup table. But maybe it's just stupid idea.
Also in C++ i got quite good speed up when I used shift_horizontal_table[sq] instead of (square & 0b111000) | 1). Wouldn't it also make it little difference in c#?

