Devlog of Leorik

dangi12012 · Post by **dangi12012** » Thu Mar 30, 2023 8:15 pm

Pext and Dotnet 8 definitely increased strength in 1 minute time controls more than 30 ELO is my guess

That's the version I posted earlier and I think the ratio is 20/10.
But I think switching movegen is a very personal choice when switching away from ones own.

You can also keep branches with commits you don't merge yet. Sort of like a preloaded bullet that can increase ELO on demand when merged.

lithander · Post by **lithander** » Fri Mar 31, 2023 1:38 am

dangi12012 wrote: ↑Thu Mar 30, 2023 8:15 pm Pext and Dotnet 8 definitely increased strength in 1 minute time controls more than 30 ELO is my guess

Why would you guess that?

Doubling the speed of an engine is worth 50-70 Elo. You made Perft faster by 40% but that doesn't carry over fully to the engine which does much more than just generating moves. If you compare my build (.Net6) with your built (.Net8 + PEXT) depth 20 on the starting position is reached in 10s instead of 12s... I would have expected less than 30 Elo to be honest.

But 30 is what I got at 5s + 500ms:

Code: Select all

Score of Leorik-2.4 vs Leorik-2.4-Dangi: 1300 - 1893 - 3823  [0.458] 7016
...      Leorik-2.4 playing White: 755 - 768 - 1985  [0.498] 3508
...      Leorik-2.4 playing Black: 545 - 1125 - 1838  [0.417] 3508
...      White vs Black: 1880 - 1313 - 3823  [0.540] 7016
Elo difference: -29.4 +/- 5.5, LOS: 0.0 %, DrawRatio: 54.5 %

Considering that selfplay games at fast time control often suggest higher strength gains than what you get against a gauntlet at longer time-control everything is fine. It's not too much and not too little!

And it seems like I've reproduced your results with my own implementation by the way:
Score of Leorik-2.4-Pext-Net8 vs Leorik-2.4-Dangi: 156 - 129 - 322 [0.522] 607
...but needs more games to be sure!

dangi12012 wrote: ↑Thu Mar 30, 2023 8:15 pm But I think switching movegen is a very personal choice when switching away from ones own.

You can also keep branches with commits you don't merge yet. Sort of like a preloaded bullet that can increase ELO on demand when merged.

I was thinking of letting the 3 code paths for slider generation exist side-by-side and use #defines to switch between them. PEXT can't exist without another slider-attacks generator anyway unless you want to compile >100k bitboards directly into the engine.

dangi12012 · Post by **dangi12012** » Fri Mar 31, 2023 10:48 am

Nice! you are right - with more testing like you did you can trust your results fully.

Yes like you said this way you can keep ARM target -

Code: Select all

  <PropertyGroup Condition=" '$(Configuration)|$(Platform)' == 'Release|x64' ">
    <DefineConstants>WIN64;$(DefineConstants)</DefineConstants>
</PropertyGroup>

So when targeting X64 you get pext otherwise you get the other codepath.

Code: Select all

#if WIN64
// CODE
#endif

lithander · Post by **lithander** » Fri Mar 31, 2023 2:11 pm

I have a version of Leorik running on KiSS now. Here are some preliminary perft results (best of five):

Code: Select all

Leorik Perft (Standard, .Net 8)

1 OK! 1356ms, 87791K NPS
2 OK! 2163ms, 89523K NPS
3 OK! 2026ms, 88162K NPS
4 OK! 8712ms, 81041K NPS
5 OK! 0ms, 59140K NPS
6 OK! 1800ms, 91115K NPS

Total: 1361558651 Nodes, 16059ms, 84780K NPS

Code: Select all

Leorik Perft (PEXT, .Net 8)

1 OK! 1106ms, 107565K NPS
2 OK! 1660ms, 116675K NPS
3 OK! 1616ms, 110505K NPS
4 OK! 6465ms, 109198K NPS
5 OK! 0ms, 109634K NPS
6 OK! 1383ms, 118591K NPS

Total: 1361558651 Nodes, 12233ms, 111300K NPS

Code: Select all

Leorik Perft (KiSS, .Net 8)

1 OK! 1161ms, 102493K NPS
2 OK! 1923ms, 100703K NPS
3 OK! 1654ms, 107960K NPS
4 OK! 7797ms, 90542K NPS
5 OK! 0ms, 108674K NPS
6 OK! 1611ms, 101815K NPS

Total: 1361558651 Nodes, 14149ms, 96226K NPS

KiSS is sitting right in the middle between my legacy implementation and PEXT.

I'll see if I can optimize the KiSS implementation for C# before running any matches. For now it's just a straight port without any thought of my own put into it.

Mike Sherwin · Post by **Mike Sherwin** » Fri Mar 31, 2023 5:40 pm

I was going to write something about why Black Magic also needs to be tested but I decided against it. I'm just too worn out. Yesterday I almost went to the hospital because my blood pressure was 193/110. I took double meds and after a couple hours it was 171/100. It eventually came down to 148/96 after I took another round of meds. All day yesterday my diastolic pressure (bottom number) never was below 90. Today it's fine, 132/76. I would put Black Magic bitboards in myself if I could. The m42 bitboard library iirc initializes the magic data at run time. Regardless of all that I am happy that KiSS is running in a real world engine with decent results. Thanks!

lithander · Post by **lithander** » Fri Mar 31, 2023 5:59 pm

Mike Sherwin wrote: ↑Fri Mar 31, 2023 5:40 pm I was going to write something about why Black Magic also needs to be tested but I decided against it. I'm just too worn out. Yesterday I almost went to the hospital because my blood pressure was 193/110. I took double meds and after a couple hours it was 171/100. It eventually came down to 148/96 after I took another round of meds. All day yesterday my diastolic pressure (bottom number) never was below 90. Today it's fine, 132/76. I would put Black Magic bitboards in myself if I could. The m42 bitboard library iirc initializes the magic data at run time. Regardless of all that I am happy that KiSS is running in a real world engine with decent results. Thanks!

Oh, damn!

Take a step back and take care of your body and I'll take care of giving KiSS a permanent home!

I don't share the same enthusiasm for move generators that you and Dangi have. PEXT was simple to add and is beautiful in it's simplicity. Yours is a favor to a friend and I don't understand it fully yet and embrace the challenge to figure out the details. Also I'm sure I can get more speed out of it (in C# every array is dynamically allocated and I'll have to change a few things to place the bitboards in a more cache friendly way) and knowing that you'd appreciate it I'll be making it my default generator for sliders once the performance is where it should be. But... then I'm done.

If someone cares enough about Black Magic or other algorithms they are invited to make a fork and use it to run their own benchmarks.

lithander · Post by **lithander** » Sat Apr 01, 2023 3:28 pm

Mike Sherwin wrote: ↑Fri Mar 31, 2023 5:40 pm Regardless of all that I am happy that KiSS is running in a real world engine with decent results. Thanks!

I hope you're doing okay today? Good news are not a risk for raising blood pressure I hope? Because I *do* have good news for you!

Code: Select all

static ulong[] DiagonalMask = new ulong[64];
static ulong[] AntiDiagonalMask = new ulong[64];

static ulong[] RookSubset = new ulong[2 * 64 * 64];
static ulong[] BishopSubset = new ulong[2 * 64 * 64];

public static ulong BishopAttacks(ulong occupation, int square)
{
    ulong offset = (ulong)(square << 7);
    ulong dIndex = ((occupation & DiagonalMask[square]) * FILE_B2_B7) >> 58;
    ulong aIndex = ((occupation & AntiDiagonalMask[square]) * FILE_B2_B7) >> 58;
    return BishopSubset[offset + dIndex] | BishopSubset[offset + 64 + aIndex];
}

public static ulong RookAttacks(ulong occupation, int square)
{
    ulong offset = (ulong)(square << 7);
    ulong hIndex = (occupation >> ((square & 0b111000) | 1)) & 63;
    ulong vIndex = (((occupation >> (square & 0b000111)) & FILE_A2_A7) * DIAGONAL_C2_H7) >> 58;
    return RookSubset[offset + hIndex] | RookSubset[offset + 64 + vIndex];
}

I have refactored the code to use only 2 large arrays for the subsets.
The 64 diagonal subsets for a square are placed beside the 64 anti-diagonal subsets for the same square making it a bit more cache friendly.
This might be a C# thing but I got better speed that way than with multi-dimensional arrays or jagged arrays.

Code: Select all

Leorik Perft KiSS

1 OK! 1137ms, 104697K NPS
2 OK! 1784ms, 108551K NPS
3 OK! 1617ms, 110429K NPS
4 OK! 7277ms, 97022K NPS
5 OK! 0ms, 76734K NPS
6 OK! 1515ms, 108265K NPS

Total: 1361558651 Nodes, 13332ms, 102123K NPS

Over 100M NPS is great. Still 10M less than PEXT but 18M more than Leorik's previous code. And without any kind of exotic intrinsics, mind you!

But what's really nice is that it seems to perform on par with PEXT in a selfplay match. My theory is that in a real engine the cache is under higher pressure than when you just compute perft. And then the smaller lookup tables of KiSS give it an edge over PEXT and it catches up.

Here are 10K games at 5s + 500ms time control against Dangi's binary.

Code: Select all

Score of Leorik-2.4-KiSS-Net8 vs Leorik-2.4-Dangi: 2664 - 2417 - 4919  [0.512] 10000
...      Leorik-2.4-KiSS-Net8 playing White: 1300 - 995 - 2705  [0.530] 5000
...      Leorik-2.4-KiSS-Net8 playing Black: 1364 - 1422 - 2214  [0.494] 5000
...      White vs Black: 2722 - 2359 - 4919  [0.518] 10000
Elo difference: 8.6 +/- 4.8, LOS: 100.0 %, DrawRatio: 49.2 %

And my own PEXT implementation against Dangi's with the same time control but fewer games:

Code: Select all

Score of Leorik-2.4-Pext-Net8 vs Leorik-2.4-D: 478 - 441 - 1039  [0.509] 1958
...      Leorik-2.4-Pext-Net8 playing White: 246 - 211 - 524  [0.518] 981
...      Leorik-2.4-Pext-Net8 playing Black: 232 - 230 - 515  [0.501] 977
...      White vs Black: 476 - 443 - 1039  [0.508] 1958
Elo difference: 6.6 +/- 10.5, LOS: 88.9 %, DrawRatio: 53.1 %

Congratz, Mike! I've become really impressed with your work! =)

Mike Sherwin · Post by **Mike Sherwin** » Sat Apr 01, 2023 4:04 pm

lithander wrote: ↑Sat Apr 01, 2023 3:28 pm
Mike Sherwin wrote: ↑Fri Mar 31, 2023 5:40 pm Regardless of all that I am happy that KiSS is running in a real world engine with decent results. Thanks!
I hope you're doing okay today? Good news are not a risk for raising blood pressure I hope? Because I *do* have good news for you!
Code: Select all
static ulong[] DiagonalMask = new ulong[64];
static ulong[] AntiDiagonalMask = new ulong[64];

static ulong[] RookSubset = new ulong[2 * 64 * 64];
static ulong[] BishopSubset = new ulong[2 * 64 * 64];

public static ulong BishopAttacks(ulong occupation, int square)
{
    ulong offset = (ulong)(square << 7);
    ulong dIndex = ((occupation & DiagonalMask[square]) * FILE_B2_B7) >> 58;
    ulong aIndex = ((occupation & AntiDiagonalMask[square]) * FILE_B2_B7) >> 58;
    return BishopSubset[offset + dIndex] | BishopSubset[offset + 64 + aIndex];
}

public static ulong RookAttacks(ulong occupation, int square)
{
    ulong offset = (ulong)(square << 7);
    ulong hIndex = (occupation >> ((square & 0b111000) | 1)) & 63;
    ulong vIndex = (((occupation >> (square & 0b000111)) & FILE_A2_A7) * DIAGONAL_C2_H7) >> 58;
    return RookSubset[offset + hIndex] | RookSubset[offset + 64 + vIndex];
}
I have refactored the code to use only 2 large arrays for the subsets.
The 64 diagonal subsets for a square are placed beside the 64 anti-diagonal subsets for the same square making it a bit more cache friendly.
This might be a C# thing but I got better speed that way than with multi-dimensional arrays or jagged arrays.
Code: Select all
Leorik Perft KiSS

1 OK! 1137ms, 104697K NPS
2 OK! 1784ms, 108551K NPS
3 OK! 1617ms, 110429K NPS
4 OK! 7277ms, 97022K NPS
5 OK! 0ms, 76734K NPS
6 OK! 1515ms, 108265K NPS

Total: 1361558651 Nodes, 13332ms, 102123K NPS
Over 100M NPS is great. Still 10M less than PEXT but 18M more than Leorik's previous code. And without any kind of exotic intrinsics, mind you!

But what's really nice is that it seems to perform on par with PEXT in a selfplay match. My theory is that in a real engine the cache is under higher pressure than when you just compute perft. And then the smaller lookup tables of KiSS give it an edge over PEXT and it catches up.

Here are 10K games at 5s + 500ms time control against Dangi's binary.
Code: Select all
Score of Leorik-2.4-KiSS-Net8 vs Leorik-2.4-Dangi: 2664 - 2417 - 4919  [0.512] 10000
...      Leorik-2.4-KiSS-Net8 playing White: 1300 - 995 - 2705  [0.530] 5000
...      Leorik-2.4-KiSS-Net8 playing Black: 1364 - 1422 - 2214  [0.494] 5000
...      White vs Black: 2722 - 2359 - 4919  [0.518] 10000
Elo difference: 8.6 +/- 4.8, LOS: 100.0 %, DrawRatio: 49.2 %
And my own PEXT implementation against Dangi's with the same time control but fewer games:
Code: Select all
Score of Leorik-2.4-Pext-Net8 vs Leorik-2.4-D: 478 - 441 - 1039  [0.509] 1958
...      Leorik-2.4-Pext-Net8 playing White: 246 - 211 - 524  [0.518] 981
...      Leorik-2.4-Pext-Net8 playing Black: 232 - 230 - 515  [0.501] 977
...      White vs Black: 476 - 443 - 1039  [0.508] 1958
Elo difference: 6.6 +/- 10.5, LOS: 88.9 %, DrawRatio: 53.1 %
Congratz, Mike! I've become really impressed with your work! =)

Thank you so much Thomas for seeing the potential in my new mg! And even more for making it part of Leorik!! And making it better!!! Maybe KiSS is the chess engine version of Leoric's undead crown.

I had hope that KiSS would surpass Black Magic because of the smaller memory usage. I had no hope in a million years at all that it would come in slightly faster than PEXT in a real engine. But then you predicted that it would be at least on par.

martinn · Post by **martinn** » Sat Apr 01, 2023 10:32 pm

Mike Sherwin wrote: ↑Sat Apr 01, 2023 4:04 pm
lithander wrote: ↑Sat Apr 01, 2023 3:28 pm
Mike Sherwin wrote: ↑Fri Mar 31, 2023 5:40 pm Regardless of all that I am happy that KiSS is running in a real world engine with decent results. Thanks!
I hope you're doing okay today? Good news are not a risk for raising blood pressure I hope? Because I *do* have good news for you!
Code: Select all
static ulong[] DiagonalMask = new ulong[64];
static ulong[] AntiDiagonalMask = new ulong[64];

static ulong[] RookSubset = new ulong[2 * 64 * 64];
static ulong[] BishopSubset = new ulong[2 * 64 * 64];

public static ulong BishopAttacks(ulong occupation, int square)
{
    ulong offset = (ulong)(square << 7);
    ulong dIndex = ((occupation & DiagonalMask[square]) * FILE_B2_B7) >> 58;
    ulong aIndex = ((occupation & AntiDiagonalMask[square]) * FILE_B2_B7) >> 58;
    return BishopSubset[offset + dIndex] | BishopSubset[offset + 64 + aIndex];
}

public static ulong RookAttacks(ulong occupation, int square)
{
    ulong offset = (ulong)(square << 7);
    ulong hIndex = (occupation >> ((square & 0b111000) | 1)) & 63;
    ulong vIndex = (((occupation >> (square & 0b000111)) & FILE_A2_A7) * DIAGONAL_C2_H7) >> 58;
    return RookSubset[offset + hIndex] | RookSubset[offset + 64 + vIndex];
}
I have refactored the code to use only 2 large arrays for the subsets.
The 64 diagonal subsets for a square are placed beside the 64 anti-diagonal subsets for the same square making it a bit more cache friendly.
This might be a C# thing but I got better speed that way than with multi-dimensional arrays or jagged arrays.
Code: Select all
Leorik Perft KiSS

1 OK! 1137ms, 104697K NPS
2 OK! 1784ms, 108551K NPS
3 OK! 1617ms, 110429K NPS
4 OK! 7277ms, 97022K NPS
5 OK! 0ms, 76734K NPS
6 OK! 1515ms, 108265K NPS

Total: 1361558651 Nodes, 13332ms, 102123K NPS
Over 100M NPS is great. Still 10M less than PEXT but 18M more than Leorik's previous code. And without any kind of exotic intrinsics, mind you!

But what's really nice is that it seems to perform on par with PEXT in a selfplay match. My theory is that in a real engine the cache is under higher pressure than when you just compute perft. And then the smaller lookup tables of KiSS give it an edge over PEXT and it catches up.

Here are 10K games at 5s + 500ms time control against Dangi's binary.
Code: Select all
Score of Leorik-2.4-KiSS-Net8 vs Leorik-2.4-Dangi: 2664 - 2417 - 4919  [0.512] 10000
...      Leorik-2.4-KiSS-Net8 playing White: 1300 - 995 - 2705  [0.530] 5000
...      Leorik-2.4-KiSS-Net8 playing Black: 1364 - 1422 - 2214  [0.494] 5000
...      White vs Black: 2722 - 2359 - 4919  [0.518] 10000
Elo difference: 8.6 +/- 4.8, LOS: 100.0 %, DrawRatio: 49.2 %
And my own PEXT implementation against Dangi's with the same time control but fewer games:
Code: Select all
Score of Leorik-2.4-Pext-Net8 vs Leorik-2.4-D: 478 - 441 - 1039  [0.509] 1958
...      Leorik-2.4-Pext-Net8 playing White: 246 - 211 - 524  [0.518] 981
...      Leorik-2.4-Pext-Net8 playing Black: 232 - 230 - 515  [0.501] 977
...      White vs Black: 476 - 443 - 1039  [0.508] 1958
Elo difference: 6.6 +/- 10.5, LOS: 88.9 %, DrawRatio: 53.1 %
Congratz, Mike! I've become really impressed with your work! =)
Thank you so much Thomas for seeing the potential in my new mg! And even more for making it part of Leorik!! And making it better!!! Maybe KiSS is the chess engine version of Leoric's undead crown. I had hope that KiSS would surpass Black Magic because of the smaller memory usage. I had no hope in a million years at all that it would come in slightly faster than PEXT in a real engine. But then you predicted that it would be at least on par.

Hello,
I've been thinking for a while now about combining PEXT with KiSS. I have no idea if it is just stupid or not. On my computer pext commands are quite slow so i cannot test it very well.

For example the vertical index could be calculated like this:

Code: Select all

uint64_t vIndex = _pext_u64(occ, vMask[sq]);

In Mike's initialization of vSubset It would be necessary to change line: blockers |= (1ull << (((5 - i) << 3) + 8));
for blockers |= (256ull << (i << 3));

Code: Select all

for (index = 0; index < 64; index++) {
            vSubset[sq][index] = 0;
            uint64_t blockers = 0;
            for (int i = 0; i <= 5; i++) {
                if (index & (1ull << i)) {
                    blockers |= (256ull << (i << 3));
                }
            }
            if ((sq >> 3) != RANK8) {
                for (ts = sq + 8; ; ts += 8) {
                    vSubset[sq][index] |= (1ull << ts);
                    if (blockers & (1ull << (ts - (ts & 7)))) break;
                    if ((ts >> 3) == RANK8) break;
                }
            }
            if ((sq >> 3) != RANK1) {
                for (ts = sq - 8; ; ts -= 8) {
                    vSubset[sq][index] |= (1ull << ts);
                    if (blockers & (1ull << (ts - (ts & 7)))) break;
                    if ((ts >> 3) == RANK1) break;
                }
            }
        }

I don't know how Thomas changed the code so I cannot say where it would change there.

hIndex would be even simpler just pext with hMask[sq].
It could be like pext but with smaller lookup table. But maybe it's just stupid idea.

Also in C++ i got quite good speed up when I used shift_horizontal_table[sq] instead of (square & 0b111000) | 1). Wouldn't it also make it little difference in c#?

Mike Sherwin · Post by **Mike Sherwin** » Sat Apr 01, 2023 11:40 pm

martinn wrote: ↑Sat Apr 01, 2023 10:32 pm Hello,
I've been thinking for a while now about combining PEXT with KiSS. I have no idea if it is just stupid or not. On my computer pext commands are quite slow so i cannot test it very well.

For example the vertical index could be calculated like this:
Code: Select all
uint64_t vIndex = _pext_u64(occ, vMask[sq]);
In Mike's initialization of vSubset It would be necessary to change line: blockers |= (1ull << (((5 - i) << 3) + 8));
for blockers |= (256ull << (i << 3));
Code: Select all
for (index = 0; index < 64; index++) {
            vSubset[sq][index] = 0;
            uint64_t blockers = 0;
            for (int i = 0; i <= 5; i++) {
                if (index & (1ull << i)) {
                    blockers |= (256ull << (i << 3));
                }
            }
            if ((sq >> 3) != RANK8) {
                for (ts = sq + 8; ; ts += 8) {
                    vSubset[sq][index] |= (1ull << ts);
                    if (blockers & (1ull << (ts - (ts & 7)))) break;
                    if ((ts >> 3) == RANK8) break;
                }
            }
            if ((sq >> 3) != RANK1) {
                for (ts = sq - 8; ; ts -= 8) {
                    vSubset[sq][index] |= (1ull << ts);
                    if (blockers & (1ull << (ts - (ts & 7)))) break;
                    if ((ts >> 3) == RANK1) break;
                }
            }
        } 
I don't know how Thomas changed the code so I cannot say where it would change there.

hIndex would be even simpler just pext with hMask[sq].
It could be like pext but with smaller lookup table. But maybe it's just stupid idea.

Also in C++ i got quite good speed up when I used shift_horizontal_table[sq] instead of (square & 0b111000) | 1). Wouldn't it also make it little difference in c#?

Hi Martin, That sounds interesting. Maybe even amazing. My cpu is also not PEXT friendly. So we will have to hope Thomas thinks the same as us. Just to think after your last update I thought no more could be done.

And just to be clear, the multiplication and shift down by 58 can just be replaced with PEXT. Seems like it should be a win to me.

Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik