Devlog of Leorik

dangi12012 · Post by **dangi12012** » Tue Mar 28, 2023 6:02 pm

There are still some low hanging fruit in the code also:

Base test

Leorik Perft

1 OK! 1549ms, 76827K NPS
2 OK! 2400ms, 80691K NPS
3 OK! 2207ms, 80930K NPS
4 OK! 10342ms, 68268K NPS
5 OK! 0ms, 86311K NPS
6 OK! 2040ms, 80410K NPS

Total: 1361558651 Nodes, 18540ms, 73436K NPS

Optimize 1

Code: Select all

Leorik Perft

1 OK! 1129ms, 105456K NPS
2 OK! 2029ms, 95441K NPS
3 OK! 1920ms, 93009K NPS
4 OK! 7927ms, 89059K NPS
5 OK! 0ms, 74145K NPS
6 OK! 1647ms, 99614K NPS

Total: 1361558651 Nodes, 14654ms, 92909K NPS

Switch to dotnet 8 and some small touchups in code:

Code: Select all

Leorik Perft

1 OK! 1128ms, 105525K NPS
2 OK! 1731ms, 111853K NPS
3 OK! 1728ms, 103348K NPS
4 OK! 7039ms, 100295K NPS
5 OK! 0ms, 83791K NPS
6 OK! 1453ms, 112899K NPS

Total: 1361558651 Nodes, 13081ms, 104079K NPS

So in total 140% of the base speed.

Letting it run for 4 games:
Score of main vs perf: 0 - 2 - 2 [0.250]
... main playing White: 0 - 1 - 1 [0.250] 2
... main playing Black: 0 - 1 - 1 [0.250] 2
... White vs Black: 1 - 1 - 2 [0.500] 4

I dont have the time to really test it for longer so cant tell exact elo difference but after seeing it win as black I knew its a significant improvement - here is the binary: https://1drv.ms/u/s!AoDIyNlDG2QTg80f9EE ... A?e=C32ScR

lithander · Post by **lithander** » Tue Mar 28, 2023 6:48 pm

Thanks guys for the interest in my engine! That you care enough to modify the source and create your own builds makes me quite happy!

dangi12012 wrote: ↑Tue Mar 28, 2023 6:02 pm So in total 140% of the base speed.

If you did that by optimizing Leoriks movegen instead of replacing it with something else entirely like using PEXT, then consider me very impressed!

dangi12012 wrote: ↑Tue Mar 28, 2023 6:02 pm Letting it run for 4 games:
Score of main vs perf: 0 - 2 - 2 [0.250]
... main playing White: 0 - 1 - 1 [0.250] 2
... main playing Black: 0 - 1 - 1 [0.250] 2
... White vs Black: 1 - 1 - 2 [0.500] 4

I dont have the time to really test it for longer so cant tell exact elo difference but after seeing it win as black I knew its a significant improvement

4 games and a win for black? I never know whether you're trolling or not!

Mike Sherwin wrote: ↑Tue Mar 28, 2023 8:51 am Have you tried LMP yet?

I just removed Futility Pruning which was a little like what you're suggesting now: Based on the static eval of the position plus a considerable margin I was pruning nodes close to the leafs if they looked hopeless. But you suggest still playing the first couple of moves which probably makes it more reliable.

But the problem with my engine is that I don't sort *all* quiet moves based on history. I just sort the first few and play the remaining ones unsorted. So I fear that your suggested code prunes a move that could actually raise alpha is quite high - and higher in other engines with more sophisticated move-sorting. (Note that I also don't have counter-moves, or follow-up-moves or what else other engines might use.)

Could still be an overall strength gainer, though! I'll look into it

Mike Sherwin · Post by **Mike Sherwin** » Tue Mar 28, 2023 6:59 pm

Mike Sherwin wrote: ↑Tue Mar 28, 2023 4:58 pm I let it run for 80 more games. The final result was still good. There should be room for improvement.

Leorik.Engine - Leorik-2.4 : 53.5/100 25-18-57 (=01=1=====1==1=11===0010=1=1===01===01==0==110=01=11=0=1=1==000=1=1=1==1=0=0=0=======110=====1===0==) 54% +28

I did not change the LMP code. I did modify the Null Move code a bit.

Code: Select all

               if (EvaluateNext(ply, remaining - 3 - remaining / 4, beta-1, beta, moveGen) >= beta)
                   return beta;

Preview of the first 20 games with 80 more to follow.

Leorik.Engine - Leorik-2.4 : 13.5/20 8-1-11 (====11==1=0=111==11=) 68% +131

Mike Sherwin · Post by **Mike Sherwin** » Wed Mar 29, 2023 7:22 am

Mike Sherwin wrote: ↑Tue Mar 28, 2023 6:59 pm
Mike Sherwin wrote: ↑Tue Mar 28, 2023 4:58 pm I let it run for 80 more games. The final result was still good. There should be room for improvement.

Leorik.Engine - Leorik-2.4 : 53.5/100 25-18-57 (=01=1=====1==1=11===0010=1=1===01===01==0==110=01=11=0=1=1==000=1=1=1==1=0=0=0=======110=====1===0==) 54% +28
I did not change the LMP code. I did modify the Null Move code a bit.
Code: Select all
               if (EvaluateNext(ply, remaining - 3 - remaining / 4, beta-1, beta, moveGen) >= beta)
                   return beta;
           
Preview of the first 20 games with 80 more to follow.

Leorik.Engine - Leorik-2.4 : 13.5/20 8-1-11 (====11==1=0=111==11=) 68% +131

Something went wrong with the test and had to start over.

Leorik.Engine - Leorik-2.4 : 55.5/100 28-17-55 (=1==1======0111==11=101=11==11==========1==0001====01==1===10=0===111===0===10=0==001=101=1=001==01=) 56% +42

Mike Sherwin · Post by **Mike Sherwin** » Wed Mar 29, 2023 5:17 pm

lithander wrote: ↑Wed Mar 29, 2023 2:28 pm
Mike Sherwin wrote: ↑Wed Mar 29, 2023 1:15 am Despite all that I did manage to write a new move generator that is a little bit faster than Black Magic bitboards in Daniel's moveGen compare test. I combined ideas from Kindergarten and SISSY to make Kindergarten Super SISSY bitboards, lol. But there does not seem to be much interest.
I guess that for many of us move generation is just something we're happy to have solved in our own engines and we don't look back. Also, it's been hard to follow the progress on your new move generator because I feel the relevant information is scattered over many posts and in at least two threads. Not even the name is easy to diggest

But for what it's worth I think that if I wanted to improve the speed of my engine right now your algorithm would be my favorite. It's relatively cache-friendly and achieves great performance without relying on super advanced intrinsic that are only available on a small subset of the hardware of our target audience. (E.g. PEXT or the Galois Field transforms) Maybe I should actually try it in Leorik after having been teased about low hanging fruits...

It would mean a lot to me if you did give it a go. Even if you decided for some reason not to keep it. I'm interested in truth, not fame. And truth in this case would be how well it does in a real world engine. I will post the fastest running code here so you do not have to search too much.

Code: Select all

    static uint64_t hMask[64];
    static uint64_t dMask[64];
    static uint64_t aMask[64];
    static uint64_t vSubset[64][64];
    static uint64_t hSubset[64][64];
    static uint64_t dSubset[64][64];
    static uint64_t aSubset[64][64];
    static unsigned int shift_horizontal_table[64]; // new lookup table for shifts in calculation of hIndex

    static constexpr uint64_t Size = (64 * 64 * 4 + 64 * 4) * sizeof(uint64_t) + 64 * sizeof(unsigned int);

    enum { FILEa = 0, FILEh = 7};
    enum { RANK1 = 0, RANK8 = 7};

    static void InitSquare(int sq)
    {
        int ts, dx, dy;
        uint64_t occ, index;
        int x = sq % 8;
        int y = sq / 8;

        // Initialize Kindergarten Super SISSY Bitboards
        // diagonals
        for (ts = sq + 9, dx = x + 1, dy = y + 1; dx < FILEh && dy < RANK8; dMask[sq] |= 1ull << ts, ts += 9, dx++, dy++);
        for (ts = sq - 9, dx = x - 1, dy = y - 1; dx > FILEa && dy > RANK1; dMask[sq] |= 1ull << ts, ts -= 9, dx--, dy--);

        // anti diagonals
        for (ts = sq + 7, dx = x - 1, dy = y + 1; dx > FILEa && dy < RANK8; aMask[sq] |= 1ull << ts, ts += 7, dx--, dy++);
        for (ts = sq - 7, dx = x + 1, dy = y - 1; dx < FILEh && dy > RANK1; aMask[sq] |= 1ull << ts, ts -= 7, dx++, dy--);

        // diagonal indexes
        for (index = 0; index < 64; index++)
        {
            dSubset[sq][index] = 0;
            occ = index << 1;
            if ((sq & 7) != FILEh && (sq >> 3) != RANK8)
            {
                for (ts = sq + 9; ; ts += 9)
                {
                    dSubset[sq][index] |= (1ull << ts);
                    if (occ & (1ull << (ts & 7))) break;
                    if ((ts & 7) == FILEh || (ts >> 3) == RANK8) break;
                }
            }
            if ((sq & 7) != FILEa && (sq >> 3) != RANK1)
            {
                for (ts = sq - 9; ; ts -= 9)
                {
                    dSubset[sq][index] |= (1ull << ts);
                    if (occ & (1ull << (ts & 7))) break;
                    if ((ts & 7) == FILEa || (ts >> 3) == RANK1) break;
                }
            }
        }

        // anti diagonal indexes
        for (index = 0; index < 64; index++)
        {
            aSubset[sq][index] = 0;
            occ = index << 1;
            if ((sq & 7) != FILEa && (sq >> 3) != RANK8)
            {
                for (ts = sq + 7; ; ts += 7) {
                    aSubset[sq][index] |= (1ull << ts);
                    if (occ & (1ull << (ts & 7))) break;
                    if ((ts & 7) == FILEa || (ts >> 3) == RANK8) break;
                }
            }
            if ((sq & 7) != FILEh && (sq >> 3) != RANK1)
            {
                for (ts = sq - 7; ; ts -= 7)
                {
                    aSubset[sq][index] |= (1ull << ts);
                    if (occ & (1ull << (ts & 7))) break;
                    if ((ts & 7) == FILEh || (ts >> 3) == RANK1) break;
                }
            }
        }

        // horizontal lines
        for (ts = sq + 1, dx = x + 1; dx < FILEh; hMask[sq] |= 1ull << ts, ts += 1, dx++);
        for (ts = sq - 1, dx = x - 1; dx > FILEa; hMask[sq] |= 1ull << ts, ts -= 1, dx--);

        // vertical indexes
        for (index = 0; index < 64; index++) {
            vSubset[sq][index] = 0;
            uint64_t blockers = 0;
            for (int i = 0; i <= 5; i++) {
                if (index & (1ull << i)) {
                    blockers |= (1ull << (((5 - i) << 3) + 8));
                }
            }
            if ((sq >> 3) != RANK8) {
                for (ts = sq + 8; ; ts += 8) {
                    vSubset[sq][index] |= (1ull << ts);
                    if (blockers & (1ull << (ts - (ts & 7)))) break;
                    if ((ts >> 3) == RANK8) break;
                }
            }
            if ((sq >> 3) != RANK1) {
                for (ts = sq - 8; ; ts -= 8) {
                    vSubset[sq][index] |= (1ull << ts);
                    if (blockers & (1ull << (ts - (ts & 7)))) break;
                    if ((ts >> 3) == RANK1) break;
                }
            }
        }

        // horizontal indexes
        for (index = 0; index < 64; index++) {
            hSubset[sq][index] = 0;
            occ = index << 1;
            if ((sq & 7) != FILEh) {
                for (ts = sq + 1; ; ts += 1) {
                    hSubset[sq][index] |= (1ull << ts);
                    if (occ & (1ull << (ts & 7))) break;
                    if ((ts & 7) == FILEh) break;
                }
            }
            if ((sq & 7) != FILEa) {
                for (ts = sq - 1; ; ts -= 1) {
                    hSubset[sq][index] |= (1ull << ts);
                    if (occ & (1ull << (ts & 7))) break;
                    if ((ts & 7) == FILEa) break;
                }
            }
        }
    }

    static void Init()
    {
        for (int i = 0; i < 64; i++) {
            InitSquare(i);
        }
        
        for (int i = 0; i < 64; i++) {
            shift_horizontal_table[i] = (i & 56) + 1;
        }
    }

    static constexpr uint64_t file_b2_b7 = 0x0002020202020200ull;
    static constexpr uint64_t file_a2_a7 = 0x0001010101010100ull;
    static constexpr uint64_t diag_c2h7  = 0x0080402010080400ull;

    static uint64_t Bishop(int sq, uint64_t occ)
    {
        return  dSubset[sq][(((occ & dMask[sq]) * file_b2_b7) >> 58)] + // |
                aSubset[sq][(((occ & aMask[sq]) * file_b2_b7) >> 58)];
    }

    static uint64_t Rook(int sq, uint64_t occ)
    {
        return hSubset[sq][(occ >> shift_horizontal_table[sq]) & 63] + // |
               vSubset[sq][((((occ >> (sq & 7)) & file_a2_a7) * diag_c2h7) >> 58)];
    }

    static uint64_t Queen(int sq, uint64_t occ)
    {
        return Rook(sq, occ) | Bishop(sq, occ);
    }

Well you can shorten the name to KSS bitboards if you like. This is the code as modified by Martin Novak. It is complete including the initializations and data structures. The only thing to test is if using '|' instead of '+' results in the bishop and rook functions being a little faster.

algerbrex · Post by **algerbrex** » Wed Mar 29, 2023 6:32 pm

lithander wrote: ↑Sun Mar 26, 2023 12:44 pm
Modern Times wrote: ↑Sun Mar 26, 2023 9:01 am It is on the CCRL Blitz list already, just made the cut-off, however judging by the average opponent Elo it is much stronger than the tester anticipated! It will require more games against stronger opponents.
2950 Elo?! That's much much stronger than all my tests indicated. The opponents were around 2800 Elo which is what the results auf the gauntlet I posted with the announcement above indicated. Also in selfplay 2.4 was only 70 Elo stronger than 2.3... I have no explanation! Let's see were it settles in the end.

2950 would be very incredible

although I suspect it's closer to 2800, which is still an impressive threshold to cross!

Modern Times · Post by **Modern Times** » Wed Mar 29, 2023 6:53 pm

It is 2784 on CCRL 40/15. Don't know why it is higher on the blitz list.

algerbrex · Post by **algerbrex** » Wed Mar 29, 2023 8:25 pm

Modern Times wrote: ↑Wed Mar 29, 2023 6:53 pm It is 2784 on CCRL 40/15. Don't know why it is higher on the blitz list.

Ah, yea, I hadn't checked in a while. No clue why the huge gap. Normally I'd say it's because Leorik might be stronger at faster time controls. But I don't think that suffices to explain that big of a gap.

lithander · Post by **lithander** » Thu Mar 30, 2023 1:50 pm

Mike Sherwin wrote: ↑Wed Mar 29, 2023 5:17 pm It would mean a lot to me if you did give it a go.

Okay, didn't know I would make someone happy if I did that. If I get Elo and Karma at the same I can't say no to that!

Mike Sherwin wrote: ↑Wed Mar 29, 2023 5:17 pm Even if you decided for some reason not to keep it. I'm interested in truth, not fame. And truth in this case would be how well it does in a real world engine. I will post the fastest running code here so you do not have to search too much.

Leorik probably gets more Elo out of improved slider movegen performance than most engines because I explore quite a lot of nodes (high nps, especially for C#) without spending much effort on other costly things like move-sorting or evaluating.

My prediction is that at best it would be on par with Dangi's binary. Dangi probably implemented PEXT (which is supported on my hardware) but a big chunk of his speed-up is also based on switching to .Net framework. Dangi's build performs about 30 Elo better, than the build you find on Github. But I don't know how much is due to him migrating to .Net 8 and how much due to PEXT. I'd guess 15 Elo from each.

I have implemented PEXT yesterday evening in a version that computes the lookup tables at startup just like your's. I have to say I like the algorithm a lot more now that I understand it, a pitty it's not well supported in the wild. Also I'm going to download a new Visual Studio version and install the .Net 8 preview. So in the end we'll be able to compare your algorithm with Leorik's baseline and also PEXT Bitboards. But it'll take some time mostly because to make any credible statements in the <15 Elo range will take thousands of test games.

Mike Sherwin wrote: ↑Wed Mar 29, 2023 5:17 pm Well you can shorten the name to KSS bitboards if you like.

How about KiSS? I'm a strong proponent of the KISS principle in software development and also large parts of my teenager brain is still operational and giggles at the thought of Sissy kissing in Kindergarten!

Mike Sherwin · Post by **Mike Sherwin** » Thu Mar 30, 2023 3:52 pm

lithander wrote: ↑Thu Mar 30, 2023 1:50 pm
Mike Sherwin wrote: ↑Wed Mar 29, 2023 5:17 pm It would mean a lot to me if you did give it a go.
I have implemented PEXT yesterday evening in a version that computes the lookup tables at startup just like your's. I have to say I like the algorithm a lot more now that I understand it, a pitty it's not well supported in the wild. Also I'm going to download a new Visual Studio version and install the .Net 8 preview. So in the end we'll be able to compare your algorithm with Leorik's baseline and also PEXT Bitboards. But it'll take some time mostly because to make any credible statements in the <15 Elo range will take thousands of test games.

Mike Sherwin wrote: ↑Wed Mar 29, 2023 5:17 pm Well you can shorten the name to KSS bitboards if you like.
How about KiSS? I'm a strong proponent of the KISS principle in software development and also large parts of my teenager brain is still operational and giggles at the thought of Sissy kissing in Kindergarten!

I can make KISS work. Kindergarten Indexing Sub Set bitboards. Though Kindergarten Super Split Index Sub Set Yielding bitboards is more descriptive and does cause giggles.

Sounds good, thanks, and I can't wait to get some comparative results. But if PEXT smashes KISS I won't be too upset.
Afterall I did invent PEXT for chess in 2006.

Early PEXT/PDEP Proposal
In late 2006, Michael Sherwin already proposed a PEXTPDEP instruction, Parallel Bits Extract controlled by a source mask followed by Parallel Bits Deposit controlled by a destination mask [20] [21]. However, it is not known whether his proposal was recognized by Intel engineers and had any influence on the design of the BMI2 PEXT and PDEP instructions.

Sorry couldn't help myself.

Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik