Devlog of Leorik

Discussion of chess software programming and technical issues.

Moderator: Ras

User avatar
algerbrex
Posts: 608
Joined: Sun May 30, 2021 5:03 am
Location: United States
Full name: Christian Dean

Re: *New* Version 2.2

Post by algerbrex »

lithander wrote: Sun Jul 17, 2022 11:32 pm I think that even without an explicit king safety evaluation the engine can keep the king reasonably safe through other means. E.g. the PSQTs reward castling and an intact pawn shield in the midgame. Or the mobility eval deducts a few CP for a too mobile king. But most importantly all short-term king-safety issues should be uncovered by the search. Or at leasts that's what I'm telling myself because I'm pretty sure I'm not going to touch that cursed topic again anytime soon! ;)
True, if Leorik has some of those features I can see it doing fine. And Blunder’s king safety isn’t great either. But for stronger engines I think it’s quite clear to tell the difference between little to know king safety, and good king safety, and that good king safety I think does start to make a big difference. But maybe at our level not so much.
Mike Sherwin
Posts: 965
Joined: Fri Aug 21, 2020 1:25 am
Location: Planet Earth, Sol system
Full name: Michael J Sherwin

Re: *New* Version 2.2

Post by Mike Sherwin »

lithander wrote: Sat Jul 16, 2022 12:37 pm I've just released a "minor" new version that adds a mobility term to the evaluation along with improved time-control logic and a new TT replacement scheme better suitable for long matches.
https://github.com/lithander/Leorik/releases/tag/2.2

A gauntlet I played on 40/20 timecontrols (~500ms per move, so quite fast) looks promising:

Code: Select all

   # PLAYER           :  RATING  POINTS  PLAYED   (%)
   1 odonata-0.6.2    :  2737.0   184.5     360    51
   2 Leorik 2.0       :  2711.2  1121.5    2148    52
   3 Ceibo_0.8        :  2702.0   176.5     358    49
   4 dumb-1.9         :  2698.0   177.5     361    49
   5 Inanis-1.0.1     :  2690.0   174.0     360    48
   6 blunder-8.0.0    :  2690.0   161.0     361    45
   7 zevra-2.5        :  2655.0   153.0     348    44
2700 Elo in the CCRL lists would be a great result considering the set of changes I made. (2.1 is listed at 2583 and 2602) But I'm a bit disappointed about all the things I worked on that *didn't* make it into this release. I have spent a lot of time on trying to get King Safety to work. And also had some Bishop-specific evaluation implemented that was giving a bonus for having a bishop-pair and otherwise evaluated the placement of pieces in regards to the color of the remaining single bishop. It looked quite promising but after thorough testing I felt like it was not improving playing strength as much it should based on the reduction of MSE I saw while tuning. So I feared it was causing problems in some situations while improving the eval on average only... this is a very dangerous thing to do long term and so I decided to release a simple but solid version 2.2 and shelve the rest. :cry: Well.. wouldn't be fun if things were always easy! :)
Mike Sherwin wrote: Wed Jun 01, 2022 10:27 pm Big improvement in playing style! Much more human like. Needs pawn storm code. This was a very interesting game!! :)
I'd be very happy if you'd try the new version. I'm very curious what your verdict will be and if you can still beat it! :)
I will tomorrow if my headache is better. I joined LiChess today and I played against SF levels 1 thru 5 at 5'+5". I won all the games but blundered an exchange in game 5 before managing to win. 5'+5" is a bit too fast for me because I have not practiced speed chess for 49 years. So against level 6 SF I wanted to play 15'+10" but after setting that time limit I found myself in a game against a human player. Oh no, I didn't want to play a human yet! I get nervous when I play humans and my level of play drops precipitously. I made 8 inaccuracies and two blunders. :( But I did not make any mistakes. :D This is my first rated game in ~40 years.
Modern Times
Posts: 3715
Joined: Thu Jun 07, 2012 11:02 pm

Re: Devlog of Leorik

Post by Modern Times »

It slipped a little bit, but this is still an impressive gain !! Maybe the blitz testers will get the few extra Elo.

Code: Select all

CCRL 40/15 main list:
Leorik 2.2 64-bit is #155 with rating of 2695 Elo points (+26 -26),
based on 468 games: 155 wins, 149 losses and 164 draws
Score: 50.6%, Average opponent: −6.6, Draws: 35.0%

Pairwise results:
     Opponent                              Elo     Score                  LOS   Perf
 - Arminius 2018-12-23 64-bit              2735  22.5-29.5  (+15-22=15)    0.4    -4
 - Dumb 1.9 64-bit                         2724  23.0-29.0  (+12-18=22)    5.2    -6
 - Shield 2.1 64-bit                       2717  24.5-27.5  (+13-16=23)    7.3    +6
 - Blunder 8.0.0 64-bit                    2715  24.0-28.0  (+13-17=22)   24.5    -2
 - KnightX 3.2 64-bit                      2698  24.0-28.0  (+13-17=22)   43.4   -23
 - Daydreamer 1.75 64-bit                  2681  24.5-27.5  (+14-17=21)   82.1   -30
 - Supernova 2.4 64-bit                    2667  30.5-21.5  (+23-14=15)   95.7   +32
 - Movei 00.8.438                          2646  29.0-23.0  (+22-16=14)   99.9    -8
 - Pharaon 3.5.1                           2613  35.0-17.0  (+30-12=10)  100.0   +47
User avatar
algerbrex
Posts: 608
Joined: Sun May 30, 2021 5:03 am
Location: United States
Full name: Christian Dean

Re: *New* Version 2.2

Post by algerbrex »

algerbrex wrote: Mon Jul 18, 2022 12:04 am ...
First gauntlet I ran overnight between Leorik 2.2, and Blunder 7.6.0, 8.0.0, and 8.4.5, the latest dev version, with a time control of 5+0.5s:

Code: Select all

Rank Name                          Elo     +/-   Games   Score    Draw
   0 Leorik 2.2                    110      12    2400   65.3%   27.8%
   1 Blunder 8.4.5                 -76      20     800   39.2%   29.9%
   2 Blunder 8.0.0                -115      21     800   34.0%   28.0%
   3 Blunder 7.6.0                -139      22     800   31.0%   25.5%

Finished match
This seems to indicate the issue is likely the short time control and not a regression, since though all three faired quite badly against Leorik, 8.4.5 still performed the best.

I'm going to construct another gauntlet of engines, 4 engines that I think are weaker and 4 stronger than Blunder 8.0.0, and then re-run the same gauntlet against 8.4.5. This should more definitively answer the question.

From my preliminary testing, I think Blunder appears in general quite weaker than other engines rated around 2700 on the CCRL, probably due to just not being able to compete in speed at certain time controls.
User avatar
lithander
Posts: 915
Joined: Sun Dec 27, 2020 2:40 am
Location: Bremen, Germany
Full name: Thomas Jahn

Re: *New* Version 2.2

Post by lithander »

algerbrex wrote: Mon Jul 18, 2022 12:33 pm This seems to indicate the issue is likely the short time control and not a regression, since though all three faired quite badly against Leorik, 8.4.5 still performed the best.
Good, that there's no regression! That's a relief I'm sure. But I think there's still something strange going on... I mean 5s + 0.5s increment is not *that* fast on a modern computer. In my own tests at comparable time controls Blunder did better. 360 games at 40/20 time control ended with a score of 45% and in 224 games at a 40/60 time control Blunder had a score of 47%.

Taking the CCRL rating list into account and assuming a ~2700 rating for Leorik my results show the expected difference between our engines despite a rather low number of matches played. What opening book are you using?

Code: Select all

[tc=40/60 Hash=32]
   # PLAYER           :  RATING  POINTS  PLAYED   (%)
   1 odonata-0.6.2    :  2737.0   127.0     223    57
   2 Ceibo_0.8        :  2702.0   113.5     223    51
   3 dumb-1.9         :  2698.0   109.0     224    49
   4 Leorik 2.2        :  2695.6   671.0    1341    50
   5 Inanis-1.0.1     :  2690.0   114.0     224    51
   6 blunder-8.0.0    :  2690.0   105.5     224    47
   7 zevra-2.5        :  2655.0   101.0     223    45

[tc=40/20 Hash=32]
   # PLAYER           :  RATING  POINTS  PLAYED   (%)
   1 odonata-0.6.2    :  2737.0   184.5     360    51
   2 Leorik 2.2       :  2711.2  1121.5    2148    52
   3 Ceibo_0.8        :  2702.0   176.5     358    49
   4 dumb-1.9         :  2698.0   177.5     361    49
   5 Inanis-1.0.1     :  2690.0   174.0     360    48
   6 blunder-8.0.0    :  2690.0   161.0     361    45
   7 zevra-2.5        :  2655.0   153.0     348    44
Minimal Chess (simple, open source, C#) - Youtube & Github
Leorik (competitive, in active development, C#) - Github & Lichess
User avatar
algerbrex
Posts: 608
Joined: Sun May 30, 2021 5:03 am
Location: United States
Full name: Christian Dean

Re: *New* Version 2.2

Post by algerbrex »

lithander wrote: Mon Jul 18, 2022 6:12 pm Good, that there's no regression! That's a relief I'm sure. But I think there's still something strange going on... I mean 5s + 0.5s increment is not *that* fast on a modern computer. In my own tests at comparable time controls Blunder did better. 360 games at 40/20 time control ended with a score of 45% and in 224 games at a 40/60 time control Blunder had a score of 47%.
Right, I'm happy my work hasn't regressed Blunder's strength, but as you said, I still find it incredibly odd how much weaker Blunder is at 5s+0.5s than Leorik, even the dev versions. I suspected Blunder was weaker at shorter time controls, so I expected some gap between it and Leorik 2.2. But to be that much weaker at 5+0.5s is still very strange to me.

Even running it against other engines which should be around Blunder's rating shows Blunder 8.0.0 getting crushed:

Code: Select all

Rank Name                          Elo     +/-   Games   Score    Draw
   0 Blunder 8.0.0                 -46      11    3200   43.4%   21.8%
   1 Leorik 2.2                    131      31     400   68.0%   23.0%
   2 Nebula 2.0                    103      32     400   64.4%   16.3%
   3 Cheese 1.8 64 bits            101      31     400   64.1%   23.3%
   4 dumb 1.9                       99      31     400   63.9%   19.8%
   5 Zahak 5.0                      69      28     400   59.8%   31.5%
   6 Velvet v1.1.0                  59      31     400   58.4%   19.3%
   7 Rodin v8.00                   -54      31     400   42.3%   19.5%
   8 admete 1.5.0                 -132      32     400   31.9%   21.8%

Finished match

   # PLAYER                : RATING    POINTS  PLAYED    (%)
   1 Leorik 2.2            : 2790.1     272.0     400   68.0%
   2 Nebula 2.0            : 2761.7     257.5     400   64.4%
   3 Cheese 1.8 64 bits    : 2759.8     256.5     400   64.1%
   4 dumb 1.9              : 2757.9     255.5     400   63.9%
   5 Zahak 5.0             : 2727.2     239.0     400   59.8%
   6 Velvet v1.1.0         : 2717.3     233.5     400   58.4%
   7 Blunder 8.0.0         : 2658.0    1389.5    3200   43.4%
   8 Rodin v8.00           : 2603.2     169.0     400   42.3%
   9 admete 1.5.0          : 2524.9     127.5     400   31.9%
Admittedly, this was run at 10s+0.1s, but I've never seen Blunder perform so badly at this time control against similarly rated opponents. Nebula 2.0, in particular on the CCRL Blitz list, is rated at 2654 and Blunder 2674, but in the above test, Nebula is crushing Blunder.

For reference, I run all my testing on my laptop, which has a Ryzen 7 4700U, 8 cores, 8 GB of ram.
lithander wrote: Mon Jul 18, 2022 6:12 pm Taking the CCRL rating list into account and assuming a ~2700 rating for Leorik my results show the expected difference between our engines despite a rather low number of matches played. What opening book are you using?

Code: Select all

[tc=40/60 Hash=32]
   # PLAYER           :  RATING  POINTS  PLAYED   (%)
   1 odonata-0.6.2    :  2737.0   127.0     223    57
   2 Ceibo_0.8        :  2702.0   113.5     223    51
   3 dumb-1.9         :  2698.0   109.0     224    49
   4 Leorik 2.2        :  2695.6   671.0    1341    50
   5 Inanis-1.0.1     :  2690.0   114.0     224    51
   6 blunder-8.0.0    :  2690.0   105.5     224    47
   7 zevra-2.5        :  2655.0   101.0     223    45

[tc=40/20 Hash=32]
   # PLAYER           :  RATING  POINTS  PLAYED   (%)
   1 odonata-0.6.2    :  2737.0   184.5     360    51
   2 Leorik 2.2       :  2711.2  1121.5    2148    52
   3 Ceibo_0.8        :  2702.0   176.5     358    49
   4 dumb-1.9         :  2698.0   177.5     361    49
   5 Inanis-1.0.1     :  2690.0   174.0     360    48
   6 blunder-8.0.0    :  2690.0   161.0     361    45
   7 zevra-2.5        :  2655.0   153.0     348    44
I actually haven't even thought to check the opening book. I've been using the 2moves_v2 opening book for months now: I'm embarrassed to admit it, but I forgot why I even started using this opening book and have never thought to change it. Which one are you using?
User avatar
lithander
Posts: 915
Joined: Sun Dec 27, 2020 2:40 am
Location: Bremen, Germany
Full name: Thomas Jahn

Re: *New* Version 2.2

Post by lithander »

algerbrex wrote: Mon Jul 18, 2022 7:04 pm I actually haven't even thought to check the opening book. I've been using the 2moves_v2 opening book for months now: I'm embarrassed to admit it, but I forgot why I even started using this opening book and have never thought to change it. Which one are you using?
I use the varied.bin that came with SCID and you can find it here: https://sourceforge.net/p/scid/code/ci/ ... ree/books/

I don't know much about opening books but I was wondering if your book was maybe too small or otherwise not a good at representing what the engines would face under tournament/testing conditions. In the extreme case no book is used at all and two equal strength engines basically replay the same match over and over. If one engine happens to win this game from the starting position than it would win all the time and it would look like this engine is much stronger. So I was just thinking maybe it's something to do with the openings...

I just ran a match at the same time-controls you used: 5s + 500ms increment and Hash set to 50MB on an i7-9700K with 7 games in parallel and got this result which seems again more inline with expectations than your gauntlet results.

Code: Select all

Score of Leorik-2.2 vs blunder-8.0.0: 571 - 347 - 502  [0.579] 1420
...      Leorik-2.2 playing White: 313 - 153 - 244  [0.613] 710
...      Leorik-2.2 playing Black: 258 - 194 - 258  [0.545] 710
...      White vs Black: 507 - 411 - 502  [0.534] 1420
Elo difference: 55.3 +/- 14.6, LOS: 100.0 %, DrawRatio: 35.4 %
Last edited by lithander on Mon Jul 18, 2022 11:34 pm, edited 1 time in total.
Minimal Chess (simple, open source, C#) - Youtube & Github
Leorik (competitive, in active development, C#) - Github & Lichess
Mike Sherwin
Posts: 965
Joined: Fri Aug 21, 2020 1:25 am
Location: Planet Earth, Sol system
Full name: Michael J Sherwin

Re: Devlog of Leorik

Post by Mike Sherwin »

I posted the game already. I know that I did. But it is not showing so here it is again. But without comment because I lack the energy and will to do it all again.
User avatar
lithander
Posts: 915
Joined: Sun Dec 27, 2020 2:40 am
Location: Bremen, Germany
Full name: Thomas Jahn

Re: Devlog of Leorik

Post by lithander »

Mike Sherwin wrote: Mon Jul 18, 2022 11:34 pm I posted the game already. I know that I did. But it is not showing so here it is again. But without comment because I lack the energy and will to do it all again.
Wow, you did it again! :shock: That's amazing. I have no idea how you know when to trade a rook for a bishop but apparently it was exactly what was needed in that situation. Pitty you lost the extensive commentary... I would have loved to read that. Did you win first try?
Minimal Chess (simple, open source, C#) - Youtube & Github
Leorik (competitive, in active development, C#) - Github & Lichess
Mike Sherwin
Posts: 965
Joined: Fri Aug 21, 2020 1:25 am
Location: Planet Earth, Sol system
Full name: Michael J Sherwin

Re: Devlog of Leorik

Post by Mike Sherwin »

lithander wrote: Mon Jul 18, 2022 11:44 pm
Mike Sherwin wrote: Mon Jul 18, 2022 11:34 pm I posted the game already. I know that I did. But it is not showing so here it is again. But without comment because I lack the energy and will to do it all again.
Wow, you did it again! :shock: That's amazing. I have no idea how you know when to trade a rook for a bishop but apparently it was exactly what was needed in that situation. Pitty you lost the extensive commentary... I would have loved to read that. Did you win first try?
I did not win first try this time because I was just not warmed up and able to focus at first. It was the second try. I sacked the pawn and the exchange because I saw I could make the rest of Leorik's pieces quite useless.