True, if Leorik has some of those features I can see it doing fine. And Blunder’s king safety isn’t great either. But for stronger engines I think it’s quite clear to tell the difference between little to know king safety, and good king safety, and that good king safety I think does start to make a big difference. But maybe at our level not so much.lithander wrote: ↑Sun Jul 17, 2022 11:32 pm I think that even without an explicit king safety evaluation the engine can keep the king reasonably safe through other means. E.g. the PSQTs reward castling and an intact pawn shield in the midgame. Or the mobility eval deducts a few CP for a too mobile king. But most importantly all short-term king-safety issues should be uncovered by the search. Or at leasts that's what I'm telling myself because I'm pretty sure I'm not going to touch that cursed topic again anytime soon!![]()
Devlog of Leorik
Moderator: Ras
-
- Posts: 608
- Joined: Sun May 30, 2021 5:03 am
- Location: United States
- Full name: Christian Dean
Re: *New* Version 2.2
-
- Posts: 965
- Joined: Fri Aug 21, 2020 1:25 am
- Location: Planet Earth, Sol system
- Full name: Michael J Sherwin
Re: *New* Version 2.2
I will tomorrow if my headache is better. I joined LiChess today and I played against SF levels 1 thru 5 at 5'+5". I won all the games but blundered an exchange in game 5 before managing to win. 5'+5" is a bit too fast for me because I have not practiced speed chess for 49 years. So against level 6 SF I wanted to play 15'+10" but after setting that time limit I found myself in a game against a human player. Oh no, I didn't want to play a human yet! I get nervous when I play humans and my level of play drops precipitously. I made 8 inaccuracies and two blunders.lithander wrote: ↑Sat Jul 16, 2022 12:37 pm I've just released a "minor" new version that adds a mobility term to the evaluation along with improved time-control logic and a new TT replacement scheme better suitable for long matches.
https://github.com/lithander/Leorik/releases/tag/2.2
A gauntlet I played on 40/20 timecontrols (~500ms per move, so quite fast) looks promising:
2700 Elo in the CCRL lists would be a great result considering the set of changes I made. (2.1 is listed at 2583 and 2602) But I'm a bit disappointed about all the things I worked on that *didn't* make it into this release. I have spent a lot of time on trying to get King Safety to work. And also had some Bishop-specific evaluation implemented that was giving a bonus for having a bishop-pair and otherwise evaluated the placement of pieces in regards to the color of the remaining single bishop. It looked quite promising but after thorough testing I felt like it was not improving playing strength as much it should based on the reduction of MSE I saw while tuning. So I feared it was causing problems in some situations while improving the eval on average only... this is a very dangerous thing to do long term and so I decided to release a simple but solid version 2.2 and shelve the rest.Code: Select all
# PLAYER : RATING POINTS PLAYED (%) 1 odonata-0.6.2 : 2737.0 184.5 360 51 2 Leorik 2.0 : 2711.2 1121.5 2148 52 3 Ceibo_0.8 : 2702.0 176.5 358 49 4 dumb-1.9 : 2698.0 177.5 361 49 5 Inanis-1.0.1 : 2690.0 174.0 360 48 6 blunder-8.0.0 : 2690.0 161.0 361 45 7 zevra-2.5 : 2655.0 153.0 348 44
Well.. wouldn't be fun if things were always easy!
I'd be very happy if you'd try the new version. I'm very curious what your verdict will be and if you can still beat it!Mike Sherwin wrote: ↑Wed Jun 01, 2022 10:27 pm Big improvement in playing style! Much more human like. Needs pawn storm code. This was a very interesting game!!![]()
![]()


-
- Posts: 3715
- Joined: Thu Jun 07, 2012 11:02 pm
Re: Devlog of Leorik
It slipped a little bit, but this is still an impressive gain !! Maybe the blitz testers will get the few extra Elo.
Code: Select all
CCRL 40/15 main list:
Leorik 2.2 64-bit is #155 with rating of 2695 Elo points (+26 -26),
based on 468 games: 155 wins, 149 losses and 164 draws
Score: 50.6%, Average opponent: −6.6, Draws: 35.0%
Pairwise results:
Opponent Elo Score LOS Perf
- Arminius 2018-12-23 64-bit 2735 22.5-29.5 (+15-22=15) 0.4 -4
- Dumb 1.9 64-bit 2724 23.0-29.0 (+12-18=22) 5.2 -6
- Shield 2.1 64-bit 2717 24.5-27.5 (+13-16=23) 7.3 +6
- Blunder 8.0.0 64-bit 2715 24.0-28.0 (+13-17=22) 24.5 -2
- KnightX 3.2 64-bit 2698 24.0-28.0 (+13-17=22) 43.4 -23
- Daydreamer 1.75 64-bit 2681 24.5-27.5 (+14-17=21) 82.1 -30
- Supernova 2.4 64-bit 2667 30.5-21.5 (+23-14=15) 95.7 +32
- Movei 00.8.438 2646 29.0-23.0 (+22-16=14) 99.9 -8
- Pharaon 3.5.1 2613 35.0-17.0 (+30-12=10) 100.0 +47
-
- Posts: 608
- Joined: Sun May 30, 2021 5:03 am
- Location: United States
- Full name: Christian Dean
Re: *New* Version 2.2
First gauntlet I ran overnight between Leorik 2.2, and Blunder 7.6.0, 8.0.0, and 8.4.5, the latest dev version, with a time control of 5+0.5s:
Code: Select all
Rank Name Elo +/- Games Score Draw
0 Leorik 2.2 110 12 2400 65.3% 27.8%
1 Blunder 8.4.5 -76 20 800 39.2% 29.9%
2 Blunder 8.0.0 -115 21 800 34.0% 28.0%
3 Blunder 7.6.0 -139 22 800 31.0% 25.5%
Finished match
I'm going to construct another gauntlet of engines, 4 engines that I think are weaker and 4 stronger than Blunder 8.0.0, and then re-run the same gauntlet against 8.4.5. This should more definitively answer the question.
From my preliminary testing, I think Blunder appears in general quite weaker than other engines rated around 2700 on the CCRL, probably due to just not being able to compete in speed at certain time controls.
-
- Posts: 915
- Joined: Sun Dec 27, 2020 2:40 am
- Location: Bremen, Germany
- Full name: Thomas Jahn
Re: *New* Version 2.2
Good, that there's no regression! That's a relief I'm sure. But I think there's still something strange going on... I mean 5s + 0.5s increment is not *that* fast on a modern computer. In my own tests at comparable time controls Blunder did better. 360 games at 40/20 time control ended with a score of 45% and in 224 games at a 40/60 time control Blunder had a score of 47%.
Taking the CCRL rating list into account and assuming a ~2700 rating for Leorik my results show the expected difference between our engines despite a rather low number of matches played. What opening book are you using?
Code: Select all
[tc=40/60 Hash=32]
# PLAYER : RATING POINTS PLAYED (%)
1 odonata-0.6.2 : 2737.0 127.0 223 57
2 Ceibo_0.8 : 2702.0 113.5 223 51
3 dumb-1.9 : 2698.0 109.0 224 49
4 Leorik 2.2 : 2695.6 671.0 1341 50
5 Inanis-1.0.1 : 2690.0 114.0 224 51
6 blunder-8.0.0 : 2690.0 105.5 224 47
7 zevra-2.5 : 2655.0 101.0 223 45
[tc=40/20 Hash=32]
# PLAYER : RATING POINTS PLAYED (%)
1 odonata-0.6.2 : 2737.0 184.5 360 51
2 Leorik 2.2 : 2711.2 1121.5 2148 52
3 Ceibo_0.8 : 2702.0 176.5 358 49
4 dumb-1.9 : 2698.0 177.5 361 49
5 Inanis-1.0.1 : 2690.0 174.0 360 48
6 blunder-8.0.0 : 2690.0 161.0 361 45
7 zevra-2.5 : 2655.0 153.0 348 44
-
- Posts: 608
- Joined: Sun May 30, 2021 5:03 am
- Location: United States
- Full name: Christian Dean
Re: *New* Version 2.2
Right, I'm happy my work hasn't regressed Blunder's strength, but as you said, I still find it incredibly odd how much weaker Blunder is at 5s+0.5s than Leorik, even the dev versions. I suspected Blunder was weaker at shorter time controls, so I expected some gap between it and Leorik 2.2. But to be that much weaker at 5+0.5s is still very strange to me.lithander wrote: ↑Mon Jul 18, 2022 6:12 pm Good, that there's no regression! That's a relief I'm sure. But I think there's still something strange going on... I mean 5s + 0.5s increment is not *that* fast on a modern computer. In my own tests at comparable time controls Blunder did better. 360 games at 40/20 time control ended with a score of 45% and in 224 games at a 40/60 time control Blunder had a score of 47%.
Even running it against other engines which should be around Blunder's rating shows Blunder 8.0.0 getting crushed:
Code: Select all
Rank Name Elo +/- Games Score Draw
0 Blunder 8.0.0 -46 11 3200 43.4% 21.8%
1 Leorik 2.2 131 31 400 68.0% 23.0%
2 Nebula 2.0 103 32 400 64.4% 16.3%
3 Cheese 1.8 64 bits 101 31 400 64.1% 23.3%
4 dumb 1.9 99 31 400 63.9% 19.8%
5 Zahak 5.0 69 28 400 59.8% 31.5%
6 Velvet v1.1.0 59 31 400 58.4% 19.3%
7 Rodin v8.00 -54 31 400 42.3% 19.5%
8 admete 1.5.0 -132 32 400 31.9% 21.8%
Finished match
# PLAYER : RATING POINTS PLAYED (%)
1 Leorik 2.2 : 2790.1 272.0 400 68.0%
2 Nebula 2.0 : 2761.7 257.5 400 64.4%
3 Cheese 1.8 64 bits : 2759.8 256.5 400 64.1%
4 dumb 1.9 : 2757.9 255.5 400 63.9%
5 Zahak 5.0 : 2727.2 239.0 400 59.8%
6 Velvet v1.1.0 : 2717.3 233.5 400 58.4%
7 Blunder 8.0.0 : 2658.0 1389.5 3200 43.4%
8 Rodin v8.00 : 2603.2 169.0 400 42.3%
9 admete 1.5.0 : 2524.9 127.5 400 31.9%
For reference, I run all my testing on my laptop, which has a Ryzen 7 4700U, 8 cores, 8 GB of ram.
I actually haven't even thought to check the opening book. I've been using the 2moves_v2 opening book for months now: I'm embarrassed to admit it, but I forgot why I even started using this opening book and have never thought to change it. Which one are you using?lithander wrote: ↑Mon Jul 18, 2022 6:12 pm Taking the CCRL rating list into account and assuming a ~2700 rating for Leorik my results show the expected difference between our engines despite a rather low number of matches played. What opening book are you using?
Code: Select all
[tc=40/60 Hash=32] # PLAYER : RATING POINTS PLAYED (%) 1 odonata-0.6.2 : 2737.0 127.0 223 57 2 Ceibo_0.8 : 2702.0 113.5 223 51 3 dumb-1.9 : 2698.0 109.0 224 49 4 Leorik 2.2 : 2695.6 671.0 1341 50 5 Inanis-1.0.1 : 2690.0 114.0 224 51 6 blunder-8.0.0 : 2690.0 105.5 224 47 7 zevra-2.5 : 2655.0 101.0 223 45 [tc=40/20 Hash=32] # PLAYER : RATING POINTS PLAYED (%) 1 odonata-0.6.2 : 2737.0 184.5 360 51 2 Leorik 2.2 : 2711.2 1121.5 2148 52 3 Ceibo_0.8 : 2702.0 176.5 358 49 4 dumb-1.9 : 2698.0 177.5 361 49 5 Inanis-1.0.1 : 2690.0 174.0 360 48 6 blunder-8.0.0 : 2690.0 161.0 361 45 7 zevra-2.5 : 2655.0 153.0 348 44
-
- Posts: 915
- Joined: Sun Dec 27, 2020 2:40 am
- Location: Bremen, Germany
- Full name: Thomas Jahn
Re: *New* Version 2.2
I use the varied.bin that came with SCID and you can find it here: https://sourceforge.net/p/scid/code/ci/ ... ree/books/algerbrex wrote: ↑Mon Jul 18, 2022 7:04 pm I actually haven't even thought to check the opening book. I've been using the 2moves_v2 opening book for months now: I'm embarrassed to admit it, but I forgot why I even started using this opening book and have never thought to change it. Which one are you using?
I don't know much about opening books but I was wondering if your book was maybe too small or otherwise not a good at representing what the engines would face under tournament/testing conditions. In the extreme case no book is used at all and two equal strength engines basically replay the same match over and over. If one engine happens to win this game from the starting position than it would win all the time and it would look like this engine is much stronger. So I was just thinking maybe it's something to do with the openings...
I just ran a match at the same time-controls you used: 5s + 500ms increment and Hash set to 50MB on an i7-9700K with 7 games in parallel and got this result which seems again more inline with expectations than your gauntlet results.
Code: Select all
Score of Leorik-2.2 vs blunder-8.0.0: 571 - 347 - 502 [0.579] 1420
... Leorik-2.2 playing White: 313 - 153 - 244 [0.613] 710
... Leorik-2.2 playing Black: 258 - 194 - 258 [0.545] 710
... White vs Black: 507 - 411 - 502 [0.534] 1420
Elo difference: 55.3 +/- 14.6, LOS: 100.0 %, DrawRatio: 35.4 %
Last edited by lithander on Mon Jul 18, 2022 11:34 pm, edited 1 time in total.
-
- Posts: 965
- Joined: Fri Aug 21, 2020 1:25 am
- Location: Planet Earth, Sol system
- Full name: Michael J Sherwin
Re: Devlog of Leorik
I posted the game already. I know that I did. But it is not showing so here it is again. But without comment because I lack the energy and will to do it all again.
-
- Posts: 915
- Joined: Sun Dec 27, 2020 2:40 am
- Location: Bremen, Germany
- Full name: Thomas Jahn
Re: Devlog of Leorik
Wow, you did it again!Mike Sherwin wrote: ↑Mon Jul 18, 2022 11:34 pm I posted the game already. I know that I did. But it is not showing so here it is again. But without comment because I lack the energy and will to do it all again.

-
- Posts: 965
- Joined: Fri Aug 21, 2020 1:25 am
- Location: Planet Earth, Sol system
- Full name: Michael J Sherwin
Re: Devlog of Leorik
I did not win first try this time because I was just not warmed up and able to focus at first. It was the second try. I sacked the pawn and the exchange because I saw I could make the rest of Leorik's pieces quite useless.lithander wrote: ↑Mon Jul 18, 2022 11:44 pmWow, you did it again!Mike Sherwin wrote: ↑Mon Jul 18, 2022 11:34 pm I posted the game already. I know that I did. But it is not showing so here it is again. But without comment because I lack the energy and will to do it all again.That's amazing. I have no idea how you know when to trade a rook for a bishop but apparently it was exactly what was needed in that situation. Pitty you lost the extensive commentary... I would have loved to read that. Did you win first try?