One advantage of playing only 100 games is I can set here and watch some of them. Leorik has a serious bug in the late endgame.
I saw numerous games like that one. In one game white had a king and a knight against a king and a pawn. And white refused to capture the pawn because it would be a draw. So instead the pawn became a queen and black won.
And even though my compile (I fixed the problem you mentioned) had started a 5000 game test but that is when the above game was played. That imho invalidates even a 5000 game test when it happens as often as it does so I cancelled the test.
Devlog of Leorik
Moderator: Ras
-
- Posts: 965
- Joined: Fri Aug 21, 2020 1:25 am
- Location: Planet Earth, Sol system
- Full name: Michael J Sherwin
-
- Posts: 915
- Joined: Sun Dec 27, 2020 2:40 am
- Location: Bremen, Germany
- Full name: Thomas Jahn
Re: Devlog of Leorik
I guess this is where the normal Evaluation breaks done and can not accurately reflect how dangerous the pawn is. I should work on it but these problems are not so obvious for me. When on it's road to demise did it blunder exactly and what was the best move?
Are there late endgame testsetsuites available? Otherwise I probably need to compile one based in Leorik's selfplay games.
Are there late endgame testsetsuites available? Otherwise I probably need to compile one based in Leorik's selfplay games.
-
- Posts: 1784
- Joined: Wed Jul 03, 2019 4:42 pm
- Location: Netherlands
- Full name: Marcel Vanthoor
Re: Devlog of Leorik
The way to solve this is to add a pawn hash table so you don't have to recheck the pawn structure every time, and then add an evaluation term for passed pawns. These are pawns that cannot be blocked or captured by enemy pawns. This evaluation should become higher as the passed pawn moves up the board, so you'll need 6 weights, from rank 2 to 7.lithander wrote: ↑Fri Apr 07, 2023 10:32 am I guess this is where the normal Evaluation breaks done and can not accurately reflect how dangerous the pawn is. I should work on it but these problems are not so obvious for me. When on it's road to demise did it blunder exactly and what was the best move?
Are there late endgame testsetsuites available? Otherwise I probably need to compile one based in Leorik's selfplay games.
Rustic has the same problem because it doesn't have such code yet: a pawn is a pawn, so it will happily swap its 6th rank passed pawn for an unassuming isolated pawn of the opponent somewhere else on the board or, indeed, refusing to even capture a passed pawn and lose material, even though the draw is the best the engine can hope for.
Adding this will gain about 120 Elo (at least, it did in MadChess 3 if i remember correctly).
It is one of the first evaluation functions I'm going to add after the PSQT's. This is the reason why TSCP wins so many games against engines that don't have a TT: these engines often cannot see that the passed pawns TSCP makes are extremely dangerous in the long run. The TT resolves this for lower-end games, but in higher-end games, passed pawns become a strategic weapon, and you shouldn't let your opponent get one if you can help it.
-
- Posts: 915
- Joined: Sun Dec 27, 2020 2:40 am
- Location: Bremen, Germany
- Full name: Thomas Jahn
Re: Devlog of Leorik
This is too obvious a blunder to explain it with a too coarse evaluation function and I have pawn structure eval already including pawn hash table.
As far as I can see move 90 was where Leorik blunders by playing b7b3.
But if I hand that FEN to the engine directly I get a reasonable output:
Code: Select all
Leorik 2.4 Net8 Classic
position fen 8/1R6/2K5/8/8/2k5/2p5/8 w - - 8 90
go depth 20
info depth 1 score cp -694 nodes 49 nps 24500 time 2 pv b7b3
info depth 2 score cp -738 nodes 98 nps 14000 time 7 pv b7b3 c3b3
info depth 3 score cp 29 nodes 545 nps 68125 time 8 pv b7c7 c2c1b c7d7
info depth 4 score cp -22 nodes 830 nps 92222 time 9 pv b7c7 c2c1q c6d7 c3b2
info depth 5 score cp 88 nodes 1821 nps 202333 time 9 pv b7c7 c2c1q c6d7 c3b2 c7c1
info depth 6 score cp 18 nodes 2493 nps 249300 time 10 pv b7c7 c2c1q c6d7 c3b2 c7c1 b2c1
info depth 7 score cp 44 nodes 3585 nps 325909 time 11 pv b7c7 c2c1q c6d7 c3b2 c7c1 b2c1 d7e6
info depth 8 score cp 26 nodes 4444 nps 370333 time 12 pv b7c7 c2c1q c6d7 c3b2 c7c1 b2c1 d7e6 c1d2
info depth 9 score cp 26 nodes 5415 nps 416538 time 13 pv b7c7 c2c1q c6d7 c3b2 c7c1 b2c1 d7e7 c1d2 e7e6
info depth 10 score cp 0 nodes 6500 nps 464285 time 14 pv b7c7 c2c1q c6d7 c3b2 c7c1 b2c1 d7e7 c1d2 e7e6 d2e1
info depth 11 score cp 0 nodes 8844 nps 552750 time 16 pv b7c7 c2c1q c6d7 c3b2 c7c1 b2c1 d7e7 c1d2 e7d6 d2e1 d6e6
info depth 12 score cp 0 nodes 10743 nps 631941 time 17 pv b7c7 c2c1q c6d7 c3b2 c7c1 b2c1 d7e7 c1d2 e7d6 d2e2 d6e6 e2e1
info depth 13 score cp 0 nodes 18418 nps 837181 time 22 pv b7c7 c2c1q c6d7 c3b2 c7c1 b2c1 d7e7 c1d2 e7d6 d2e2 d6e7 e2e1 e7e6
info depth 14 score cp 0 nodes 24858 nps 920666 time 27 pv b7c7 c2c1q c6d7 c3b2 c7c1 b2c1 d7e7 c1d2 e7d6 d2e2 d6e7 e2d2 e7d
Btw... the FEN-tag seems to be broken for me. E.g. [fen]8/1R6/2K5/8/8/2k5/2p5/8 w - - 8 90[/fen]
-
- Posts: 1784
- Joined: Wed Jul 03, 2019 4:42 pm
- Location: Netherlands
- Full name: Marcel Vanthoor
Re: Devlog of Leorik
OK; I didn't know that.
As b7b3 was also one of the first moves at low depth, could it be that Leorik didn't spend enough time on that move for whatever reason? It seems impossible, because even the current version of Rustic immediately sees the solution (in 5 miliseconds) to draw this position. (It moves the king out of the way so it can check with the rook and then capture the queen after promotion.)As far as I can see move 90 was where Leorik blunders by playing b7b3.
But if I hand that FEN to the engine directly I get a reasonable output:
Last edited by mvanthoor on Fri Apr 07, 2023 2:04 pm, edited 2 times in total.
-
- Posts: 253
- Joined: Mon Aug 26, 2019 4:34 pm
- Location: Clearwater, Florida USA
- Full name: JoAnn Peeler
Re: Devlog of Leorik
I had a problem looked a lot like this and it turned out to be a bug in my staged move generation. This bug caused the pawn promotion not to be made whenever it was in my killers. Since it was never made the search never saw the increase in material when it gets promoted to a queen and happily ignored it. However, if I fed the FEN string directly in my test suite, it wouldn't make the same blunder because the move wasn't yet in my killers. I also uncovered a bug in my IsValidMove method that I use to test killers before blindly making the move although I don't recall exactly what it was.
This turned out to be a HUGE find for me, so I doubt this is Leorik's problem because it increased my Elo a huge amount once I fixed it.
-
- Posts: 2697
- Joined: Tue Aug 30, 2016 8:19 pm
- Full name: Rasmus Althoff
Re: Devlog of Leorik
That's because the script is loaded from HGM's server, but this has only a self-signed https certificate which the browser ofc doesn't accept. That means, if Talkchess itself is loaded via https, the browser will block this script.
I don't put promotions in the killers slots to begin with because they are not quiet moves.
Rasmus Althoff
https://www.ct800.net
https://www.ct800.net
-
- Posts: 253
- Joined: Mon Aug 26, 2019 4:34 pm
- Location: Clearwater, Florida USA
- Full name: JoAnn Peeler
Re: Devlog of Leorik
Yep, that was part of my fix. Previously I was only excluding captures from my killers.
-
- Posts: 965
- Joined: Fri Aug 21, 2020 1:25 am
- Location: Planet Earth, Sol system
- Full name: Michael J Sherwin
Re: Devlog of Leorik
It might be a TT bug, storing the wrong score.
-
- Posts: 965
- Joined: Fri Aug 21, 2020 1:25 am
- Location: Planet Earth, Sol system
- Full name: Michael J Sherwin
Re: Devlog of Leorik
It could be that my downloaded copy is corrupt. If it is that would be a first.