I just finished playing a very fun game between Leorik (white) and Blunder (black) since I hadn't actually seen the two engines play a game myself before. Both engines seemed to be playing pretty well for most of the game, until move 47.
I was curious why 47. Ne5 was such a bad blunder, and did a little analysis; it appears it actually seems to lead to Zugzwang for white. Something I had never personally seen before in a game, engine, or human:
[fen]8/8/4k3/p3Pp2/5K2/1P6/8/8 w - - 3 50[/fen]
Taking the position and having Leorik analyze it after the game shows it clearly can spot the Zugzwang after only a couple of plies of searching, so I'm not posting this as some sort of bug report, since it was a Blunder likely due to bullet time-control I chose. Just thought it was an interesting example of null-move pruning probably going wrong at lower depths
algerbrex wrote: ↑Tue May 31, 2022 7:37 pm
I was curious why 47. Ne5 was such a bad blunder, and did a little analysis; it appears it actually seems to lead to Zugzwang for white. Something I had never personally seen before in a game, engine, or human:
Uh that's a serious blunder. With Leorik 1.0 (that does no unsafe prunings) the PV switches from f3e5 to the best move e3d3 at depth 5 already. For Leorik 2.1 it takes until depth 16 for the PV to switch to e3d3. But reaching depth 16 takes only 80ms on my machine... How fast were your time control settings exactly?
I hope there's nothing more serious going on here than just a lack of processing time. I might run a few fast games through an analysis engine to identify blunders like that and see how long it takes for the PV to switch to the correct line. Could lead to a valuable set of positions for comparing different versions of the engine. Time to depth measures just the performance but time-to-bestmove could be a useful metric to approximate playing-strength.
Minimal Chess (simple, open source, C#) - Youtube & Github Leorik (competitive, in active development, C#) - Github & Lichess
lithander wrote: ↑Wed Jun 01, 2022 5:14 pm
Uh that's a serious blunder. With Leorik 1.0 (that does no unsafe prunings) the PV switches from f3e5 to the best move e3d3 at depth 5 already. For Leorik 2.1 it takes until depth 16 for the PV to switch to e3d3. But reaching depth 16 takes only 80ms on my machine... How fast were your time control settings exactly?
Hmm, for me Leorik realizes it's mistake at depth 7:
I do see now I was using Leorik 2.0.2, so I'm not sure how much of a difference that makes? The time control I used for the game was 40 moves in 2 minutes, which I typically use when pitting Blunder against engines to get a feel for their style.
lithander wrote: ↑Wed Jun 01, 2022 5:14 pm
I hope there's nothing more serious going on here than just a lack of processing time.
I wish I would've paid more attention now, so this may just be hindsight bias, but I believe Leorik's blunder may have occurred towards the end of the 2 minutes, so it was in a bit of a time crunch perhaps? I suppose the place to start might be to investigate a bit more into Leorik's time management, which seemed to work well overall.
lithander wrote: ↑Wed Jun 01, 2022 5:14 pm
I hope there's nothing more serious going on here than just a lack of processing time. I might run a few fast games through an analysis engine to identify blunders like that and see how long it takes for the PV to switch to the correct line. Could lead to a valuable set of positions for comparing different versions of the engine. Time to depth measures just the performance but time-to-bestmove could be a useful metric to approximate playing-strength.
Regardless, that sounds like a good plan of action. Let me know how that goes!
algerbrex wrote: ↑Wed Jun 01, 2022 6:15 pm
I do see now I was using Leorik 2.0.2, so I'm not sure how much of a difference that makes? The time control I used for the game was 40 moves in 2 minutes, which I typically use when pitting Blunder against engines to get a feel for their style.
[...]
I wish I would've paid more attention now, so this may just be hindsight bias, but I believe Leorik's blunder may have occurred towards the end of the 2 minutes, so it was in a bit of a time crunch perhaps?
2 minutes per 40 moves is 3 seconds per move on average right? And when the blunder happened in move 47 the clock was just refreshed 7 moves ago so there should have been no time pressure. When you play the move f3e5 the followup is f6e5 f4e5 c5d5 <something> d5e6 until the pawn is lost for good and the position should now be evaluated around 100cp for white. It get's only worse from there and on depth ~30 even the promoted queen should appear on the radar. With quiescence search the pawn-loss should be detectable as early as depth 5! And Leorik 1.0 (which does no risky prunings) indeed finds it at depth 5!
Leorik 2.x does null-move and all kind of prunings but it also is fast enough to reach depth 30 on such a simple position within just a second. I think there's no excuse and I need to go hunt for a bug... :/
Minimal Chess (simple, open source, C#) - Youtube & Github Leorik (competitive, in active development, C#) - Github & Lichess
lithander wrote: ↑Wed Jun 01, 2022 7:16 pm
2 minutes per 40 moves is 3 seconds per move on average right? And when the blunder happened in move 47 the clock was just refreshed 7 moves ago so there should have been no time pressure.
Ah, that's silly of me, of course, there was no time pressure at move 47
lithander wrote: ↑Wed Jun 01, 2022 7:16 pm
Leorik 2.x does null-move and all kind of prunings but it also is fast enough to reach depth 30 on such a simple position within a second. I think there's no excuse and I need to go hunt for a bug... :/
I might take a look through your code as well, as this has made me curious too. Given everything as you said I can't think of a good reason why Leorik didn't see the right move in the game.
If it helps at all (probably not), the game was played from the start position as normal, with no opening used. So maybe it's reproducible to a degree?
The pawn structure evaluation (including pawn hash table) turned out surprisingly simple yet effective! Or at least it feels like it's working well... I'm always struggling to judge when a feature is done enough that it's time to move on. I wish there was a an easier way to asses how much the current implementation exhausts the theoretical potential. So (@all) what's your experience with pawn structure eval terms? How much Elo did you gain from adding it in your engines?
Mike Sherwin wrote: ↑Sun Apr 17, 2022 11:20 pm
This is probably the last version of Leorik that I'll be able to win against.
Leorik 2.1 is only about 50 Elo stronger so maybe you can still win against it?
Big improvement in playing style! Much more human like. Needs pawn storm code. This was a very interesting game!!
Hey do you have a system setup that grades your commits to get the relative elo change yet?
So you can know exactly know how much stronger or weaker your engine is.
If you are interested we can set this up for your repo!
The goal can be a normal workflow of commit etc. and asynchrounously your CI pipeline will message the strength change data for that commit.
dangi12012 wrote: ↑Thu Jun 02, 2022 12:22 am
Hey do you have a system setup that grades your commits to get the relative elo change yet?
So you can know exactly know how much stronger or weaker your engine is.
If you are interested we can set this up for your repo!
The goal can be a normal workflow of commit etc. and asynchrounously your CI pipeline will message the strength change data for that commit.
Do you mean something like fishtest or openbench?
I always assumed my engine wouldn't be complient with openbench's way of building engines from source because it needs the .Net toolchain. And to set up something like that myself I lack the dedicated hardware that would just wait for these kind of tasks and supply a result in no time. So I'm setting the tests up on my personal computer when I'm not working or gaming at the moment.
Falsifying a small patch with the expected gain of just a few Elo takes a lot of compute, sadly.
Minimal Chess (simple, open source, C#) - Youtube & Github Leorik (competitive, in active development, C#) - Github & Lichess
Mike Sherwin wrote: ↑Wed Jun 01, 2022 10:27 pm
Big improvement in playing style! Much more human like. Needs pawn storm code. This was a very interesting game!!
Thanks for playing the new version and glad to hear I'm making some progress in the direction of style!
A pawn storm.... is that what you did on the King Side? Moving a phalanx of pawns together? Leorik should have countered that better, you mean?
Minimal Chess (simple, open source, C#) - Youtube & Github Leorik (competitive, in active development, C#) - Github & Lichess
lithander wrote: ↑Thu Jun 02, 2022 9:41 am
Do you mean something like fishtest or openbench?
I always assumed my engine wouldn't be complient with openbench's way of building engines from source because it needs the .Net toolchain. And to set up something like that myself I lack the dedicated hardware that would just wait for these kind of tasks and supply a result in no time. So I'm setting the tests up on my personal computer when I'm not working or gaming at the moment.
Much simpler. Yaml files provide an easy easy way to set that up. You just checkin a single file and git will start to execute you continuus integration steps against workers. These workers can be self hosted and I even have some spare machines.
If its enough to get a small standard deviation of elo per run remains to be seen - but running a tournament against itself in an 8x8 grid with 4x master and 4x commit should eliminate noise.
There has to be math already done somewhere that will give a mathemtical sound confidence interval for tournament results.