New engine: Peacekeeper

JVMerlino · Post by **JVMerlino** » Wed Feb 15, 2023 12:22 am

lingfors wrote: ↑Tue Feb 14, 2023 10:44 pm It's okay, I mix upp the terms as well. ^^

For an all-node, i.e. fail-low node, you have no clue which move is the best one, all you know is that no move managed to exceed alpha (the lower bound), so what are you even accomplishing by saving any move in the TT?

But in the end, the only thing that matters is if a code change increase the playing strength or not. And the only way to find that out is to let the new version play against the old version for a few (hundred) games.

I disagree that self-play is the "only way", and in fact I believe it is inferior to playing a gauntlet against multiple other engines of similar strength. I play 100 games against 12 opponents to get an idea if my engine is improved. I've had self-play give very misleading results.

lithander · Post by **lithander** » Wed Feb 15, 2023 11:39 am

JVMerlino wrote: ↑Wed Feb 15, 2023 12:22 am I disagree that self-play is the "only way", and in fact I believe it is inferior to playing a gauntlet against multiple other engines of similar strength. I play 100 games against 12 opponents to get an idea if my engine is improved. I've had self-play give very misleading results.

100 games is only enough to verify a large strength difference within acceptable error margins. Once the low hanging fruits are picked your changes and tweaks to your engine will only be worth a few Elo at best and to verify these small improvements takes thousands of games, sadly.

If you don't want to worry about how many games you need to play you can use the -sprt option in cutechess:

Code: Select all

     -sprt elo0=E0 elo1=E1 alpha=<alpha> beta=<beta>
	     Use a Sequential Probability Ratio Test as a termination
	     criterion for the match.

	     This option should only be used in matches between two players to
	     test if engine P1 is stronger than engine P2.  Hypothesis H1 is
	     that P1 is stronger than P2 by at least E0 ELO points, and H0
	     (the null hypothesis) is that P1 is not stronger than P2 by at
	     least E1 ELO points.  The maximum probabilities for type I and
	     type II errors outside the interval [ E0, E1 ] are <alpha> and
	     <beta>.

	     The match is stopped if either H0 or H1 is accepted or if the
	     maximum number of games set by -rounds and / or -games is
	     reached.

But I agree on the gauntlet vs selfplay point. I usually do very fast selfplay games and if the result is promising there I test the new version under a longer timecontrol gauntlet.

Sazgr · Post by **Sazgr** » Wed Feb 15, 2023 6:38 pm

lithander wrote: ↑Wed Feb 15, 2023 11:39 am
JVMerlino wrote: ↑Wed Feb 15, 2023 12:22 am I disagree that self-play is the "only way", and in fact I believe it is inferior to playing a gauntlet against multiple other engines of similar strength. I play 100 games against 12 opponents to get an idea if my engine is improved. I've had self-play give very misleading results.
100 games is only enough to verify a large strength difference within acceptable error margins. Once the low hanging fruits are picked your changes and tweaks to your engine will only be worth a few Elo at best and to verify these small improvements takes thousands of games, sadly.

If you don't want to worry about how many games you need to play you can use the -sprt option in cutechess:
Code: Select all
     -sprt elo0=E0 elo1=E1 alpha=<alpha> beta=<beta>
	     Use a Sequential Probability Ratio Test as a termination
	     criterion for the match.

	     This option should only be used in matches between two players to
	     test if engine P1 is stronger than engine P2.  Hypothesis H1 is
	     that P1 is stronger than P2 by at least E0 ELO points, and H0
	     (the null hypothesis) is that P1 is not stronger than P2 by at
	     least E1 ELO points.  The maximum probabilities for type I and
	     type II errors outside the interval [ E0, E1 ] are <alpha> and
	     <beta>.

	     The match is stopped if either H0 or H1 is accepted or if the
	     maximum number of games set by -rounds and / or -games is
	     reached.
But I agree on the gauntlet vs selfplay point. I usually do very fast selfplay games and if the result is promising there I test the new version under a longer timecontrol gauntlet.

I played a gauntlet against similar strength engines, where it also seems to gain some elo (not as much but still some). Yes I don't know what storing the first move in the sorted movelist into the TT entry in all-nodes accomplishes, otherwise it would be a smart change! Maybe the stored TT moves help by not having to generate moves in cut-nodes later on?

Guenther · Post by **Guenther** » Wed Feb 15, 2023 9:19 pm

lithander wrote: ↑Wed Feb 15, 2023 11:39 am
JVMerlino wrote: ↑Wed Feb 15, 2023 12:22 am I disagree that self-play is the "only way", and in fact I believe it is inferior to playing a gauntlet against multiple other engines of similar strength. I play 100 games against 12 opponents to get an idea if my engine is improved. I've had self-play give very misleading results.
100 games is only enough to verify a large strength difference within acceptable error margins. Once the low hanging fruits are picked your changes and tweaks to your engine will only be worth a few Elo at best and to verify these small improvements takes thousands of games, sadly.

...

Thomas, I think John means 100*12 games.

JVMerlino · Post by **JVMerlino** » Wed Feb 15, 2023 9:59 pm

Guenther wrote: ↑Wed Feb 15, 2023 9:19 pm
lithander wrote: ↑Wed Feb 15, 2023 11:39 am
JVMerlino wrote: ↑Wed Feb 15, 2023 12:22 am I disagree that self-play is the "only way", and in fact I believe it is inferior to playing a gauntlet against multiple other engines of similar strength. I play 100 games against 12 opponents to get an idea if my engine is improved. I've had self-play give very misleading results.
100 games is only enough to verify a large strength difference within acceptable error margins. Once the low hanging fruits are picked your changes and tweaks to your engine will only be worth a few Elo at best and to verify these small improvements takes thousands of games, sadly.

...
Thomas, I think John means 100*12 games.

That is correct.

Sazgr · Post by **Sazgr** » Tue Feb 21, 2023 9:36 pm

Fixed a nasty LMR bug, turns out I wasn't doing LMR in non-PV nodes

. Anyway, with some other bugfixes, it is looking better for its feature set. v1.20 has been released on https://github.com/Sazgr/peacekeeper/releases/tag/v1.20 !

Sazgr · Post by **Sazgr** » Tue Mar 07, 2023 6:09 pm

My chess engine has been progressing slowly as usual. However, I am trying to implement the killer heuristic into my engine. However, it is not gaining any significant elo (at most 10 which is within the error bars of around 15). The implementation does not seem to contain any bugs, so I was wondering whether history heuristics already give the gain that killers provide. I have seen some other authors implement killers before history, and they get an appreciable gain from killers but less from history. So I was just wondering whether killers are really necessary after history has been implemented? Maybe engine authors with both killers and history can try removing killers to see the elo loss? Thanks in advance for any comments!

lithander · Post by **lithander** » Tue Mar 07, 2023 6:42 pm

I have both killer and history. History only allows me to sort quiet moves but before I can do that I have to generate them of course. So an additional benefit of killers is that if I get a cut-off from a killer move I don't have to generate any quiet moves at all.

j.t. · Post by **j.t.** » Tue Mar 07, 2023 6:42 pm

Sazgr wrote: ↑Tue Mar 07, 2023 6:09 pm My chess engine has been progressing slowly as usual. However, I am trying to implement the killer heuristic into my engine. However, it is not gaining any significant elo (at most 10 which is within the error bars of around 15).

If I remember correctly, killers weren't that important for me neither, but they are quite simple to implement, and 10-20 Elo is not nothing.

j.t. · Post by **j.t.** » Tue Mar 07, 2023 9:26 pm

Sazgr wrote: ↑Tue Mar 07, 2023 6:09 pm Maybe engine authors with both killers and history can try removing killers to see the Elo loss? Thanks in advance for any comments!

I tried this, and the killers implemented like this (used after winning captures in move ordering) give on two different test runs (+30 +/-6) and (+22 +/-6), both with slightly different CPUs, engine pools, and time control.

New engine: Peacekeeper

Re: New engine: Peacekeeper

Re: New engine: Peacekeeper

Re: New engine: Peacekeeper

Re: New engine: Peacekeeper

Re: New engine: Peacekeeper

Re: New engine: Peacekeeper

Killer move problems

Re: New engine: Peacekeeper

Re: Killer move problems

Re: Killer move problems