I disagree that self-play is the "only way", and in fact I believe it is inferior to playing a gauntlet against multiple other engines of similar strength. I play 100 games against 12 opponents to get an idea if my engine is improved. I've had self-play give very misleading results.lingfors wrote: ↑Tue Feb 14, 2023 10:44 pm It's okay, I mix upp the terms as well. ^^
For an all-node, i.e. fail-low node, you have no clue which move is the best one, all you know is that no move managed to exceed alpha (the lower bound), so what are you even accomplishing by saving any move in the TT?
But in the end, the only thing that matters is if a code change increase the playing strength or not. And the only way to find that out is to let the new version play against the old version for a few (hundred) games.
New engine: Peacekeeper
Moderator: Ras
-
- Posts: 1396
- Joined: Wed Mar 08, 2006 10:15 pm
- Location: San Francisco, California
Re: New engine: Peacekeeper
-
- Posts: 915
- Joined: Sun Dec 27, 2020 2:40 am
- Location: Bremen, Germany
- Full name: Thomas Jahn
Re: New engine: Peacekeeper
100 games is only enough to verify a large strength difference within acceptable error margins. Once the low hanging fruits are picked your changes and tweaks to your engine will only be worth a few Elo at best and to verify these small improvements takes thousands of games, sadly.JVMerlino wrote: ↑Wed Feb 15, 2023 12:22 am I disagree that self-play is the "only way", and in fact I believe it is inferior to playing a gauntlet against multiple other engines of similar strength. I play 100 games against 12 opponents to get an idea if my engine is improved. I've had self-play give very misleading results.
If you don't want to worry about how many games you need to play you can use the -sprt option in cutechess:
Code: Select all
-sprt elo0=E0 elo1=E1 alpha=<alpha> beta=<beta>
Use a Sequential Probability Ratio Test as a termination
criterion for the match.
This option should only be used in matches between two players to
test if engine P1 is stronger than engine P2. Hypothesis H1 is
that P1 is stronger than P2 by at least E0 ELO points, and H0
(the null hypothesis) is that P1 is not stronger than P2 by at
least E1 ELO points. The maximum probabilities for type I and
type II errors outside the interval [ E0, E1 ] are <alpha> and
<beta>.
The match is stopped if either H0 or H1 is accepted or if the
maximum number of games set by -rounds and / or -games is
reached.
-
- Posts: 66
- Joined: Thu Dec 09, 2021 8:26 pm
- Full name: Kyle Zhang
Re: New engine: Peacekeeper
I played a gauntlet against similar strength engines, where it also seems to gain some elo (not as much but still some). Yes I don't know what storing the first move in the sorted movelist into the TT entry in all-nodes accomplishes, otherwise it would be a smart change! Maybe the stored TT moves help by not having to generate moves in cut-nodes later on?lithander wrote: ↑Wed Feb 15, 2023 11:39 am100 games is only enough to verify a large strength difference within acceptable error margins. Once the low hanging fruits are picked your changes and tweaks to your engine will only be worth a few Elo at best and to verify these small improvements takes thousands of games, sadly.JVMerlino wrote: ↑Wed Feb 15, 2023 12:22 am I disagree that self-play is the "only way", and in fact I believe it is inferior to playing a gauntlet against multiple other engines of similar strength. I play 100 games against 12 opponents to get an idea if my engine is improved. I've had self-play give very misleading results.
If you don't want to worry about how many games you need to play you can use the -sprt option in cutechess:
But I agree on the gauntlet vs selfplay point. I usually do very fast selfplay games and if the result is promising there I test the new version under a longer timecontrol gauntlet.Code: Select all
-sprt elo0=E0 elo1=E1 alpha=<alpha> beta=<beta> Use a Sequential Probability Ratio Test as a termination criterion for the match. This option should only be used in matches between two players to test if engine P1 is stronger than engine P2. Hypothesis H1 is that P1 is stronger than P2 by at least E0 ELO points, and H0 (the null hypothesis) is that P1 is not stronger than P2 by at least E1 ELO points. The maximum probabilities for type I and type II errors outside the interval [ E0, E1 ] are <alpha> and <beta>. The match is stopped if either H0 or H1 is accepted or if the maximum number of games set by -rounds and / or -games is reached.
Peacekeeper: https://github.com/Sazgr/peacekeeper/
-
- Posts: 4718
- Joined: Wed Oct 01, 2008 6:33 am
- Location: Regensburg, Germany
- Full name: Guenther Simon
Re: New engine: Peacekeeper
Thomas, I think John means 100*12 games.lithander wrote: ↑Wed Feb 15, 2023 11:39 am100 games is only enough to verify a large strength difference within acceptable error margins. Once the low hanging fruits are picked your changes and tweaks to your engine will only be worth a few Elo at best and to verify these small improvements takes thousands of games, sadly.JVMerlino wrote: ↑Wed Feb 15, 2023 12:22 am I disagree that self-play is the "only way", and in fact I believe it is inferior to playing a gauntlet against multiple other engines of similar strength. I play 100 games against 12 opponents to get an idea if my engine is improved. I've had self-play give very misleading results.
...
-
- Posts: 1396
- Joined: Wed Mar 08, 2006 10:15 pm
- Location: San Francisco, California
Re: New engine: Peacekeeper
That is correct.Guenther wrote: ↑Wed Feb 15, 2023 9:19 pmThomas, I think John means 100*12 games.lithander wrote: ↑Wed Feb 15, 2023 11:39 am100 games is only enough to verify a large strength difference within acceptable error margins. Once the low hanging fruits are picked your changes and tweaks to your engine will only be worth a few Elo at best and to verify these small improvements takes thousands of games, sadly.JVMerlino wrote: ↑Wed Feb 15, 2023 12:22 am I disagree that self-play is the "only way", and in fact I believe it is inferior to playing a gauntlet against multiple other engines of similar strength. I play 100 games against 12 opponents to get an idea if my engine is improved. I've had self-play give very misleading results.
...

-
- Posts: 66
- Joined: Thu Dec 09, 2021 8:26 pm
- Full name: Kyle Zhang
Re: New engine: Peacekeeper
Fixed a nasty LMR bug, turns out I wasn't doing LMR in non-PV nodes
. Anyway, with some other bugfixes, it is looking better for its feature set. v1.20 has been released on https://github.com/Sazgr/peacekeeper/releases/tag/v1.20 !


Peacekeeper: https://github.com/Sazgr/peacekeeper/
-
- Posts: 66
- Joined: Thu Dec 09, 2021 8:26 pm
- Full name: Kyle Zhang
Killer move problems
My chess engine has been progressing slowly as usual. However, I am trying to implement the killer heuristic into my engine. However, it is not gaining any significant elo (at most 10 which is within the error bars of around 15). The implementation does not seem to contain any bugs, so I was wondering whether history heuristics already give the gain that killers provide. I have seen some other authors implement killers before history, and they get an appreciable gain from killers but less from history. So I was just wondering whether killers are really necessary after history has been implemented? Maybe engine authors with both killers and history can try removing killers to see the elo loss? Thanks in advance for any comments!
Peacekeeper: https://github.com/Sazgr/peacekeeper/
-
- Posts: 915
- Joined: Sun Dec 27, 2020 2:40 am
- Location: Bremen, Germany
- Full name: Thomas Jahn
Re: New engine: Peacekeeper
I have both killer and history. History only allows me to sort quiet moves but before I can do that I have to generate them of course. So an additional benefit of killers is that if I get a cut-off from a killer move I don't have to generate any quiet moves at all.
Last edited by lithander on Tue Mar 07, 2023 6:43 pm, edited 1 time in total.
-
- Posts: 263
- Joined: Wed Jun 16, 2021 2:08 am
- Location: Berlin
- Full name: Jost Triller
Re: Killer move problems
If I remember correctly, killers weren't that important for me neither, but they are quite simple to implement, and 10-20 Elo is not nothing.
-
- Posts: 263
- Joined: Wed Jun 16, 2021 2:08 am
- Location: Berlin
- Full name: Jost Triller
Re: Killer move problems
I tried this, and the killers implemented like this (used after winning captures in move ordering) give on two different test runs (+30 +/-6) and (+22 +/-6), both with slightly different CPUs, engine pools, and time control.