I think a big problem here is that with learning enabled, the engine under test would change by being tested. That's an issue with reproducibility and what results to use.
Something new ?
Moderator: Ras
-
Ras
- Posts: 2707
- Joined: Tue Aug 30, 2016 8:19 pm
- Full name: Rasmus Althoff
Re: Something new ?
Rasmus Althoff
https://www.ct800.net
https://www.ct800.net
-
Modern Times
- Posts: 3759
- Joined: Thu Jun 07, 2012 11:02 pm
Re: Something new ?
Yes that is right. Rating tools assume that the engine that plays its first game is identical to the one that plays its 100,000th game. Assuming that the engine's learning is effective, then the engines that play it later on are at a disadvantage compared to the engines that played it earlier. So it may distort the ratings - by how much I don't know, maybe it is negligible and gets lost in the statistical error margins.
-
chrisw
- Posts: 4673
- Joined: Tue Apr 03, 2012 4:28 pm
- Location: Midi-Pyrénées
- Full name: Christopher Whittington
Re: Something new ?
That’s okay, chess playing entities, including humans, don’t stay constant. Why is “reproducity” important and what are you doing with these “results” anyway?
Btw, if you want “reproducibility”, ask engines to upload their changed-ness to the cloud for distribution back to their superseded siblings.
-
Rebel
- Posts: 7406
- Joined: Thu Aug 18, 2011 12:04 pm
- Full name: Ed Schröder
Re: Something new ?
Regarding reproducibility there isn't any, several ratings lists (I won't mention names) constantly change opening sets.
90% of coding is debugging, the other 10% is writing bugs.
-
Rebel
- Posts: 7406
- Joined: Thu Aug 18, 2011 12:04 pm
- Full name: Ed Schröder
Re: Something new ?
Books, learning, different styles, knowing the opponent, when done well means elo progress.chrisw wrote: ↑Fri Oct 31, 2025 10:45 pmThat’s okay, chess playing entities, including humans, don’t stay constant. Why is “reproducity” important and what are you doing with these “results” anyway?
Btw, if you want “reproducibility”, ask engines to upload their changed-ness to the cloud for distribution back to their superseded siblings.
It's real hard to understand the rating lists guys don't want it.
90% of coding is debugging, the other 10% is writing bugs.
-
Ras
- Posts: 2707
- Joined: Tue Aug 30, 2016 8:19 pm
- Full name: Rasmus Althoff
Re: Something new ?
But that's akin to new engine versions, from a config management POV.
That's state change external to the engines. If the engines keep changing themselves all the time, then there really isn't anything under test any more.
Btw, in your test Plentychess vs. Stockfish - did you apply the same learning to both engines, or only Plentychess? If the latter, what happens when you do it for both sides? Does the 91 Elo jump shrink?
Rasmus Althoff
https://www.ct800.net
https://www.ct800.net
-
Graham Banks
- Posts: 44818
- Joined: Sun Feb 26, 2006 10:52 am
- Location: Auckland, NZ
Re: Something new ?
Quite correct for CCRL and probably CEGT as well, but they're all what we like to call "balanced books". When creating a database of high quality games, it's nice to have variety through the ECO classifications.
That's just the way that we chose to do things. Some enthusiasts are okay with it, others aren't.
I genuinely wish you the best for your new rating list though.
gbanksnz at gmail.com
-
jkominek
- Posts: 77
- Joined: Tue Sep 04, 2018 5:33 am
- Full name: John Kominek
Re: Something new ?
I like this initiative. Rather than thought of as another rating list, though, I'd portray it as an open experiment that has an associated rating list to track progress. If it were me this is how I'd go about it.
o Start with a list of engines. Run a large round robin tournament with no engine-owned opening books, as well as learning features turned off. Compute the tournament ratings list with BayesElo and/or Ordo.
o Repeat the round robin gauntlet, extended with those of the above engines that have their own opening book. Preferably the book is provided by the engine author or opening book team member. Recompute the ratings.
o Continue in small batches, this time with learning features enabled for any engine that supports it. Importantly, for rating purposes switch to incremental rating update formulas such as are used in human tournaments. K=25 may be an appropriate update factor. Apply no FIDE-like rating difference clipping, of course.
One outcome of the experiment is a comparison how online learning compares over time to the same engine operating with a hand-crafted opening book. Rote learning traces back to early versions of Crafty (or before). It will be interesting to see which contemporary machine learning techniques can perform better, perhaps with opponent-aware contempt a part of the mix.
-
Rebel
- Posts: 7406
- Joined: Thu Aug 18, 2011 12:04 pm
- Full name: Ed Schröder
Re: Something new ?
Only to Plentychess to show that learning matters.Ras wrote: ↑Sat Nov 01, 2025 12:30 amBut that's akin to new engine versions, from a config management POV.
That's state change external to the engines. If the engines keep changing themselves all the time, then there really isn't anything under test any more.
Btw, in your test Plentychess vs. Stockfish - did you apply the same learning to both engines, or only Plentychess? If the latter, what happens when you do it for both sides? Does the 91 Elo jump shrink?
90% of coding is debugging, the other 10% is writing bugs.
-
Ras
- Posts: 2707
- Joined: Tue Aug 30, 2016 8:19 pm
- Full name: Rasmus Althoff
Re: Something new ?
I'd be curious to see what happens when both learn. My guess is that the learning impact would shrink, but not quite vanish, as the weaker side has more to gain from papering over difficult positions. Can you please test this?
Rasmus Althoff
https://www.ct800.net
https://www.ct800.net