Something new ?

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

User avatar
Ras
Posts: 2707
Joined: Tue Aug 30, 2016 8:19 pm
Full name: Rasmus Althoff

Re: Something new ?

Post by Ras »

Rebel wrote: Fri Oct 31, 2025 11:58 amThe only reason I am taken this initiative is that the well known rating lists (such as yours) refuse engines to play like humans.
I think a big problem here is that with learning enabled, the engine under test would change by being tested. That's an issue with reproducibility and what results to use.
Rasmus Althoff
https://www.ct800.net
Modern Times
Posts: 3759
Joined: Thu Jun 07, 2012 11:02 pm

Re: Something new ?

Post by Modern Times »

Ras wrote: Fri Oct 31, 2025 9:24 pm
Rebel wrote: Fri Oct 31, 2025 11:58 amThe only reason I am taken this initiative is that the well known rating lists (such as yours) refuse engines to play like humans.
I think a big problem here is that with learning enabled, the engine under test would change by being tested. That's an issue with reproducibility and what results to use.
Yes that is right. Rating tools assume that the engine that plays its first game is identical to the one that plays its 100,000th game. Assuming that the engine's learning is effective, then the engines that play it later on are at a disadvantage compared to the engines that played it earlier. So it may distort the ratings - by how much I don't know, maybe it is negligible and gets lost in the statistical error margins.
chrisw
Posts: 4673
Joined: Tue Apr 03, 2012 4:28 pm
Location: Midi-Pyrénées
Full name: Christopher Whittington

Re: Something new ?

Post by chrisw »

Ras wrote: Fri Oct 31, 2025 9:24 pm
Rebel wrote: Fri Oct 31, 2025 11:58 amThe only reason I am taken this initiative is that the well known rating lists (such as yours) refuse engines to play like humans.
I think a big problem here is that with learning enabled, the engine under test would change by being tested. That's an issue with reproducibility and what results to use.
That’s okay, chess playing entities, including humans, don’t stay constant. Why is “reproducity” important and what are you doing with these “results” anyway?

Btw, if you want “reproducibility”, ask engines to upload their changed-ness to the cloud for distribution back to their superseded siblings.
User avatar
Rebel
Posts: 7406
Joined: Thu Aug 18, 2011 12:04 pm
Full name: Ed Schröder

Re: Something new ?

Post by Rebel »

Ras wrote: Fri Oct 31, 2025 9:24 pm
Rebel wrote: Fri Oct 31, 2025 11:58 amThe only reason I am taken this initiative is that the well known rating lists (such as yours) refuse engines to play like humans.
I think a big problem here is that with learning enabled, the engine under test would change by being tested. That's an issue with reproducibility and what results to use.
Regarding reproducibility there isn't any, several ratings lists (I won't mention names) constantly change opening sets.
90% of coding is debugging, the other 10% is writing bugs.
User avatar
Rebel
Posts: 7406
Joined: Thu Aug 18, 2011 12:04 pm
Full name: Ed Schröder

Re: Something new ?

Post by Rebel »

chrisw wrote: Fri Oct 31, 2025 10:45 pm
Ras wrote: Fri Oct 31, 2025 9:24 pm
Rebel wrote: Fri Oct 31, 2025 11:58 amThe only reason I am taken this initiative is that the well known rating lists (such as yours) refuse engines to play like humans.
I think a big problem here is that with learning enabled, the engine under test would change by being tested. That's an issue with reproducibility and what results to use.
That’s okay, chess playing entities, including humans, don’t stay constant. Why is “reproducity” important and what are you doing with these “results” anyway?

Btw, if you want “reproducibility”, ask engines to upload their changed-ness to the cloud for distribution back to their superseded siblings.
Books, learning, different styles, knowing the opponent, when done well means elo progress.

It's real hard to understand the rating lists guys don't want it.
90% of coding is debugging, the other 10% is writing bugs.
User avatar
Ras
Posts: 2707
Joined: Tue Aug 30, 2016 8:19 pm
Full name: Rasmus Althoff

Re: Something new ?

Post by Ras »

chrisw wrote: Fri Oct 31, 2025 10:45 pmBtw, if you want “reproducibility”, ask engines to upload their changed-ness to the cloud for distribution back to their superseded siblings.
But that's akin to new engine versions, from a config management POV.
Rebel wrote: Fri Oct 31, 2025 11:44 pmRegarding reproducibility there isn't any, several ratings lists (I won't mention names) constantly change opening sets.
That's state change external to the engines. If the engines keep changing themselves all the time, then there really isn't anything under test any more.

Btw, in your test Plentychess vs. Stockfish - did you apply the same learning to both engines, or only Plentychess? If the latter, what happens when you do it for both sides? Does the 91 Elo jump shrink?
Rasmus Althoff
https://www.ct800.net
User avatar
Graham Banks
Posts: 44818
Joined: Sun Feb 26, 2006 10:52 am
Location: Auckland, NZ

Re: Something new ?

Post by Graham Banks »

Rebel wrote: Fri Oct 31, 2025 11:44 pm.....several ratings lists (I won't mention names) constantly change opening sets.
Quite correct for CCRL and probably CEGT as well, but they're all what we like to call "balanced books". When creating a database of high quality games, it's nice to have variety through the ECO classifications.

That's just the way that we chose to do things. Some enthusiasts are okay with it, others aren't.

I genuinely wish you the best for your new rating list though.
gbanksnz at gmail.com
jkominek
Posts: 77
Joined: Tue Sep 04, 2018 5:33 am
Full name: John Kominek

Re: Something new ?

Post by jkominek »

Rebel wrote: Fri Oct 31, 2025 9:05 am Something new ?
I want to start a rating list where it is allowed to freely use opening books, learning features, personalities, as long as everything is legal. New creativity, I look forward to it.
I like this initiative. Rather than thought of as another rating list, though, I'd portray it as an open experiment that has an associated rating list to track progress. If it were me this is how I'd go about it.

o Start with a list of engines. Run a large round robin tournament with no engine-owned opening books, as well as learning features turned off. Compute the tournament ratings list with BayesElo and/or Ordo.

o Repeat the round robin gauntlet, extended with those of the above engines that have their own opening book. Preferably the book is provided by the engine author or opening book team member. Recompute the ratings.

o Continue in small batches, this time with learning features enabled for any engine that supports it. Importantly, for rating purposes switch to incremental rating update formulas such as are used in human tournaments. K=25 may be an appropriate update factor. Apply no FIDE-like rating difference clipping, of course.

One outcome of the experiment is a comparison how online learning compares over time to the same engine operating with a hand-crafted opening book. Rote learning traces back to early versions of Crafty (or before). It will be interesting to see which contemporary machine learning techniques can perform better, perhaps with opponent-aware contempt a part of the mix.
User avatar
Rebel
Posts: 7406
Joined: Thu Aug 18, 2011 12:04 pm
Full name: Ed Schröder

Re: Something new ?

Post by Rebel »

Ras wrote: Sat Nov 01, 2025 12:30 am
chrisw wrote: Fri Oct 31, 2025 10:45 pmBtw, if you want “reproducibility”, ask engines to upload their changed-ness to the cloud for distribution back to their superseded siblings.
But that's akin to new engine versions, from a config management POV.
Rebel wrote: Fri Oct 31, 2025 11:44 pmRegarding reproducibility there isn't any, several ratings lists (I won't mention names) constantly change opening sets.
That's state change external to the engines. If the engines keep changing themselves all the time, then there really isn't anything under test any more.

Btw, in your test Plentychess vs. Stockfish - did you apply the same learning to both engines, or only Plentychess? If the latter, what happens when you do it for both sides? Does the 91 Elo jump shrink?
Only to Plentychess to show that learning matters.
90% of coding is debugging, the other 10% is writing bugs.
User avatar
Ras
Posts: 2707
Joined: Tue Aug 30, 2016 8:19 pm
Full name: Rasmus Althoff

Re: Something new ?

Post by Ras »

Rebel wrote: Sat Nov 01, 2025 7:49 amOnly to Plentychess to show that learning matters.
I'd be curious to see what happens when both learn. My guess is that the learning impact would shrink, but not quite vanish, as the weaker side has more to gain from papering over difficult positions. Can you please test this?
Rasmus Althoff
https://www.ct800.net