Something new ?

Ras · Post by **Ras** » Fri Oct 31, 2025 9:24 pm

Rebel wrote: ↑Fri Oct 31, 2025 11:58 amThe only reason I am taken this initiative is that the well known rating lists (such as yours) refuse engines to play like humans.

I think a big problem here is that with learning enabled, the engine under test would change by being tested. That's an issue with reproducibility and what results to use.

Modern Times · Post by **Modern Times** » Fri Oct 31, 2025 10:17 pm

Ras wrote: ↑Fri Oct 31, 2025 9:24 pm
Rebel wrote: ↑Fri Oct 31, 2025 11:58 amThe only reason I am taken this initiative is that the well known rating lists (such as yours) refuse engines to play like humans.
I think a big problem here is that with learning enabled, the engine under test would change by being tested. That's an issue with reproducibility and what results to use.

Yes that is right. Rating tools assume that the engine that plays its first game is identical to the one that plays its 100,000th game. Assuming that the engine's learning is effective, then the engines that play it later on are at a disadvantage compared to the engines that played it earlier. So it may distort the ratings - by how much I don't know, maybe it is negligible and gets lost in the statistical error margins.

chrisw · Post by **chrisw** » Fri Oct 31, 2025 10:45 pm

Ras wrote: ↑Fri Oct 31, 2025 9:24 pm
Rebel wrote: ↑Fri Oct 31, 2025 11:58 amThe only reason I am taken this initiative is that the well known rating lists (such as yours) refuse engines to play like humans.
I think a big problem here is that with learning enabled, the engine under test would change by being tested. That's an issue with reproducibility and what results to use.

That’s okay, chess playing entities, including humans, don’t stay constant. Why is “reproducity” important and what are you doing with these “results” anyway?

Btw, if you want “reproducibility”, ask engines to upload their changed-ness to the cloud for distribution back to their superseded siblings.

Rebel · Post by **Rebel** » Fri Oct 31, 2025 11:44 pm

Ras wrote: ↑Fri Oct 31, 2025 9:24 pm
Rebel wrote: ↑Fri Oct 31, 2025 11:58 amThe only reason I am taken this initiative is that the well known rating lists (such as yours) refuse engines to play like humans.
I think a big problem here is that with learning enabled, the engine under test would change by being tested. That's an issue with reproducibility and what results to use.

Regarding reproducibility there isn't any, several ratings lists (I won't mention names) constantly change opening sets.

Rebel · Post by **Rebel** » Fri Oct 31, 2025 11:53 pm

chrisw wrote: ↑Fri Oct 31, 2025 10:45 pm
Ras wrote: ↑Fri Oct 31, 2025 9:24 pm
Rebel wrote: ↑Fri Oct 31, 2025 11:58 amThe only reason I am taken this initiative is that the well known rating lists (such as yours) refuse engines to play like humans.
I think a big problem here is that with learning enabled, the engine under test would change by being tested. That's an issue with reproducibility and what results to use.
That’s okay, chess playing entities, including humans, don’t stay constant. Why is “reproducity” important and what are you doing with these “results” anyway?

Btw, if you want “reproducibility”, ask engines to upload their changed-ness to the cloud for distribution back to their superseded siblings.

Books, learning, different styles, knowing the opponent, when done well means elo progress.

It's real hard to understand the rating lists guys don't want it.

Ras · Post by **Ras** » Sat Nov 01, 2025 12:30 am

chrisw wrote: ↑Fri Oct 31, 2025 10:45 pmBtw, if you want “reproducibility”, ask engines to upload their changed-ness to the cloud for distribution back to their superseded siblings.

But that's akin to new engine versions, from a config management POV.

Rebel wrote: ↑Fri Oct 31, 2025 11:44 pmRegarding reproducibility there isn't any, several ratings lists (I won't mention names) constantly change opening sets.

That's state change external to the engines. If the engines keep changing themselves all the time, then there really isn't anything under test any more.

Btw, in your test Plentychess vs. Stockfish - did you apply the same learning to both engines, or only Plentychess? If the latter, what happens when you do it for both sides? Does the 91 Elo jump shrink?

Graham Banks · Post by **Graham Banks** » Sat Nov 01, 2025 12:35 am

Rebel wrote: ↑Fri Oct 31, 2025 11:44 pm.....several ratings lists (I won't mention names) constantly change opening sets.

Quite correct for CCRL and probably CEGT as well, but they're all what we like to call "balanced books". When creating a database of high quality games, it's nice to have variety through the ECO classifications.

That's just the way that we chose to do things. Some enthusiasts are okay with it, others aren't.

I genuinely wish you the best for your new rating list though.

jkominek · Post by **jkominek** » Sat Nov 01, 2025 7:02 am

Rebel wrote: ↑Fri Oct 31, 2025 9:05 am Something new ?
I want to start a rating list where it is allowed to freely use opening books, learning features, personalities, as long as everything is legal. New creativity, I look forward to it.

I like this initiative. Rather than thought of as another rating list, though, I'd portray it as an open experiment that has an associated rating list to track progress. If it were me this is how I'd go about it.

o Start with a list of engines. Run a large round robin tournament with no engine-owned opening books, as well as learning features turned off. Compute the tournament ratings list with BayesElo and/or Ordo.

o Repeat the round robin gauntlet, extended with those of the above engines that have their own opening book. Preferably the book is provided by the engine author or opening book team member. Recompute the ratings.

o Continue in small batches, this time with learning features enabled for any engine that supports it. Importantly, for rating purposes switch to incremental rating update formulas such as are used in human tournaments. K=25 may be an appropriate update factor. Apply no FIDE-like rating difference clipping, of course.

One outcome of the experiment is a comparison how online learning compares over time to the same engine operating with a hand-crafted opening book. Rote learning traces back to early versions of Crafty (or before). It will be interesting to see which contemporary machine learning techniques can perform better, perhaps with opponent-aware contempt a part of the mix.

Rebel · Post by **Rebel** » Sat Nov 01, 2025 7:49 am

Ras wrote: ↑Sat Nov 01, 2025 12:30 am
chrisw wrote: ↑Fri Oct 31, 2025 10:45 pmBtw, if you want “reproducibility”, ask engines to upload their changed-ness to the cloud for distribution back to their superseded siblings.
But that's akin to new engine versions, from a config management POV.

Rebel wrote: ↑Fri Oct 31, 2025 11:44 pmRegarding reproducibility there isn't any, several ratings lists (I won't mention names) constantly change opening sets.
That's state change external to the engines. If the engines keep changing themselves all the time, then there really isn't anything under test any more.

Btw, in your test Plentychess vs. Stockfish - did you apply the same learning to both engines, or only Plentychess? If the latter, what happens when you do it for both sides? Does the 91 Elo jump shrink?

Only to Plentychess to show that learning matters.

Ras · Post by **Ras** » Sat Nov 01, 2025 10:30 am

Rebel wrote: ↑Sat Nov 01, 2025 7:49 amOnly to Plentychess to show that learning matters.

I'd be curious to see what happens when both learn. My guess is that the learning impact would shrink, but not quite vanish, as the weaker side has more to gain from papering over difficult positions. Can you please test this?

Something new ?

Re: Something new ?

Re: Something new ?

Re: Something new ?

Re: Something new ?

Re: Something new ?

Re: Something new ?

Re: Something new ?

Re: Something new ?

Re: Something new ?

Re: Something new ?