About rating list and unballanced opening

Uri Blass · Post by **Uri Blass** » Fri Aug 15, 2025 6:51 pm

I think that it may be interesting to have 2 rating for every engine:
1)rating with the stronger side
2)rating with the weaker side
It may be interesting if there are cases when engine A is stronger than B in winning good positions(rating 1) but weaker than B in saving bad positions(rating2)

Of course we need to have enough games that the difference between engines in rating 1 and rating 2 is above statistical noise.

Ajedrecista · Post by **Ajedrecista** » Fri Aug 15, 2025 9:27 pm

Hello Uri:

It could be interesting, of course, but we should define very well first what unbalanced openings are and which not:

Single core analysis for being deterministic? I would recomend this.

Surpassing a determinate evaluation thresold by a specific top engine and minimum depth (which engine and which depth)? Let us called it the referee engine, the umpire engine or the anchor engine.
- The same as before, but with a pool of top engines?
  - Which ones? Surely based on Elo and being enough different among them (i.e. neither derivatives nor clones).
  - The same evaluation thresold for every engine? I guess no, but how to define it? Trial and error?
  - The same minimum depth for every engine? I guess no, but how to define it? Trial and error?
- If the referee/umpire/anchor engine/s is/are updated, re-run the analysis or not?
  - This might add or remove some openings over the time. Is this acceptable?

How many moves at most to determine an opening? Probably subjective.
- Maybe a material thresold if many pieces are traded early in the game? Trial and error?

Too many questions and too few answers from me, sadly. The clearest thing I see is the single core analysis to get consistent analysis through different systems and users.

Regards from Spain.

Ajedrecista.

pohl4711 · Post by **pohl4711** » Sat Aug 16, 2025 7:55 am

"It may be interesting if there are cases when engine A is stronger than B in winning good positions(rating 1) but weaker than B in saving bad positions(rating2)"

For me, this is easy. Because of my UHO-openings, in my UHO-Top15 ratinglist, white is always better at the beginning of the game.

I made a quick tool, to distinguish white games and black games by the engines, and made two ranking-lists out of it. Here the results:

Overall:

Code: Select all

   1 Stockfish 250809 a512    : 3859    4    4 15000    67.8%   3725   48.2%
   2 Stockfish 17.1 250330    : 3854    4    4 15000    67.1%   3725   48.8%
   3 Torch 4 a512             : 3813    4    4 15000    61.5%   3728   49.3%
   4 Obsidian 250706 a512     : 3791    4    4 15000    58.4%   3730   49.9%
   5 PlentyChess 6.15 a512    : 3774    4    4 15000    56.0%   3731   49.6%
   6 Integral 250708 a512     : 3754    4    4 15000    53.1%   3732   50.3%
   7 Berserk 250606 a512      : 3746    4    4 15000    51.9%   3733   50.4%
   8 Reckless 250630 bmi2     : 3736    4    4 15000    50.4%   3733   49.1%
   9 Alexandria 8.0 a512      : 3715    4    4 15000    47.4%   3735   50.2%
  10 KomodoDragon 3.3 avx2    : 3695    4    4 15000    44.5%   3736   49.9%
  11 Caissa 1.22 a512         : 3684    4    4 15000    42.8%   3737   50.8%
  12 Viridithas 250712        : 3680    4    4 15000    42.2%   3737   50.1%
  13 Stormphrax 7.0 avx2      : 3677    4    4 15000    41.8%   3737   48.9%
  14 RubiChess 250606 a512    : 3657    4    4 15000    38.9%   3739   49.9%
  15 Horsie 1.1 a512          : 3656    4    4 15000    38.8%   3739   48.7%
  16 Clover 8.2 a512          : 3645    4    4 15000    37.4%   3739   51.4%

Engines percents, playing white, only (because of UHO: engine always better at the beginning):

Code: Select all

46.40%     Stockfish 250809 a512 
45.95%     Stockfish 17.1 250330 
42.91%     Torch 4 a512 
41.40%     Obsidian 250706 a512 
40.20%     PlentyChess 6.15 a512 
38.68%     Integral 250708 a512 
38.03%     Berserk 250606 a512 
37.57%     Reckless 250630 bmi2 
35.82%     Alexandria 8.0 a512 
34.12%     KomodoDragon 3.3 avx2 
33.28%     Stormphrax 7.0 avx2 
33.27%     Caissa 1.22 a512 
32.90%     Viridithas 250712 
31.85%     Horsie 1.1 a512 
31.23%     RubiChess 250606 a512 
30.46%     Clover 8.2 a512

Engines percents, games black, only (because of UHO: engine always worse at the beginning):

Code: Select all

21.39%     Stockfish 250809 a512 
21.18%     Stockfish 17.1 250330 
18.57%     Torch 4 a512 
17.04%     Obsidian 250706 a512 
15.82%     PlentyChess 6.15 a512 
14.44%     Integral 250708 a512 
13.89%     Berserk 250606 a512 
12.82%     Reckless 250630 bmi2 
11.53%     Alexandria 8.0 a512 
10.34%     KomodoDragon 3.3 avx2 
09.56%     Caissa 1.22 a512 
09.30%     Viridithas 250712 
08.50%     Stormphrax 7.0 avx2 
07.71%     RubiChess 250606 a512 
06.92%     Horsie 1.1 a512 
06.89%     Clover 8.2 a512

Conclusion: The results differ very, very little. Top10 rankings are identical in all 3 lists and the last 6 are very close in strength, so a splitting into white and black games and building rankings here, is more or less statistical noise (rank 11-16)

chesskobra · Post by **chesskobra** » Sat Aug 16, 2025 1:01 pm

Is there much difference between these rating lists and CCRL? Is the difference just statistical noise or not?

Uri Blass · Post by **Uri Blass** » Sat Aug 16, 2025 5:20 pm

pohl4711 wrote: ↑Sat Aug 16, 2025 7:55 am "It may be interesting if there are cases when engine A is stronger than B in winning good positions(rating 1) but weaker than B in saving bad positions(rating2)"

For me, this is easy. Because of my UHO-openings, in my UHO-Top15 ratinglist, white is always better at the beginning of the game.

I made a quick tool, to distinguish white games and black games by the engines, and made two ranking-lists out of it. Here the results:

Overall:

Code: Select all

   1 Stockfish 250809 a512    : 3859    4    4 15000    67.8%   3725   48.2%
   2 Stockfish 17.1 250330    : 3854    4    4 15000    67.1%   3725   48.8%
   3 Torch 4 a512             : 3813    4    4 15000    61.5%   3728   49.3%
   4 Obsidian 250706 a512     : 3791    4    4 15000    58.4%   3730   49.9%
   5 PlentyChess 6.15 a512    : 3774    4    4 15000    56.0%   3731   49.6%
   6 Integral 250708 a512     : 3754    4    4 15000    53.1%   3732   50.3%
   7 Berserk 250606 a512      : 3746    4    4 15000    51.9%   3733   50.4%
   8 Reckless 250630 bmi2     : 3736    4    4 15000    50.4%   3733   49.1%
   9 Alexandria 8.0 a512      : 3715    4    4 15000    47.4%   3735   50.2%
  10 KomodoDragon 3.3 avx2    : 3695    4    4 15000    44.5%   3736   49.9%
  11 Caissa 1.22 a512         : 3684    4    4 15000    42.8%   3737   50.8%
  12 Viridithas 250712        : 3680    4    4 15000    42.2%   3737   50.1%
  13 Stormphrax 7.0 avx2      : 3677    4    4 15000    41.8%   3737   48.9%
  14 RubiChess 250606 a512    : 3657    4    4 15000    38.9%   3739   49.9%
  15 Horsie 1.1 a512          : 3656    4    4 15000    38.8%   3739   48.7%
  16 Clover 8.2 a512          : 3645    4    4 15000    37.4%   3739   51.4%

Engines percents, playing white, only (because of UHO: engine always better at the beginning):

Code: Select all

46.40%     Stockfish 250809 a512 
45.95%     Stockfish 17.1 250330 
42.91%     Torch 4 a512 
41.40%     Obsidian 250706 a512 
40.20%     PlentyChess 6.15 a512 
38.68%     Integral 250708 a512 
38.03%     Berserk 250606 a512 
37.57%     Reckless 250630 bmi2 
35.82%     Alexandria 8.0 a512 
34.12%     KomodoDragon 3.3 avx2 
33.28%     Stormphrax 7.0 avx2 
33.27%     Caissa 1.22 a512 
32.90%     Viridithas 250712 
31.85%     Horsie 1.1 a512 
31.23%     RubiChess 250606 a512 
30.46%     Clover 8.2 a512

Engines percents, games black, only (because of UHO: engine always worse at the beginning):

Code: Select all

21.39%     Stockfish 250809 a512 
21.18%     Stockfish 17.1 250330 
18.57%     Torch 4 a512 
17.04%     Obsidian 250706 a512 
15.82%     PlentyChess 6.15 a512 
14.44%     Integral 250708 a512 
13.89%     Berserk 250606 a512 
12.82%     Reckless 250630 bmi2 
11.53%     Alexandria 8.0 a512 
10.34%     KomodoDragon 3.3 avx2 
09.56%     Caissa 1.22 a512 
09.30%     Viridithas 250712 
08.50%     Stormphrax 7.0 avx2 
07.71%     RubiChess 250606 a512 
06.92%     Horsie 1.1 a512 
06.89%     Clover 8.2 a512

Conclusion: The results differ very, very little. Top10 rankings are identical in all 3 lists and the last 6 are very close in strength, so a splitting into white and black games and building rankings here, is more or less statistical noise (rank 11-16)

Thanks.

In your list no big difference but it may be different in TCEC or maybe it is only a statistical noise.

Obsidian has 12 wins in TCEC(even 1 more than stockfish after 36 rounds of TCEC)
but it has 7 losses that is clearly more than stockfish lc0 and integral who have only 2 losses so I thought maybe obsidian is going to get relatively better rating with white and not with black.

pohl4711 · Post by **pohl4711** » Sun Aug 17, 2025 5:31 am

I made a clean output and released the results on my experiments-subsite:

https://www.sp-cc.de/experiments.htm

Code: Select all

                              All     white    rank    black   rank
   1 Stockfish 250809 a512 : 67.8%    46.40%     =     21.39%    =
   2 Stockfish 17.1 250330 : 67.1%    45.95%     =     21.18%    =
   3 Torch 4 a512          : 61.5%    42.91%     =     18.57%    =
   4 Obsidian 250706 a512  : 58.4%    41.40%     =     17.04%    =
   5 PlentyChess 6.15 a512 : 56.0%    40.20%     =     15.82%    =
   6 Integral 250708 a512  : 53.1%    38.68%     =     14.44%    =
   7 Berserk 250606 a512   : 51.9%    38.03%     =     13.89%    =
   8 Reckless 250630 bmi2  : 50.4%    37.57%     =     12.82%    =
   9 Alexandria 8.0 a512   : 47.4%    35.82%     =     11.53%    =
  10 KomodoDragon 3.3 avx2 : 44.5%    34.12%     =     10.34%    =
  11 Caissa 1.22 a512      : 42.8%    33.27%    +1     09.56%    =
  12 Viridithas 250712     : 42.2%    32.90%    +1     09.30%    =
  13 Stormphrax 7.0 avx2   : 41.8%    33.28%    -2     08.50%    =
  14 RubiChess 250606 a512 : 38.9%    31.23%    +1     07.71%    =
  15 Horsie 1.1 a512       : 38.8%    31.85%    -1     06.92%    =
  16 Clover 8.2 a512       : 37.4%    30.46%     =     06.89%    =

Explanation: Horsie 1.1 for example: rank -1 means, in the white-games-only Ranking, Horsie 1.1 is on rank 14, not rank 15 like in the full Rankinglist. "rank =" means, the ranking is identical to ranking of this engine in the full Rankinglist.

Mention, rank 11-15 in the full list are very close in strength (5 engines in a 28 Celo range, only), so it is no surprise, that some slight changes in the ranking of these 5 engines can always happen, when looking just into the games played with white only, because these 5 engines are so close in strength. RubiChess and Horsie, switching their rank, is really no surprise: In the full list, both engines are only 0.1% apart in their score.

Conclusion: It makes no measureable difference, if engines start playing with an advantage or disadvantage. So, using UHO openings does not lead to any distortions, compared to balanced openings. As expected.

About rating list and unballanced opening

About rating list and unballanced opening

Re: About rating lists and unbalanced openings.

Re: About rating list and unballanced opening

Re: About rating list and unballanced opening

Re: About rating list and unballanced opening

Re: About rating list and unballanced opening