About rating list and unballanced opening

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

Uri Blass
Posts: 10892
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

About rating list and unballanced opening

Post by Uri Blass »

I think that it may be interesting to have 2 rating for every engine:
1)rating with the stronger side
2)rating with the weaker side
It may be interesting if there are cases when engine A is stronger than B in winning good positions(rating 1) but weaker than B in saving bad positions(rating2)

Of course we need to have enough games that the difference between engines in rating 1 and rating 2 is above statistical noise.
User avatar
Ajedrecista
Posts: 2125
Joined: Wed Jul 13, 2011 9:04 pm
Location: Madrid, Spain.

Re: About rating lists and unbalanced openings.

Post by Ajedrecista »

Hello Uri:

It could be interesting, of course, but we should define very well first what unbalanced openings are and which not:
  • Single core analysis for being deterministic? I would recomend this.
  • Surpassing a determinate evaluation thresold by a specific top engine and minimum depth (which engine and which depth)? Let us called it the referee engine, the umpire engine or the anchor engine.
    • The same as before, but with a pool of top engines?
      • Which ones? Surely based on Elo and being enough different among them (i.e. neither derivatives nor clones).
      • The same evaluation thresold for every engine? I guess no, but how to define it? Trial and error?
      • The same minimum depth for every engine? I guess no, but how to define it? Trial and error?
    • If the referee/umpire/anchor engine/s is/are updated, re-run the analysis or not?
      • This might add or remove some openings over the time. Is this acceptable?
  • How many moves at most to determine an opening? Probably subjective.
    • Maybe a material thresold if many pieces are traded early in the game? Trial and error?
Too many questions and too few answers from me, sadly. The clearest thing I see is the single core analysis to get consistent analysis through different systems and users.

Regards from Spain.

Ajedrecista.
User avatar
pohl4711
Posts: 2804
Joined: Sat Sep 03, 2011 7:25 am
Location: Berlin, Germany
Full name: Stefan Pohl

Re: About rating list and unballanced opening

Post by pohl4711 »

"It may be interesting if there are cases when engine A is stronger than B in winning good positions(rating 1) but weaker than B in saving bad positions(rating2)"

For me, this is easy. Because of my UHO-openings, in my UHO-Top15 ratinglist, white is always better at the beginning of the game.

I made a quick tool, to distinguish white games and black games by the engines, and made two ranking-lists out of it. Here the results:

Overall:

Code: Select all

   1 Stockfish 250809 a512    : 3859    4    4 15000    67.8%   3725   48.2%
   2 Stockfish 17.1 250330    : 3854    4    4 15000    67.1%   3725   48.8%
   3 Torch 4 a512             : 3813    4    4 15000    61.5%   3728   49.3%
   4 Obsidian 250706 a512     : 3791    4    4 15000    58.4%   3730   49.9%
   5 PlentyChess 6.15 a512    : 3774    4    4 15000    56.0%   3731   49.6%
   6 Integral 250708 a512     : 3754    4    4 15000    53.1%   3732   50.3%
   7 Berserk 250606 a512      : 3746    4    4 15000    51.9%   3733   50.4%
   8 Reckless 250630 bmi2     : 3736    4    4 15000    50.4%   3733   49.1%
   9 Alexandria 8.0 a512      : 3715    4    4 15000    47.4%   3735   50.2%
  10 KomodoDragon 3.3 avx2    : 3695    4    4 15000    44.5%   3736   49.9%
  11 Caissa 1.22 a512         : 3684    4    4 15000    42.8%   3737   50.8%
  12 Viridithas 250712        : 3680    4    4 15000    42.2%   3737   50.1%
  13 Stormphrax 7.0 avx2      : 3677    4    4 15000    41.8%   3737   48.9%
  14 RubiChess 250606 a512    : 3657    4    4 15000    38.9%   3739   49.9%
  15 Horsie 1.1 a512          : 3656    4    4 15000    38.8%   3739   48.7%
  16 Clover 8.2 a512          : 3645    4    4 15000    37.4%   3739   51.4%
Engines percents, playing white, only (because of UHO: engine always better at the beginning):

Code: Select all

46.40%     Stockfish 250809 a512 
45.95%     Stockfish 17.1 250330 
42.91%     Torch 4 a512 
41.40%     Obsidian 250706 a512 
40.20%     PlentyChess 6.15 a512 
38.68%     Integral 250708 a512 
38.03%     Berserk 250606 a512 
37.57%     Reckless 250630 bmi2 
35.82%     Alexandria 8.0 a512 
34.12%     KomodoDragon 3.3 avx2 
33.28%     Stormphrax 7.0 avx2 
33.27%     Caissa 1.22 a512 
32.90%     Viridithas 250712 
31.85%     Horsie 1.1 a512 
31.23%     RubiChess 250606 a512 
30.46%     Clover 8.2 a512 
Engines percents, games black, only (because of UHO: engine always worse at the beginning):

Code: Select all

21.39%     Stockfish 250809 a512 
21.18%     Stockfish 17.1 250330 
18.57%     Torch 4 a512 
17.04%     Obsidian 250706 a512 
15.82%     PlentyChess 6.15 a512 
14.44%     Integral 250708 a512 
13.89%     Berserk 250606 a512 
12.82%     Reckless 250630 bmi2 
11.53%     Alexandria 8.0 a512 
10.34%     KomodoDragon 3.3 avx2 
09.56%     Caissa 1.22 a512 
09.30%     Viridithas 250712 
08.50%     Stormphrax 7.0 avx2 
07.71%     RubiChess 250606 a512 
06.92%     Horsie 1.1 a512 
06.89%     Clover 8.2 a512
Conclusion: The results differ very, very little. Top10 rankings are identical in all 3 lists and the last 6 are very close in strength, so a splitting into white and black games and building rankings here, is more or less statistical noise (rank 11-16)
chesskobra
Posts: 354
Joined: Thu Jul 21, 2022 12:30 am
Full name: Chesskobra

Re: About rating list and unballanced opening

Post by chesskobra »

Is there much difference between these rating lists and CCRL? Is the difference just statistical noise or not?
Uri Blass
Posts: 10892
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: About rating list and unballanced opening

Post by Uri Blass »

pohl4711 wrote: Sat Aug 16, 2025 7:55 am "It may be interesting if there are cases when engine A is stronger than B in winning good positions(rating 1) but weaker than B in saving bad positions(rating2)"

For me, this is easy. Because of my UHO-openings, in my UHO-Top15 ratinglist, white is always better at the beginning of the game.

I made a quick tool, to distinguish white games and black games by the engines, and made two ranking-lists out of it. Here the results:

Overall:

Code: Select all

   1 Stockfish 250809 a512    : 3859    4    4 15000    67.8%   3725   48.2%
   2 Stockfish 17.1 250330    : 3854    4    4 15000    67.1%   3725   48.8%
   3 Torch 4 a512             : 3813    4    4 15000    61.5%   3728   49.3%
   4 Obsidian 250706 a512     : 3791    4    4 15000    58.4%   3730   49.9%
   5 PlentyChess 6.15 a512    : 3774    4    4 15000    56.0%   3731   49.6%
   6 Integral 250708 a512     : 3754    4    4 15000    53.1%   3732   50.3%
   7 Berserk 250606 a512      : 3746    4    4 15000    51.9%   3733   50.4%
   8 Reckless 250630 bmi2     : 3736    4    4 15000    50.4%   3733   49.1%
   9 Alexandria 8.0 a512      : 3715    4    4 15000    47.4%   3735   50.2%
  10 KomodoDragon 3.3 avx2    : 3695    4    4 15000    44.5%   3736   49.9%
  11 Caissa 1.22 a512         : 3684    4    4 15000    42.8%   3737   50.8%
  12 Viridithas 250712        : 3680    4    4 15000    42.2%   3737   50.1%
  13 Stormphrax 7.0 avx2      : 3677    4    4 15000    41.8%   3737   48.9%
  14 RubiChess 250606 a512    : 3657    4    4 15000    38.9%   3739   49.9%
  15 Horsie 1.1 a512          : 3656    4    4 15000    38.8%   3739   48.7%
  16 Clover 8.2 a512          : 3645    4    4 15000    37.4%   3739   51.4%
Engines percents, playing white, only (because of UHO: engine always better at the beginning):

Code: Select all

46.40%     Stockfish 250809 a512 
45.95%     Stockfish 17.1 250330 
42.91%     Torch 4 a512 
41.40%     Obsidian 250706 a512 
40.20%     PlentyChess 6.15 a512 
38.68%     Integral 250708 a512 
38.03%     Berserk 250606 a512 
37.57%     Reckless 250630 bmi2 
35.82%     Alexandria 8.0 a512 
34.12%     KomodoDragon 3.3 avx2 
33.28%     Stormphrax 7.0 avx2 
33.27%     Caissa 1.22 a512 
32.90%     Viridithas 250712 
31.85%     Horsie 1.1 a512 
31.23%     RubiChess 250606 a512 
30.46%     Clover 8.2 a512 
Engines percents, games black, only (because of UHO: engine always worse at the beginning):

Code: Select all

21.39%     Stockfish 250809 a512 
21.18%     Stockfish 17.1 250330 
18.57%     Torch 4 a512 
17.04%     Obsidian 250706 a512 
15.82%     PlentyChess 6.15 a512 
14.44%     Integral 250708 a512 
13.89%     Berserk 250606 a512 
12.82%     Reckless 250630 bmi2 
11.53%     Alexandria 8.0 a512 
10.34%     KomodoDragon 3.3 avx2 
09.56%     Caissa 1.22 a512 
09.30%     Viridithas 250712 
08.50%     Stormphrax 7.0 avx2 
07.71%     RubiChess 250606 a512 
06.92%     Horsie 1.1 a512 
06.89%     Clover 8.2 a512
Conclusion: The results differ very, very little. Top10 rankings are identical in all 3 lists and the last 6 are very close in strength, so a splitting into white and black games and building rankings here, is more or less statistical noise (rank 11-16)
Thanks.

In your list no big difference but it may be different in TCEC or maybe it is only a statistical noise.

Obsidian has 12 wins in TCEC(even 1 more than stockfish after 36 rounds of TCEC)
but it has 7 losses that is clearly more than stockfish lc0 and integral who have only 2 losses so I thought maybe obsidian is going to get relatively better rating with white and not with black.
User avatar
pohl4711
Posts: 2804
Joined: Sat Sep 03, 2011 7:25 am
Location: Berlin, Germany
Full name: Stefan Pohl

Re: About rating list and unballanced opening

Post by pohl4711 »

I made a clean output and released the results on my experiments-subsite:

https://www.sp-cc.de/experiments.htm

Code: Select all

                              All     white    rank    black   rank
   1 Stockfish 250809 a512 : 67.8%    46.40%     =     21.39%    =
   2 Stockfish 17.1 250330 : 67.1%    45.95%     =     21.18%    =
   3 Torch 4 a512          : 61.5%    42.91%     =     18.57%    =
   4 Obsidian 250706 a512  : 58.4%    41.40%     =     17.04%    =
   5 PlentyChess 6.15 a512 : 56.0%    40.20%     =     15.82%    =
   6 Integral 250708 a512  : 53.1%    38.68%     =     14.44%    =
   7 Berserk 250606 a512   : 51.9%    38.03%     =     13.89%    =
   8 Reckless 250630 bmi2  : 50.4%    37.57%     =     12.82%    =
   9 Alexandria 8.0 a512   : 47.4%    35.82%     =     11.53%    =
  10 KomodoDragon 3.3 avx2 : 44.5%    34.12%     =     10.34%    =
  11 Caissa 1.22 a512      : 42.8%    33.27%    +1     09.56%    =
  12 Viridithas 250712     : 42.2%    32.90%    +1     09.30%    =
  13 Stormphrax 7.0 avx2   : 41.8%    33.28%    -2     08.50%    =
  14 RubiChess 250606 a512 : 38.9%    31.23%    +1     07.71%    =
  15 Horsie 1.1 a512       : 38.8%    31.85%    -1     06.92%    =
  16 Clover 8.2 a512       : 37.4%    30.46%     =     06.89%    =
Explanation: Horsie 1.1 for example: rank -1 means, in the white-games-only Ranking, Horsie 1.1 is on rank 14, not rank 15 like in the full Rankinglist. "rank =" means, the ranking is identical to ranking of this engine in the full Rankinglist.

Mention, rank 11-15 in the full list are very close in strength (5 engines in a 28 Celo range, only), so it is no surprise, that some slight changes in the ranking of these 5 engines can always happen, when looking just into the games played with white only, because these 5 engines are so close in strength. RubiChess and Horsie, switching their rank, is really no surprise: In the full list, both engines are only 0.1% apart in their score.

Conclusion: It makes no measureable difference, if engines start playing with an advantage or disadvantage. So, using UHO openings does not lead to any distortions, compared to balanced openings. As expected.