I think that it may be interesting to have 2 rating for every engine:
1)rating with the stronger side
2)rating with the weaker side
It may be interesting if there are cases when engine A is stronger than B in winning good positions(rating 1) but weaker than B in saving bad positions(rating2)
Of course we need to have enough games that the difference between engines in rating 1 and rating 2 is above statistical noise.
About rating list and unballanced opening
Moderator: Ras
-
- Posts: 10892
- Joined: Thu Mar 09, 2006 12:37 am
- Location: Tel-Aviv Israel
-
- Posts: 2125
- Joined: Wed Jul 13, 2011 9:04 pm
- Location: Madrid, Spain.
Re: About rating lists and unbalanced openings.
Hello Uri:
It could be interesting, of course, but we should define very well first what unbalanced openings are and which not:
Regards from Spain.
Ajedrecista.
It could be interesting, of course, but we should define very well first what unbalanced openings are and which not:
- Single core analysis for being deterministic? I would recomend this.
- Surpassing a determinate evaluation thresold by a specific top engine and minimum depth (which engine and which depth)? Let us called it the referee engine, the umpire engine or the anchor engine.
- The same as before, but with a pool of top engines?
- Which ones? Surely based on Elo and being enough different among them (i.e. neither derivatives nor clones).
- The same evaluation thresold for every engine? I guess no, but how to define it? Trial and error?
- The same minimum depth for every engine? I guess no, but how to define it? Trial and error?
- If the referee/umpire/anchor engine/s is/are updated, re-run the analysis or not?
- This might add or remove some openings over the time. Is this acceptable?
- The same as before, but with a pool of top engines?
- How many moves at most to determine an opening? Probably subjective.
- Maybe a material thresold if many pieces are traded early in the game? Trial and error?
Regards from Spain.
Ajedrecista.
-
- Posts: 2804
- Joined: Sat Sep 03, 2011 7:25 am
- Location: Berlin, Germany
- Full name: Stefan Pohl
Re: About rating list and unballanced opening
"It may be interesting if there are cases when engine A is stronger than B in winning good positions(rating 1) but weaker than B in saving bad positions(rating2)"
For me, this is easy. Because of my UHO-openings, in my UHO-Top15 ratinglist, white is always better at the beginning of the game.
I made a quick tool, to distinguish white games and black games by the engines, and made two ranking-lists out of it. Here the results:
Overall:
Engines percents, playing white, only (because of UHO: engine always better at the beginning):
Engines percents, games black, only (because of UHO: engine always worse at the beginning):
Conclusion: The results differ very, very little. Top10 rankings are identical in all 3 lists and the last 6 are very close in strength, so a splitting into white and black games and building rankings here, is more or less statistical noise (rank 11-16)
For me, this is easy. Because of my UHO-openings, in my UHO-Top15 ratinglist, white is always better at the beginning of the game.
I made a quick tool, to distinguish white games and black games by the engines, and made two ranking-lists out of it. Here the results:
Overall:
Code: Select all
1 Stockfish 250809 a512 : 3859 4 4 15000 67.8% 3725 48.2%
2 Stockfish 17.1 250330 : 3854 4 4 15000 67.1% 3725 48.8%
3 Torch 4 a512 : 3813 4 4 15000 61.5% 3728 49.3%
4 Obsidian 250706 a512 : 3791 4 4 15000 58.4% 3730 49.9%
5 PlentyChess 6.15 a512 : 3774 4 4 15000 56.0% 3731 49.6%
6 Integral 250708 a512 : 3754 4 4 15000 53.1% 3732 50.3%
7 Berserk 250606 a512 : 3746 4 4 15000 51.9% 3733 50.4%
8 Reckless 250630 bmi2 : 3736 4 4 15000 50.4% 3733 49.1%
9 Alexandria 8.0 a512 : 3715 4 4 15000 47.4% 3735 50.2%
10 KomodoDragon 3.3 avx2 : 3695 4 4 15000 44.5% 3736 49.9%
11 Caissa 1.22 a512 : 3684 4 4 15000 42.8% 3737 50.8%
12 Viridithas 250712 : 3680 4 4 15000 42.2% 3737 50.1%
13 Stormphrax 7.0 avx2 : 3677 4 4 15000 41.8% 3737 48.9%
14 RubiChess 250606 a512 : 3657 4 4 15000 38.9% 3739 49.9%
15 Horsie 1.1 a512 : 3656 4 4 15000 38.8% 3739 48.7%
16 Clover 8.2 a512 : 3645 4 4 15000 37.4% 3739 51.4%
Code: Select all
46.40% Stockfish 250809 a512
45.95% Stockfish 17.1 250330
42.91% Torch 4 a512
41.40% Obsidian 250706 a512
40.20% PlentyChess 6.15 a512
38.68% Integral 250708 a512
38.03% Berserk 250606 a512
37.57% Reckless 250630 bmi2
35.82% Alexandria 8.0 a512
34.12% KomodoDragon 3.3 avx2
33.28% Stormphrax 7.0 avx2
33.27% Caissa 1.22 a512
32.90% Viridithas 250712
31.85% Horsie 1.1 a512
31.23% RubiChess 250606 a512
30.46% Clover 8.2 a512
Code: Select all
21.39% Stockfish 250809 a512
21.18% Stockfish 17.1 250330
18.57% Torch 4 a512
17.04% Obsidian 250706 a512
15.82% PlentyChess 6.15 a512
14.44% Integral 250708 a512
13.89% Berserk 250606 a512
12.82% Reckless 250630 bmi2
11.53% Alexandria 8.0 a512
10.34% KomodoDragon 3.3 avx2
09.56% Caissa 1.22 a512
09.30% Viridithas 250712
08.50% Stormphrax 7.0 avx2
07.71% RubiChess 250606 a512
06.92% Horsie 1.1 a512
06.89% Clover 8.2 a512
-
- Posts: 354
- Joined: Thu Jul 21, 2022 12:30 am
- Full name: Chesskobra
Re: About rating list and unballanced opening
Is there much difference between these rating lists and CCRL? Is the difference just statistical noise or not?
-
- Posts: 10892
- Joined: Thu Mar 09, 2006 12:37 am
- Location: Tel-Aviv Israel
Re: About rating list and unballanced opening
Thanks.pohl4711 wrote: ↑Sat Aug 16, 2025 7:55 am "It may be interesting if there are cases when engine A is stronger than B in winning good positions(rating 1) but weaker than B in saving bad positions(rating2)"
For me, this is easy. Because of my UHO-openings, in my UHO-Top15 ratinglist, white is always better at the beginning of the game.
I made a quick tool, to distinguish white games and black games by the engines, and made two ranking-lists out of it. Here the results:
Overall:Engines percents, playing white, only (because of UHO: engine always better at the beginning):Code: Select all
1 Stockfish 250809 a512 : 3859 4 4 15000 67.8% 3725 48.2% 2 Stockfish 17.1 250330 : 3854 4 4 15000 67.1% 3725 48.8% 3 Torch 4 a512 : 3813 4 4 15000 61.5% 3728 49.3% 4 Obsidian 250706 a512 : 3791 4 4 15000 58.4% 3730 49.9% 5 PlentyChess 6.15 a512 : 3774 4 4 15000 56.0% 3731 49.6% 6 Integral 250708 a512 : 3754 4 4 15000 53.1% 3732 50.3% 7 Berserk 250606 a512 : 3746 4 4 15000 51.9% 3733 50.4% 8 Reckless 250630 bmi2 : 3736 4 4 15000 50.4% 3733 49.1% 9 Alexandria 8.0 a512 : 3715 4 4 15000 47.4% 3735 50.2% 10 KomodoDragon 3.3 avx2 : 3695 4 4 15000 44.5% 3736 49.9% 11 Caissa 1.22 a512 : 3684 4 4 15000 42.8% 3737 50.8% 12 Viridithas 250712 : 3680 4 4 15000 42.2% 3737 50.1% 13 Stormphrax 7.0 avx2 : 3677 4 4 15000 41.8% 3737 48.9% 14 RubiChess 250606 a512 : 3657 4 4 15000 38.9% 3739 49.9% 15 Horsie 1.1 a512 : 3656 4 4 15000 38.8% 3739 48.7% 16 Clover 8.2 a512 : 3645 4 4 15000 37.4% 3739 51.4%
Engines percents, games black, only (because of UHO: engine always worse at the beginning):Code: Select all
46.40% Stockfish 250809 a512 45.95% Stockfish 17.1 250330 42.91% Torch 4 a512 41.40% Obsidian 250706 a512 40.20% PlentyChess 6.15 a512 38.68% Integral 250708 a512 38.03% Berserk 250606 a512 37.57% Reckless 250630 bmi2 35.82% Alexandria 8.0 a512 34.12% KomodoDragon 3.3 avx2 33.28% Stormphrax 7.0 avx2 33.27% Caissa 1.22 a512 32.90% Viridithas 250712 31.85% Horsie 1.1 a512 31.23% RubiChess 250606 a512 30.46% Clover 8.2 a512
Conclusion: The results differ very, very little. Top10 rankings are identical in all 3 lists and the last 6 are very close in strength, so a splitting into white and black games and building rankings here, is more or less statistical noise (rank 11-16)Code: Select all
21.39% Stockfish 250809 a512 21.18% Stockfish 17.1 250330 18.57% Torch 4 a512 17.04% Obsidian 250706 a512 15.82% PlentyChess 6.15 a512 14.44% Integral 250708 a512 13.89% Berserk 250606 a512 12.82% Reckless 250630 bmi2 11.53% Alexandria 8.0 a512 10.34% KomodoDragon 3.3 avx2 09.56% Caissa 1.22 a512 09.30% Viridithas 250712 08.50% Stormphrax 7.0 avx2 07.71% RubiChess 250606 a512 06.92% Horsie 1.1 a512 06.89% Clover 8.2 a512
In your list no big difference but it may be different in TCEC or maybe it is only a statistical noise.
Obsidian has 12 wins in TCEC(even 1 more than stockfish after 36 rounds of TCEC)
but it has 7 losses that is clearly more than stockfish lc0 and integral who have only 2 losses so I thought maybe obsidian is going to get relatively better rating with white and not with black.
-
- Posts: 2804
- Joined: Sat Sep 03, 2011 7:25 am
- Location: Berlin, Germany
- Full name: Stefan Pohl
Re: About rating list and unballanced opening
I made a clean output and released the results on my experiments-subsite:
https://www.sp-cc.de/experiments.htm
Explanation: Horsie 1.1 for example: rank -1 means, in the white-games-only Ranking, Horsie 1.1 is on rank 14, not rank 15 like in the full Rankinglist. "rank =" means, the ranking is identical to ranking of this engine in the full Rankinglist.
Mention, rank 11-15 in the full list are very close in strength (5 engines in a 28 Celo range, only), so it is no surprise, that some slight changes in the ranking of these 5 engines can always happen, when looking just into the games played with white only, because these 5 engines are so close in strength. RubiChess and Horsie, switching their rank, is really no surprise: In the full list, both engines are only 0.1% apart in their score.
Conclusion: It makes no measureable difference, if engines start playing with an advantage or disadvantage. So, using UHO openings does not lead to any distortions, compared to balanced openings. As expected.
https://www.sp-cc.de/experiments.htm
Code: Select all
All white rank black rank
1 Stockfish 250809 a512 : 67.8% 46.40% = 21.39% =
2 Stockfish 17.1 250330 : 67.1% 45.95% = 21.18% =
3 Torch 4 a512 : 61.5% 42.91% = 18.57% =
4 Obsidian 250706 a512 : 58.4% 41.40% = 17.04% =
5 PlentyChess 6.15 a512 : 56.0% 40.20% = 15.82% =
6 Integral 250708 a512 : 53.1% 38.68% = 14.44% =
7 Berserk 250606 a512 : 51.9% 38.03% = 13.89% =
8 Reckless 250630 bmi2 : 50.4% 37.57% = 12.82% =
9 Alexandria 8.0 a512 : 47.4% 35.82% = 11.53% =
10 KomodoDragon 3.3 avx2 : 44.5% 34.12% = 10.34% =
11 Caissa 1.22 a512 : 42.8% 33.27% +1 09.56% =
12 Viridithas 250712 : 42.2% 32.90% +1 09.30% =
13 Stormphrax 7.0 avx2 : 41.8% 33.28% -2 08.50% =
14 RubiChess 250606 a512 : 38.9% 31.23% +1 07.71% =
15 Horsie 1.1 a512 : 38.8% 31.85% -1 06.92% =
16 Clover 8.2 a512 : 37.4% 30.46% = 06.89% =
Mention, rank 11-15 in the full list are very close in strength (5 engines in a 28 Celo range, only), so it is no surprise, that some slight changes in the ranking of these 5 engines can always happen, when looking just into the games played with white only, because these 5 engines are so close in strength. RubiChess and Horsie, switching their rank, is really no surprise: In the full list, both engines are only 0.1% apart in their score.
Conclusion: It makes no measureable difference, if engines start playing with an advantage or disadvantage. So, using UHO openings does not lead to any distortions, compared to balanced openings. As expected.