
PGN - https://kirill-kryukov.com/chess/discus ... p?id=55373

PGN - https://kirill-kryukov.com/chess/discus ... p?id=55372
Moderator: Ras
Code: Select all
CCRL 40/15 Rating List - Custom engine selection
1791649 games played by 3677 programs, run by 25 testers
Ponder off, General books (up to 12 moves), 3-4-5 piece EGTB
Time control: Equivalent to 40 moves in 15 minutes on an Intel i7-4770k.
Computed on February 16, 2024 with Bayeselo based on 1'791'649 games
Tested by CCRL team, 2005-2024, http://ccrl.chessdom.com/ccrl/4040/
Rank Engine Elo + - Score AvOp Games
1 RubiChess 20240112 64-bit 3559 +13 -13 53.2% -20.0 1278
RubiChess 20230918 64-bit 3545 +10 -10 56.4% -42.5 2691
RubiChess 20230410 64-bit 3518 +10 -10 52.9% -18.1 2391
RubiChess 20221120 64-bit 3506 +11 -11 52.7% -16.5 1999
RubiChess 20220813 64-bit 3472 +12 -12 51.4% -8.8 1534
RubiChess 20220223 64-bit 3463 +10 -10 49.8% +2.2 2447
RubiChess 2021 64-bit 3432 +13 -13 48.3% +8.7 1298
RubiChess 2.2 64-bit 3420 +10 -10 46.8% +19.4 2328
RubiChess 2.1 64-bit 3390 +13 -13 46.6% +23.8 1374
RubiChess 2.0 64-bit 3331 +12 -12 47.5% +21.6 1652
RubiChess 1.9 64-bit 3315 +13 -13 46.8% +24.1 1530
RubiChess 1.7.3 64-bit 3238 +15 -15 50.7% -4.5 1040
RubiChess 1.8 64-bit 3233 +16 -16 48.2% +14.3 903
RubiChess 1.7.1 64-bit 3219 +19 -19 50.2% -0.3 612
RubiChess 1.6.1 64-bit 3214 +19 -19 51.9% -13.3 634
RubiChess 1.6 64-bit 3205 +14 -14 52.6% -21.1 1271
RubiChess 1.5 64-bit 3147 +17 -17 48.9% +3.7 761
RubiChess 1.4 64-bit 3068 +15 -15 48.3% +11.1 994
RubiChess 1.3 64-bit 3010 +17 -17 52.4% -20.9 827
RubiChess 1.2.1 64-bit 2929 +16 -16 51.7% -14.4 909
RubiChess 1.1 64-bit 2787 +18 -18 50.2% -3.6 799
RubiChess 1.0 64-bit 2618 +17 -17 48.4% +8.4 899
RubiChess 0.9 64-bit 2539 +19 -19 51.9% -19.3 718
RubiChess 0.8 64-bit 2430 +22 -22 51.6% -14.8 529
RubiChess 0.7 64-bit 2261 +19 -19 52.2% -17.7 809
Code: Select all
CCRL 40/15 Rating List - Custom engine selection
1791649 games played by 3677 programs, run by 25 testers
Ponder off, General books (up to 12 moves), 3-4-5 piece EGTB
Time control: Equivalent to 40 moves in 15 minutes on an Intel i7-4770k.
Computed on February 16, 2024 with Bayeselo based on 1'791'649 games
Tested by CCRL team, 2005-2024, http://ccrl.chessdom.com/ccrl/4040/
Rank Engine Elo + - Score AvOp Games
1 RubiChess 20240112 64-bit 4CPU 3595 +17 -17 52.7% -16.2 778
RubiChess 20230918 64-bit 4CPU 3582 +14 -14 53.2% -20.3 1195
RubiChess 20230410 64-bit 4CPU 3560 +13 -13 53.1% -18.9 1402
RubiChess 20221120 64-bit 4CPU 3554 +13 -13 55.0% -31.7 1461
RubiChess 20220813 64-bit 4CPU 3527 +15 -15 50.2% -1.0 966
RubiChess 20220223 64-bit 4CPU 3514 +14 -14 50.8% -5.9 1346
RubiChess 2021 64-bit 4CPU 3499 +19 -19 51.7% -13.0 660
RubiChess 2.2 64-bit 4CPU 3494 +16 -16 47.9% +13.6 981
RubiChess 2.1 64-bit 4CPU 3462 +15 -15 47.3% +18.7 1031
RubiChess 2.0 64-bit 4CPU 3410 +16 -16 46.4% +25.7 922
RubiChess 1.8 64-bit 4CPU 3347 +14 -14 45.4% +31.5 1296
RubiChess 1.7.3 64-bit 4CPU 3341 +18 -18 48.3% +10.6 736
RubiChess 1.6 64-bit 4CPU 3279 +19 -19 54.3% -30.1 682
RubiChess 1.5 64-bit 4CPU 3207 +25 -25 51.1% -6.8 368
RubiChess 1.3 64-bit 4CPU 3103 +20 -20 52.8% -21.3 642
Code: Select all
RubiChess 240112 on CCRL (40/15) (1CPU) draw-ratio vs. Top14 Engines according to my UHO-Top15 Ratinglist:
vs.
Stockfish 16: 88.5%
Torch 1: 88.5%
Dragon 3.3: 84.6%
Berserk 12: 92.3%
Ethereal 14.25: 98.1%
Caissa 1.16: 94.2% (not the strongest Version
(Caissa 1.17 is out, but only 8 Elo difference to 1.16))
Obsidian 10: 90.4%
Seer 2.8.0: 98.1%
CSTal 2.0: 96.2%
Clover 6.1: 86.5%
Rebel EAS: 94.2%
RofChade 3.1: 88.5%
Uralochka 3.40a: 82.7%
Revenge 3.0: 88.5%
-------------------------------
Average draw-ratio: 90.8%
------------------------------------------------------------------------------
Koivisto 9: 82.7% (not the strongest Version (Koivisto 9.2 is out))
Alexandria 5.1.0: 84.6% (not the strongest Version (Alexandria 6 is out))
(these outdated versions both are measureable weaker and not in the Top15 anymore)
I disagree that testing top15 engines does not make sense with no biased book.pohl4711 wrote: ↑Sat Feb 17, 2024 10:41 amSo, finally the engine progress (development) climbed up to a point, where testing the Top15 engines with non-biased openings does not make sense anymore (draw-ratio above 90%). Just a question of time (further engine progress), until testing the Top20, Top 30 and so on will not work anymore.Code: Select all
RubiChess 240112 on CCRL (40/15) (1CPU) draw-ratio vs. Top14 Engines according to my UHO-Top15 Ratinglist: vs. Stockfish 16: 88.5% Torch 1: 88.5% Dragon 3.3: 84.6% Berserk 12: 92.3% Ethereal 14.25: 98.1% Caissa 1.16: 94.2% (not the strongest Version (Caissa 1.17 is out, but only 8 Elo difference to 1.16)) Obsidian 10: 90.4% Seer 2.8.0: 98.1% CSTal 2.0: 96.2% Clover 6.1: 86.5% Rebel EAS: 94.2% RofChade 3.1: 88.5% Uralochka 3.40a: 82.7% Revenge 3.0: 88.5% ------------------------------- Average draw-ratio: 90.8% ------------------------------------------------------------------------------ Koivisto 9: 82.7% (not the strongest Version (Koivisto 9.2 is out)) Alexandria 5.1.0: 84.6% (not the strongest Version (Alexandria 6 is out)) (these outdated versions both are measureable weaker and not in the Top15 anymore)
Making sense from the point of view as a tester means, get the Rankings of the engines outside the errorbar. Thats all, I wanted to say. Not, that the games itself can not be useful for other purposes.Uri Blass wrote: ↑Sat Feb 17, 2024 11:13 amI disagree that testing top15 engines does not make sense with no biased book.pohl4711 wrote: ↑Sat Feb 17, 2024 10:41 amSo, finally the engine progress (development) climbed up to a point, where testing the Top15 engines with non-biased openings does not make sense anymore (draw-ratio above 90%). Just a question of time (further engine progress), until testing the Top20, Top 30 and so on will not work anymore.Code: Select all
RubiChess 240112 on CCRL (40/15) (1CPU) draw-ratio vs. Top14 Engines according to my UHO-Top15 Ratinglist: vs. Stockfish 16: 88.5% Torch 1: 88.5% Dragon 3.3: 84.6% Berserk 12: 92.3% Ethereal 14.25: 98.1% Caissa 1.16: 94.2% (not the strongest Version (Caissa 1.17 is out, but only 8 Elo difference to 1.16)) Obsidian 10: 90.4% Seer 2.8.0: 98.1% CSTal 2.0: 96.2% Clover 6.1: 86.5% Rebel EAS: 94.2% RofChade 3.1: 88.5% Uralochka 3.40a: 82.7% Revenge 3.0: 88.5% ------------------------------- Average draw-ratio: 90.8% ------------------------------------------------------------------------------ Koivisto 9: 82.7% (not the strongest Version (Koivisto 9.2 is out)) Alexandria 5.1.0: 84.6% (not the strongest Version (Alexandria 6 is out)) (these outdated versions both are measureable weaker and not in the Top15 anymore)
As long as it is less than 100% testing make sense
Edit:even with 100% draws testing may make sense because humans can learn from the games that is a product of the testing.