My UHO-Top15 Ratinglist is the world's first engine-ratinglist, using UHO-openings, and the world's first ratinglist offering additionally Gamepair-statistics.
Ratinglist-testrun of Stockfish 16.1 finished (avx2-binary from the official Stockfish website)
https://www.sp-cc.de
Also take a look at the EAS-Ratinglist, the world's first engine-ratinglist not measuring strength of engines but engines's style of play:
https://www.sp-cc.de/eas-ratinglist.htm
(Perhaps you have to clear your browsercache (press STRG+SHIFT+DEL) or reload the website))
SPCC: Testrun of Stockfish 16.1 finished
Moderator: Ras
-
- Posts: 2697
- Joined: Sat Sep 03, 2011 7:25 am
- Location: Berlin, Germany
- Full name: Stefan Pohl
-
- Posts: 2697
- Joined: Sat Sep 03, 2011 7:25 am
- Location: Berlin, Germany
- Full name: Stefan Pohl
Re: SPCC: Testrun of Stockfish 16.1 finished
+12 Elo to Stockfish 16. Not very much, but when we look at the gamepairs, it is far more impressive:
Stockfish 16.1 won over 2 times more gamepairs vs. Stockfish 16 than it lost: 500 (+164 =265 -71)
And Stockfish 16.1 lost only 113 gamepairs overall (out of 7500 gamepairs), Stockfish 16 lost 233 gamepairs overall.
Without the games vs. Stockfish, Stockfish 16.1 lost only 42 gamepairs (out of 7000 gamepairs), Stockfish 16 lost 69 gamepairs.
And Stockfish 16.1 plays measureable more aggressive than Stockfish 16:
https://www.sp-cc.de/eas-ratinglist.htm
Especially, when we look at the sacrifices: Stockfish 16 played a sacrifice in 20.79% of its won games, Stockfish 16.1: 23.73%. This is even more impressive, when looking at the absolute numbers:
For example: In a testrun, Stockfish wins lets say 500 games. This means Stockfish 16 would play a sacrifice in 104 of these 500 won games (20.79% of 500 (rounded)). Stockfish 16.1 would play a sacrifice in 119 of these 500 (23.73% of 500 (rounded) won games:
An increase from 104 to 119 sac-games means +14% (119 is 114% of 104) here. This is definitly an impressive progress! And in these days of superhuman strong engines, the playing-style becomes more and more important!
Stockfish 16.1 won over 2 times more gamepairs vs. Stockfish 16 than it lost: 500 (+164 =265 -71)
And Stockfish 16.1 lost only 113 gamepairs overall (out of 7500 gamepairs), Stockfish 16 lost 233 gamepairs overall.
Without the games vs. Stockfish, Stockfish 16.1 lost only 42 gamepairs (out of 7000 gamepairs), Stockfish 16 lost 69 gamepairs.
And Stockfish 16.1 plays measureable more aggressive than Stockfish 16:
https://www.sp-cc.de/eas-ratinglist.htm
Especially, when we look at the sacrifices: Stockfish 16 played a sacrifice in 20.79% of its won games, Stockfish 16.1: 23.73%. This is even more impressive, when looking at the absolute numbers:
For example: In a testrun, Stockfish wins lets say 500 games. This means Stockfish 16 would play a sacrifice in 104 of these 500 won games (20.79% of 500 (rounded)). Stockfish 16.1 would play a sacrifice in 119 of these 500 (23.73% of 500 (rounded) won games:
An increase from 104 to 119 sac-games means +14% (119 is 114% of 104) here. This is definitly an impressive progress! And in these days of superhuman strong engines, the playing-style becomes more and more important!
-
- Posts: 3619
- Joined: Wed Mar 08, 2006 8:15 pm
- Full name: Jouni Uski
Re: SPCC: Testrun of Stockfish 16.1 finished
Only change from previous test was "Update the WDL model". The patch only affects the UCI-reported cp and wdl values. And result was inside error bars.
Jouni
-
- Posts: 2046
- Joined: Wed Mar 08, 2006 8:30 pm
-
- Posts: 2697
- Joined: Sat Sep 03, 2011 7:25 am
- Location: Berlin, Germany
- Full name: Stefan Pohl
Re: SPCC: Testrun of Stockfish 16.1 finished
Pretty easy: Because my UHO-openings offer white a measureable advantage, 2 engines (in a head-to-head) play the same opening twice: One time engine A plays white and engine B plays black and in the second game A plays black and B plays white.
These 2 games are evaluated as one gamepair. Engine A (or B) needs at least 1.5 points out of 2 to win a gamepair. A 1-1 (two draws or 2 wins for the same color (mostly white of course, means one win for A (when having white) and one win for B (when having white)) is a drawn gamepair.
This is done in engine-tournaments, too (TCEC Superfinal, engine-tournaments on chesscom (CCC)). And in engine-development (see the Stockfish website about SF 16.1: "Stockfish 16.1 shows a notable improvement in performance ... winning over 2 times more game pairs than it loses.").
-
- Posts: 2046
- Joined: Wed Mar 08, 2006 8:30 pm
Re: SPCC: Testrun of Stockfish 16.1 finished
Great to see that our Forum seems to be reborn !
And a big thank you to all who contributed to that !
OK Stefan, thanks ! But my view is that ALL "reputable" tests and tournaments (TCEC...) are indeed done with preset openings, played in pairs. Or else ?
So that is not new to me.
But what is gamepair statistics ?
Do you simply eliminate the 1-1 result pairs (be it 1-0 1-0 or 1/2-1/2 1/2-1/2) ?
And a big thank you to all who contributed to that !
OK Stefan, thanks ! But my view is that ALL "reputable" tests and tournaments (TCEC...) are indeed done with preset openings, played in pairs. Or else ?
So that is not new to me.
But what is gamepair statistics ?
Do you simply eliminate the 1-1 result pairs (be it 1-0 1-0 or 1/2-1/2 1/2-1/2) ?
-
- Posts: 2095
- Joined: Wed Jul 13, 2011 9:04 pm
- Location: Madrid, Spain.
Re: SPCC: Testrun of Stockfish 16.1 finished.
Hello Ernest:
Regards from Spain.
Ajedrecista.
UHO 2024 openings can be downloaded from Stefan's site, as well as the so called GamePairs rescoring tool. I have understood the following from the source code of the tool, but please Stefan correct me if I am wrong:
- My notation for each pair of games of a given opening line is (A vs B) and (B vs A). For example, if I write (1-0 and 1-0), I mean that A won the first game being white and B won the second game being white.
- Each pair of games of a given opening line is 'compressed' or converted (rescored) into one game.
- Neither pair of games is discarded.
- For a given opening line, when the outcome of a game is repeated [(1-0 and 1-0) or (½-½ and ½-½) or (0-1 and 0-1)], the pair is rescored as a draw for both engines because both engines finished 1-1.
- For a given opening line, when the outcome of a game is not repeated [(1-0 and ½-½) or (1-0 and 0-1) or (½-½ and 1-0) or (½-½ and 0-1) or (0-1 and 1-0) or (0-1 and ½-½)], the pair is rescored as a win for the winning engine and as a lose for the loser engine; 1.5-0.5 and 2-0 (reciprocally 0.5-1.5 and 0-2) are treated exactly the same, as a win or a lose, regardless of the winning/losing margin of the two games.
- Then, get Ordo ratings with the rescored outcomes.
Code: Select all
A-B B-A
1-0 1-0
1-0 ½-½
- The standard match result is 2.5-1.5 in favour of A, then get ratings from there.
- The rescored match result is 1.5-0.5 in favour of A, because the first pair of games was drawn (½ point for each engine) and the second pair of games was won by A (1 point for A) and lost by B (0 points for B). Then get ratings from there.
Regards from Spain.
Ajedrecista.
-
- Posts: 2095
- Joined: Wed Jul 13, 2011 9:04 pm
- Location: Madrid, Spain.
Re: SPCC: Testrun of Stockfish 16.1 finished.
Hello:
I hope no typos. The difference between normal and rescored scores (and ratings) are the pairs of games where only one game is drawn, as expected, but it is quantified now.
Regards from Spain.
Ajedrecista.
I did some math:Ajedrecista wrote: ↑Fri Mar 01, 2024 10:03 pm[...]
An interesting exercise would be to compare standard and rescored results, to find possible relations between them.
[...]
Code: Select all
Normal Rescored Number (1/n)
A-B B-A A-B A-B pairs Probability
-----------------------------------------------------
1-0 1-0 1 -1 ½-½ nAB pAB
1-0 ½-½ 1½- ½ 1-0 nAd pAd
1-0 0-1 2 -0 1-0 nAA pAA
½-½ 1-0 ½-1½ 0-1 ndB pdB
½-½ ½-½ 1 -1 ½-½ ndd pdd
½-½ 0-1 1½- ½ 1-0 ndA pdA
0-1 1-0 0 -2 0-1 nBB pBB
0-1 ½-½ ½-1½ 0-1 nBd pBd
0-1 0-1 1 -1 ½-½ nBA pBA
-----------------------------------------------------
n 1
Code: Select all
From A's POV:
Normal score: µA = 1*pAA + 0.75*(pAd + pdA) + 0.5*(pAB + pdd + pBA) + 0.25*(pdB + pBd) + 0*pBB
Rescored score: mA = 1*(pAd + pAA + pdA) + 0.5*(pAB + pdd + pBA) + 0*(pdB + pBB + pBd)
mA - µA = 0.25*(pAd + pdA) - 0.25*(pdB + pBd)
mA - µA = (pAd - pdB + pdA - pBd)/4
------------
From B's POV:
Normal score: µB = 1*pBB + 0.75*(pdB + pBd) + 0.5*(pAB + pdd + pBA) + 0.25*(pAd + pdA) + 0*pAA
Rescored score: mB = 1*(pdB + pBB + pBd) + 0.5*(pAB + pdd + pBA) + 0*(pAd + pAA + pdA)
mB - µB = 0.25*(pdB + pBd) - 0.25*(pAd + pdA)
mB - µB = (- pAd + pdB - pdA + pBd)/4
------------
mB - µB = -(mA - µA)
Regards from Spain.
Ajedrecista.
-
- Posts: 2697
- Joined: Sat Sep 03, 2011 7:25 am
- Location: Berlin, Germany
- Full name: Stefan Pohl
Re: SPCC: Testrun of Stockfish 16.1 finished.
Correct.Ajedrecista wrote: ↑Fri Mar 01, 2024 10:03 pm Hello Ernest:
UHO 2024 openings can be downloaded from Stefan's site, as well as the so called GamePairs rescoring tool. I have understood the following from the source code of the tool, but please Stefan correct me if I am wrong:
- My notation for each pair of games of a given opening line is (A vs B) and (B vs A). For example, if I write (1-0 and 1-0), I mean that A won the first game being white and B won the second game being white.
- Each pair of games of a given opening line is 'compressed' or converted (rescored) into one game.
- Neither pair of games is discarded.
- For a given opening line, when the outcome of a game is repeated [(1-0 and 1-0) or (½-½ and ½-½) or (0-1 and 0-1)], the pair is rescored as a draw for both engines because both engines finished 1-1.
- For a given opening line, when the outcome of a game is not repeated [(1-0 and ½-½) or (1-0 and 0-1) or (½-½ and 1-0) or (½-½ and 0-1) or (0-1 and 1-0) or (0-1 and ½-½)], the pair is rescored as a win for the winning engine and as a lose for the loser engine; 1.5-0.5 and 2-0 (reciprocally 0.5-1.5 and 0-2) are treated exactly the same, as a win or a lose, regardless of the winning/losing margin of the two games.
Please imagine the following mini-match consisting on two pairs of games with the following results:
- Then, get Ordo ratings with the rescored outcomes.
Code: Select all
A-B B-A 1-0 1-0 1-0 ½-½
- The standard match result is 2.5-1.5 in favour of A, then get ratings from there.
An interesting exercise would be to compare standard and rescored results, to find possible relations between them.
- The rescored match result is 1.5-0.5 in favour of A, because the first pair of games was drawn (½ point for each engine) and the second pair of games was won by A (1 point for A) and lost by B (0 points for B). Then get ratings from there.
Regards from Spain.
Ajedrecista.
"An interesting exercise would be to compare standard and rescored results, to find possible relations between them."
On my main-site, there are both ratinglists: first the normal ratinglist, followed by the gamepair-ratinglist... There you can compare all results. And, of course, you can compare the single results of each engine and each engine head-to-head, too:
https://www.sp-cc.de/files/programs.dat
https://www.sp-cc.de/files/uho_top15_gamepair.txt
-
- Posts: 2095
- Joined: Wed Jul 13, 2011 9:04 pm
- Location: Madrid, Spain.
Re: SPCC: Testrun of Stockfish 16.1 finished.
Hello Stefan:
My math of yesterday seems correct because I was able to construct some stats:
I rearranged the last data in a kind of a symmetric matrix for better visualization. Further rearrangements are possible if we want to group certain results into certain patterns.
Regards from Spain.
Ajedrecista.
Thank you for the confirmation and the links. I have taken the extreme result of SF 16.1 vs. Rebel EAS. I hope no typos:pohl4711 wrote: ↑Sat Mar 02, 2024 7:19 amCorrect.
"An interesting exercise would be to compare standard and rescored results, to find possible relations between them."
On my main-site, there are both ratinglists: first the normal ratinglist, followed by the gamepair-ratinglist... There you can compare all results. And, of course, you can compare the single results of each engine and each engine head-to-head, too:
https://www.sp-cc.de/files/programs.dat
https://www.sp-cc.de/files/uho_top15_gamepair.txt
Code: Select all
Normal results:
https://www.sp-cc.de/files/programs.dat
1 Stockfish 16.1 240224
Rebel EAS avx2 : 1000 (+557,=440,- 3), 77.7 %
------------
Rescored results:
https://www.sp-cc.de/files/uho_top15_gamepair.txt
1) Stockfish 16.1 240224
vs. : games ( +, =, -), (%) : Diff, SD, CFS (%)
Rebel EAS avx2 : 500 ( 491, 9, 0), 99.1 : +728, 7, 100.0
Code: Select all
S = SF ; d = draw ; R = Rebel
n = 500
mS = 0.991 = (491 + 0.5*9)/500
µS = 0.777 = (557 + 0.5*440)/1000
mS - µS = 0.214 = (pSd - pdR + pdS - pRd)/4 // (Eq. 1)
pSd - pdR + pdS - pRd) = 0.856
Rebel did not won any pair:
pdR + pRR + pRd = 0 // Each value is 0 because is limited to the closed interval [0, 1].
// 3 values out of 9 are known at this point.
Rebel drew 9 pairs out of 500:
pSR + pdd + pRS = 9/500 = 0.018
Rebel won 3 games out of 1000 games (500 pairs):
pSR + pRR + pRS = 3/500 = 0.006 // We know that pRR = 0
pSR + pRS = 0.006 // 2 values are correlated at this point.
pdd = 0.018 - 0.006 = 0.012 // 4 values out of 9 are known at this point.
(Eq. 1) knowing that pdR = 0 and pRd = 0
pSd + pdS = 0.856
// 4 values are correlated (2 and 2) at this point.
The sum of all p is 1. We can compute the last value:
pSS = 1 - 0.856 - 0.018 = 0.126
Code: Select all
SUMMARY:
pSR + pRS = 0.006 → Draw (Pair: 1 win for SF and 1 win for Rebel).
pSd + pdS = 0.856 → SF wins (Pair: 1 win for SF and one draw).
pSS = 0.126 → SF wins (Pair: 2 wins for SF).
pdR + pRd = 0 → Rebel wins (Pair: 1 draw and 1 win for Rebel).
pdd = 0.012 → Draw (Pair: 2 draws).
pRR = 0 → Rebel wins (Pair: 2 wins for Rebel).
SUM = 1
Code: Select all
Multiplying by n = 500 to get the pairs:
SF won 2.0-0.0: 63 pairs = n*pSS = 500*0.126
SF won 1.5-0.5: 428 pairs = n*(pSd + pdS) = 500*0.856
SF drew 1.0-1.0: 9 pairs = n*[(pSR + pRS) + pdd] = 500*(0.006 + 0.012)
SF lost 0.5-1.5: 0 pairs = n*(pdR + pRd) = 500*0
SF lost 0.0-2.0: 0 pairs = n*pRR = 500*0
Code: Select all
Comprobation of games:
SF won 557 games = n*[ 2*pSS + (pSd + pdS) + (pSR + pRS)] = 500*(0.252 + 0.856 + 0.006)
SF drew 440 games = n*[(pSd + pdS) + 2*pdd + (pdR + pRd)] = 500*(0.856 + 0.024 + 0 )
SF lost 3 games = n*[(pSR + pRS) + (pdR + pRd) + 2*pRR ] = 500*(0.006 + 0 + 0 )
Regards from Spain.
Ajedrecista.