What was the draw rate? Your openings favor huge draw rates and compress Elo differences. Can you write WDL numbers?
CCRL flawed testing : SF12 above SF12 8CPU
Moderators: hgm, Rebel, chrisw
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: CCRL flawed testing : SF12 above SF12 8CPU
-
- Posts: 550
- Joined: Tue Nov 19, 2019 8:48 pm
- Full name: Alayan Feh
Re: CCRL flawed testing : SF12 above SF12 8CPU
These elo differences are completely unreliable because of the error bars.mvanthoor wrote: ↑Thu Oct 08, 2020 7:22 pm Can't it just be a scaling problem with Stockfish? According to CEGT, Stockfish 11 is only 6 ELO stronger @ 8CPU than Stockfish 11 @ 4CPU:
http://www.cegt.net/40_40%20Rating%20Li ... liste.html
This would probably mean that, if you run a long enough test between Stockfish 11 @ 4CPU and 8CPU, the result would be almost, if not equal.
The 1 vs 8 CPU at CCRL is so wrong that it's rather easy to show in a h2h test or a test vs common opponents that it's incorrect.
I saw SF 12 8CPU with zero losses vs the 1 core after 60+ games but I don't know the final results. Anyway, 8CPU had a huge win:loss ratio.
-
- Posts: 1971
- Joined: Wed Jul 13, 2011 9:04 pm
- Location: Madrid, Spain.
Re: CCRL flawed testing: SF12 above SF12 8CPU.
Hello Kai and Alayan:
I computed draw ratios for some K values just to get an idea:
Of course, the extreme cases are:
So 10.22% < D < 79.57% in this case, with great chances of being around 75%. If there were 200 games, then WDL figures must be in steps of 0.5% and some values of my table can be discarded like K = 7. K has low chances of being an integer, after all. I picked integer values for K just to get a rough idea of the draw ratio.
Regards from Spain.
Ajedrecista.
While we wait Mark's answer, I did some math. Based on my own post when SF 12 was released, the draw ratio is:
Code: Select all
W = K*L >= L // K >= 1
K*L + D + L = 1
(K + 1)*L = 1 - D
L = (1 - D)/(K + 1)
Elo_diff. = 400*log10{[2*K - (K - 1)*D]/[2 + (K - 1)*D]}
D = 2*[10^(Elo_diff./400) - K]/{(1 - K)*[10^(Elo_diff./400) + 1]}
Code: Select all
Elo difference = 72 Elo.
W = K*L
K W D L
2 40.86% 38.71% 20.43%
3 30.65% 59.14% 10.22%
4 27.24% 65.95% 6.81%
5 25.54% 69.35% 5.11%
6 24.52% 71.40% 4.09%
7 23.84% 72.76% 3.41%
8 23.35% 73.73% 2.92%
9 22.99% 74.46% 2.55%
10 22.70% 75.03% 2.27%
11 22.47% 75.48% 2.04%
12 22.29% 75.85% 1.86%
13 22.13% 76.16% 1.70%
14 22.00% 76.43% 1.57%
15 21.89% 76.65% 1.46%
16 21.79% 76.84% 1.36%
17 21.71% 77.01% 1.28%
18 21.63% 77.16% 1.20%
19 21.57% 77.30% 1.14%
20 21.51% 77.42% 1.08%
Code: Select all
Assuming Elo_diff. >= 0 Elo:
Elo_diff. = 400*log10[score/(1 - score)]
score = 1/[1 + 10^(-Elo_diff./400)] = W + D = 1/2 + D_min.
D_min. = 1/[1 + 10^(-Elo_diff./400)] - 1/2
// Elo_diff. = 72 ==> D_min. ~ 10.22%
W + D_max. = 1 // L = 0
Elo_diff. = 400*log10[(W + D_max./2)/(D_max./2)]
Elo_diff. = 400*log10[(1 - D_max. + D_max./2)/(D_max./2)]
10^(Elo_diff./400) = (1 - D_max./2)/(D_max./2) = 2/D_max. - 1
D_max. = 2/[1 + 10^(Elo_diff./400)]
// Elo_diff. = 72 ==> D_max. ~ 79.57%
Regards from Spain.
Ajedrecista.
Last edited by Ajedrecista on Thu Oct 08, 2020 9:16 pm, edited 1 time in total.
-
- Posts: 2727
- Joined: Wed May 12, 2010 10:00 pm
Re: CCRL flawed testing : SF12 above SF12 8CPU
I will post this when I get home. The book should not matter. As I use the same opening standards as CCRL.
"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.
But my words like silent raindrops fell. And echoed in the wells of silence.
-
- Posts: 3293
- Joined: Wed Mar 08, 2006 8:15 pm
Re: CCRL flawed testing : SF12 above SF12 8CPU
Fastgm page has test with SF8 and there after 3000 games 8th vs 1th was +158 ELO. SF wiki has value of +178 ELO after 1000 games. Both 60+0.6 games. But we are so much higher level now.
Jouni
-
- Posts: 2727
- Joined: Wed May 12, 2010 10:00 pm
Re: CCRL flawed testing : SF12 above SF12 8CPU
Remember we are making a big assumption here. SF 12 is not a A/B engine. Just because SF 8 scales this way. Does not mean SF NNUE scales the same way. It is still possible this is normal.
"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.
But my words like silent raindrops fell. And echoed in the wells of silence.
-
- Posts: 2727
- Joined: Wed May 12, 2010 10:00 pm
Re: CCRL flawed testing : SF12 above SF12 8CPU
mwyoung wrote: ↑Tue Oct 06, 2020 9:11 pmThanks,Laskos wrote: ↑Tue Oct 06, 2020 8:53 pmmwyoung wrote: ↑Tue Oct 06, 2020 8:35 pmNow we have some limited data to work with. The data said we have a issue if true. Why? Bad testing, bad hardware configuration, or still is there a issue with SF 12 on 8 cores. Since we have only CCRL data for 8 cores with SF 12.Laskos wrote: ↑Tue Oct 06, 2020 8:21 pmI checked on 4 cores with my i7 CPU and the result against 1 core in bullet was in excess of 100 Elo points. It's very unlikely that it regresses to 8 cores. In fact I saw results on SF testing framework on many cores (64?) showing good scaling with cores of SF NNUE, at least at short time controls.mwyoung wrote: ↑Tue Oct 06, 2020 8:09 pmThen you are assuming this is true then with STOCKFISH 12. So you have no data! This is why you always fall off the rails.Laskos wrote: ↑Tue Oct 06, 2020 8:02 pmWhat's not clear? 3 doublings in cores mean nowadays at least 2.5 real effective doublings in TC. Each effective doubling in TC in these blitz conditions means at very least 40 Elo points, therefore at very least 80 Elo points 1 core -> 8 cores. In fact more likely 120 - 140 Elo points. That result posted in OP and discrepancy beyond doubt break the Elo model.mwyoung wrote: ↑Tue Oct 06, 2020 7:43 pm"The difference should be (much) in excess of 80 Elo points in these conditions."Laskos wrote: ↑Tue Oct 06, 2020 7:37 pmThe difference should be (much) in excess of 80 Elo points in these conditions, here it is -8 +/- 36 Elo points 2 standard deviations, therefore the mismatch is highly statistically significant. The explanation is that Leela-like MCTS engines in a pool of AB engines don't obey the Elo model, and this was discussed awhile ago here.mwyoung wrote: ↑Tue Oct 06, 2020 6:48 pm"Yes, underperformance of 8CPU SF12 is statistically significant"Laskos wrote: ↑Tue Oct 06, 2020 6:16 pm5 out of 6 opponents of SF12 8CPU are Leela-like MCTS engines which compress Elo differences when playing against AB engines (was discussed more than a year ago here). Yes, underperformance of 8CPU SF12 is statistically significant, despite not that large number of games.Alayan wrote: ↑Tue Oct 06, 2020 6:00 pm Both got a very different mix of opponents. Both don't have that much games so small sample size doesn't help, but :
Elo transitivity flat out doesn't work, and we can get absurd results like this if the opponent mix is different enough.Code: Select all
Stockfish 12 64-bit 3666 +22 −22 89.3% −325.7 21.0% Stockfish 12 64-bit 8CPU 3658 +28 −28 63.4% −75.1 70.7%
Why is this true?
What should be the Elo difference with testing between SF 12 on 1 core vs SF 12 on 8 cores at this fast TC?
By CCRL own testing results. SF 12 on 8 cores could have a rating of 3686, and SF 12 on 1 core could have a rating of 3644.
This is not even considering the hardware CCRL is using. And is it configured correctly. As in did they lock the cpu core speed of the CPUs, CPU ramping, and other considerations. This can have a big impact on performance at these TC.
Why should it be over 80 Elo with SF 12. 1 core vs 8 cores. I have not tested this. What results are you looking at that do not agree with CCRL.
If you are correct. Then why is it so off. As I said before...
This is not even considering the hardware CCRL is using. And is it configured correctly. As in did they lock the cpu core speed of the CPUs, CPU ramping, and other considerations. This can have a big impact on performance at these TC.
So this could be a issue with Stockfish 12 with 8 cores, and CCRL testing could be correct.
We need to rule out a SF 12 issue first. Before looking at other reasons like CCRL.
You are free to rule out anything you want.
If CCRL has really bad data here. I would like to be fair, and show it with some kind of data.
Stockfish 12 (1 core) vs Stockfish 12 (8 cores) (TC=2m+1s) (200 Rounds)
Live:
Code: Select all
Result:
--------------------------------------------------------------------------------------
# name games wins draws losses score los% elo+/-
1. Stockfish 12 dup 8 cores 200 44 156 0 122.0 100.0 77.7
2. Stockfish 12 dup 1 core 200 0 156 44 78.0 0.0 -77.7
Cross table:
--------------------------------------------------------------------------------------
# name score games 1 2
1. Stockfish 12 dup 8 cores 122.0 200 x 1===1===============1======11======1==1===========1=1=====1===11===1===11==1===1========1=1=1==1=1========1===1=========1==1===========1===11======1====1====1==1====1=====1======1====111==1=11===1===1
2. Stockfish 12 dup 1 core 78.0 200 0===0===============0======00======0==0===========0=0=====0===00===0===00==0===0========0=0=0==0=0========0===0=========0==0===========0===00======0====0====0==0====0=====0======0====000==0=00===0===0 x
Tech:
--------------------------------------------------------------------------------------
Tech (average nodes, depths, time/m per move, others per game), counted for computing moves only, ignored moves with zero nodes:
# name nodes/m NPS depth/m time/m moves time
1. Stockfish 12 dup 8 cores 28475K 9905390 35.2 2.9 48.0 137.8
2. Stockfish 12 dup 1 core 3474K 1180298 29.1 2.9 48.1 141.4
all --- 15585K 5486571 32.1 2.9 48.0 139.6
Tournament finished! Elapsed: 15:46:49
"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.
But my words like silent raindrops fell. And echoed in the wells of silence.
-
- Posts: 476
- Joined: Sun Mar 17, 2019 12:00 pm
- Full name: Henk Drost
Re: CCRL flawed testing : SF12 above SF12 8CPU
-
- Posts: 2727
- Joined: Wed May 12, 2010 10:00 pm
Re: CCRL flawed testing : SF12 above SF12 8CPU
If this is not the fault of SF 12.
Then I guess CCRL has a lot to explain...
Last edited by mwyoung on Fri Oct 09, 2020 12:22 am, edited 1 time in total.
"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.
But my words like silent raindrops fell. And echoed in the wells of silence.
-
- Posts: 476
- Joined: Sun Mar 17, 2019 12:00 pm
- Full name: Henk Drost
Re: CCRL flawed testing : SF12 above SF12 8CPU
Compare SF9 and SF10... (4CPU)mwyoung wrote: ↑Fri Oct 09, 2020 12:18 amThen I guess CCRL has a lot to explain...
For a long time SF9 was actually ahead of SF10 on the CCRL 40/4 list.
As Alayan has already said; when they don't share the same opponent pool, things get fuzzy.
Is SF10 only 2 elo stronger than SF9?
http://ccrl.chessdom.com/ccrl/404/cgi/c ... +opponents