Your assumption that the MP scaling of top engines is comparable is not something that is proved so I think that you cannot decide that 12 cores 1+1 is the same as 1 core 10+10.Laskos wrote:150-200 Elo points from 1 to 12 cores at 1'+1'' are to be expected, but I don't agree that 1-core results are unrelated to 12-core results. The MP scaling of top engines is comparable. In your place I would play 10'+10'' games on one core or on several cores SF against Houdini Contempt=0 until a SPRT stop, to dispel some myths (that SF does not scale better, for example). There were many 1-core results, but none of them had LOS of 98% SF against Houdini 4 Contempt=0, and that happens at somewhat larger TC than blitz. I will now wait for a SPRT stop in Cutechess-Cli to show that SF overtook Houdini (if that is the case).ouachita wrote:There are a lot of one core tests posted here, so I wanted to post these results to again highlight the point that one core results are not related to multi-core results, in this case, 12 cores:
Also, I misspoke by saying 12 cores win >90%. Here, 12 cores scored 76.5, but had 100% of wins.Code: Select all
1-22-14 SF0901014IP-12 core v SF 080114-1 core 1+1 50 positions, alternating colors defaults # of cores is sole setting difference. 1 Stockfish 090114IP 64 SSE4.2 +205 +53/=47/-0 76.50% 76.5/100 2 Stockfish 080114 64 SSE4.2 -205 +0/=47/-53 23.50% 23.5/100
Food for thought.
Stockfish seems definitely the strongest engine
Moderators: hgm, Rebel, chrisw
-
- Posts: 10297
- Joined: Thu Mar 09, 2006 12:37 am
- Location: Tel-Aviv Israel
Re: Stockfish seems definitely the strongest engine
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: Stockfish seems definitely the strongest engine
It is a reasonable assumption. The MP scaling of Stockfish and Houdini (not Komodo, though) can be measured by time-to-depth tests on say 100 positions at certain approximate time control (it's time dependent), and it gave, for example, on 1->4 cores, 3.15 for Houdini and 3.05 for Stockfish, only a 3% difference, or ~3 Elo points MP scaling difference. I doubt that to 12 cores it will be more than 10 Elo points difference. While the scaling with time of Stockfish compared to Houdini (45%->56%) from ultra-fast to longer than blitz is about 70-75 Elo points. So:Uri Blass wrote: Your assumption that the MP scaling of top engines is comparable is not something that is proved so I think that you cannot decide that 12 cores 1+1 is the same as 1 core 10+10.
1. MP scaling differences are relatively minor.
2. Testing engines on many cores and large TC is almost impossible for desired number of games (needed to have conclusive LOS or a SPRT stop between closely matched engines).
-
- Posts: 1971
- Joined: Wed Jul 13, 2011 9:04 pm
- Location: Madrid, Spain.
Re: Stockfish seems definitely the strongest engine.
Hello:
I finally did the changes to find the median of the distribution. I also think that the results are more accurate now. With bayeselo = 60.75, drawelo = 270, alpha = beta = 0.05, bayeselo_0 = 0 and bayeselo_1 = 30, I ran 500000 simulations:
The stronger engine failed 57 times out of 500,000 simulations. I manually found the only time that failed with more wins than loses:
If you wanted a number for the median under your assumptions, here you have it: 248.
Regards from Spain.
Ajedrecista.
I finally did the changes to find the median of the distribution. I also think that the results are more accurate now. With bayeselo = 60.75, drawelo = 270, alpha = beta = 0.05, bayeselo_0 = 0 and bayeselo_1 = 30, I ran 500000 simulations:
Code: Select all
Shortest simulation: 39 games (simulation 204448).
Longest simulation: 1664 games (simulation 337409).
Average number of games per simulation: 275
Median of the distribution: 248
Type I errors (false positives): 0.00 %
Type II errors (false negatives): 0.01 %
There is 1 simulation with score > 50% that failed SPRT.
There are 0 simulations with score = 50% that failed SPRT.
Code: Select all
From 0 to 999 games: 499712 simulations ( 99.94 %); accumulated: 99.94 %.
From 1000 to 1999 games: 288 simulations ( 0.06 %); accumulated: 100.00 %.
Number of finished simulations: 500000.
Code: Select all
349585) FAIL after 886 games (+ 155 - 154 = 577).
Passes: 349547 Fails: 38
Regards from Spain.
Ajedrecista.
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: Stockfish seems definitely the strongest engine.
Thanks Jesus.Ajedrecista wrote:Hello:
I finally did the changes to find the median of the distribution. I also think that the results are more accurate now. With bayeselo = 60.75, drawelo = 270, alpha = beta = 0.05, bayeselo_0 = 0 and bayeselo_1 = 30, I ran 500000 simulations:
Code: Select all
Shortest simulation: 39 games (simulation 204448). Longest simulation: 1664 games (simulation 337409). Average number of games per simulation: 275 Median of the distribution: 248 Type I errors (false positives): 0.00 % Type II errors (false negatives): 0.01 % There is 1 simulation with score > 50% that failed SPRT. There are 0 simulations with score = 50% that failed SPRT.
The stronger engine failed 57 times out of 500,000 simulations. I manually found the only time that failed with more wins than loses:Code: Select all
From 0 to 999 games: 499712 simulations ( 99.94 %); accumulated: 99.94 %. From 1000 to 1999 games: 288 simulations ( 0.06 %); accumulated: 100.00 %. Number of finished simulations: 500000.
If you wanted a number for the median under your assumptions, here you have it: 248.Code: Select all
349585) FAIL after 886 games (+ 155 - 154 = 577). Passes: 349547 Fails: 38
Regards from Spain.
Ajedrecista.
Meanwhile I got SPRT stop in Cutechess-Cli with H1 accepted in 388 games for:
elo0=0
elo1=30
alpha=0.05
beta=0.05
Time control: 10m+10s
Each engine on 1 i7 core
Openings 8moves_v2:
Code: Select all
Score of SF 19.01 vs H4 Contempt 0: 107 - 74 - 207 [0.543] 388
ELO difference: 30
SPRT: H1 was accepted
Finished match
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: Stockfish seems definitely the strongest engine.
Summarizing, I got a SPRT stop for Stockfish 19.01.2014 being stronger than Houdini 4 Contempt 0 under Cutechess-Cli in these conditions:
TC: 10m + 10s
i7 2600k at 3.6 GHz 4 cores
4 parallel matches, each engine on one physical core
Hash: 512MB
Ponder: Off
EGTB: No
Openings: 8moves_v2
30 +/- 24 Elo points 2SD advantage for Stockfish 19.01 at 10m + 10s TC. LOS=99.3%
SPRT:
elo0=0
elo1=30
alpha, beta = 0.05
H1 accepted after 388 games:
H1 accepted means that 30 points advantage for Stockfish was accepted instead of H0 (equal strength).
The games are here:
http://speedy.sh/aw3tG/SPRT-RR.pgn
TC: 10m + 10s
i7 2600k at 3.6 GHz 4 cores
4 parallel matches, each engine on one physical core
Hash: 512MB
Ponder: Off
EGTB: No
Openings: 8moves_v2
Code: Select all
Program Score % Elo + - Draws
1 SF 19.01 : 210.5/388 54.3 3015 24 24 53.4 %
2 H4 Contempt 0 : 177.5/388 45.7 2985 24 24 53.4 %
SPRT:
elo0=0
elo1=30
alpha, beta = 0.05
H1 accepted after 388 games:
Code: Select all
Score of SF 19.01 vs H4 Contempt 0: 107 - 74 - 207 [0.543] 388
ELO difference: 30
SPRT: H1 was accepted
Finished match
The games are here:
http://speedy.sh/aw3tG/SPRT-RR.pgn
-
- Posts: 2727
- Joined: Wed May 12, 2010 10:00 pm
Re: Stockfish seems definitely the strongest engine.
Yes, Stockfish is the strongest engine. I have no doubt. Thanks you for your results.Laskos wrote:Summarizing, I got a SPRT stop for Stockfish 19.01.2014 being stronger than Houdini 4 Contempt 0 under Cutechess-Cli in these conditions:
TC: 10m + 10s
i7 2600k at 3.6 GHz 4 cores
4 parallel matches, each engine on one physical core
Hash: 512MB
Ponder: Off
EGTB: No
Openings: 8moves_v2
30 +/- 24 Elo points 2SD advantage for Stockfish 19.01 at 10m + 10s TC. LOS=99.3%Code: Select all
Program Score % Elo + - Draws 1 SF 19.01 : 210.5/388 54.3 3015 24 24 53.4 % 2 H4 Contempt 0 : 177.5/388 45.7 2985 24 24 53.4 %
SPRT:
elo0=0
elo1=30
alpha, beta = 0.05
H1 accepted after 388 games:H1 accepted means that 30 points advantage for Stockfish was accepted instead of H0 (equal strength).Code: Select all
Score of SF 19.01 vs H4 Contempt 0: 107 - 74 - 207 [0.543] 388 ELO difference: 30 SPRT: H1 was accepted Finished match
The games are here:
http://speedy.sh/aw3tG/SPRT-RR.pgn
"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.
But my words like silent raindrops fell. And echoed in the wells of silence.
-
- Posts: 2041
- Joined: Wed Mar 08, 2006 8:30 pm
Re: Stockfish seems definitely the strongest engine.
...and Stockfish is even better, since it seems that Stockfish 19.01.2014 is an unfortunate 10-Elo regression !Laskos wrote:Summarizing, I got a SPRT stop for Stockfish 19.01.2014 ...
(see http://ls-ratinglist.beepworld.de )
-
- Posts: 5566
- Joined: Tue Feb 28, 2012 11:56 pm
Re: Stockfish seems definitely the strongest engine.
Apparently because of a bug in the Makefile that resulted in the compilation of a 32-bit binary instead of 64-bit. It shouldn't be too difficult for Kai to tell whether the binary he tested is 32-bit or 64-bit.ernest wrote:...and Stockfish is even better, since it seems that Stockfish 19.01.2014 is an unfortunate 10-Elo regression !Laskos wrote:Summarizing, I got a SPRT stop for Stockfish 19.01.2014 ...
(see http://ls-ratinglist.beepworld.de )
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: Stockfish seems definitely the strongest engine.
Thanks for the info, there were 2 binaries released on 19th, mine is 64-bit. It can be easily seen from my initial post with NPS, 32-bit is ~30% slower. So, I used an uncorrupted SF.syzygy wrote:Apparently because of a bug in the Makefile that resulted in the compilation of a 32-bit binary instead of 64-bit. It shouldn't be too difficult for Kai to tell whether the binary he tested is 32-bit or 64-bit.ernest wrote:...and Stockfish is even better, since it seems that Stockfish 19.01.2014 is an unfortunate 10-Elo regression !Laskos wrote:Summarizing, I got a SPRT stop for Stockfish 19.01.2014 ...
(see http://ls-ratinglist.beepworld.de )
-
- Posts: 5566
- Joined: Tue Feb 28, 2012 11:56 pm
Re: Stockfish seems definitely the strongest engine.
Maybe I was too quick in concluding that the Makefile bug was behind the regression. Whether there was a regression or not seems to be a topic of discussion now on the FishCooking list. I can only assume the people there are aware of:Laskos wrote:Thanks for the info, there were 2 binaries released on 19th, mine is 64-bit. It can be easily seen from my initial post with NPS, 32-bit is ~30% slower. So, I used an uncorrupted SF.syzygy wrote:Apparently because of a bug in the Makefile that resulted in the compilation of a 32-bit binary instead of 64-bit. It shouldn't be too difficult for Kai to tell whether the binary he tested is 32-bit or 64-bit.ernest wrote:...and Stockfish is even better, since it seems that Stockfish 19.01.2014 is an unfortunate 10-Elo regression !Laskos wrote:Summarizing, I got a SPRT stop for Stockfish 19.01.2014 ...
(see http://ls-ratinglist.beepworld.de )
I guess if the ARCH parameter was set by hand (or script), the bug was not triggered. So the question is what was the source of the binary used by beepworld.de.Author: Joona Kiiski
Date: Sat Jan 25 11:29:32 2014 +0100
Timestamp: 1390645772
Do not set default value for architeture in Makefile
Fixes a regression that ARCH parameter was not properly validated.
Invalid value would default to generic 32-bit build.
No functional change.