I didn't followed this thread, if I understood you, these is a selection of performances of Fizbo against 40 engines, right? Nothing wrong with the result. Each mini-match (25 games) has 2 SD about 110 ELO points. Assuming normal distribution in performances in these mini-matches, if I divide the Gaussian in two equal left-hand and right-hand parts, the averaged distance between them is about 1.7 SD or ~100 ELO points. You get 123 or so, but things of this nature happen with 40 engines as opponents. Especially if you selected Fizbo for illustration, having a bit exaggerated distortion. Nothing wrong, and don't separate like this the performances, you can just see a magnified noise. Maybe it would be instructive if you do the same with only 2 engines, separate a 1000 games match between them in 40 minimatches of 25 games each, and then compare the average of performances in the top 20 minimatches with that of 20 bottom minimatches, the difference should be close to 100 ELO points, similar to what you see here. If you do minimatches of 6 games each, 167 or whatever not too small of them, you will see a difference of 200 ELO points between top and bottom halves.Frank Quisinsky wrote:Hi there,
at the moment Fizbo 1.6 x64 is still running vs. 59 opponents.
Results after round 25 vs. 20 strongest / weakest opponents!
Error = 22.4 or 24.0Code: Select all
14 Fizbo 1.6 x64 strong : 2908.9 499 56.5 44.9 24.0 2860.2 11.5 20.0 28 Fizbo 1.6 x64 week : 2785.9 499 53.3 45.3 22.4 2763.9 11.1 20.0
OK
24.0 x 2 = 48 Elo
Result = 123 Elo differents
- 48 = 75 Elo different to ErrorBar after 500 games vs. 20 opponents. Not new for me ... the average is 55 Elo with other opponents (20 opponents with 50 games).
I made some calculations and find out that with 50 games per paring and 26 opponents the ErrorBar is correct. With more opponents the ErrorBar is smaler and with lesser opponents the ErrorBar is bigger.
Different opponents different results.
And a Rating List with lesser opponents ... to read tea leaves is more interesting.
Thats the bad point we have in our rating calculation programs because factor ... quantity of opponents is missing. We are looking on quantity of games only and this is absolutely wrong.
Here the Fizbo results:
Code: Select all
14) Fizbo 1.6 x64 stark 2908.9 : 499 (+170,=224,-105), 56.5 % vs. : games ( +, =, -), (%) : Diff, SD, CFS (%) Komodo 9.3 x64 : 25 ( 0, 7, 18), 14.0 : -276.6, 15.0, 0.0 Stockfish 7 KP BMI2 x64 : 25 ( 0, 9, 16), 18.0 : -267.3, 14.4, 0.0 Houdini 4 STD B x64 : 25 ( 2, 11, 12), 30.0 : -189.9, 14.3, 0.0 Fire 4 x64 : 25 ( 0, 12, 13), 24.0 : -145.7, 14.0, 0.0 Equinox 3.30 x64 : 25 ( 2, 14, 9), 36.0 : -96.9, 14.1, 0.0 Nirvanachess 2.2 POP x64 : 25 ( 4, 13, 8), 42.0 : -34.0, 13.6, 0.6 Texel 1.05 x64 : 24 ( 4, 16, 4), 50.0 : +4.1, 13.9, 61.6 Naum 4.6 x64 : 25 ( 4, 17, 4), 50.0 : +18.9, 13.4, 92.1 Hakkapeliitta 3.0 x64 : 25 ( 10, 11, 4), 62.0 : +72.9, 13.5, 100.0 Shredder 12 x64 : 25 ( 7, 17, 1), 62.0 : +110.6, 13.4, 100.0 Junior 13.3.00 x64 : 25 ( 9, 12, 4), 60.0 : +111.9, 13.5, 100.0 DiscoCheck 5.2.1 x64 : 25 ( 8, 16, 1), 64.0 : +131.7, 13.3, 100.0 Booot 5.2.0 x64 : 25 ( 16, 7, 2), 78.0 : +136.8, 13.6, 100.0 Deuterium 14.3.34.130 POP x64 : 25 ( 9, 15, 1), 66.0 : +148.3, 13.3, 100.0 Doch 1.3.4 JA x64 : 25 ( 14, 10, 1), 76.0 : +162.9, 13.7, 100.0 MinkoChess 1.3 JA POP x64 : 25 ( 17, 7, 1), 82.0 : +184.9, 13.3, 100.0 Murka 3 x64 : 25 ( 14, 9, 2), 74.0 : +201.2, 13.6, 100.0 Nemo 1.01 Beta POP x64 : 25 ( 13, 11, 1), 74.0 : +201.2, 13.7, 100.0 Scorpio 2.77 JA POP x64 : 25 ( 18, 5, 2), 82.0 : +233.5, 14.1, 100.0 The Baron 3.29 x64 : 25 ( 19, 5, 1), 86.0 : +264.3, 13.8, 100.0
We can do what we do ... since years.Code: Select all
28) Fizbo 1.6 x64 schwach 2785.9 : 499 (+153,=226,-120), 53.3 % vs. : games ( +, =, -), (%) : Diff, SD, CFS (%) GullChess 3.0 BMI2 x64 : 25 ( 0, 7, 18), 14.0 : -264.3, 13.5, 0.0 Critter 1.6a x64 : 25 ( 1, 12, 12), 28.0 : -209.3, 13.5, 0.0 iCE 3.0 v658 POP x64 : 25 ( 1, 15, 9), 34.0 : -145.4, 13.0, 0.0 Sting SF 6 x64 : 25 ( 2, 7, 16), 22.0 : -139.3, 12.7, 0.0 Cheng 4.39 x64 : 25 ( 7, 12, 6), 52.0 : -16.8, 12.5, 8.9 Quazar 0.4 x64 : 25 ( 6, 11, 8), 46.0 : +13.7, 12.6, 86.2 Alfil 15.04 C# Beta 24 x64 : 25 ( 5, 13, 7), 46.0 : +29.7, 12.5, 99.1 Spark 1.0 x64 : 25 ( 7, 13, 5), 54.0 : +36.1, 12.5, 99.8 Crafty 25.0 DC x64 : 25 ( 8, 13, 4), 58.0 : +49.1, 13.1, 100.0 TogaII 280513 Intel w32 : 25 ( 10, 9, 6), 58.0 : +51.8, 12.7, 100.0 Atlas 3.80 x64 : 25 ( 10, 9, 6), 58.0 : +54.1, 12.9, 100.0 Gaviota 1.0 AVX x64 : 25 ( 8, 14, 3), 60.0 : +59.6, 12.4, 100.0 Dirty 03NOV2015 POP x64 : 24 ( 8, 12, 4), 58.3 : +63.6, 12.8, 100.0 Bobcat 7.1 x64 : 25 ( 10, 12, 3), 64.0 : +72.3, 13.0, 100.0 EXchess 7.71b x64 : 25 ( 10, 12, 3), 64.0 : +74.3, 13.2, 100.0 GNU Chess5 5.60 x64 : 25 ( 11, 11, 3), 66.0 : +108.2, 12.7, 100.0 Glaurung 2.2 JA x64 : 25 ( 13, 9, 3), 70.0 : +126.1, 12.7, 100.0 Rhetoric 1.4.3 POP x64 : 25 ( 12, 11, 2), 70.0 : +135.0, 12.9, 100.0 BugChess2 1.9 POP x64 : 25 ( 10, 15, 0), 70.0 : +167.4, 13.1, 100.0 Frenzee 3.5.19 x64 : 25 ( 14, 9, 2), 74.0 : +176.4, 13.0, 100.0
Most interesting is the result we like.
Hard but fact.
Of course the right rating is in the middle ... around 2.840 Elo. I wrote more about it in German language in CSS Forum.
Best
Frank
PS: Logical ... with 19 opponents more as 123 Elo ... with more as 20 opponents lesser as 123 Elo.
Maybe I didn't follow your question, though.