Summary of CM11 Ratings so far
------------------------------
2860 CM11 Default *4CPU*
2815 CM11 Default *2CPU*
2794 CM11 Gil-Galad
2787 CM11 Aiglos II
2786 CM11 NightFire
2782 CM11 Glorfindel
2780 CM11 Galadriel II
2779 CM11 Balrog
2779 CM11 Archangel
2778 CM11 Sauron
2778 CM11 Toxic
2775 CM11 Melkor
2775 CM11 Gwaihir
2774 CM11 Attakinski
2772 CM11 Razorback
2770 CM11 Tomahawk Sel 14
2769 CM11 Glorfindel Sel 14
2767 CM11 Gilgamesh
2767 CM11 Default
2767 CM11 Tomahawk
2764 CM11 Razorback II
2763 CM11 Default Sel 21
2761 CM11 Silver Fern
2755 CM11 Thorondor X
2754 CM11 Orcrist
2749 CM11 Aristocrat
2740 CM11 Default Sel 16
2694 CM11 ChesGek+
Settings, logos and more details regarding ratings can be found here:
http://kirr.homeunix.org/chess/discussi ... f=7&t=3054
CM11 settings testing - Orcrist added - Granite next up
Moderator: Ras
-
- Posts: 44718
- Joined: Sun Feb 26, 2006 10:52 am
- Location: Auckland, NZ
CM11 settings testing - Orcrist added - Granite next up
gbanksnz at gmail.com
-
- Posts: 811
- Joined: Wed Mar 08, 2006 10:07 pm
Re: CM11 settings testing - Orcrist added - Granite next up
hi graham,
I wonder what the error bars are for these results? Is gil-gilad significantly better than default? It is just that if you run enough personalities, some will appear higher by chance.
best
J
I wonder what the error bars are for these results? Is gil-gilad significantly better than default? It is just that if you run enough personalities, some will appear higher by chance.
best
J
Re: CM11 settings testing - Orcrist added - Granite next up
Joseph,
Each CM personality plays 600 games (50 games against the same 12 opponents) which is a reasonable number, although more would be better. The error bars are +/- 23 ELO.
The BayesELO "likelihood of superiority" calculation tells a good story:
Percentage likelihood of being stronger than default:
Gil-Galad - 96.1%
Aiglos II - 92.4%
Nightfire - 91.4%
Glorfindel - 85.2%
Balrog - 81.4%
Galadriel II - 78.9%
Archangel - 77.6%
Toxic - 77.7%
Sauron - 77.0%
The rest are <75% probability.
I think what Graham wants out of this as much as anything is just an indication of what are the strongest group of settings that warrant further testing on the main CCRL lists. In terms of the personalities relative to each other, sometimes it is just too close to call - for example the probability that Gil-Galad is stronger than Aiglos II is just 63.1%
Each CM personality plays 600 games (50 games against the same 12 opponents) which is a reasonable number, although more would be better. The error bars are +/- 23 ELO.
The BayesELO "likelihood of superiority" calculation tells a good story:
Percentage likelihood of being stronger than default:
Gil-Galad - 96.1%
Aiglos II - 92.4%
Nightfire - 91.4%
Glorfindel - 85.2%
Balrog - 81.4%
Galadriel II - 78.9%
Archangel - 77.6%
Toxic - 77.7%
Sauron - 77.0%
The rest are <75% probability.
I think what Graham wants out of this as much as anything is just an indication of what are the strongest group of settings that warrant further testing on the main CCRL lists. In terms of the personalities relative to each other, sometimes it is just too close to call - for example the probability that Gil-Galad is stronger than Aiglos II is just 63.1%
-
- Posts: 44718
- Joined: Sun Feb 26, 2006 10:52 am
- Location: Auckland, NZ
Re: CM11 settings testing - Orcrist added - Granite next up
I'm looking at three different CM rating lists - Ray's, Cock de Gorter's and Luis Barutti's to try and find some common ground.ozziejoe wrote:hi graham,
I wonder what the error bars are for these results? Is gil-gilad significantly better than default? It is just that if you run enough personalities, some will appear higher by chance.
best
J
Settings that perform well in all three lists are likely to be good ones hopefully.
We're talking 600 games per setting in Ray's list plus many hundreds of extra games from the others.
Regards, Graham.
gbanksnz at gmail.com