IPON ratings calculation

Albert Silver · Post by **Albert Silver** » Thu Dec 29, 2011 3:54 am

I can only assume it is I who lack the proper understanding of how the ratings are calculated, but watching the IPON results of Critter 1.4, I began to wonder why its performance was 2978 after 2106 games. I took the 22 performances, added them up, and then divided them by 22 and came up with 3000.59, so why is the total performance 2978?

Code: Select all

Critter 1.4

Critter 1.4 SSE42 - Houdini 2.0 STD &#40;3022&#41;		48.5	-	47.5		50.52%		Perf=3025
Critter 1.4 SSE42 - Komodo 4 SSE42 &#40;2979&#41;		55.0	-	42.0		56.70%		Perf=3025
Critter 1.4 SSE42 - Deep Rybka 4.1 SSE42 &#40;2956&#41;		50.5	-	45.5		52.60%		Perf=2974
Critter 1.4 SSE42 - Stockfish 2.1.1 JA &#40;2941&#41;		56.5	-	39.5		58.85%		Perf=3003
Critter 1.4 SSE42 - Naum 4.2 &#40;2827&#41;		68.5	-	27.5		71.35%		Perf=2985
Critter 1.4 SSE42 - Deep Shredder 12 &#40;2800&#41;		75.0	-	21.0		78.13%		Perf=3021
Critter 1.4 SSE42 - Gull 1.2 &#40;2796&#41;		75.5	-	21.5		77.84%		Perf=3014
Critter 1.4 SSE42 - Deep Sjeng c't 2010 32b &#40;2788&#41;		70.0	-	26.0		72.92%		Perf=2960
Critter 1.4 SSE42 - Spike 1.4 32b &#40;2784&#41;		68.5	-	27.5		71.35%		Perf=2942
Critter 1.4 SSE42 - Protector 1.4.0 &#40;2760&#41;		78.5	-	16.5		82.63%		Perf=3030
Critter 1.4 SSE42 - Hannibal 1.1 &#40;2757&#41;		72.0	-	23.0		75.79%		Perf=2955
Critter 1.4 SSE42 - spark-1.0 SSE42 &#40;2757&#41;		77.0	-	18.0		81.05%		Perf=3009
Critter 1.4 SSE42 - HIARCS 13.2 MP 32b &#40;2751&#41;		79.0	-	17.0		82.29%		Perf=3017
Critter 1.4 SSE42 - Deep Junior 12.5 &#40;2732&#41;		80.0	-	16.0		83.33%		Perf=3011
Critter 1.4 SSE42 - Zappa Mexico II &#40;2717&#41;		80.0	-	16.0		83.33%		Perf=2996
Critter 1.4 SSE42 - Deep Onno 1-2-70 &#40;2685&#41;		84.5	-	10.5		88.95%		Perf=3047
Critter 1.4 SSE42 - Strelka 2.0 B &#40;2673&#41;		86.5	-	9.5		90.10%		Perf=3056
Critter 1.4 SSE42 - Umko 1.2 SSE42 &#40;2664&#41;		82.0	-	12.0		87.23%		Perf=2997
Critter 1.4 SSE42 - Loop 2007 &#40;2621&#41;		82.5	-	13.5		85.94%		Perf=2935
Critter 1.4 SSE42 - Jonny 4.00 32b &#40;2614&#41;		87.5	-	8.5		91.15%		Perf=3019
Critter 1.4 SSE42 - Tornado 4.80 &#40;2609&#41;		86.0	-	9.0		90.53%		Perf=3001
Critter 1.4 SSE42 - Crafty 23.3 JA &#40;2599&#41;		86.0	-	9.0		90.53%		Perf=2991
1629.5	-	476.5		77.37%		Perf=2978



2106 out of 2200 games played

IGarcia · Post by **IGarcia** » Thu Dec 29, 2011 4:45 am

Scoring 77.3% means about 214 elo above opponents (this comes from ELO tables or formula, not lineal).
the average opponent ELO is (rounding) 2765 ELO, if we add them we get: 2765 + 214 = 2979 ELO

There is a discussion about this in a topic called "Komodo 4 at IPON results" in the tournament forum.

melajara · Post by **melajara** » Thu Dec 29, 2011 5:18 am

Critter 1.4 match seems to be stalled now. Last time I checked, score was 2978 after 2162 games but as I'm writing this, there is no more ad interim results displayed.

From what I observed from several IPON unfolding matches, for whatever reason the provisional score seems to drop after a few hundred games.
It would be ironic that Critter 1.4 score exactly matches Komodo 4 when the profile of play of both programs is very different (from this match, Critter being stronger with the strongest opponents but somewhat inconsistent with weaker ones).

Anyway, we are clearly in the diminishing return phase from the latest version for both programs.
At current rate of progress, we'll need Komodo 7 and Critter 1.6 to bypass Houdini 2/1.5

This demonstrates the engineering ability of Mr Houdart or the luck he had in tuning the parameters of Houdini 1.5

lkaufman · Post by **lkaufman** » Thu Dec 29, 2011 5:26 am

melajara wrote:Critter 1.4 match seems to be stalled now. Last time I checked, score was 2978 after 2162 games but as I'm writing this, there is no more ad interim results displayed.

From what I observed from several IPON unfolding matches, for whatever reason the provisional score seems to drop after a few hundred games.
It would be ironic that Critter 1.4 score exactly matches Komodo 4 when the profile of play of both programs is very different (from this match, Critter being stronger with the strongest opponents but somewhat inconsistent with weaker ones).

Anyway, we are clearly in the diminishing return phase from the latest version for both programs.
At current rate of progress, we'll need Komodo 7 and Critter 1.6 to bypass Houdini 2/1.5

This demonstrates the engineering ability of Mr Houdart or the luck he had in tuning the parameters of Houdini 1.5

I don't think so. Although IPON only shows a 14 elo gain for K4 over K3, CCRL so far shows 28 elo, so probably my original estimate of 20 was spot on. I guess we'll need two more versions to catch Houdini at blitz based on this, but the next version should do it at 40/40 minutes.

Frank Quisinsky · Post by **Frank Quisinsky** » Thu Dec 29, 2011 7:22 am

Hi Larry,

not realistic to test it.

40 in 40 without resign = 160 minutes per game (move average without resign = 86 moves, with resign = 67 moves). Good statistics are only possible without resign.

For SWCR with 40 minutes per games I need with four machines 2 years to produced a rating list with many opponents. All of this opponents must be available in an actual version.

For a 40 in 40 rating list it need 2 years if an user have 48 cores and will give 320 EUR for electric current each month.

Biggest problem is to hold such a list actual. Most of the programmers released too many versions, most of the user await it for the most of people with working on a list not possible to test all.

With outer words, noboday can test it.

Possible only 40 in 40 games vs. a small group of engines. But this small group of engines must have a good rating too and all must be actual or a test is not interesting enough.

What the community can do is to compare:
40 in 3 with 40 in 5 with 40 in 10 for an example.
So we can see ... an engine is better with more time with games only.

Furthermore, for a good rating more opponents are more important as many games. Not to see with elo calculation programs. Easy to see with database simulations.

Best
Frank

For an example:
The still running SWCR Komodo 4 test with 1.560 games produced 15 EUR electric current = w32 and x64 = 30 EUR. So you can calculate how expensive it is to test so many versions of one Engine. I think in SWCR around 250 EUR in two years for Komodo only. Each test with around 1.500 games and 40 and 40 produced 60 EUR electric current.

For a good rating list with 40 in 40 it need around two years and you have to pay:

320 x 24 months = 7.680 EUR for electric current
6.000 EUR for hardware
1.000 EUR for hardware wearing
Many time, you have to write many mails to programmers with bug reports and you need a women with understanding because nobody will give you money back but all will have the results.

And of course the programmers will have money too. So the persons which working on such a list have to pay the programs too, but this is a very little problem if I thinking on all the other money.

geots · Post by **geots** » Thu Dec 29, 2011 7:35 am

lkaufman wrote:
melajara wrote:Critter 1.4 match seems to be stalled now. Last time I checked, score was 2978 after 2162 games but as I'm writing this, there is no more ad interim results displayed.

From what I observed from several IPON unfolding matches, for whatever reason the provisional score seems to drop after a few hundred games.
It would be ironic that Critter 1.4 score exactly matches Komodo 4 when the profile of play of both programs is very different (from this match, Critter being stronger with the strongest opponents but somewhat inconsistent with weaker ones).

Anyway, we are clearly in the diminishing return phase from the latest version for both programs.
At current rate of progress, we'll need Komodo 7 and Critter 1.6 to bypass Houdini 2/1.5

This demonstrates the engineering ability of Mr Houdart or the luck he had in tuning the parameters of Houdini 1.5
I don't think so. Although IPON only shows a 14 elo gain for K4 over K3, CCRL so far shows 28 elo, so probably my original estimate of 20 was spot on. I guess we'll need two more versions to catch Houdini at blitz based on this, but the next version should do it at 40/40 minutes.

Right Larry, but which Houdini version are you referring to when you say "catch Houdini". You seem to be in this period of time you refer to assuming Robert will be sitting on his hands doing nothing to come out with possibly a much stronger and faster release. Just a thought.

Best,

george

Frank Quisinsky · Post by **Frank Quisinsky** » Thu Dec 29, 2011 7:44 am

And again ...
I don't think that an engine will have an advantage with more as 30 ELO if you compare results 40 in 10 with 40 in 120. Can be easy see in SWCR Champions-League with 40 moves in 150 minutes, 40 moves in 150 minutes or with CEGT 40 in 120.

I think it's right that Komodo will get a little advantage if the program hold the positional playing strength. If not, and Komodo will get more tactic its possible that Komodo lost this little advantage to the others. This is clear if we study results of all the available rating list.

With other words:
A nice fairy tale in computer chess over so many years.

Have a look on IPON and SWCR. In SWCR I played with the double time. Compare the results and you can see ... only Junior is clearly stronger with more time. But here I am sure this advantage Junior have will be lost with more time as 40 in 10 because Junior is bad in endgames and more advantage with Junior's strong middlegame isn't possible.

So many fairy-tales in computer chess. To opening books, to ponder, to the time control and all other things, sources of programs ... own ideas, copy ideas and so one.

Best
Frank

Jouni · Post by **Jouni** » Thu Dec 29, 2011 9:05 am

If engine has positive score against all others, isn't it the best and strongest automatically - no need to calculate anything ?!

Jouni

MM · Post by MM » Thu Dec 29, 2011 9:17 am

Jouni wrote:If engine has positive score against all others, isn't it the best and strongest automatically - no need to calculate anything ?!

Jouni

I tend to agree with you.

But i would say that having 98% against a 2500 engine (it's just an example) is not like having 58 % against a very strong engine.

Honestly, i wouldn't consider too much the results against engines with more than 200 elo difference.

Regards

Uri Blass · Post by **Uri Blass** » Thu Dec 29, 2011 9:20 am

Frank Quisinsky wrote:And again ...
I don't think that an engine will have an advantage with more as 30 ELO if you compare results 40 in 10 with 40 in 120. Can be easy see in SWCR Champions-League with 40 moves in 150 minutes, 40 moves in 150 minutes or with CEGT 40 in 120.

I think it's right that Komodo will get a little advantage if the program hold the positional playing strength. If not, and Komodo will get more tactic its possible that Komodo lost this little advantage to the others. This is clear if we study results of all the available rating list.

With other words:
A nice fairy tale in computer chess over so many years.

Have a look on IPON and SWCR. In SWCR I played with the double time. Compare the results and you can see ... only Junior is clearly stronger with more time. But here I am sure this advantage Junior have will be lost with more time as 40 in 10 because Junior is bad in endgames and more advantage with Junior's strong middlegame isn't possible.

So many fairy-tales in computer chess. To opening books, to ponder, to the time control and all other things, sources of programs ... own ideas, copy ideas and so one.

Best
Frank

I do not see how you can be sure that Junior is going to lose the advantage with more time.

Only testing can answer it and if a program is bad in endgames it is possible that more time can help it to play the endgame better(not better than other programs but only good enough to prevent part of the mistakes that it does at fast time control).

IPON ratings calculation

IPON ratings calculation

Re: IPON ratings calculation

Re: IPON ratings calculation

Re: IPON ratings calculation

Re: Not realistic!

Re: IPON ratings calculation

Re: Not realistic!

Re: IPON ratings calculation

Re: IPON ratings calculation

Re: Not realistic!