Uri,
logical after information we have.
1.
With more time more remis games.
2.
Average with 1:0 / 0:1 games, adavantage average = move number 54 (late middlegame, early endgame). Means that the average a game goes to 1:0 or 0:1 = move number 54 if the average from all particpants / opponents in a tourney or rating list isn't higher as 135 ELO (135 ELO situation we have with our TOP-40).
3.
An engine which have so many problems in endgames like Junior can't go higher in ratings with more time. The others will see in the game phase Junior is strong with more time better moves.
Not a proof but after all I think it should be clear that to 95% the Junior rating can't go higher with more time as around 30 ELO (in comparing to the others). I think more that Junior lost his advantage with more and more time ...
Example:
40 in 5 ... No advantage
40 in 10 ... 30 ELO advantage to the others
40 in 40 ... 20 ELO advantage to the others
and perhaps with 40 in 120 ... no advantage to the others
Best
Frank
IPON ratings calculation
Moderators: hgm, Rebel, chrisw
-
- Posts: 6823
- Joined: Wed Nov 18, 2009 7:16 pm
- Location: Gutweiler, Germany
- Full name: Frank Quisinsky
-
- Posts: 10427
- Joined: Thu Mar 09, 2006 12:37 am
- Location: Tel-Aviv Israel
Re: Not realistic!
You say:Frank Quisinsky wrote:Uri,
logical after information we have.
1.
With more time more remis games.
2.
Average with 1:0 / 0:1 games, adavantage average = move number 54 (late middlegame, early endgame). Means that the average a game goes to 1:0 or 0:1 = move number 54 if the average from all particpants / opponents in a tourney or rating list isn't higher as 135 ELO (135 ELO situation we have with our TOP-40).
3.
An engine which have so many problems in endgames like Junior can't go higher in ratings with more time. The others will see in the game phase Junior is strong with more time better moves.
Not a proof but after all I think it should be clear that to 95% the Junior rating can't go higher with more time as around 30 ELO (in comparing to the others). I think more that Junior lost his advantage with more and more time ...
Example:
40 in 5 ... No advantage
40 in 10 ... 30 ELO advantage to the others
40 in 40 ... 20 ELO advantage to the others
and perhaps with 40 in 120 ... no advantage to the others
Best
Frank
The others will see in the game phase Junior is strong with more time better moves.
The question is why not to think also that Junior is going to see in the phase that it is weak with more time better moves.
-
- Posts: 2277
- Joined: Mon Sep 29, 2008 1:50 am
Re: IPON ratings calculation
The calculation method of BayesElo is explained here:Albert Silver wrote:I can only assume it is I who lack the proper understanding of how the ratings are calculated, but watching the IPON results of Critter 1.4, I began to wonder why its performance was 2978 after 2106 games. I took the 22 performances, added them up, and then divided them by 22 and came up with 3000.59, so why is the total performance 2978?
http://remi.coulom.free.fr/Bayesian-Elo/#theory
The elo's are the result of a maximum likelihood calculation seeded
with a prior (afaics this can only be theoretically justified in a Bayesian
setting).
The actual algorithm is derived from this paper
http://www.stat.psu.edu/~dhunter/papers/bt.pdf
-
- Posts: 6823
- Joined: Wed Nov 18, 2009 7:16 pm
- Location: Gutweiler, Germany
- Full name: Frank Quisinsky
Re: Not realistic!
Hi Uri,
with longer time controls move and remis average will go higher. The advantage Junior have in the early middlegame can't compare weaknesses in endgames if more endgames with more time are to play.
With other words:
More time = longer games = more endgames.
Possible that Junior will be stronger with more time in the early middlegame but you have compare it with ... Junior will get with more time more problems in endgames too.
The advantage Junior have deflagrates.
Same for Spark because both have same weaknesses and strengths. Spark is more aggressive as Junior, produced more of such games as Junior but broadly speaking ... strengths and weeknesses are the same.
Best
Frank
with longer time controls move and remis average will go higher. The advantage Junior have in the early middlegame can't compare weaknesses in endgames if more endgames with more time are to play.
With other words:
More time = longer games = more endgames.
Possible that Junior will be stronger with more time in the early middlegame but you have compare it with ... Junior will get with more time more problems in endgames too.
The advantage Junior have deflagrates.
Same for Spark because both have same weaknesses and strengths. Spark is more aggressive as Junior, produced more of such games as Junior but broadly speaking ... strengths and weeknesses are the same.
Best
Frank
-
- Posts: 5981
- Joined: Sun Jan 10, 2010 6:15 am
- Location: Maryland USA
Re: Not realistic!
CCRL does 40/40 ratings, and usually a new engine will have a well-established rating on their list within a week or two. For you it is probably not realistic because you are just one person, but for a group like CCRL it appears to be no problem.
-
- Posts: 5981
- Joined: Sun Jan 10, 2010 6:15 am
- Location: Maryland USA
Re: IPON ratings calculation
I refer to either 1.5 or 2.0, as the lists show no net difference between the two; in fact the slower tests give 1.5 an edge. If he couldn't improve Houdini in a year I don't expect miracles, but perhaps he'll find some improvements in 2012.geots wrote:lkaufman wrote:I don't think so. Although IPON only shows a 14 elo gain for K4 over K3, CCRL so far shows 28 elo, so probably my original estimate of 20 was spot on. I guess we'll need two more versions to catch Houdini at blitz based on this, but the next version should do it at 40/40 minutes.melajara wrote:Critter 1.4 match seems to be stalled now. Last time I checked, score was 2978 after 2162 games but as I'm writing this, there is no more ad interim results displayed.
From what I observed from several IPON unfolding matches, for whatever reason the provisional score seems to drop after a few hundred games.
It would be ironic that Critter 1.4 score exactly matches Komodo 4 when the profile of play of both programs is very different (from this match, Critter being stronger with the strongest opponents but somewhat inconsistent with weaker ones).
Anyway, we are clearly in the diminishing return phase from the latest version for both programs.
At current rate of progress, we'll need Komodo 7 and Critter 1.6 to bypass Houdini 2/1.5
This demonstrates the engineering ability of Mr Houdart or the luck he had in tuning the parameters of Houdini 1.5
Right Larry, but which Houdini version are you referring to when you say "catch Houdini". You seem to be in this period of time you refer to assuming Robert will be sitting on his hands doing nothing to come out with possibly a much stronger and faster release. Just a thought.
Best,
george
-
- Posts: 5981
- Joined: Sun Jan 10, 2010 6:15 am
- Location: Maryland USA
Re: IPON ratings calculation
I think the "prior" may be the problem; it appears to have way too much weight. If an engine performs 3000 against every opponent in over 2000 games, it should get a rating very close to 3000, maybe 2999. But apparently the prior gets way too much weight, because I believe such an engine on the IPON list would get only around 2975.Michel wrote:The calculation method of BayesElo is explained here:Albert Silver wrote:I can only assume it is I who lack the proper understanding of how the ratings are calculated, but watching the IPON results of Critter 1.4, I began to wonder why its performance was 2978 after 2106 games. I took the 22 performances, added them up, and then divided them by 22 and came up with 3000.59, so why is the total performance 2978?
http://remi.coulom.free.fr/Bayesian-Elo/#theory
The elo's are the result of a maximum likelihood calculation seeded
with a prior (afaics this can only be theoretically justified in a Bayesian
setting).
The actual algorithm is derived from this paper
http://www.stat.psu.edu/~dhunter/papers/bt.pdf
-
- Posts: 6823
- Joined: Wed Nov 18, 2009 7:16 pm
- Location: Gutweiler, Germany
- Full name: Frank Quisinsky
Re: Not realistic!
Hi Larry,
I have 40 in 60 if I would say (conditions are Pentium 4 2.0 GHz, without ponder, with resign).
40 in 40 CCRL without ponder and without resign factors = around 40 in 16 if you compare it with 40 in 10 SWCR conditions.
CCRL is playing without ponder
40 in 40 on older AMD hardware
SWCR is playing with ponder
40 in 10 on faster Intel Q9550 hardware
Ponder = around 40 ELO more, see Crafty 23.3 x64 results in SWCR rating list.
Indeed CCRL have the highest conditions with around 40 in 16 (comparing to 40 in 10 SWCR), CEGT with 40 in 20 without ponder and slower hardware as SWCR is around the same as SWCR. IPON have with ponder around 40 in 4 if I compare it with SWCR.
Highest conditions comes from CCRL!
CEGT and SWCR around the same.
But the different between CCRL and SWCR / CEGT isn't enough as to see anything.
Best
Frank
CCRL to SWCR will give Komodo not a big jumping in ELO. In CEGT Komodo 4 is around + 14 stronger as Komodo 3. Same results after around 250 games with a lot of opponents in SWCR (clear) and in IPON it is 12 ELO. So you can be sure that the CCRL rating for Komodo is to high. CCRL have not so many participant I think, much more important as many games are many opponents.
With other words:
CEGT + 14 so far
SWCR + 14 so far
IPON + 12 so far
Your tester Clemens wrote today in CSS forum that in testing Komodo he find out + 15
And CCRL have +28 but this could not be right if I am looking on the other -- 4 -- results. I think the reason is, that CCRL don't used so many strong opponents the others are using or don't have at the moment many different opponents.
Best
Frank
My example to current electric other things are 40 in 40 with actual hardware. This one we need if to see is Komodo stronger or not at the others with longer time controls. With CCRL in comparing SWCR or CEGT you can't see it.
I have 40 in 60 if I would say (conditions are Pentium 4 2.0 GHz, without ponder, with resign).
40 in 40 CCRL without ponder and without resign factors = around 40 in 16 if you compare it with 40 in 10 SWCR conditions.
CCRL is playing without ponder
40 in 40 on older AMD hardware
SWCR is playing with ponder
40 in 10 on faster Intel Q9550 hardware
Ponder = around 40 ELO more, see Crafty 23.3 x64 results in SWCR rating list.
Indeed CCRL have the highest conditions with around 40 in 16 (comparing to 40 in 10 SWCR), CEGT with 40 in 20 without ponder and slower hardware as SWCR is around the same as SWCR. IPON have with ponder around 40 in 4 if I compare it with SWCR.
Highest conditions comes from CCRL!
CEGT and SWCR around the same.
But the different between CCRL and SWCR / CEGT isn't enough as to see anything.
Best
Frank
CCRL to SWCR will give Komodo not a big jumping in ELO. In CEGT Komodo 4 is around + 14 stronger as Komodo 3. Same results after around 250 games with a lot of opponents in SWCR (clear) and in IPON it is 12 ELO. So you can be sure that the CCRL rating for Komodo is to high. CCRL have not so many participant I think, much more important as many games are many opponents.
With other words:
CEGT + 14 so far
SWCR + 14 so far
IPON + 12 so far
Your tester Clemens wrote today in CSS forum that in testing Komodo he find out + 15
And CCRL have +28 but this could not be right if I am looking on the other -- 4 -- results. I think the reason is, that CCRL don't used so many strong opponents the others are using or don't have at the moment many different opponents.
Best
Frank
My example to current electric other things are 40 in 40 with actual hardware. This one we need if to see is Komodo stronger or not at the others with longer time controls. With CCRL in comparing SWCR or CEGT you can't see it.
-
- Posts: 5981
- Joined: Sun Jan 10, 2010 6:15 am
- Location: Maryland USA
Re: Not realistic!
I don't consider resign on/off or ponder on/off to be important. Ponder on has unpredictable effects, and I think it raises the quality only slightly, not nearly enough to justify cutting your sample in half. We would never test that way. If your machine were used for CCRL 40/40 test, what setting would you use? In other words, what is the hardware-only adjustment between CCRL and your list, disregarding ponder and resign?Frank Quisinsky wrote:Hi Larry,
I have 40 in 60 if I would say (conditions are Pentium 4 2.0 GHz, without ponder, with resign).
40 in 40 CCRL without ponder and without resign factors = around 40 in 16 if you compare it with 40 in 10 SWCR conditions.
CCRL is playing without ponder
40 in 40 on older AMD hardware
SWCR is playing with ponder
40 in 40 on faster Intel Q9550 hardware
Indeed CCRL have the highest conditions with around 40 in 16 (comparing to 40 in 10 SWCR), CEGT with 40 in 20 without ponder and slower hardware as SWCR is around the same as SWCR. IPON have with ponder around 40 in 4 if I compare it with SWCR.
Highest conditions comes from CCRL, CEGT and SWCR around the same.
Best
Frank
CCRL to SWCR will give Komodo not a big jumping in ELO. In CEGT Komodo 4 is around + 14 stronger as Komodo 3. Same results after around 250 games in SWCR (clear) and in IPON it is 12 ELO. So you can be sure that the CCRL rating for Komodo is to high. CCRL have not so many participant I think, much more important as many games are many opponents.
With other words:
CEGT + 14 so far
SWCR + 14 so far
IPON + 12 so far
Your tester Clemens wrote today in CSS forum that in testing Komodo he find out + 15
And CCRL have +28 but this could not be right if I am looking on the other results. I think the reason is, that CCRL don't used so many strong opponents the others are using or don't have at the moment many different opponents.
Best
Frank
My example to current electric other things are 40 in 40 with actual hardware. This one we need if to see is Komodo stronger or not at the others with longer time controls. With CCRL in comparing SWCR or CEGT you can't see it.
Best,
Larry
-
- Posts: 6823
- Joined: Wed Nov 18, 2009 7:16 pm
- Location: Gutweiler, Germany
- Full name: Frank Quisinsky
Re: Not realistic!
Hi Larry,
ponder is a time factor and very important.
Without ponder you can produced more games, with ponder and around 30% ponder hits the performance go higher. I think games without ponder are half games only.
= 47 ELO.
With the double on time you will get 60-65 ELO more.
40 in 40 CCRL without ponder = around 40 in 25 with ponder.
SpeedUp from Q9550 to AMD3800 = a lot.
I calculate it for two years ...
40 in 10 SWCR should be around 40 in 16 CCRL if CCRL used the same hardware and ponder I used.
With or without resign only for statistics.
Move average without resign = 86 moves
Move average with resign = 67 moves
Better statistics with resign = off are possible.
Again, complete other opinion to ponder as yourself.
Ponder is very very important speed factor.
Best
Frank
ponder is a time factor and very important.
Without ponder you can produced more games, with ponder and around 30% ponder hits the performance go higher. I think games without ponder are half games only.
Code: Select all
- 157 Crafty 23.3 JA x64 2593 18 18 1160 33% 2721 34%
- 176 Crafty 23.3 JA x64, no ponder 2546 20 20 1000 26% 2729 30%
With the double on time you will get 60-65 ELO more.
40 in 40 CCRL without ponder = around 40 in 25 with ponder.
SpeedUp from Q9550 to AMD3800 = a lot.
I calculate it for two years ...
40 in 10 SWCR should be around 40 in 16 CCRL if CCRL used the same hardware and ponder I used.
With or without resign only for statistics.
Move average without resign = 86 moves
Move average with resign = 67 moves
Better statistics with resign = off are possible.
Again, complete other opinion to ponder as yourself.
Ponder is very very important speed factor.
Best
Frank