CCRL update (14th July 2007)

Discussion of computer chess matches and engine tournaments.

Moderators: hgm, Rebel, chrisw

Uri Blass
Posts: 10314
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: CCRL update (14th July 2007)

Post by Uri Blass »

Norm Pollock wrote:
hgm wrote:
Spock wrote:The list of killed engines was introduced about a month ago :)

The total number of games killed is currently about 2,200

We try to ensure all engines on the list get at least 200 games. If a new engine version comes out quickly, and the old version only has a small number of games, then either we commit to getting it up to 200 games as well as the new version, or "kill" the old one.... As you say, the list can quickly get out of control if we don't take steps to tidy it up
I can understand why you don't want the list to grow to unwieldly proportions, including all kind of obsolete engine versions with poorly known rating.

But I seriously question the statistical wisdom of removing their games from the database. These games do contain information that is still useful for narrowing down the ratings of other engines that have played them, that BayesElo would extract.

Example:
Say I have engines A and B and I play them two games each against the engines C1, C2, ... C200. Say A and B score both 50% from these gauntlets.

A and B then each have 400 games, and there is good evidence that they are equally strong. Statistically about as good as when they had played 200 games against each other, but without the systematic error that would result from playing against the same opponent too often. All the engines C1, ... C200 would have only played 4 games, though, and their ratings are hardly known at all.

But 'killing' these C engines would leave the relative strength of A and B totally undefined. It would be equivalent in terms of accuracy loss to removing 200 games between the two of them, without need or reason.

An extreme example, perhaps, to make it very obvious. But the effect will always be there, no matter how small the fraction of games thrown away is, compared to the total. These games still contain about 25% of the information as the games between 'alive' engines.
I think you answered it yourself when you said:
"All the engines C1, ... C200 would have only played 4 games, though, and their ratings are hardly known at all."

Their tentative elo ratings will be based upon the standard initial elo value that all engines start from, which is data input by the user, and 4 games. Possibly very inaccurate elo ratings. These ratings will then influence A and B's elo rating, and then have a ripple effect until all engines in the cluster are affected.

I would not have confidence in such ratings. A chain is as weak as the weakest link, and in this case, having 200 weak elo ratings (weak in terms of reliability) is like having 200 weak links. Not good.
I still agree with H.G.Muller
The point is that the best estimate for rating should not ignore data of engines that played against more than one opponent and got different results against them.

Even if engine X has only 2 games when it beat Y and lost against Z ignoring the games is not fair for Y and Z because the results can help to find the relative difference between Y and Z.

I think that the weight of the games of engine X should be smaller then the weight of games of engines that played more games and maybe the games are counter productive if you use the default way to calculate rating but there should be some productive way to calculate rating that does not totally ignore the games.

Uri
User avatar
hgm
Posts: 27818
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: CCRL update (14th July 2007)

Post by hgm »

Norm Pollock wrote:I think you answered it yourself when you said:
"All the engines C1, ... C200 would have only played 4 games, though, and their ratings are hardly known at all."

Their tentative elo ratings will be based upon the standard initial elo value that all engines start from, which is data input by the user, and 4 games.
With proper statistical analysis (e.g. BayesElo) they don't!
Possibly very inaccurate elo ratings. These ratings will then influence A and B's elo rating, and then have a ripple effect until all engines in the cluster are affected.

I would not have confidence in such ratings. A chain is as weak as the weakest link, and in this case, having 200 weak elo ratings (weak in terms of reliability) is like having 200 weak links. Not good.
Thorough statistical analysis shows that your lack of confidence is totally misplaced. What you fail to take into account is that the topology of the pairing network is not that of a chain. The weak links are all in parallel, and in such a case their strength adds. If I staple your shirt to the door with 1000 staples, there is no way you are going to pull free, despite the fact that the force it would take to pull out the weakest staple might amount to noting (because it happened to coincide the key hole).

Like I said, the effect of having these 800 games in the database is exactly the same as having 200 games between A and B. (With proper Elo calculation, of course. If the procedure for extracting ratings from the data sucks, priority should ly on repairing that, as throwing away data won't be of any avail in such a case.) If you would have 200 games between A and B, would you throw any of them away? If not, why would you do it here, where it has exactly the same negative impact on the accurracy of the ratings of A and B (rippling through the cluster)? It just makes no sense.

And even a chain with many weak links is stronger than a broken chain...
User avatar
Graham Banks
Posts: 41473
Joined: Sun Feb 26, 2006 10:52 am
Location: Auckland, NZ

Re: CCRL update (14th July 2007)

Post by Graham Banks »

Mike S. wrote:There is a chance that Toga 1.3X4 is a little bit better than 1.2.1a. If we take a look at the CEGT blitz ratings.
Hi Mike,

Shaun is still testing various Toga II 1.3 settings before a final release.

Regards, Graham.
Shaun
Posts: 322
Joined: Wed Mar 08, 2006 9:55 pm
Location: Brighton - UK

Re: CCRL update (14th July 2007)

Post by Shaun »

It will be this week :wink:

Shaun
ernest
Posts: 2041
Joined: Wed Mar 08, 2006 8:30 pm

Re: CCRL update (14th July 2007)

Post by ernest »

Mike S. wrote:There is a chance that Toga 1.3X4 is a little bit better than 1.2.1a.
Hi Mike,
What is your opinion on the bitbases bug in Toga 1.3X4 ?
Does it make the bitbases useless, or is the bug a rare occurence?