Colossus 2007a - early impression

Discussion of computer chess matches and engine tournaments.

Moderator: Ras

User avatar
Kirill Kryukov
Posts: 518
Joined: Sun Mar 19, 2006 4:12 am
Full name: Kirill Kryukov

Colossus 2007a - early impression

Post by Kirill Kryukov »

I did a quick test of Colossus 2007a under CCRL 40/4 conditions. The rating is 2686 ELO points, after 224 games. This makes it #16 engine in CCRL 40/4 Free Single-CPU list (my version which includes only stable public releases with default settings).

It's a bit early to make conclusion, ranking may change after more games (which are running right now). Still I hope this improvement can stand. :-)

Those few games were enough to get 97.3% LOS (Likelihood of Superiority) over previous version - 2006f which is rated 2644 (42 points difference).

All results of Colossus 2007a to date.

Comparison of 3 Colossus versions we tested

What makes me more happy personally is that new version does not crash when accessing tablebases on my Vista machine like 2006f did. :-)

Best,
Kirill
Uri Blass
Posts: 10783
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Colossus 2007a - early impression

Post by Uri Blass »

Kirill Kryukov wrote:I did a quick test of Colossus 2007a under CCRL 40/4 conditions. The rating is 2686 ELO points, after 224 games. This makes it #16 engine in CCRL 40/4 Free Single-CPU list (my version which includes only stable public releases with default settings).

It's a bit early to make conclusion, ranking may change after more games (which are running right now). Still I hope this improvement can stand. :-)

Those few games were enough to get 97.3% LOS (Likelihood of Superiority) over previous version - 2006f which is rated 2644 (42 points difference).

All results of Colossus 2007a to date.

Comparison of 3 Colossus versions we tested

What makes me more happy personally is that new version does not crash when accessing tablebases on my Vista machine like 2006f did. :-)

Best,
Kirill
Note that movei personality 10 10 10 that is free and can be used by everyone is not in the list(only in the complete list) inspite of the fact that the tests suggest that it is better than the default(not enough games to know but other tests that I did also support it)

Movei 0.08.403(10 10 10) 2646 +27 −27 46.6% +22.0 25.2% 476
50.9%
Movei 0.08.403 2635 +20 −20 52.7% −22.6 30.1% 860

Uri
User avatar
Kirill Kryukov
Posts: 518
Joined: Sun Mar 19, 2006 4:12 am
Full name: Kirill Kryukov

Re: Colossus 2007a - early impression

Post by Kirill Kryukov »

Uri Blass wrote:Note that movei personality 10 10 10 that is free and can be used by everyone is not in the list(only in the complete list) inspite of the fact that the tests suggest that it is better than the default(not enough games to know but other tests that I did also support it)

Movei 0.08.403(10 10 10) 2646 +27 −27 46.6% +22.0 25.2% 476
50.9%
Movei 0.08.403 2635 +20 −20 52.7% −22.6 30.1% 860

Uri
Sorry, Uri, I don't test settings. At least at the current moment I can't spend CPU time on settings, when so many nice engines are still almost untested by us.

I realize that it is difficult for the author to know the best setting when he releases an engine. That's why authors add many configurable options to their engines, hoping that someone will test them and discover a killer setting. (This happened with Chessmaster, where default setting is known to be weaker than many custom settings).

If an engine is not updated for long time (say, 1 year), it may be OK to try a setting, otherwise testing settings is just unfair to other engines and authors. Testing settings to me equals joining the author's testing team, contributing to the development of the engine itself. While sure a nice thing to do, I can't spare CPU time for that, at the moment.

What if tomorrow someone comes with results showing that Movei personality 8 8 8 is yet stronger. Do I have to re-test it again? There are too many engines in existance to test settings. Personally I don't even test new versions if they are released too frequently.

Another point is: Suppose there are 10 settings, say 1 10 10, 2 10 10, 3 10 10, etc.. And suppose they all are actually of exactly same strength (suppose the setting has no actual effect due to bug for example). What happens when we test them all? What happens is: they don't get all the same rating. The ratings will be slightly different due to statistical error. And then the highst rated setting will represent that engine on the rating list. This means that even simply entering 10 copies of the same engine creates an unfair advantage for that engine. May be 10 10 10 is stronger than default, may be it is not. But even if it is equal, simply its presence on the list creates advantage for Movei.

This is just my personal idea, we also have members who test lot of settings as you can see in the list.

I think Movei 00.8.403 is old enough now, so if you release any new version (with whatever default setting you think is best), I will be happy to do some serious testing for it. :-)

Best,
Kirill
Uri Blass
Posts: 10783
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Colossus 2007a - early impression

Post by Uri Blass »

Kirill Kryukov wrote:
Uri Blass wrote:Note that movei personality 10 10 10 that is free and can be used by everyone is not in the list(only in the complete list) inspite of the fact that the tests suggest that it is better than the default(not enough games to know but other tests that I did also support it)

Movei 0.08.403(10 10 10) 2646 +27 −27 46.6% +22.0 25.2% 476
50.9%
Movei 0.08.403 2635 +20 −20 52.7% −22.6 30.1% 860

Uri
Sorry, Uri, I don't test settings. At least at the current moment I can't spend CPU time on settings, when so many nice engines are still almost untested by us.

I realize that it is difficult for the author to know the best setting when he releases an engine. That's why authors add many configurable options to their engines, hoping that someone will test them and discover a killer setting. (This happened with Chessmaster, where default setting is known to be weaker than many custom settings).

If an engine is not updated for long time (say, 1 year), it may be OK to try a setting, otherwise testing settings is just unfair to other engines and authors. Testing settings to me equals joining the author's testing team, contributing to the development of the engine itself. While sure a nice thing to do, I can't spare CPU time for that, at the moment.

What if tomorrow someone comes with results showing that Movei personality 8 8 8 is yet stronger. Do I have to re-test it again? There are too many engines in existance to test settings. Personally I don't even test new versions if they are released too frequently.

Another point is: Suppose there are 10 settings, say 1 10 10, 2 10 10, 3 10 10, etc.. And suppose they all are actually of exactly same strength (suppose the setting has no actual effect due to bug for example). What happens when we test them all? What happens is: they don't get all the same rating. The ratings will be slightly different due to statistical error. And then the highst rated setting will represent that engine on the rating list. This means that even simply entering 10 copies of the same engine creates an unfair advantage for that engine. May be 10 10 10 is stronger than default, may be it is not. But even if it is equal, simply its presence on the list creates advantage for Movei.

This is just my personal idea, we also have members who test lot of settings as you can see in the list.

I think Movei 00.8.403 is old enough now, so if you release any new version (with whatever default setting you think is best), I will be happy to do some serious testing for it. :-)

Best,
Kirill
My point is the following:
Suppose that I release a new version of movei and it is 100 elo better than the default but only 70 elo better than some personality XXX

Is it fair to say that I made 100 elo improvement?
I think that it is not fair because the programming improvement that I made from previous version is only 70 elo.
30 elo simply came from better personality.

second comment:
I think that newer versions of movei will be or private or commercial
I may send it to testers of CEGT or CCRL to test it before release in case that they are interested to do it but I will probably never release newer version of movei as a free program.

There are weaker commercial programs than movei(chess alex) and I do not see a single reason that new versions of movei needs to be free.

I guess that with correct use of hash movei can be better than ruffian2.1.0 but it is only a guess.

Movei is probably the program that is the worst in using hash tables.

Uri
Uri Blass
Posts: 10783
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Colossus 2007a - early impression

Post by Uri Blass »

I can add that the real testing that I do with movei is with newer version and not with 403

I simply think that for finding improvement of new version relative to 403 it is unfair to compare with the default version when I know a better version.

Uri
User avatar
Kirill Kryukov
Posts: 518
Joined: Sun Mar 19, 2006 4:12 am
Full name: Kirill Kryukov

Re: Colossus 2007a - early impression

Post by Kirill Kryukov »

Uri Blass wrote:My point is the following:
Suppose that I release a new version of movei and it is 100 elo better than the default but only 70 elo better than some personality XXX

Is it fair to say that I made 100 elo improvement?
I think that it is not fair because the programming improvement that I made from previous version is only 70 elo.
30 elo simply came from better personality.
I think that engine development consists of both coding and tuning. It's not obviouos to me that tuning is a minor part of the work. The fact that you were able to do coding, but were unable to find 10 10 10 personality at the time of release, suggests that tuning is actually more difficult part.

So I think yes, it is fair to say that you did 100 ELO points improvement. You did some more coding and in parallel you found (with the help of testers) a better setting. Both contribute to the development, I think.

The fact that Movei 00.8.403 can be also used with 10 10 10 should be taken separately. While you made improvement to the next version (by coding and tuning), community made improvement to 00.8.403 by tuning and testing. So you did 100 ELO improvement compared to your previous release, but only 70 (in your example) compared to the best discovered setting of previous release. Complex, but this is just how it is.

BTW, if someone does not want community tuning to get in the way of obtaining 100 ELO increase with the next version, he simlpy does not put any configurable options to the engine. :-)
Uri Blass wrote:second comment:
I think that newer versions of movei will be or private or commercial
I may send it to testers of CEGT or CCRL to test it before release in case that they are interested to do it but I will probably never release newer version of movei as a free program.
Oh! OK, good luck on commercial route then! It means 00.8.403 is the last free version. In such case I will of course test the best known setting. Just it will have to wait a little.
Uri Blass wrote:There are weaker commercial programs than movei(chess alex) and I do not see a single reason that new versions of movei needs to be free.

I guess that with correct use of hash movei can be better than ruffian2.1.0 but it is only a guess.

Movei is probably the program that is the worst in using hash tables.

Uri
I think someone should release commercial Micro-Max or something.. Then everyone looking for excuse to go commercial will have one, and only those truly enjoying it as a hobby and communication will stay free.

It is interesting how many engines in CCRL Free lists don't have updates anymore because of going commercial: Rybka 1.0 Beta 64-bit, Naum 2.0, Zappa 1.1 64-bit, List 5.12, Ruffian 1.0.5, Smarthink 0.17a, Fritz 6 Light (although different reason here). Now Movei is going to join them.

It is interesting to see how far an engine can go on pure enthusiasm.

Best wishes,
Kirill
Tony Thomas

Re: Colossus 2007a - early impression

Post by Tony Thomas »

Kirill Kryukov wrote:It is interesting to see how far an engine can go on pure enthusiasm.

Best wishes,
Kirill
Keep watching Glaurung, I think that is the only engine that will never go commercial.
Uri Blass
Posts: 10783
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Colossus 2007a - early impression

Post by Uri Blass »

Tony Thomas wrote:
Kirill Kryukov wrote:It is interesting to see how far an engine can go on pure enthusiasm.

Best wishes,
Kirill
Keep watching Glaurung, I think that is the only engine that will never go commercial.
only engine?

I do not expect Crafty to be commercial and I am not sure Glaurung will never go commercial.

If Glaurung becomes the strongest free engine then it is clearly possible that somebody may sell illegal clones of it.

It already happened with fruit when it was discovered that some commercial program is a clone of fruit and Leo took it out of wbec.

Uri
Tord Romstad
Posts: 1808
Joined: Wed Mar 08, 2006 9:19 pm
Location: Oslo, Norway

Re: Colossus 2007a - early impression

Post by Tord Romstad »

Uri Blass wrote:I do not expect Crafty to be commercial
Neither do I. There are also lots of other programs which will never be commercial, of course.
and I am not sure Glaurung will never go commercial.
I will not release a commercial Glaurung, but I have neither the right nor the desire to prevent other people from doing so.
If Glaurung becomes the strongest free engine then it is clearly possible that somebody may sell illegal clones of it.
A more interesting possibility is that somebody might some day start selling legal clones of Glaurung. Even today, there is nothing which stops you from selling Glaurung, apart from the fact that only very stupid or uninformed people would buy it when they could get the same thing for free.

More realistically, it would be possible to create a Chessmaster-like mass-market chess program with Glaurung as the chess engine. This would be perfectly legal, and if done well, it should be possible to sell a lot of copies (as proved by Chessmaster).

Tord
User avatar
tiger
Posts: 819
Joined: Sat Mar 11, 2006 3:15 am
Location: Guadeloupe (french caribbean island)

Re: Colossus 2007a - early impression

Post by tiger »

Uri Blass wrote:
Kirill Kryukov wrote:I did a quick test of Colossus 2007a under CCRL 40/4 conditions. The rating is 2686 ELO points, after 224 games. This makes it #16 engine in CCRL 40/4 Free Single-CPU list (my version which includes only stable public releases with default settings).

It's a bit early to make conclusion, ranking may change after more games (which are running right now). Still I hope this improvement can stand. :-)

Those few games were enough to get 97.3% LOS (Likelihood of Superiority) over previous version - 2006f which is rated 2644 (42 points difference).

All results of Colossus 2007a to date.

Comparison of 3 Colossus versions we tested

What makes me more happy personally is that new version does not crash when accessing tablebases on my Vista machine like 2006f did. :-)

Best,
Kirill
Note that movei personality 10 10 10 that is free and can be used by everyone is not in the list(only in the complete list) inspite of the fact that the tests suggest that it is better than the default(not enough games to know but other tests that I did also support it)

Movei 0.08.403(10 10 10) 2646 +27 −27 46.6% +22.0 25.2% 476
50.9%
Movei 0.08.403 2635 +20 −20 52.7% −22.6 30.1% 860

Uri

Uri, I seem to remember that this "10 10 10" stuff is somewhat related to "progress". Can you explain the concept? I have been playing with what I believe is a similar concept and I wanted to know about yours.


// Christophe