CEGT - rating lists February 17th 2008

Discussion of computer chess matches and engine tournaments.

Moderator: Ras

User avatar
Werner
Posts: 2993
Joined: Wed Mar 08, 2006 10:09 pm
Location: Germany
Full name: Werner Schüle

CEGT - rating lists February 17th 2008

Post by Werner »

Hi all :-),

our updated rating lists are online and can be found under the attached links.

40 / 120:
We have an interim result from our 2nd CEGT Quad Marathon Championship 40/400 in our forum:
CEGT Quad Extreme 40/400 repeated 2008

Code: Select all

1   Rybka 2.3.2a X64 4CPU     1½½0½½½½½½½11½½½1½1½½10½11½110 18.5/30 
2   Zappa Mexico II X64 4CPU  0½½1½½½½½½½00½½½0½0½½01½00½001 11.5/30
And of course, we started to play with Naum 3 too for our 40/120 Quad-List. Interim results are in our forum. At the moment only 60 games are played:
Rybka 2.3.2a 64 4CPU - Naum 3 x64 4CPU 12-8
Naum 3 x64 4CPU - Zappa Mexico X64 4CPU 12,5 - 7,5
Deep Shredder 11 x64 4CPU - Naum 3 x64 4CPU 11,5 - 8,5

We had similar results for our 40/120 list.

40 / 20:
This week we added more than 1800 games to our list. See more in our list "Games of the week". In total our 40/20 list is based now on 222.400 games! Most games are played with new Naum 3 engine of course :)

New engines:
The main entry this week is Naum 3 x64 2CPU. There has been a lot of discussions in the net. After 1039 games we have a fantastic rating of 2960 - nearly 80 more than for 2.2! This is the 2nd place just in front of Zappa II at the moment. And this is place 5 in our best MP-Versions-List!
Congrats to Alex!! We have also a very good first result for Naum 3 on 4CPUs and we will see next week if this will hold on.

We will not forget to mention 2 other new entries in our list:
Frenzee Feb08 x64 and Rotor 0.2. Both engines need a lot of more games for a sure rating.

Updated engines:
We have updated results for E.T.Chess 13/01/2008 and Romichess P3K which is after 246 games now very close to the strongest Romichess engine in our list: NG5. I will make some more games this week to get the engine over 300 games.

40 / 4:
Our blitz-list was not updated this week. The main actions of course have been with Naum 3. We already posted the interim results:

Naum 3 x64 2CPU +69 (910 games)
Naum 3 w32 1CPU +67 (860 games)
See more here:
http://husvankempen.de/nunn/phpBB2/viewtopic.php?t=927

A big „Thank you“ to all testers as usual! :)

40/20: http://www.husvankempen.de/nunn/rating.htm
Blitz: http://www.husvankempen.de/nunn/blitz.htm
40/120: http://www.husvankempen.de/nunn/rating120.htm
Tester: http://www.husvankempen.de/nunn/testers/testers.htm
Games of the week: http://www.husvankempen.de/nunn/40_40%2 ... on/gow.JPG
Elo-comparison: http://www.husvankempen.de/nunn/Replay/ ... arison.htm

Werner
CEGT Team
Heinz Van Kempen

Re: CEGT - rating lists February 17th 2008

Post by Heinz Van Kempen »

Hi Werner :) ,

nice report and an almost unbelievable amout of games in CEGT 40/20 and Blitz. Thanks to all testers.

So in 40/120 list with more CPU´s Naum is number two now, if we count one version only for one engine and compare only 2 CPU versions and not 2 CPU against 4 CPU where the Naum tests are also started now.

With an improvement of 76 points and regarding the error bars this is very close to Alex prediction and the usual quick sharp shooters after few games were wrong again. 76 points in such a short time at the level already reached is just fantastic.

In CEGT 40/120 good results for Naum 4 CPU continue and matches against Rybka and Zappa are much better than with Naum 2.2. See also here:

http://husvankempen.de/nunn/phpBB2/viewtopic.php?t=935

It looks like Naum 3 will soon replace Zappa Mexico II in the marathon matches 40/400. The match against Zappa should be finished in a bit more than one week.
User avatar
geots
Posts: 4790
Joined: Sat Mar 11, 2006 12:42 am

Re: CEGT - rating lists February 17th 2008

Post by geots »

Heinz Van Kempen wrote:Hi Werner :) ,

nice report and an almost unbelievable amout of games in CEGT 40/20 and Blitz. Thanks to all testers.

So in 40/120 list with more CPU´s Naum is number two now, if we count one version only for one engine and compare only 2 CPU versions and not 2 CPU against 4 CPU where the Naum tests are also started now.

With an improvement of 76 points and regarding the error bars this is very close to Alex prediction and the usual quick sharp shooters after few games were wrong again. 76 points in such a short time at the level already reached is just fantastic.

In CEGT 40/120 good results for Naum 4 CPU continue and matches against Rybka and Zappa are much better than with Naum 2.2. See also here:

http://husvankempen.de/nunn/phpBB2/viewtopic.php?t=935

It looks like Naum 3 will soon replace Zappa Mexico II in the marathon matches 40/400. The match against Zappa should be finished in a bit more than one week.


"The ususal quick sharpshooters" Ahem...... I dont suppose you would care to elaborate and be a little more specific?
Spock

Re: CEGT - rating lists February 17th 2008

Post by Spock »

Heinz Van Kempen wrote: With an improvement of 76 points and regarding the error bars this is very close to Alex prediction and the usual quick sharp shooters after few games were wrong again.
Well that is a matter of interpretation. Whilst I agree completely about not jumping to conclusions about early results with few games, 76 is certainly not close to 100, and the early doubters I would say were more right than wrong. 76 is as close to 52 as it is to 100 !!

Nevertheless, I'm delighted with this engine whether it be +50 or +100 ELO or anywhere in between. Under certain test conditions it may well reach +100 ELO. Perhaps your 40/120 on quad. On the other hand, as my FRC testing under very tightly controlled and systematic testing conditions with 1,000 games showed, as "little" as +60 ELO is also possible to attain. So, there will be a range, dependent on how many CPUs, time control etc
Heinz Van Kempen

Re: CEGT - rating lists February 17th 2008

Post by Heinz Van Kempen »

Hi Ray and George :) ,

Ray you are correct, especially what Naum concerns all depends on time and amount of cores. I base this on your rating comparison with 1000 games and more for Naum 2.2, where Naum is the engine gaining most from more time:

http://www.husvankempen.de/nunn/Replay/ ... arison.htm

I do not expect that Naum 3 will differ here from Naum 2.2, as it seems to be more solid and better defends against attacks, but who knows.

The bad thing is that I need a bit more than 3 weeks to get enough games for the quad list. Where I would really expect a lot from Naum 3 is in our 40/400 repeated marathons.

No matter what will be the improvement when we see all ratings with all time controls, people would have been happy with what we have by now without the announcement by Alex. On the other hand in the past there were announced huge improvements by authors or companies and in the end we got +10 ELO or nothing at all. It is just that since Rybka came out, expectations are much too high and maybe the greed from testers to see something that would be the toughest challenge for Rybka.

George, with usual quick sharp shooters I mean really generally that is usually happens after each top release that people come and claim a lot of things only one day later based either on Blitz or few games with a bit longer time control. In some fora these are always the same, just watching some Blitz games at playchess server. Knowing too much about weird statistics the chess fans should be more fair. Engine authors anyway have a difficult task now to keep pace with Rybka. In no way I thought about you personally. It is appreciated that you give results soon and the words of your postings show that you are aware of the limited importance of only one match with few games.
User avatar
geots
Posts: 4790
Joined: Sat Mar 11, 2006 12:42 am

Re: CEGT - rating lists February 17th 2008

Post by geots »

Well said, Heinz. And you are correct- the jury is still out on this version of Naum. I havent a clue as to what Alex based his "approx. 100 elo gain" on. I just have no way of knowing, but i know enough about the man to know he is nobody's fool. And he is plenty smart enough to know that giving that evaluation in advertisement could really come back to haunt him if he were extremely far off. So i really dont in the end expect it will be an overall "15 or 20 elo increase". Much more than that. How much? My guess when the dust settles is somewhere around 65 elo, which i consider one hell of a success. I have seen programs that cost more- that a new version gave us no more than 15 to 18 elo. Granted such cases were not advertised as 100 elo gains, but you know too that we were led by wording to believe great things of certain programs, and were let down badly. I really dont think so in this case. Possibly in hindsight i might not have thrown around the phrase "100 elo"- but im still proud to own the program- and have no regrets. Im with you- let's give this deal a little more time instead of rushing to judgment. After all, my 30 games means nothing more than just a start. I have been trying to tell people that from the beginning. In the end do i think it's going to be stronger than Rybka? Of course not, IMO. But a damn fine engine at any rate.

Best Regards,

George


PS: It is nice to see that we can discuss matters like this together in a civil manner- as we are all working toward the same goal anyway. Discusing deals like this with you makes computer chess more enjoyable.
User avatar
geots
Posts: 4790
Joined: Sat Mar 11, 2006 12:42 am

Re: CEGT - rating lists February 17th 2008

Post by geots »

One other thing, Heinz. So often we see a program tie for or win a highly advertised tournament or match. Then later, after said program is sold, your group and our group tests it, and it doesnt do nearly as well- to the point it is often described as a "letdown". The first thing that is always brought up for the excuse is hardware difference- hardware difference. While that matters, IMO it is at best only the 3rd reason in terms of importance. Much more important is the short length of these matches and tournaments. Coupled with the fact that when yours and our groups test said engine, the book-cookers and opening experts who prepare certain openings for certain engines in these deals- are a non-factor in our tests. i.e., no human intervention. The presence of that and the short length of the matches and tournaments are the main reason we see so many engines sold and fall short of expectations. My solution- take the human factor completely out of these deals, and just let the programs have at each other-keeping in mind the shortness of these things also.

Best,

George