CEGT - rating lists March 16th 2008

Discussion of computer chess matches and engine tournaments.

Moderator: Ras

User avatar
Werner
Posts: 2993
Joined: Wed Mar 08, 2006 10:09 pm
Location: Germany
Full name: Werner Schüle

CEGT - rating lists March 16th 2008

Post by Werner »

Hi all :-),

our updated rating lists are now online and can be found under the attached links.

40 / 120:
Our 40/120 Quad will be updated again next week. Interim results as usual in our forum:
http://husvankempen.de/nunn/phpBB2/viewforum.php?f=9

Here you will find more results from our CEGT Quad Marathon Championship 40/400. Interim standing from the 3rd match:

Code: Select all

CEGT Quad Extreme 40/400 repeated  2008 
1   Rybka 2.3.2a X64 4CPU  1½½½½½½1½½½½½1½½½½½½½½½1½½½½½1 17.5/30 
2   Naum 3 X64 4CPU        0½½½½½½0½½½½½0½½½½½½½½½0½½½½½0 12.5/30

You will also find first results with Fruit 2.4 Beta A 4CPU. More results with this private version from Ryan in the other lists!

40 / 20:
This week we added nearly 2600 games to our list. See more in our list "Games of the week". In total our 40/20 list is based now on 230.704 games!

New engines:
New in our list are Fruit 2.4 Beta, (Passive)Thinker 5.1c and Homer 2.01pre3.

Fruit 2.4 Beta A x64 2CPU reached 2908 elos after 702 games. This is the 5th place for the 2CPU engines behind Deep Shredder 11. The engine is now 64 bit too and SMP! From startposition I had 1.14 times more kns for the single engine in comparison to 32 bit version. The effectivity of SMP was not so easy to measure. Report comes next Sunday. Here are the different Fruit versions in our list:

Code: Select all

no Program Elo + - Games
19 Fruit 2.3.4n w32 2CPU 2955 67 67 71  
30 Fruit 2.4 Beta A w32 2CPU 2919 51 51 100 
35 Fruit 2.4 Beta A x64 2CPU 2908 21 21 702 
49 Fruit 2.4 Beta B x64 2CPU 2878 85 85 50 
52 Fruit 2.4 Beta A w32 1CPU 2875 60 60 63 
67 Fruit 2.4 Beta B w32 2CPU 2846 70 70 50 
71 Fruit 2.3.3f Beta 2839 14 14 1546
Another big surprice was the result of (Passive)Thinker 5.1c 64bit: With 2775 elos after 401 games this engine is +66 points in front of the (Active) version!

We started to test Homer 2.01pre3, the version which plays very strong in Leo´s tournament. But the start rating here is not so good. I tried to make some matches under Winboard too but with not so good results I have had with same starting positions under Arena!

Updated engines:
We have updated results for some engines we tested first last week:

Code: Select all

3 Zappa Mexico II x64 4CPU 3022 +19 -19 743 games (+11)
45 Naum 3.0 w32 1CPU 2886 +21 -21 663 games (-1)
70 Zappa Mexico II w32 1CPU 2842 +22 -22 576 games (-6)
125 SmarThink 1.10 Moscow 2764 +20 -20 754 games (-16)
Our best MP Versions list is now looking like that:

Code: Select all

1 Rybka 2.3.2a x64 4CPU 3084 
2 Rybka 2.3.2a x64 2CPU WM-2007 3052 
3 Zappa Mexico II x64 4CPU 3022  
4 Zappa Mexico x64 4CPU 3001  
5 Naum 3.0 x64 4CPU 3001 
6 Deep Shredder 11 x64 4CPU 2963  
7 Naum 3.0 x64 2CPU 2962  
8 Zappa Mexico II x64 2CPU 2958  
9 Deep Shredder 11 x64 2CPU 2939  
10 Deep Fritz 10.1 4CPU 2931  
11 Hiarcs 11.2 4CPU 2925  
12 Toga II 1.4 Beta5c 4CPU 2911  
13 Fruit 2.4 Beta A x64 2CPU 2908 

40 / 4:
Our blitz-list updated yesterday. We made a lot of blitz games with the different new Fruit versions to decide which will be included in 40/20 and 40/120 lists. Here is a list of the results:

Code: Select all

no Program Elo + - Games 
21 Fruit 2.4 Beta A x64 4CPU 2959 18 18 950 
32 Fruit 2.4 Beta B x64 4CPU 2941 26 26 450 
39 Fruit 2.4 Beta B x64 2CPU 2922 33 33 300 
44 Fruit 2.4 Beta A x64 2CPU 2918 20 20 720 
71 Fruit 2.3.3f Test Beta 2857 10 10 3100 
79 Fruit 2.4 Beta A w32 1CPU 2851 23 23 550 
93 Fruit 051103 2827 13 13 1696 
96 Fruit 061115b 2824 17 17 1020 
97 Fruit 2.4 Beta B w32 1CPU 2823 32 32 300

Other highlights are the results of Smarthink 1.1 Moscow and Fritz 11.1 which is 20 points ahead of the regular version!

A big „Thank you“ to all testers as usual! :)

40/20: http://www.husvankempen.de/nunn/rating.htm
Blitz: http://www.husvankempen.de/nunn/blitz.htm
40/120: http://www.husvankempen.de/nunn/rating120.htm
Tester: http://www.husvankempen.de/nunn/testers/testers.htm
Games of the week: http://www.husvankempen.de/nunn/40_40%2 ... on/gow.JPG
Elo-comparison: http://www.husvankempen.de/nunn/Replay/ ... arison.htm

Werner
CEGT Team
User avatar
Werner
Posts: 2993
Joined: Wed Mar 08, 2006 10:09 pm
Location: Germany
Full name: Werner Schüle

Re: SMP effectivity from Fruit

Post by Werner »

Hi,
I took 10 positions from WM-Test and had following results:
1->2CPU: 142% and
2->4CPU: 146% average
Werner
Tony Thomas

Re: SMP effectivity from Fruit

Post by Tony Thomas »

That's not really the best scaling as far as I am aware, then again its a beta. Also, I am glad I wasnt so wrong about Thinker passive..
F. Bluemers
Posts: 880
Joined: Thu Mar 09, 2006 11:21 pm
Location: Nederland

Re: CEGT - rating lists March 16th 2008

Post by F. Bluemers »

We started to test Homer 2.01pre3, the version which plays very strong in Leo´s tournament. But the start rating here is not so good. I tried to make some matches under Winboard too but with not so good results I have had with same starting positions under Arena!
I'm sure Daniel will love this.
As far as i remember homer was tested mainly in UCI mode and with learning disabled.

Best
Fonzy
User avatar
Daniel Mehrmann
Posts: 858
Joined: Wed Mar 08, 2006 9:24 pm
Location: Germany
Full name: Daniel Mehrmann

Re: CEGT - rating lists March 16th 2008

Post by Daniel Mehrmann »

F. Bluemers wrote:
We started to test Homer 2.01pre3, the version which plays very strong in Leo´s tournament. But the start rating here is not so good. I tried to make some matches under Winboard too but with not so good results I have had with same starting positions under Arena!
I'm sure Daniel will love this.
As far as i remember homer was tested mainly in UCI mode and with learning disabled.

Best
Fonzy
Thanks Fonzy !

Yeah, Homer should be used with UCI and the best settings are:

(homer.ini and UCI settings should be the same if possible. For example learnjing should be disabled in homer.ini and also be disabled in the UCI option)

homer.ini:

Code: Select all

Resign = off 
*
* Resign score (-1000 up to -400)
*
Resign score = -600 
*
* Position learning on/off
*
[b]Learning = off[/b]
*
* PlayStyle = YourStyleFile.ini
*
PlayStyle = default.sty
default.sty:

Code: Select all

*
* Default Style File 
* Skeleton for other style files
*
* Naming of the Style
*
Name = default
*
* KingSafty [80-120]%
*
KingSafty = 100
*
* Mobility [80-120]%
*
Mobility = 100
*
* PawnStructure [80-120]%
*
PawnStructure = 100
*
* Search = [Intelligent|Positional|Tactical]
*
Search = Intelligent
*
* PlayStyle = [Intelligent|Gambit|Normal]
*
PlayStyle = Intelligent
Best,
Daniel
Tony Thomas

Re: CEGT - rating lists March 16th 2008

Post by Tony Thomas »

Any chances of a public release?? Homer like the previous Zappa (prior to Zappa Mexico II) struggles in blitz. It could be because you did not care about being good in fast time controls or that Homer's algorithms are ultra optimized for tournament time controls. Anthony was able to get his engine on par with the other major opponents after he spend few days on optimizing, I hope you will do the same. :wink:
User avatar
Werner
Posts: 2993
Joined: Wed Mar 08, 2006 10:09 pm
Location: Germany
Full name: Werner Schüle

Re: CEGT - rating lists March 16th 2008

Post by Werner »

Hi Daniel,
I am using Homer in uci mode and I am glad to see - with the same settings you prefer:
Learning is off (with all engines I test) - learning on would make no sence if using a set of opening positions.
Normally I use the same book for both engines. But for my first games with new Homer I used a set of openings to see the differences better under Winboard and Arena. Next games now are under Arena and with same opening book.
I am not sure, but I think Leo uses uci-wb converter when he is testing an uci engine - and he is using engine books and ponder=on. So he has very different conditions in his tournaments.
Next Sunday we will have more games with not so high error margins.
Werner
Tony Thomas

Re: CEGT - rating lists March 16th 2008

Post by Tony Thomas »

Homer also supports WB, so Leo doesnt have to use an adapter. Also, Leo plays with Ponder on which is a decisive factor.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: SMP effectivity from Fruit

Post by bob »

Werner wrote:Hi,
I took 10 positions from WM-Test and had following results:
1->2CPU: 142% and
2->4CPU: 146% average
What does this mean?

For example, if it takes 2 minutes to solve N positions, and it take 6 minutes to solve the other N, than a 2 minute search with 2x the processors is going to have no effect on the results.

The right way to measure SMP performance is to take a large set of positions, and search with 1, 2, 4, ..., N processors, and measure the time required to reach a specific depth for each position.

A _much_ less accurate way is to play a one cpu program against a bunch of opponents, then keep everything the same but use two cpus for that program and see if it performs better. It is much less accurate because the variance/randomness in a basic game of chess is extremely high. And when you factor in the randomness produced by a parallel search, it goes even higher. You need thousands of games (at a bare minimum) to get a reasonable estimate on improvement. Most can't pull that off.
User avatar
Werner
Posts: 2993
Joined: Wed Mar 08, 2006 10:09 pm
Location: Germany
Full name: Werner Schüle

Re: SMP effectivity from Fruit

Post by Werner »

bob wrote:The right way to measure SMP performance is to take a large set of positions, and search with 1, 2, 4, ..., N processors, and measure the time required to reach a specific depth for each position.
Hi Bob,
thanks for the answer. I took 10 positions to test it that was - so you think this is a too great error margin? How large must it be for around 10% error margin?

regards
Werner