CM11 settings testing - Glorfindel now included

Discussion of computer chess matches and engine tournaments.

Moderator: Ras

User avatar
Graham Banks
Posts: 45304
Joined: Sun Feb 26, 2006 10:52 am
Location: Auckland, NZ

CM11 settings testing - Glorfindel now included

Post by Graham Banks »

The methodology will be:

• A core database of 3,300 games, being 12 engines each having played 50 games against each other.

• CM11 default will play all 12 engines and be loaded into the database, and it's rating calculated. It is then removed

• Personality “A” plays all 12 engines and is loaded into the database, and it's rating calculated. It is then removed

• Personality “B” plays all 12 engines and is loaded into the database, and it's rating calculated. It is then removed

• Personality “C” plays all 12 engines and is loaded into the database, and it's rating calculated. It is then removed

etc etc

This seems to be a pretty scientific method of going about the testing. The 12 opponents are well rated themselves, and then each personality plays this identical set of opponents. EloStat will be used because of ease of use through the Arena interface. Time control is of necessity blitz, being CCRL testing conditions of 40 moves in 4 minutes repeating.

Code: Select all

Summary of CM11 Ratings so far
------------------------------

2782 CM11 Glorfindel
2778 CM11 Sauron
2767 CM11 Default
2767 CM11 Tomahawk
2763 CM11 Default Sel 21
2761 CM11 Silver Fern


Performances
------------


CM 11 Glorfindel          2782  600 (+183,=201,-216), 47.2 %

Chess Tiger 2007.1            :  50 (+ 13,= 21,- 16), 47.0 %
Delfi 5.2                     :  50 (+ 27,= 13,- 10), 67.0 %
Glaurung 2.0.1 64-bit         :  50 (+ 11,= 15,- 24), 37.0 %
Hiarcs 11.1                   :  50 (+  8,= 14,- 28), 30.0 %
Loop 13.6 32-bit              :  50 (+ 11,= 11,- 28), 33.0 %
Movei 00.8.438                :  50 (+ 15,= 22,- 13), 52.0 %
Scorpio 1.91                  :  50 (+ 19,= 16,- 15), 54.0 %
Slow Chess Blitz WV2.1        :  50 (+ 23,= 17,- 10), 63.0 %
Spike 1.2 Turin               :  50 (+ 11,= 24,- 15), 46.0 %
WildCat 7                     :  50 (+ 26,= 15,-  9), 67.0 %
Ktulu 8.0                     :  50 (+ 12,= 11,- 27), 35.0 %
Naum 2.2 32-bit               :  50 (+  7,= 22,- 21), 36.0 %



CM 11 Sauron              2778  600 (+183,=193,-224), 46.6 %

Chess Tiger 2007.1            :  50 (+ 10,= 16,- 24), 36.0 %
Delfi 5.2                     :  50 (+ 26,= 15,-  9), 67.0 %
Glaurung 2.0.1 64-bit         :  50 (+  9,= 18,- 23), 36.0 %
Hiarcs 11.1                   :  50 (+  5,= 14,- 31), 24.0 %
Loop 13.6 32-bit              :  50 (+  8,= 18,- 24), 34.0 %
Movei 00.8.438                :  50 (+ 25,= 15,- 10), 65.0 %
Scorpio 1.91                  :  50 (+ 23,= 16,- 11), 62.0 %
Slow Chess Blitz WV2.1        :  50 (+ 19,= 20,- 11), 58.0 %
Spike 1.2 Turin               :  50 (+ 12,= 11,- 27), 35.0 %
WildCat 7                     :  50 (+ 21,= 18,- 11), 60.0 %
Ktulu 8.0                     :  50 (+ 15,= 15,- 20), 45.0 %
Naum 2.2 32-bit               :  50 (+ 10,= 17,- 23), 37.0 %



CM 11 Default             2767  600 (+170,=198,-232), 44.8 %

Chess Tiger 2007.1  	      :  50 (+  7,= 21,- 22), 35.0 %
Delfi 5.2   		      :  50 (+ 22,= 12,- 16), 56.0 %
Glaurung 2.0.1 64-bit         :  50 (+ 10,= 13,- 27), 33.0 %
Hiarcs 11.1                   :  50 (+  9,= 12,- 29), 30.0 %
Loop 13.6 32-bit              :  50 (+ 12,= 11,- 27), 35.0 %
Movei 00.8.438                :  50 (+ 22,= 14,- 14), 58.0 %
Scorpio 1.91                  :  50 (+ 19,= 20,- 11), 58.0 %
Slow Chess Blitz WV2.1        :  50 (+ 21,= 21,-  8), 63.0 %
Spike 1.2 Turin               :  50 (+  7,= 20,- 23), 34.0 %
WildCat 7                     :  50 (+ 18,= 20,- 12), 56.0 %
Ktulu 8.0                     :  50 (+ 12,= 17,- 21), 41.0 %
Naum 2.2 32-bit               :  50 (+ 11,= 17,- 22), 39.0 %



Chessmaster 11 Tomahawk   2767  600 (+177,=184,-239), 44.8 %

Chess Tiger 2007.1            :  50 (+ 10,= 23,- 17), 43.0 %
Delfi 5.2                     :  50 (+ 20,= 15,- 15), 55.0 %
Glaurung 2.0.1 64-bit         :  50 (+ 11,= 17,- 22), 39.0 %
Hiarcs 11.1                   :  50 (+  7,= 13,- 30), 27.0 %
Loop 13.6 32-bit              :  50 (+  8,= 11,- 31), 27.0 %
Movei 00.8.438                :  50 (+ 21,= 14,- 15), 56.0 %
Scorpio 1.91                  :  50 (+ 20,= 11,- 19), 51.0 %
Slow Chess Blitz WV2.1        :  50 (+ 24,= 16,- 10), 64.0 %
Spike 1.2 Turin               :  50 (+ 15,= 16,- 19), 46.0 %
WildCat 7                     :  50 (+ 21,= 16,- 13), 58.0 %
Ktulu 8.0                     :  50 (+ 14,= 16,- 20), 44.0 %
Naum 2.2 32-bit               :  50 (+  6,= 16,- 28), 28.0 %



CM11 Default Sel 21       2763  600 (+152,=228,-220), 44.3 %

Chess Tiger 2007.1            :  50 (+ 14,= 16,- 20), 44.0 %
Delfi 5.2                     :  50 (+ 19,= 18,- 13), 56.0 %
Glaurung 2.0.1 64-bit         :  50 (+ 13,= 16,- 21), 42.0 %
Hiarcs 11.1                   :  50 (+  3,= 18,- 29), 24.0 %
Loop 13.6 32-bit              :  50 (+  5,= 17,- 28), 27.0 %
Movei 00.8.438                :  50 (+ 18,= 19,- 13), 55.0 %
Scorpio 1.91                  :  50 (+ 14,= 24,- 12), 52.0 %
Slow Chess Blitz WV2.1        :  50 (+ 18,= 24,-  8), 60.0 %
Spike 1.2 Turin               :  50 (+ 13,= 19,- 18), 45.0 %
WildCat 7                     :  50 (+ 22,= 15,- 13), 59.0 %
Ktulu 8.0                     :  50 (+  8,= 18,- 24), 34.0 %
Naum 2.2 32-bit               :  50 (+  5,= 24,- 21), 34.0 %



CM 11 Silver Fern         2761  600 (+169,=189,-242), 43.9 %

Chess Tiger 2007.1            :  50 (+ 12,= 15,- 23), 39.0 %
Delfi 5.2                     :  50 (+ 18,= 24,-  8), 60.0 %
Glaurung 2.0.1 64-bit         :  50 (+ 10,= 16,- 24), 36.0 %
Hiarcs 11.1                   :  50 (+  6,= 17,- 27), 29.0 %
Loop 13.6 32-bit              :  50 (+  8,= 16,- 26), 32.0 %
Movei 00.8.438                :  50 (+ 16,= 14,- 20), 46.0 %
Scorpio 1.91                  :  50 (+ 22,= 12,- 16), 56.0 %
Slow Chess Blitz WV2.1        :  50 (+ 27,= 13,- 10), 67.0 %
Spike 1.2 Turin               :  50 (+ 11,= 15,- 24), 37.0 %
WildCat 7                     :  50 (+ 15,= 20,- 15), 50.0 %
Ktulu 8.0                     :  50 (+ 12,= 11,- 27), 35.0 %
Naum 2.2 32-bit               :  50 (+ 12,= 16,- 22), 40.0 %
gbanksnz at gmail.com
User avatar
Ovyron
Posts: 4562
Joined: Tue Jul 03, 2007 4:30 am

Re: CM11 settings testing - Glorfindel now included

Post by Ovyron »

Graham Banks wrote:This seems to be a pretty scientific method of going about the testing.
In think that a scientific method doesn't only need playing against the same opponents, but also playing the same openings.
Your beliefs create your reality, so be careful what you wish for.
User avatar
Dr.Wael Deeb
Posts: 9773
Joined: Wed Mar 08, 2006 8:44 pm
Location: Amman,Jordan

Re: CM11 settings testing - Glorfindel now included

Post by Dr.Wael Deeb »

Ovyron wrote:
Graham Banks wrote:This seems to be a pretty scientific method of going about the testing.
In think that a scientific method doesn't only need playing against the same opponents, but also playing the same openings.
There is no need to play the same openings when there is enough variety of opponents....It's testing against the different playing styles that counts here :!:
_No one can hit as hard as life.But it ain’t about how hard you can hit.It’s about how hard you can get hit and keep moving forward.How much you can take and keep moving forward….
Tony Thomas

Re: CM11 settings testing - Glorfindel now included

Post by Tony Thomas »

I do not understand the reasoning behind removing the games before each new personality is tested again. True, it would lower the rating of the engine that does bad against Chessmaster, but arent we looking for a higher score?
Spock

Re: CM11 settings testing - Glorfindel now included

Post by Spock »

Tony Thomas wrote:I do not understand the reasoning behind removing the games before each new personality is tested again. True, it would lower the rating of the engine that does bad against Chessmaster, but arent we looking for a higher score?
Simply because it takes away the risk of distortion in the ratings from having so many versions of the same engine in a single ratings database.
Tony Thomas

Re: CM11 settings testing - Glorfindel now included

Post by Tony Thomas »

Spock wrote:
Tony Thomas wrote:I do not understand the reasoning behind removing the games before each new personality is tested again. True, it would lower the rating of the engine that does bad against Chessmaster, but arent we looking for a higher score?
Simply because it takes away the risk of distortion in the ratings from having so many versions of the same engine in a single ratings database.
So you wont be using these games for the 40/4 list? I did say in my post that the distortion would occur, but you are basically looking for the personality that gets the highest score in 600 games. I guess you must be using some kind of tool to remove the games, because it can be a pain to open the pgn using word and then removing it manually.
Spock

Re: CM11 settings testing - Glorfindel now included

Post by Spock »

Tony Thomas wrote:
So you wont be using these games for the 40/4 list? I did say in my post that the distortion would occur, but you are basically looking for the personality that gets the highest score in 600 games. I guess you must be using some kind of tool to remove the games, because it can be a pain to open the pgn using word and then removing it manually.
A small number will be included in the CCRL 40/4 list - for example Sauron and Sel 21 already have been. To include them all would not be a good idea. I'll submit another 2 maximum eventually at the end of the process, after it becomes clear which are the strongest. We don't want to distort or clutter that list....