New rating list @ CEGT

Discussion of computer chess matches and engine tournaments.

Moderators: hgm, Rebel, chrisw

Wolfgang
Posts: 900
Joined: Sat May 13, 2006 1:08 am

New rating list @ CEGT

Post by Wolfgang »

Hi all,

we have established a new rating list with Ponder=ON and following conditions:

Code: Select all

Time control:          5 minutes per game + 3 seconds per move
Games per match:	    50
Hardware:              mainly Intel i5-2400 @ 3,1 GHZ and AMD-X4 @ 3,4 GHZ 
                       (seldomly Intel i7-3770 @ 3,7 GHZ)
Cores:	              1
Hashsize:              256 MB
Tablebases:            3+4 men Nalimov, Gaviota etc.
Software:              x64 and SSE if available
GUI:                   Shredder Classic, Arena, CB-Fritz
Learning:              no
Link: http://www.husvankempen.de/nunn/rating5plus3pbon.htm

There will be two sub lists named “All Versions” and “Pure List” with the second one containing only the best version of an engine. “All Versions” is self explanatory, I think… :-).

Why this list? Why (roughly) IPON conditions?
So, the main reason was that our existing Ponder-ON list with 40 moves / 20 Minutes (about 80 minutes per game!) became too much time consuming with growing number of engines. One test run took about 2 months and we have to keep in mind that we have three other lists to care for. This was not practicable any more!

At the same time Ingo gave up his fantastic IPON and so we decided to take “his” time control for our new list. Some other conditions (Hardware, Hash, Tablebases, Cores etc.) are nearly or completely identical, but there are important differences too. We play 50 games per match (IPON: 150) and we use different testsuites taken from various sources. Maybe the most important difference is that our games are provided for download as usual.

The actual list contains 12 engines having played round robin, i.e. 550 games per engine, 3.300 games total. Nearly finished is the test with Hannibal 1.4 x64 and just started is the run with Black Mamba 1.4 x64.

Have fun! :D
Best
Wolfgang
CEGT-Team
www.cegt.net
www.cegt.forumieren.com
ThatsIt
Posts: 992
Joined: Thu Mar 09, 2006 2:11 pm

Re: New rating list @ CEGT

Post by ThatsIt »

A remark:
from now on you will find some more stats after every testrun here:
http://www.husvankempen.de/nunn/5Plus3R ... /stats.htm

Best wishes,
G.S.
(CEGT team)
IWB
Posts: 1539
Joined: Thu Mar 09, 2006 2:02 pm

Re: New rating list @ CEGT

Post by IWB »

Hi,

I am happy that the IPON inspired you all to do something similar especially because some of my initial critics on the existing rating lists, which let me publish my official list, are taken into consideration.
Wolfgang wrote: ... We play 50 games per match (IPON: 150) and we use different testsuites taken from various sources. Maybe the most important difference is that our games are provided for download as usual.
...
I have some experience with this and my first thought was that 25 positions would not be enough for a propper rating list but then you write "taken from various sources". Does that mean you ...

1. have a fixed 25 position set taken from different sources or
2. use different sets with 25 positions in different matches.

In case one I have the feeling that this is not represantative enough for "chess" (I had some doubts with 75 position - but the comparision with the CEGT 40/20 showed good similarities) and in case 2 it is the problem of case 1 AND the fact that you play different games with each engine. My advice is to increase the number of positions as much as possible in your time frame.

Then I am curious to see when we will have the first enigne where your fixed 25 positions are compiled in (and maybe the fact is hidden by taking time but playing brilliant moves ... I already suspect a certain corner of compilers to go that way! Prejustices, I know, but experience beats philanthropy). In that case you will have a very good engine which is really different in your "book" lists. Then you have to decide if it is due to cheating or the time control or the increment or ... Of course you can avoid engines comming from that corner. Up till now you did a good job doing so, in contrary to other lists!

Anyhow, good luck, I will surely follow that project.

Bye
Ingo
Wolfgang
Posts: 900
Joined: Sat May 13, 2006 1:08 am

Re: New rating list @ CEGT

Post by Wolfgang »

IWB wrote:Hi,

1. have a fixed 25 position set taken from different sources or
2. use different sets with 25 positions in different matches.
Hi Ingo,

point 2 is correct. There are thousands of positions where sets of 25 openings are choosen from.
... in case 2 it is the problem of case 1 AND the fact that you play different games with each engine. My advice is to increase the number of positions as much as possible in your time frame.
I can live with that as we made good experiences in our other rating lists. 40/4, 40/20, 40/120 are played same way . There are no books used but only various testsuites. I don't think that we will increase number of games with our current hardware power which is divided to four lists. With only one list we would surely make 100 or even 200 games per match.
Then I am curious to see when we will have the first enigne where your fixed 25 positions are compiled in
I think this is point is obsolete due to my answer above as we haven't ONE fixed set.

...
Anyhow, good luck, I will surely follow that project.

Bye
Ingo
Thanks! :)
Wolfgang
Best
Wolfgang
CEGT-Team
www.cegt.net
www.cegt.forumieren.com
IWB
Posts: 1539
Joined: Thu Mar 09, 2006 2:02 pm

Re: New rating list @ CEGT

Post by IWB »

Wolfgang wrote:
Then I am curious to see when we will have the first enigne where your fixed 25 positions are compiled in
I think this is point is obsolete due to my answer above as we haven't ONE fixed set.
Yes it is. Unfortunately the "same games for every engine rule" is not given then.

But, I am more relaxed regarding this today then I were years ago. I assume it is good enough for a rating list but might make "beta-testing" more difficult.
But this was my intention with the IPON and it might not be yours!!!

Again: Good luck, I am curious to see the results!

BYe
Ingoi
ThatsIt
Posts: 992
Joined: Thu Mar 09, 2006 2:11 pm

Re: New rating list @ CEGT

Post by ThatsIt »

Hi Ingo !

Some years ago you increased the number of start-positions for the IPON.
After that you quoted that you've seen no notable changes.
We use well balanced position-sets, and therefore its no problem that
these sets (chosen from a huge pool) will vary very often.

Best wishes,
G.S.
(CEGT team)
IWB
Posts: 1539
Joined: Thu Mar 09, 2006 2:02 pm

Re: New rating list @ CEGT

Post by IWB »

ThatsIt wrote:Hi Ingo !

Some years ago you increased the number of start-positions for the IPON.
After that you quoted that you've seen no notable changes.
We use well balanced position-sets, and therefore its no problem that
these sets (chosen from a huge pool) will vary very often.

Best wishes,
G.S.
(CEGT team)
Thats true, but I moved from 50 to 75 positions - that is a difference.

But it doesnt matter as you use a big set and only use 25 per match. In reality your set of positions is bigger than 25 and I assume problems when JUST used 25 positions ...

Bye
Ingo
Modern Times
Posts: 3557
Joined: Thu Jun 07, 2012 11:02 pm

Re: New rating list @ CEGT

Post by Modern Times »

Wolfgang wrote: So, the main reason was that our existing Ponder-ON list with 40 moves / 20 Minutes (about 80 minutes per game!) became too much time consuming with growing number of engines. One test run took about 2 months and we have to keep in mind that we have three other lists to care for. This was not practicable any more!
So is that list being discontinued ?
Wolfgang
Posts: 900
Joined: Sat May 13, 2006 1:08 am

Re: New rating list @ CEGT

Post by Wolfgang »

Modern Times wrote:...

So is that list being discontinued ?
Yes, at least for now. MAYBE (an this is really a BIG "Maybe") I'll (or we will) later find time and CPU-Power and motivation and, and, and... to continue but surely not under these conditions where practically a big round-robin tourney is played (except matches between engines/versions from same author). This is definetely not possible any more with this rather long time control.

What I could imagine is to integrate new engines but only let them play vs. opponents with their projected strength (+/- 150 ELO or so) as we normally do at our other rating lists. Maybe ~500 games for the entry and more from time to time. But even this would last around three weeks if played on one dual core computer!

I'd rate the probability for a continuation at 20% maximum and its priority is far behind any other CEGT project we have!

The only reason for me to think about it at all is that I don't like throwing away nearly 20.000 long time control games.
Best
Wolfgang
CEGT-Team
www.cegt.net
www.cegt.forumieren.com
lkaufman
Posts: 5966
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: New rating list @ CEGT

Post by lkaufman »

This is good news! I think the time control is a reasonable one, and in particular the switch from 40/x to increment is a very desirable one, and the ratio of 100 to 1 for base time to increment is close to ideal in my opinion. It is a huge waste of resources to play out drawn endings for 200 moves at the same rate as the first forty moves. I hope you gradually migrate to only increment time controls, or perhaps you will focus on this new list and the others will become somewhat neglected, if so that's understandable. Personally I think the use of Ponder ON is a waste of resources, especially since for most users this is not relevant, but I understand you want to make this list different from your other ones.
One question: I'm not familiar enough with AMD hardware to know this: are the I5 machines and the AMD machines you mention fairly close in speed for typical chess engines? I think it is important that they not be too far apart, or if they are one of them should use an adjusted time control to make them fairly even. It is very clear to me that different engines benefit unequally from more time, so if you mix time controls that are effectively quite different this would make the results rather random or subject to operator preferences. For example, if an operator liked Houdini he would test it on the slowest hardware to make the time control especially fast, at which Houdini excels.