Tentative evaluation of Capablanca engines

Discussion of computer chess matches and engine tournaments.

Moderator: Ras

User avatar
hgm
Posts: 28391
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Tentative evaluation of Capablanca engines

Post by hgm »

In my first successful attempt to evaluate playing strength of the available 10x8-capable WinBoard engines, (i.e. the first attempt that was not corrupted by hanging processes loading the CPU at the expense of the ongoing games), I conducted a round-robin at 10+0 time control. This had 10 games per pairing from 5 different opening positions, so that each engine played 70 games. (For details, see http://home.hccnet.nl/h.g.muller/TT5.html.)

The result were

Code: Select all

                              Joker   TJ  Smirf TSCPG f-Max  ArcB  BigL Chanc

 1. Joker80 1.1.14e           ##### 10110 11010 11101 11111 11111 11111 11111
    (H.G.Muller)              ##### 11=00 11111 11111 10111 11111 11111 11111   88%  61.5 (2185.0, 1773.5)

 2. TJchess10x8 0.109         01001 ##### 10111 01100 1=11= 11111 11111 11111
    (Tony Hecker)             00=11 ##### 01000 11111 1=001 11011 11111 11111   74%  52.0 (2280.0, 1420.5)

 3. Smirf 1.71d               00101 01000 ##### =1110 11110 11111 11111 11111
    (Reinhard Scharnagl)      00000 10111 ##### 10000 10101 11011 11111 11111   68%  47.5 (2325.0, 1204.5)

 4. TSCP Gothic               00010 10011 =0001 ##### =11=1 11111 11111 11111
    (M.Langeveld/T.Kerrigan)  00000 00000 01111 ##### 0=001 00111 11111 11011   60%  42.0 (2380.0, 1024.5)

 5. Fairy-Max 4.8 t           00000 0=00= 00001 =00=0 ##### 11111 1=111 11110
    (H.G.Muller)              01000 0=110 01010 1=110 ##### 11111 11111 11111   58%  40.5 (2395.0, 924.8)

 6. ArcBishop80 1.00          00000 00000 00000 00000 00000 ##### 11101 10=01
    (Matthias Gemuh)          00000 00100 00100 11000 00000 ##### 10001 =1011   23%  16.0 (2640.0, 306.5)

 7. BigLion80 2.23x           00000 00000 00000 00000 0=000 00010 ##### 10100
    (Matthias Gemuh)          00000 00000 00000 00000 00000 01110 ##### 11101   15%  10.5 (2695.0, 144.3)

 8. Chancellor 1.00d          00000 00000 00000 00000 00001 01=10 01011 #####
    (Matthias Gemuh)          00000 00000 00000 00100 00000 =0100 00010 #####   14%  10.0 (2700.0, 188.5)
Using BayesElo with prior=0 and offset = 1950 then gives:

Code: Select all

Rank Name               Elo    +    - games score oppo. draws
   1 Joker80 1.1.14e   2402  119   96    70   88%  1885    1%
   2 TJchess10x8 0.109 2224   91   86    70   74%  1911    6%
   3 Smirf 1.71d       2155   89   87    70   68%  1921    1%
   4 TSCP Gothic       2073   84   85    70   60%  1932    6%
   5 Fairy-Max 4.8 t   2050   82   83    70   58%  1936   10%
   6 ArcBishop80 1.00  1636   96  103    70   23%  1995    3%
   7 BigLion80 2.23    1541  103  117    70   15%  2008    1%
   8 Chancellor 1.00d  1519  103  117    70   14%  2012    3%
The games can also been downloaded via the link given above.
Spock

Re: Tentative evaluation of Capablanca engines

Post by Spock »

What is stunning here is the almost non-existent draw rate !!
User avatar
hgm
Posts: 28391
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Tentative evaluation of Capablanca engines

Post by hgm »

True, draws are much rarer in 10x8 Chess than in normal Chess. The are more common than this tourney suggests, though: part of the low draw rate is due to the very wide spread in Elo of the participants, making about half the games a straightforward slaughter.

Sudden-death time controls are also a factor adverse to draws: drawn games tend to drag on forever, until the engine with the slightly inferior time management is left with so much less time than the opponent that it starts to blunder the draw away.
Last edited by hgm on Sun Feb 24, 2008 8:30 pm, edited 1 time in total.
Spock

Re: Tentative evaluation of Capablanca engines

Post by Spock »

OK - the draw rate on our FRC list is lower than normal (22%) for the same reason
User avatar
hgm
Posts: 28391
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Tentative evaluation of Capablanca engines

Post by hgm »

The only good indication of the normal draw rate I have is from the following large set of self-play games of Joker80: two completely identical versions against each other, from opening positions with various pieces deleted, to create an unequal-material situation. The deleted material was chosen such to create an approximately fair battle. (Sometimes I also changed one piece for another, e.g. Q->A for one side, and A->Q for the other.)

Typical number of draws in a 432-game match is ~64, is ~15%. (These were all 40/1' games; I did not measure how the draw rate varied with time control.)

Code: Select all

RR-Q     (174+ 194- 64=) 47.7%
RR-CP    (131+ 227- 74=) 38.9%
RR-AP    (166+ 199- 67=) 46.2%
RR-C     (188+ 170- 74=) 52.1%
RR-A     (197+ 162- 73=) 54.1%
QQ-CC    (131+ 55-  30=) 67.6%
QQ-AA    (117+ 60-  39=) 63.2%
QQ-CCP   (112+ 72-  32=) 59.3%
QQ-AAP   (112+ 78-  26=) 57.9%
CC-AA    (102+ 89-  25=) 53.0%
Q-CP     (164+ 191- 77=) 46.9%
Q-AP     (191+ 186- 55=) 50.6%
Q-C      (215+ 161- 56=) 56.3%
Q-A      (219+ 138- 75=) 59.4%
C-A      (187+ 182- 63=) 50.6%
A-RN     (261+ 122- 49=) 66.1%
C-RN     (273+ 101- 58=) 69.9%
A-RNP    (247+ 121- 64=) 64.6%
C-RNP    (242+ 144- 46=) 61.3%
NN-RP    (262+ 127- 43=) 65.6%
NN-RPP   (221+ 141- 70=) 59.3%
P-.      (233+ 132- 67=) 61.7% (g-pawn)
P-.      (218+ 160- 54=) 56.7% (c-pawn)
PP-.     (235+ 144- 53=) 60.5% (g+c)
PP-.     (253+ 129+ 50=) 64.4% (g+b)
R-BP     (187+ 187- 58=) 50.0%
R-BPP    (170+ 209- 53=) 45.5%
R-NP     (226+ 148- 58=) 59.0%
R-NPP    (218+ 153- 61=) 57.5% (g+c)
R-NPP    (195+ 172- 65=) 52.7% (g+b)
BB-BB'   (208+ 164- 60=) 55.1% (pair vs anti-pair)
B-N      (395+ 314- 154=) 54.6%
B-NP     (331+ 384- 149=) 46.9%
BB-BN    (459+ 278- 127=) 60.5%
BB-BNP   (371+ 346- 147=) 51.4%
BB-NN    (262+ 109- 61=) 67.7%
BB-NNP   (221+ 139- 72=) 59.5%
BB-NNPP  (197+ 168- 67=) 53.4%
RR-BBPP  (201+ 175- 56=) 53%