CCRL 40/40 lists updated (8th June 2013)

Discussion of computer chess matches and engine tournaments.

Moderator: Ras

User avatar
Graham Banks
Posts: 45237
Joined: Sun Feb 26, 2006 10:52 am
Location: Auckland, NZ

CCRL 40/40 lists updated (8th June 2013)

Post by Graham Banks »

The latest CCRL Rating Lists and Statistics are available for viewing from the following links:
http://computerchess.org.uk/ccrl/4040/ (40/40)
http://www.computerchess.org.uk/ccrl/404/ (40/4)
http://www.computerchess.org.uk/ccrl/404FRC/ (FRC 40/4)

Please note that the three lists are updated separately to each other. The 40/40 and 40/4 lists are updated once every two weeks and alternately to each other. The FRC list is updated when a new engine or engine version is being/has been tested.

The links to the various rating lists can be found just beneath the default Best Versions list (as in this screenshot). Specific 32-bit rating lists are denoted as such to the right of the default list in each category. The default lists contain the 64-bit engines.

Image

Our 40 moves in 40 minutes repeating and 40 moves in 4 minutes repeating are both adjusted to the AMD64 X2 4600+ (2.4GHz).

Be aware that in the early stages of testing, an engine's rating can often fluctuate a lot.
It is strongly advised to look at the many other rating lists available in order to get a more accurate overall picture of an engine's rating relative to others.

The LOS (likelihood of superiority) stats to the right hand side of each rating list tell you the likelihood in percentage terms of each engine being superior to the engine directly below them.

All games are available for download by engine or ECO code. The total games database in its entirety is always available.
The current ELO ratings are saved in all game databases for those engines that have 200 games or more.

Clicking on an engine name will give details as to opponents played plus homepage links where applicable.

Custom lists of engines can be selected for comparison.

An openings report page lists the number of games played by ECO codes with draw percentage and White win percentage. Clicking on a column heading will sort the list by that column.


Games submitted for this update
Komodo CCT 64-bit -> 521
Stockfish 3 64-bit -> 308
Sting SF 3 64-bit -> 210
Booot 5.2.0 64-bit -> 155
Stockfish 3 64-bit 4CPU -> 146
Toga II 3.0 -> 115
Critter 1.6a 64-bit -> 105
BlackMamba 1.4 64-bit -> 101
ProDeo 1.85 -> 99
Houdini 3 64-bit -> 96
Djinn 0.987 64-bit -> 95
Rybka 4.1 64-bit -> 92
Nebula 2.0 64-bit -> 88
Glass 2.0 64-bit -> 84
Rotor 0.8 -> 82
Godel 2.3.7 64-bit -> 81
Arasan 15.6 64-bit -> 77
Deep Fritz 13 4CPU -> 75
Hannibal 1.3 64-bit -> 71
Gull R375 64-bit -> 70
RedQueen 1.1.3 64-bit -> 61
DeepSaros 3.3b 64-bit -> 60
Protector 1.5.0 64-bit -> 59
Bouquet 1.6 64-bit -> 59
Hiarcs 14 -> 54
Tornado 4.88 64-bit -> 51
GNU Chess 5.50 64-bit -> 49
Dirty 20Apr2013 64-bit -> 49
Chiron 1.5 64-bit -> 48
Gaviota 0.83 64-bit -> 46
Dirty 24Apr2011 64-bit -> 46
Cheng3 1.07 64-bit -> 45
RobboLito 0.21Q 64-bit -> 44
Tucano 2.0 64-bit -> 44
Nemo 1.0.1 64-bit -> 43
Scorpio 2.7.6 64-bit -> 41
Vitruvius 1.11C 64-bit -> 40
Quazar 0.3b 32-bit -> 40
Houdini 1.5a 64-bit -> 40
CM10th Xperience -> 40
Strelka 5.5 64-bit -> 39
IvanHoe 9.46h 64-bit -> 38
Cheng3 1.07 32-bit -> 36
Bouquet 1.5 64-bit -> 36
Scorpio 2.0 -> 36
MinkoChess 1.3 64-bit -> 35
Bouquet 1.6 64-bit 4CPU -> 35
Quazar 0.4 64-bit -> 35
Texel 1.01 64-bit -> 35
RobboLito 0.21Q 64-bit 4CPU -> 35
Frenzee 3.5.19 64-bit -> 35
Sting SF 3 64-bit 4CPU -> 35
Octochess r4984 64-bit -> 35
Gull II b2 64-bit -> 34
Alfil 13.1 64-bit -> 33
NanoSzachy 4.0 64-bit -> 33
Rodin 6.0 -> 33
Chiron 1.1a 64-bit -> 33
Atlas 3.25 64-bit -> 32
Naum 4.2 64-bit 4CPU -> 30
IvanHoe 9.46h 64-bit 4CPU -> 30
Naum 4.2 64-bit -> 30
Vitruvius 1.11C 64-bit 4CPU -> 30
Naum 1.91 64-bit -> 29
Critter 1.6a 64-bit 4CPU -> 29
Hiarcs 14 4CPU -> 29
Rybka 4.1 64-bit 4CPU -> 29
Houdini 3 64-bit 4CPU -> 29
GreKo 10.0 64-bit -> 28
Rodent 1.0 64-bit -> 28
DanaSah 5.00 -> 28
CM10th Lazarus -> 28
Protector 1.5.0 64-bit 4CPU -> 25
Komodo 3 64-bit -> 25
Hannibal 1.3 64-bit 4CPU -> 25
Rybka 2.3.2a 64-bit -> 24
Spike 1.4 Leiden -> 24
Chiron 1.5 64-bit 4CPU -> 24
Gaviota 0.86 64-bit -> 23
Spike 1.4 Leiden 4CPU -> 20
EXchess 6.71b 64-bit -> 20
Deep Junior 13.3 64-bit 4CPU -> 19
EXchess 7.03b 64-bit -> 17
DiscoCheck 4.2 64-bit -> 17
Crafty 23.5 64-bit -> 17
Deuterium 12.01.30.1070 64-bit -> 17
Deep Junior 13.3 64-bit 1CPU -> 14
Dirty 25Aug2011 64-bit -> 8
Stockfish 1.5.1 32-bit -> 8
Rybka 1.1 64-bit -> 8
Naum 3 32-bit -> 8
LoopList 6.00 -> 8
Glaurung 2.1 64-bit -> 6
ProDeo 1.81 -> 6
Chessmaster 11 Conqueror -> 6
Rybka 1.2 32-bit -> 6
Shredder 12 64-bit OA On -> 6
Tornado 2.1.41a -> 6
Zappa 1.1 64-bit -> 6
Zappa Mexico II 64-bit 4CPU -> 5
Spark 1.0 64-bit 4CPU -> 5
Vajolet 2.03 -> 5
Onno 1.2.70 64-bit 4CPU -> 5
Philou 3.7.1 64-bit -> 5
Rybka 2.3.2a 64-bit 4CPU -> 5
Pawny 1.0 64-bit -> 5
Houdini 1.5a 64-bit 4CPU -> 5
Gull R375 64-bit 4CPU -> 5
Chess Tiger 2007 -> 4
Bobcat 3.25 64-bit -> 4
Rodent 0.16 64-bit -> 4
Pawny 0.3 64-bit -> 4
Francesca MAD 0.19 -> 4
Cheese 1.4 64-bit -> 4
Scorpio 1.91 -> 4
Toga II 3.0 4CPU -> 4
GreKo 9.7 64-bit -> 4
Sting SF 2 64-bit -> 4
Stockfish 2.3.1 64-bit -> 4
NirvanaChess 1.1 64-bit -> 2
RomiChess P3L 64-bit -> 2
Jazz 721 64-bit -> 2
Gibbon 2.57a 64-bit -> 2
Komodo 5 64-bit -> 2
Tigran 2.3 64-bit -> 2
Rhetoric Lite -> 2
Sjakk 2.0 64-bit -> 2
Ifrit m1.8 64-bit -> 2
EXchess 6.50b 64-bit -> 2
Capivara LK 0.09a02a 64-bit -> 2
Betsabe II 1.30 64-bit -> 2
Pepito 1.59 64-bit -> 1
TJchess 1.1 64-bit -> 1
ECE 12.01 -> 1
Bagatur 1.3a 64-bit -> 1
EveAnn 1.71a -> 1
Myrddin 0.86 64-bit -> 1
gbanksnz at gmail.com
lkaufman
Posts: 6284
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: CCRL 40/40 lists updated (8th June 2013)

Post by lkaufman »

So, we just have to "find" 22 elo points to catch Houdini 3 at 40/40. I wonder how long that will take us...
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: CCRL 40/40 lists updated (8th June 2013)

Post by Laskos »

lkaufman wrote:So, we just have to "find" 22 elo points to catch Houdini 3 at 40/40. I wonder how long that will take us...
22+/-21 95% confidence, and I would go for the upper limit, looking at the performance of Houdini 3 Tactical, which is known to be weaker than Houdini 3 normal by some 25 points.

Or, to rephrase alternatively your statement, you have to "find" 22 elo points to catch Houdini 3 Tactical at 40/40.
mwyoung
Posts: 2727
Joined: Wed May 12, 2010 10:00 pm

Re: CCRL 40/40 lists updated (8th June 2013)

Post by mwyoung »

lkaufman wrote:So, we just have to "find" 22 elo points to catch Houdini 3 at 40/40. I wonder how long that will take us...
If I were you and don I would not be so concerned with Houdini.
I have been testing a development version of stockfish. It has achieved a equal rating with Houdini 3 pro at long time controls with a error bar of +\- 20 Elo.

I started testing a new development version of stockfish yesterday at 40/40 that could be even stronger. Will have a update on this version of stockfish in a week or two. Why the concern for you....stockfish is free and making progress.
"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.
lkaufman
Posts: 6284
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: CCRL 40/40 lists updated (8th June 2013)

Post by lkaufman »

Houdini tactical has only about 30% of the number of games that Houdini 3 has, so it is far more likely that Houdini tactical is overrated than that Houdini 3 is underrated significantly. Also, the margin of error for H3 is shown as 15. The most likely thing is that H3 is a couple elo underrated on that list, so let's say we need 25 elo to be favored to lead the list.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: CCRL 40/40 lists updated (8th June 2013)

Post by Laskos »

lkaufman wrote:Houdini tactical has only about 30% of the number of games that Houdini 3 has, so it is far more likely that Houdini tactical is overrated than that Houdini 3 is underrated significantly. Also, the margin of error for H3 is shown as 15. The most likely thing is that H3 is a couple elo underrated on that list, so let's say we need 25 elo to be favored to lead the list.
Averaging over H3 and H3 T (25-30 points weaker) gives that H3 is underrated by some 8 Elo points. So the difference between H3 and Komodo CCT is more likely 30+/-20 Elo points 95% confidence.
lkaufman
Posts: 6284
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: CCRL 40/40 lists updated (8th June 2013)

Post by lkaufman »

Laskos wrote:
lkaufman wrote:Houdini tactical has only about 30% of the number of games that Houdini 3 has, so it is far more likely that Houdini tactical is overrated than that Houdini 3 is underrated significantly. Also, the margin of error for H3 is shown as 15. The most likely thing is that H3 is a couple elo underrated on that list, so let's say we need 25 elo to be favored to lead the list.
Averaging over H3 and H3 T (25-30 points weaker) gives that H3 is underrated by some 8 Elo points. So the difference between H3 and Komodo CCT is more likely 30+/-20 Elo points 95% confidence.
But your claim that H3 T is 25-30 points weaker is presumably based on the blitz lists. At higher levels all the rating differences contract, and it may well be that there is very little difference between the versions at 40/40 level, since sometimes different programs/versions benefit more than others from more time. The best evidence of the difference is the actual testing at this level. I admit it is unlikely that there is no difference, but maybe 10 elo might be a good guess, in which case H3 would be about 3 elo underrated.
lkaufman
Posts: 6284
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: CCRL 40/40 lists updated (8th June 2013)

Post by lkaufman »

Actually I would not mind if our main rival in the future was Stockfish rather than Houdini. Stockfish is a truly original program, and everything about it is open and above-board. If Stockfish does come up with a truly original idea that is worth a ton of Elo, well at least we will know what it is and can see if it works in Komodo. Right now we have a comfortable lead (at least on one core), and I think we can keep pace, but who knows? Even if Stockfish does end up on a par with Komodo at the top, we do have one major selling point. The evaluation in Stockfish has little resemblance to that of human masters, unlike all the other top programs. So for people wanting to use the engine to review their games or to prepare openings, they will still have plenty of reason to buy a top engine.
mwyoung
Posts: 2727
Joined: Wed May 12, 2010 10:00 pm

Re: CCRL 40/40 lists updated (8th June 2013)

Post by mwyoung »

lkaufman wrote:Actually I would not mind if our main rival in the future was Stockfish rather than Houdini. Stockfish is a truly original program, and everything about it is open and above-board. If Stockfish does come up with a truly original idea that is worth a ton of Elo, well at least we will know what it is and can see if it works in Komodo. Right now we have a comfortable lead (at least on one core), and I think we can keep pace, but who knows? Even if Stockfish does end up on a par with Komodo at the top, we do have one major selling point. The evaluation in Stockfish has little resemblance to that of human masters, unlike all the other top programs. So for people wanting to use the engine to review their games or to prepare openings, they will still have plenty of reason to buy a top engine.
I don't have a problem with Houdini 3 as some do. Unless someone with standing comes forward and claims wrong doing by Robert Houdart. And that has not happened. Setting that aside.

I don't know what exactly they are doing with Stockfish, but it seems Stockfish evals have been toned down a bit. I am testing the newest Stockfish against Stockfish 3. So we will see how the two Stockfish versions eval against each other. It seems the newer Stockfish development version evals are more in line with the other program I have been testing against. Then the older Stockfish versions, but still seem high at some points in the game.

And that my not be correct. It may be more accurate to say you don't get the wild evaluation swings as you did with older Stockfish version. I don't know if that has to do with them cleaning up code, or something else. Here is my first test game with the latest Stockfish version. So you can view the evaluation profile, it is easier to see then explain. Yes it is higher for Stockfish, but the evaluation curves do mirror each other, seems better then it use to be.

[Event "Stockfish test"]
[Site ""]
[Date "2013.06.10"]
[Round "1.1"]
[White "Stockfish 090613 64 SSE4.2"]
[Black "Deep Rybka 4.1 SSE42 x64"]
[Result "1-0"]
[ECO "A29"]
[Annotator "0.34;-0.01"]
[PlyCount "157"]
[EventDate "2013.06.10"]
[EventType "simul"]
[Source "Young"]
[TimeControl "40/2400:40/2400:40/2400"]

{Intel(R) Core(TM) i7 CPU Q 840 @ 1.87GHz 1861 MHz W=29.4 plies; 4,
699kN/s; Droidfish.ctg B=19.6 plies; 249kN/s; 3,283 TBAs; Droidfish.ctg} 1. c4
{B 0} e5 {B 0} 2. g3 {B 0} Nf6 {B 0} 3. Bg2 {B 0} d5 {B 0} 4. cxd5 {B 0} Nxd5 {
B 0} 5. Nc3 {B 0} Nb6 {B 0} 6. Nf3 {B 0} Nc6 {B 0} 7. O-O {B 0} Be7 {B 0} 8. d3
{B 0} O-O {B 0} 9. Be3 {B 0} Be6 {B 0} 10. a3 {B 0} a5 {B 0} 11. Na4 {B 0} Nd5
{-0.01/19 81} 12. Bc5 {B 0} Bd6 {B 0} 13. Rc1 {B 0} h6 {B 0} 14. d4 {B 0} e4 {
0.23/18 27} 15. Ne5 {0.34/31 78} f5 {0.14/19 49} 16. Nxc6 {0.20/30 153} bxc6 {
0.28/19 11} 17. e3 {0.22/32 190 (Bxd6)} Ne7 {0.07/20 161} 18. f3 {0.22/31 201
(Re1)} Rb8 {0.00/20 84 (exf3)} 19. Qd2 {0.26/30 94 (Bxd6)} exf3 {0.00/19 105}
20. Bxf3 {0.22/31 90} Bb3 {0.00/21 79} 21. Bxd6 {0.22/30 90} cxd6 {0.00/20 22}
22. Nc3 {0.20/31 89} Qb6 {0.00/20 90} 23. Rf2 {0.20/29 88} Rf7 {0.00/18 90
(Rbe8)} 24. Qd3 {0.24/27 88 (Bg2)} Rff8 {0.00/20 108 (Be6)} 25. Bg2 {0.20/30
55 (Ne2)} Rf6 {0.00/19 161 (Rbe8)} 26. Bh3 {0.28/28 111 (Ne2)} Rbf8 {0.00/19
143} 27. Re1 {0.20/29 366 (Ne2)} Kh8 {0.00/19 227 (R6f7)} 28. e4 {0.18/26 63
(Ref1)} f4 {-0.08/19 54 (fxe4)} 29. Ref1 {0.00/29 65} fxg3 {-0.07/19 73 (Ng6)}
30. hxg3 {0.00/31 44} Ng8 {0.00/19 66 (Rxf2)} 31. Bf5 {0.00/29 44} Rd8 {0.00/
19 51 (Ne7)} 32. Ne2 {0.30/25 45 (Nb1)} Ne7 {0.08/19 85} 33. g4 {0.36/25 175
(Bg4)} Kg8 {0.00/18 127 (c5)} 34. Ng3 {0.62/25 34 (Nf4)} g6 {0.00/17 54} 35.
Rh2 {0.00/26 71 (Nh5)} Rdf8 {0.00/18 74 (gxf5)} 36. Rff2 {0.70/25 29 (Rxh6)}
gxf5 {0.06/17 27} 37. gxf5 {0.62/26 23 (exf5)} Kf7 {0.29/16 92} 38. Nh5 {0.70/
27 33} Ke8 {0.30/17 28} 39. Nxf6+ {0.66/27 32} Rxf6 {0.41/18 6} 40. Rhg2 {0.94/
29 47} Bf7 {0.61/18 113} 41. Rg7 {1.11/29 97} Kf8 {0.64/18 56} 42. Rg3 {0.80/
29 326} Ke8 {0.64/18 84 (a4)} 43. Rgf3 {1.13/28 76 (b3)} Bh5 {0.87/17 108 (Qb5)
} 44. Rf4 {1.81/30 221} Qb8 {1.49/17 227} 45. Qh3 {2.16/30 454 (Qc3)} Bd1 {1.
02/17 20} 46. Qh4 {1.91/31 83} Kf7 {1.17/17 42} 47. Rd2 {2.00/31 59} Bb3 {1.17/
17 31} 48. Rg4 {2.18/31 252} Qb6 {1.28/19 59 (Qb5)} 49. Kh2 {2.40/30 45 (Qg3)}
a4 {1.32/18 69 (Bc4)} 50. Rg3 {2.68/28 34 (Rg1)} Qa7 {1.32/18 41} 51. Rf3 {2.
72/30 28 (Rg1)} Qa5 {1.75/19 83 (Qc7)} 52. Rg2 {2.76/29 34} Qb5 {1.80/19 52
(Bc4)} 53. Rfg3 {2.68/30 32} Nxf5 {1.80/19 6} 54. Qh5+ {2.84/30 28} Ke7 {1.80/
19 20} 55. exf5 {2.84/30 11} Bf7 {1.81/20 58 (Qxf5)} 56. Qe2+ {3.19/28 25} Kd7
{2.19/21 54 (Kf8)} 57. Qxb5 {3.17/33 29} cxb5 {2.19/21 5} 58. Rf2 {3.45/34 36
(Rf3)} Kc6 {2.23/23 58} 59. Rc3+ {3.83/30 29 (Rf4)} Kd7 {2.78/22 118} 60. Kg3 {
4.00/33 26} Ba2 {2.78/22 87} 61. Kg4 {4.24/33 129} Bb1 {2.78/21 59 (Bg8)} 62.
Rh3 {4.60/32 21 (Rf4)} Kc6 {3.04/22 50} 63. Rf1 {4.68/32 25} Be4 {3.04/23 12}
64. Rh5 {4.78/30 15} Bd5 {3.04/23 36 (Bd3)} 65. Re1 {4.90/32 40} Bc4 {3.04/22
135} 66. Kf4 {5.03/32 16} Bd5 {3.74/22 62 (Bf7)} 67. Re8 {6.18/30 22 (Rh3)} Ba2
{4.78/19 47 (Bc4)} 68. Rh8 {6.44/31 20} Bb1 {4.79/20 15} 69. R8xh6 {6.60/32 19}
Rf8 {4.79/21 55} 70. f6 {6.76/31 18} Bd3 {4.79/20 111 (Ba2)} 71. Rg5 {6.82/30
14 (Rh3)} Bc4 {5.70/20 61} 72. Rg7 {7.05/30 17} Bd5 {5.90/20 103 (Kd5)} 73. Kf5
{7.59/26 12} Bc4 {6.21/21 112 (Ba2)} 74. Ra7 {7.79/29 16 (Kg6)} Kb6 {6.39/20
65 (Bd3+)} 75. Re7 {8.08/30 16} Kc6 {6.40/22 40 (Rf7)} 76. Kg6 {8.44/23 6 (Kg5)
} b4 {6.77/20 72} 77. f7 {8.92/27 17 (axb4)} Bxf7+ {6.84/19 63} 78. Rxf7 {9.21/
27 16} Rg8+ {7.10/17 28} 79. Kf5 {9.57/29 17} 1-0
"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.
lkaufman
Posts: 6284
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: CCRL 40/40 lists updated (8th June 2013)

Post by lkaufman »

My issue with Stockfish evals is not that they are too high; of course they are, but its easy enough to take 2/3 of every eval to get a reasonable score. The problem is that after doing this adjustment, the evals don't correlate well with GM evals. Of course you could say this about any engine, but for Stockfish the disparity is far greater and far more obvious. The reason is that Stockfish evals were generated by a purely automated procedure, whereas all the other top engines use some eval that was originally based on one that I was involved with, either when working on Rybka 3 or for Komodo. Of course by now Houdini, Critter etc. have made many changes, but they still feel to me like modifications of my Rybka 3 eval, which although tuned by testing was developed based on the idea of simulating GM thinking.
If Stockfish has changed its basic eval philosophy, please let me know.