Blunder rates of top engines in the opening

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Blunder rates of top engines in the opening

Post by Laskos »

I needed a very short, but varied opening book, so I took the 2-mover 2moves_v1.pgn. The adjudication is the following: draw at 15 moves, win at 400cp or above shown by both engines. In these conditions, we see practically the wins right in the openings of the games. To lose in the fairly balanced opening practically means that engine committed a blunder or a series of blunders. They are rare, so I needed to play many games. The time control is 0.25s/move. 1 core each engine. All engines at Contempt=0.

Ranking according Wins - Losses in the openings.

1/ Round-Robin of 12 top engines, 880 games each engine. Fibo 1.9 was excluded, as it behaves erratically with time control.

Code: Select all

Komodo 11.2.2 64-bit     	14-3
Houdini 6.02 Pro x64-pext	 9-0
Brainfish 021017 64 BMI2 	 7-0
Deep Shredder 13 x64     	 6-1
Booot 6.2_x64                2-1
Andscacs 0.92                0-0
Gull 3 x64                   1-3
Hannibal 1.7 x64         	 0-2
Hakkapeliitta TCEC       	 2-5
Fritz 15                 	 2-5
Texel 1.07               	3-14
Fire 6.1 x64 popcnt      	1-13
To mention that here I discovered that with Andscacs 0.92 adjudication as Win/Loss doesn't work until the engine gets simply mated, and here it practically never happens in 15 moves, hence its 0-0 score.

2/ Round-Robin of top 3 engines in the previous ranking, 4000 games each engine:

Code: Select all

Houdini 6.02 Pro x64-pext	16-3
Brainfish 021017 64 BMI2 	14-6
Komodo 11.2.2 64-bit     	3-24
So, even if Komodo beats others right in the opening with be biggest margin in the larger pool of top engines, in top-3 it lags far behind and commits much more blunders


3/ Final of top 2, 6000 games

Code: Select all

Houdini 6.02 Pro x64-pext	10-5
Brainfish 021017 64 BMI2 	5-10
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Blunder rates of top engines in the midgame

Post by Laskos »

Now I took balanced midgame positions, and with stretches of 10 moves to draw adjudication and 400cp win adjudication, got the following:


Ranking according Wins - Losses in balanced midgames.

1/ Round-Robin of 11 top engines, 1000 games each engine. Fibo 1.9 was excluded, as it behaves erratically with time control. Andscacs 0.92 also excluded, as it doesn't always report its eval to GUI, and adjudications are broken.

Code: Select all

Houdini 6.02 Pro x64-pext	27-1
Brainfish 021017 64 BMI2 	18-1
Komodo 11.2.2 64-bit     	19-8
Deep Shredder 13 x64     	10-6
Booot 6.2_x64               12-10
Fritz 15                    12-13
Gull 3 x64                  5-12
Hakkapeliitta TCEC       	4-14
Fire 6.1 x64 popcnt      	10-21
Texel 1.07               	7-22
Hannibal 1.7 x64         	5-21


2/ Round-Robin of top 3 engines in the previous ranking, 4000 games each engine:

Code: Select all

Houdini 6.02 Pro x64-pext	32-23
Brainfish 021017 64 BMI2 	29-27
Komodo 11.2.2 64-bit     	26-37


3/ Final of top 2, 6000 games

Code: Select all

Houdini 6.02 Pro x64-pext	40-23
Brainfish 021017 64 BMI2 	23-40

It seems to me, as in 10 moves no many long term eval features can be made important, Houdini outsearches Stockfish, and both Houdini and Stockfish outsearch Komodo. Also, as all three are close in strength at LTC, Komodo compensates this by better eval, and smaller gains per move over longer stretches of game.
User avatar
cdani
Posts: 2204
Joined: Sat Jan 18, 2014 10:24 am
Location: Andorra

Re: Blunder rates of top engines in the midgame

Post by cdani »

I will try to solve this bug of Andscacs. First time I know about it. Thanks!
Uri Blass
Posts: 10282
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Blunder rates of top engines in the opening

Post by Uri Blass »

I do not like the definition of blunder.

1)It is based on the internal evaluation of the engine.

400 cp does not mean the same for 2 different engines and engines with smaller evaluation may get smaller scores and as a result have relatively less blunders and result closer to 0-0.

2)I believe that having a score of +200 cp that is supposed to be 2 pawns already suggest a serious blunder.
JJJ
Posts: 1346
Joined: Sat Apr 19, 2014 1:47 pm

Re: Blunder rates of top engines in the opening

Post by JJJ »

And how many blunder do they make with 0,5 sec per move ? Then 1 sec per move ? Then 3 sec per move ? Probably close to none at some point.

It could be interesting to know with 1 sec per move because some analyses are made with 1 sec per move.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Blunder rates of top engines in the opening

Post by Laskos »

Uri Blass wrote:I do not like the definition of blunder.

1)It is based on the internal evaluation of the engine.

400 cp does not mean the same for 2 different engines and engines with smaller evaluation may get smaller scores and as a result have relatively less blunders and result closer to 0-0.
Yes, engines with smaller evaluation will get results closer to 0-0 (both engines have to agree to 400cp or above). It is an effect one has to consider when comparing multiple engines, like in the large RR. But the effect is mild, I estimate it for (Wins - Losses) as no larger than a factor of 1.3 for the engines tested here. One can also take Wins/Losses, which is hardly affected by the varying evaluations of engines.

In head-to-head, however, Wins - Losses is not affected in its sign.
2)I believe that having a score of +200 cp that is supposed to be 2 pawns already suggest a serious blunder.
I just typed "blunder" in the title of the thread for brevity, it might be that even in 10-15 moves, one engine outplays the other by 30cp each move, or it might happen that one engine loses the game in one move.
Last edited by Laskos on Tue Oct 17, 2017 12:03 pm, edited 1 time in total.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Blunder rates of top engines in the opening

Post by Laskos »

JJJ wrote:And how many blunder do they make with 0,5 sec per move ? Then 1 sec per move ? Then 3 sec per move ? Probably close to none at some point.

It could be interesting to know with 1 sec per move because some analyses are made with 1 sec per move.
Sure you are right. But the results here can still be useful, as short time control matches can be useful to derive what happens at LTC, even if the level of playing is hundreds of ELO points different.