In response to Uri's thread about positional understanding

Laskos · Post by **Laskos** » Wed Feb 07, 2018 12:30 pm

In this thread
http://talkchess.com/forum/viewtopic.php?t=66530
Uri showed some Stockfish performance in opening/early midgame, but just for one position.

In the past, I tried to build a positional test-suite for openings, with no tactical shots. I used: human and engine databases CB MegaBase 2017, CB Live Book, online Chess Tempo database and Noomen.ctg opening book, and several engines analyzing.
My last version was Openings200beat07.epd is listed here:
http://talkchess.com/forum/viewtopic.ph ... t&start=49

Although Dann Corbit shows that dozens of positions are not solved by corroborated engines even at long TC, I am pretty satisfied by this positional opening suite containing 200 positions. I estimate that out of the 200 positions, about 15 have wrong solutions, and about 15 are too hard for any current engine at any time control.

I tested in Polyglot with these settings:

Code: Select all

-min-depth 3 -max-depth 99 -min-time 0.1 -max-time 5 -depth-delta 2

So, the maximum allowed time is 5s/position. The engines are on 4 threads with 512MB Hash. Speed of PC is of 3.8GHz modern i7.
I used 1000 positions suite, these 200 openings repeated 5 times, to dampen statistical noise, and by jackknifing, I get that 1 standard deviation of the difference in results is about 9-10 points out of 1000.

The results:

Code: Select all

Komodo 11.2.2
score=666/1000 &#91;averages on correct positions&#58; depth=13.2 time=0.71 nodes=3853146&#93;

Houdini 6.03
score=656/1000 &#91;averages on correct positions&#58; depth=14.3 time=0.87 nodes=6708492&#93;

Stockfish 9
score=641/1000 &#91;averages on correct positions&#58; depth=14.0 time=0.74 nodes=4793389&#93;

Andscacs 0.93
score=598/1000 &#91;averages on correct positions&#58; depth=12.2 time=0.69 nodes=3202933&#93;

Shredder 13
score=573/1000 &#91;averages on correct positions&#58; depth=14.3 time=0.79 nodes=4678511&#93;

Although for the first place there is a tight battle, Komodo 11.2.2 seems almost certainly performing better here than Stockfish 9. Also, Andscacs 0.93 seems to perform significantly better than Shredder 13.

Werewolf · Post by **Werewolf** » Wed Feb 07, 2018 1:40 pm

What was the score of Stockfish 8?

I'd love to see results of others, such as Giraffe, HIARCS, Junior etc, but SF8 would be very interesting.

Laskos · Post by **Laskos** » Wed Feb 07, 2018 3:56 pm

Werewolf wrote:What was the score of Stockfish 8?

I'd love to see results of others, such as Giraffe, HIARCS, Junior etc, but SF8 would be very interesting.

Here is with Stockfish 8 included. I might test some other engines, if they comply with Polyglot.

Code: Select all

Komodo 11.2.2 
score=666/1000 &#91;averages on correct positions&#58; depth=13.2 time=0.71 nodes=3853146&#93; 

Houdini 6.03 
score=656/1000 &#91;averages on correct positions&#58; depth=14.3 time=0.87 nodes=6708492&#93; 

Stockfish 9 
score=641/1000 &#91;averages on correct positions&#58; depth=14.0 time=0.74 nodes=4793389&#93; 

Stockfish 8
score=628/1000 &#91;averages on correct positions&#58; depth=13.9 time=0.80 nodes=4802227&#93;

Andscacs 0.93 
score=598/1000 &#91;averages on correct positions&#58; depth=12.2 time=0.69 nodes=3202933&#93; 

Shredder 13 
score=573/1000 &#91;averages on correct positions&#58; depth=14.3 time=0.79 nodes=4678511&#93;

Dicaste · Post by **Dicaste** » Wed Feb 07, 2018 5:13 pm

RomiChess would be cool too.

Laskos · Post by **Laskos** » Wed Feb 07, 2018 6:53 pm

Dicaste wrote:RomiChess would be cool too.

Got what seem to be the latest RomiChess and Giraffe. I am not sure they obey literally the Polyglot commands, and they run only on one thread, so their result is deflated to other engines using 4 threads. I also included the latest Texel.

Code: Select all

Komodo 11.2.2 
score=666/1000 &#91;averages on correct positions&#58; depth=13.2 time=0.71 nodes=3853146&#93; 

Houdini 6.03 
score=656/1000 &#91;averages on correct positions&#58; depth=14.3 time=0.87 nodes=6708492&#93; 

Stockfish 9 
score=641/1000 &#91;averages on correct positions&#58; depth=14.0 time=0.74 nodes=4793389&#93; 

Stockfish 8 
score=628/1000 &#91;averages on correct positions&#58; depth=13.9 time=0.80 nodes=4802227&#93; 

Andscacs 0.93 
score=598/1000 &#91;averages on correct positions&#58; depth=12.2 time=0.69 nodes=3202933&#93; 

Shredder 13 
score=573/1000 &#91;averages on correct positions&#58; depth=14.3 time=0.79 nodes=4678511&#93;

Texel 1.08a8
score=489/1000 &#91;averages on correct positions&#58; depth=10.3 time=0.53 nodes=3053861&#93;

Giraffe
score=410/1000 &#91;averages on correct positions&#58; depth=10.0 time=0.68 nodes=167994&#93;

RomiChessP3n default
score=392/1000 &#91;averages on correct positions&#58; depth=11.7 time=0.88 nodes=4934412&#93;

matejst · Post by **matejst** » Wed Feb 07, 2018 7:52 pm

Thanks, Kai. Very interesting testing like always.

Could you write what version of Giraffe you used?

Then, there are a few engines I believe play a good positional brand of chess, so if you could test engines like Wasp and iCE, I would be very grateful.

Finally, did you test engines understanding of endings? I noticed that there is a trend of removing endgame knowledge lately, and I would be very interested in your findings.

zenpawn · Post by **zenpawn** » Wed Feb 07, 2018 8:21 pm

Did you end up taking any of Dann's alternate (better?) best moves into account? (The other thread ends after his findings.) If so, do you have an updated suite? Thanks.

Laskos · Post by **Laskos** » Wed Feb 07, 2018 9:34 pm

zenpawn wrote:Did you end up taking any of Dann's alternate (better?) best moves into account? (The other thread ends after his findings.) If so, do you have an updated suite? Thanks.

I believe I have a beta08 suite somewhere, which takes some inspiration from Dann's analysis, but I didn't like it. I believe 69 is a way too high number of wrong proposed solutions, top 3 engines (each one of them) solve 160+ positions from 200 at some 5 min/move (not all of positions the same for each engine), and generally I do not trust too much computer analysis in openings, where I mostly used large databases of human and computer games and had a reasonable statistic of outcomes. Seems more reliable in this case, and I kept beta07 as my reference as of now.

Laskos · Post by **Laskos** » Wed Feb 07, 2018 9:40 pm

matejst wrote:Thanks, Kai. Very interesting testing like always.

Could you write what version of Giraffe you used?

Then, there are a few engines I believe play a good positional brand of chess, so if you could test engines like Wasp and iCE, I would be very grateful.

Finally, did you test engines understanding of endings? I noticed that there is a trend of removing endgame knowledge lately, and I would be very interested in your findings.

Giraffe_161023_x64
Is this the last one? I don't know of a newer one.

I will try to add some other results.

I had in the past some endgame results, but I believe I lost them. Some tests would be easy to perform, as I have 6-men Syzygy on SSD and many easy and hard suites of 5- and 6-men positions.

carldaman · Post by **carldaman** » Wed Feb 07, 2018 10:16 pm

Laskos wrote:
matejst wrote:Thanks, Kai. Very interesting testing like always.

Could you write what version of Giraffe you used?

Then, there are a few engines I believe play a good positional brand of chess, so if you could test engines like Wasp and iCE, I would be very grateful.

Finally, did you test engines understanding of endings? I noticed that there is a trend of removing endgame knowledge lately, and I would be very interested in your findings.
Giraffe_161023_x64
Is this the last one? I don't know of a newer one.

I will try to add some other results.

I had in the past some endgame results, but I believe I lost them. Some tests would be easy to perform, as I have 6-men Syzygy on SSD and many easy and hard suites of 5- and 6-men positions.

Giraffe_161023_x64, although the latest released, is known to be weak/buggy. The strongest Giraffe is (from) 20150908.

CL

In response to Uri's thread about positional understanding

In response to Uri's thread about positional understanding

Re: In response to Uri's thread about positional understandi

Re: In response to Uri's thread about positional understandi

Re: In response to Uri's thread about positional understandi

Re: In response to Uri's thread about positional understandi

Re: In response to Uri's thread about positional understandi

Re: In response to Uri's thread about positional understandi

Re: In response to Uri's thread about positional understandi

Re: In response to Uri's thread about positional understandi

Re: In response to Uri's thread about positional understandi