Bishop Verses Knight EPD test suite

jdart · Post by **jdart** » Wed Aug 12, 2009 5:46 pm

When you say "reliable" what do you mean? Is the "best" move significantly higher in score than other moves, and if so what score difference do you consider significant?

--Jon

plattyaj · Post by **plattyaj** » Wed Aug 12, 2009 6:22 pm

Schola 1.1.0 gets 59/100 @ 10 seconds (32 bit version on my "slow" 2Ghz T7300 laptop). Given it's very basic evaluation, that's probably a reasonable benchmark for a "starting" value for this test though.

Schola treats a Knight and a Bishop the same for the material score but does hold a bonus for retaining the bishop pair. It would be interesting to see if the ones it got right were biased towards the bishop pair or not.

Thanks Dann & Swami!

Andy.

Dann Corbit · Post by **Dann Corbit** » Wed Aug 12, 2009 7:59 pm

jdart wrote:When you say "reliable" what do you mean? Is the "best" move significantly higher in score than other moves, and if so what score difference do you consider significant?

--Jon

We usually consider 200 or so positions that initially seem plausible. For instance, the move may have been played by Rybka at slow time control and Rybka won the game. So we analyze all of the candidate positions for at least one hour each using Rybka on a 4 CPU machine. We also analyze with several other top engines. Then we analyze the move with Multi-PV set to 4. Often, we get alternative moves that are nearly as good or better and so we reject the position. Also, we frequently see busts when we extend the time to one hour. The ultimate best move has to match our initial best move, because these were carefully selected by Swaminathan for the theme. I store all of the positions in a SQL*Server database. All of the alternative evaluations are also stored.

So I have a stored procedure that examines the scores and the depths for all of the canidate moves. This procedure then creates ratio for the quality of the alternatives. This is what you see in the comment c0 in the Epd records. In this particular test set, the best alternative for any record is only 80% as good as the best move. This may seem a miniscule difference, but we are talking about positional test suites and such distinctive moves are actually quite difficult to produce.

In all, I guess that each test set requires about 1000 CPU hours to finalize.

So don't expect a new one tomorrow.

El Gringo · Post by **El Gringo** » Wed Aug 12, 2009 8:07 pm

Hi,

let's see how the big guys perform :

Q6600 @3Ghz , 10sec/move

Rybka 3 x64 2CPU : 93/100
Zappa Mexico II x64 2 CPU : 85/100
Naum 4 x64 2CPU : 82/100
Deep Shredder 11 x64 2CPU: 79/100

Best
Johan

jdart · Post by **jdart** » Wed Aug 12, 2009 11:44 pm

I appreciate all the effort you've put into this but I really don't place much stock in positional tests. Any good chess program has over 100 and many have quite a lot more positional factors that go into a score. If you have a relatively small difference between best and next best move then it's easy to get shifts off the best move by changing one or a few weighting terms. It doesn't mean you'll play worse overall.

Of course some of these tests might lead to a stronger advantage - for example a winning endgame. That's more clear cut and less sensitive to small changes in the eval function.

Jan Brouwer · Post by **Jan Brouwer** » Wed Aug 12, 2009 11:49 pm

After hunting around for some time through the menu's of Arena, I found "Automatic Analysis...":

Rotor 0.5: 69/100 (10 s/move - Celeron-M 1.3 GHz)

This seems only to count the first move, not the alternative moves.
Is there a possibility with Arena to count the alternative solutions as well?
And add the move scores (..=10, etc.) in the epd file?

Dann Corbit · Post by **Dann Corbit** » Thu Aug 13, 2009 12:28 am

jdart wrote:I appreciate all the effort you've put into this but I really don't place much stock in positional tests. Any good chess program has over 100 and many have quite a lot more positional factors that go into a score. If you have a relatively small difference between best and next best move then it's easy to get shifts off the best move by changing one or a few weighting terms. It doesn't mean you'll play worse overall.

Of course some of these tests might lead to a stronger advantage - for example a winning endgame. That's more clear cut and less sensitive to small changes in the eval function.

I do not know how much value they will provide for a chess engine.
The tests may be useful for engines, for humans, for neither or for both.

I guess that they will be most useful for people who have an evaluation that does not take these factors into account at all. In such cases, addition of an evaluation expression that considers the factors will make them stronger to solve these test suites. Whether or not it will make them stronger in game play is another matter.

They are also interesting for humans. I suspect that when they are all completed, they could be folded into one giant test set (so that the user does not know what sort of solution they are searching for -- which is a titanic hint) and then we can solve them by hand the same way that we solve other test sets.

Dann Corbit · Post by **Dann Corbit** » Thu Aug 13, 2009 12:30 am

Jan Brouwer wrote:After hunting around for some time through the menu's of Arena, I found "Automatic Analysis...":

Rotor 0.5: 69/100 (10 s/move - Celeron-M 1.3 GHz)

This seems only to count the first move, not the alternative moves.
Is there a possibility with Arena to count the alternative solutions as well?
And add the move scores (..=10, etc.) in the epd file?

At some point, I plan to modify Bruce Moreland's EPD analyzer program to handle it automatically. The program GradualTest can use the EPD files if you reformat them a bit with a SED script or what have you.

Someone did that for us with some of the earlier tests.

michiguel · Post by **michiguel** » Thu Aug 13, 2009 4:39 am

michiguel wrote:
michiguel wrote:
Dann Corbit wrote:Material imbalance is certainly one of important new ideas in chess programming.

Sometimes a knight is better and sometimes a bishop, though they have approximately the same point value in general.

When should you trade a knight for a bishop (and vice versa)?
What about the aftermath of the trade (e.g. with a pawn recapture, has the altered pawn structure changed the value of the trade)?

There is a new test suite in the "Strategic Test Suite" arsenal, the fifth of the series by Swaminathan. These positions have had the stuffings pounded out of them by powerful chess engines to ensure that the answers provided are reliable. Besides the key move, there is also a list of hints in the ce that shows the relative value of alternate move choices.

Want to give it a whirl? Take a look here, and help yourself! :
http://sites.google.com/site/strategict ... -vs-knight
Gaviota, "Flock version" 0.68.1
AMD 2x 2.4 Ghz (2 cores used), 16MB hash tables.

10 seconds/test

average nps: 572564
solved: 65
wrong : 35
ratio : 65.00%

Miguel
1 minute/ position

average nps: 602136
solved: 74
wrong : 26
ratio : 74.00%

Miguel

7 min/position

81 solved.

Sounds like there are some positions are quickly solved, ~20 may require some time, and the rest either you get it or not at all. I just put on my never ending to do list to study those positions to see if I see a pattern. It may be interesting.

Miguel

Edsel Apostol · Post by **Edsel Apostol** » Thu Aug 13, 2009 5:18 am

The latest experimental Twisted Logic version just scored 65/100 in a 5 sec/position on a single core of a Q8200 with 128Mb of hash. For comparison, Stockfish 1.4 single core scored 77/100.

Note that I have used Arena and it may only consider the best move and not the alternative ones.

Note for Swami, this is my testimonial:

The test suites are very interesting. My opinion is that this tests is good for tuning one's engine. I noticed that there is a positive correlation with higher score and engine strength.

Here are more test results from Twisted Logic and Stockfish:

Code: Select all


--------------------------------------------------------------------------------

Analysis from D&#58;\chess\tests\arena201\Tournaments\Epd\STS_Undermining.epd   
Analyzing engine&#58; T20090812
8/12/2009 9&#58;38&#58;56 PM Level&#58; 5 Seconds
76 of 100 matching moves

--------------------------------------------------------------------------------

Analysis from D&#58;\chess\tests\arena201\Tournaments\Epd\STS_OpenFilesDiags.epd   
Analyzing engine&#58; T20090812
8/12/2009 9&#58;46&#58;18 PM Level&#58; 5 Seconds
69 of 100 matching moves

--------------------------------------------------------------------------------

Analysis from D&#58;\chess\tests\arena201\Tournaments\Epd\STS_KnightOutposts.epd   
Analyzing engine&#58; T20090812
8/12/2009 9&#58;53&#58;26 PM Level&#58; 5 Seconds
73 of 100 matching moves

--------------------------------------------------------------------------------

Analysis from D&#58;\chess\tests\arena201\Tournaments\Epd\STS_SquareVacancy.epd   
Analyzing engine&#58; T20090812
8/12/2009 10&#58;00&#58;41 PM Level&#58; 5 Seconds
68 of 100 matching moves

--------------------------------------------------------------------------------

Analysis from D&#58;\chess\tests\arena201\Tournaments\Epd\STS_KnightvsBishop.epd   
Analyzing engine&#58; T20090812
8/12/2009 10&#58;08&#58;00 PM Level&#58; 5 Seconds
65 of 100 matching moves

--------------------------------------------------------------------------------

Analysis from D&#58;\chess\tests\arena201\Tournaments\Epd\STS_Undermining.epd   
Analyzing engine&#58; Stockfish_14_ja
8/12/2009 10&#58;23&#58;05 PM Level&#58; 5 Seconds
83 of 100 matching moves

--------------------------------------------------------------------------------

Analysis from D&#58;\chess\tests\arena201\Tournaments\Epd\STS_OpenFilesDiags.epd   
Analyzing engine&#58; Stockfish_14_ja
8/12/2009 10&#58;28&#58;38 PM Level&#58; 5 Seconds
77 of 100 matching moves

--------------------------------------------------------------------------------

Analysis from D&#58;\chess\tests\arena201\Tournaments\Epd\STS_KnightOutposts.epd   
Analyzing engine&#58; Stockfish_14_ja
8/12/2009 10&#58;34&#58;15 PM Level&#58; 5 Seconds
76 of 100 matching moves

--------------------------------------------------------------------------------

Analysis from D&#58;\chess\tests\arena201\Tournaments\Epd\STS_SquareVacancy.epd   
Analyzing engine&#58; Stockfish_14_ja
8/12/2009 10&#58;39&#58;56 PM Level&#58; 5 Seconds
81 of 100 matching moves

--------------------------------------------------------------------------------

Analysis from D&#58;\chess\tests\arena201\Tournaments\Epd\STS_KnightvsBishop.epd   
Analyzing engine&#58; Stockfish_14_ja
8/12/2009 10&#58;45&#58;50 PM Level&#58; 5 Seconds
77 of 100 matching moves

I have set abort analysis to true and minimum ply depth to 9 then it must at least show the best move for three ply before it would abort the search.

Bishop Verses Knight EPD test suite

Re: Bishop Verses Knight EPD test suite

Re: Bishop Verses Knight EPD test suite

Re: Bishop Verses Knight EPD test suite

Re: Bishop Verses Knight EPD test suite

Re: Bishop Verses Knight EPD test suite

Re: Bishop Verses Knight EPD test suite

Re: Bishop Verses Knight EPD test suite

Re: Bishop Verses Knight EPD test suite

Re: Bishop Verses Knight EPD test suite

Re: Bishop Verses Knight EPD test suite