Q for Dann and Swami

opraus · Post by **opraus** » Sat Oct 23, 2010 10:50 am

Thank you again, Dann.

Xpdnt is scoring somewhere around 77% over-all of STS @ 10s/pos
It has particular trouble with the pawn advances.

Antonio Torrecillas · Sat Oct 23, 2010 12:00 pm

I don't tune with this test suite, but I use it to spot bugs and regressions.

Last time I checked...
at 0.05 seconds -> 5015/6690
at 0.1 seconds -> 5204/6690
at 1 second -> 5320/6690

there is still room to improve !!

to fix a level for compare: Simplex 096 is at elo 2372 in CCRL.
let me know how yours perform.

Mark · Post by **Mark** » Sat Oct 23, 2010 12:48 pm

Dann Corbit wrote:
Mark wrote:
Antonio Torrecillas wrote:An opposed approach, may be better for this purpose.
Rather than challenging the engine with dificult test position.

use a simple test suite, which will test key elements of the engine.

Once I constructed a test suite as follow:
I walked the file twic663.pgn to get all position reached.
on these set I discarded each position where fruit does not
give the same answer for depth=3 and depth=6
Then I ran the test at 10" with fruit, crafty, rybka 1.0,
spike 1.2 and other engines I do not remember now.
The final set of 6690 positions where these engines agree
was very useful to detect errors in quiesce, misevaluation etc.
at least for my weak engine.

This approach but with shorter time control will do your business.

if somebody is interested you can find twic663.epd at http://sites.google.com/site/barajandotrebejos/

regards,Antonio.
Thanks for the link. Is there any easy way to convert the format of the bm to algebraic notation? (like Kc4 instead of Kd3-c4)
I made a translation of it.
Here it is:
http://cap.connx.com/chess-engines/new- ... 3a.epd.bz2

Thanks very much, Dann! I can run this test suite automatically in my program now.

Dann Corbit · Post by **Dann Corbit** » Sat Oct 23, 2010 2:07 pm

Mark wrote:
Dann Corbit wrote:
Mark wrote:
Antonio Torrecillas wrote:An opposed approach, may be better for this purpose.
Rather than challenging the engine with dificult test position.

use a simple test suite, which will test key elements of the engine.

Once I constructed a test suite as follow:
I walked the file twic663.pgn to get all position reached.
on these set I discarded each position where fruit does not
give the same answer for depth=3 and depth=6
Then I ran the test at 10" with fruit, crafty, rybka 1.0,
spike 1.2 and other engines I do not remember now.
The final set of 6690 positions where these engines agree
was very useful to detect errors in quiesce, misevaluation etc.
at least for my weak engine.

This approach but with shorter time control will do your business.

if somebody is interested you can find twic663.epd at http://sites.google.com/site/barajandotrebejos/

regards,Antonio.
Thanks for the link. Is there any easy way to convert the format of the bm to algebraic notation? (like Kc4 instead of Kd3-c4)
I made a translation of it.
Here it is:
http://cap.connx.com/chess-engines/new- ... 3a.epd.bz2
Thanks very much, Dann! I can run this test suite automatically in my program now.

I have over 98% of the test suite analyzed. Here is a version with all the positions that I have analyzed in it:
http://cap.connx.com/chess-engines/new- ... na.epd.bz2

There are 6609 out of 6690 positions analyzed. If your program picks a different best move, and it is not the suggested best move in the original, and yet it is the best move in the analyzed set, it should be considered as a good choice, since the analysis is usually good enough to form a decent opinion.

Dann Corbit · Post by **Dann Corbit** » Sat Oct 23, 2010 2:13 pm

Dann Corbit wrote:
Mark wrote:
Dann Corbit wrote:
Mark wrote:
Antonio Torrecillas wrote:An opposed approach, may be better for this purpose.
Rather than challenging the engine with dificult test position.

use a simple test suite, which will test key elements of the engine.

Once I constructed a test suite as follow:
I walked the file twic663.pgn to get all position reached.
on these set I discarded each position where fruit does not
give the same answer for depth=3 and depth=6
Then I ran the test at 10" with fruit, crafty, rybka 1.0,
spike 1.2 and other engines I do not remember now.
The final set of 6690 positions where these engines agree
was very useful to detect errors in quiesce, misevaluation etc.
at least for my weak engine.

This approach but with shorter time control will do your business.

if somebody is interested you can find twic663.epd at http://sites.google.com/site/barajandotrebejos/

regards,Antonio.
Thanks for the link. Is there any easy way to convert the format of the bm to algebraic notation? (like Kc4 instead of Kd3-c4)
I made a translation of it.
Here it is:
http://cap.connx.com/chess-engines/new- ... 3a.epd.bz2
Thanks very much, Dann! I can run this test suite automatically in my program now.
I have over 98% of the test suite analyzed. Here is a version with all the positions that I have analyzed in it:
http://cap.connx.com/chess-engines/new- ... na.epd.bz2

There are 6609 out of 6690 positions analyzed. If your program picks a different best move, and it is not the suggested best move in the original, and yet it is the best move in the analyzed set, it should be considered as a good choice, since the analysis is usually good enough to form a decent opinion.

There are also 286 positions in that set that have enhanced pm information, such as the following:

[d]rnbqk2r/ppppppbp/5np1/8/2P5/2N2N2/PP1PPPPP/R1BQKB1R w KQkq - ce 56; acd 29; pv d4 d5 cxd5 Nxd5 Qb3 Nxc3 bxc3 O-O e3 Nc6 Bd3 Na5 Qc2 Bg4 Ba3 Re8 Nd2 e5 O-O b6 Rae1 Be6 f4 exd4 exd4 Bd5 f5 Qh4 Be4 Bxe4 Rxe4 Rxe4 Qxe4 Qxe4 Nxe4; pm g3 {1398} e4 {703} d4 {54} e3 {17} d3 {9} b3 {2} h3 {1} ; bm d4; id "ATG_Reg_18-126";

For this position, from a collection of high quality games, 1398 players/engines took move choice g3 (so it is probably an excellent choice) and 703 took move choice e4, while 54 took d4. Despite the low frequency of participants choosing d4, it is also probably a pretty good move because it has been verified with a 29 ply search by a very strong chess engine.

Mark · Post by **Mark** » Sun Oct 24, 2010 3:38 pm

Still trying to figure things out as far as testing goes, but I think the test suite can be shortened quite a bit.

Some background regarding my engine which is still a work in progress. I'm not a programmer by any stretch of the imagination, so progress is slow. It's a basic recursive alpha-beta with null move, no hash tables (haven't figured them out yet), quies currently just extends one ply if last move was a capture or promotion, and the only eval so far is material and piece square tables. Just finished computing Perft 9 = 2439530234167, so I'm pretty happy about that.

With this very minimal setup, at 1 second/move I get 3768/6690. And with just material eval, I get 3312/6690. I'm thinking that you could just remove the 3300 or so positions that are easily solved with just material only eval. Would there be any reason to keep these positions in the test suite?

Antonio Torrecillas · Post by **Antonio Torrecillas** » Sun Oct 24, 2010 4:50 pm

This test suite was built at September 2007.The main idea was to have a group of positions properly labelled,where in the worst case I would not have to debug the search beyond ply 3-4.It is probably true that it need a refresh, clean up and improvement. (It seems that Dann is already in this work

)

It is true that some can be solved only with material, I would even say that some can be solved without evaluation.If you order moves by SEE you can resolve basic cases of recaptures/quiesce .

My engine uses a phasing 0-32 and what I did was to separate the test for each phase.so that I could control how a change is affecting the results in each stage.For this type of operation, redundancy is useful.

Try to think on this as a stock/database of _easy_to_debug_ positions correctly labelled (I hope).You can always filter by a criteria to match your needs.

Mark · Post by **Mark** » Sun Oct 24, 2010 8:19 pm

Antonio Torrecillas wrote:This test suite was built at September 2007.The main idea was to have a group of positions properly labelled,where in the worst case I would not have to debug the search beyond ply 3-4.It is probably true that it need a refresh, clean up and improvement. (It seems that Dann is already in this work )

It is true that some can be solved only with material, I would even say that some can be solved without evaluation.If you order moves by SEE you can resolve basic cases of recaptures/quiesce .

My engine uses a phasing 0-32 and what I did was to separate the test for each phase.so that I could control how a change is affecting the results in each stage.For this type of operation, redundancy is useful.

Try to think on this as a stock/database of _easy_to_debug_ positions correctly labelled (I hope).You can always filter by a criteria to match your needs.

Yes, thanks. It is a useful set of positions. Especially since I need a rather easy set!

swami · Post by **swami** » Wed Oct 27, 2010 2:06 am

opraus wrote:
Dann Corbit wrote: So which inferior answer do we choose?
The better ones that Rybka, Zappa, and Stockfish come up with, rather than the really stupid one that Xpdnt comes up with.

Thank you (and Swami), BTW for STS. They are very helpful indeed.

Sorry for late reply, David. I didn't see this thread until later as I was busy the past few weeks. I see that Dann has already answered to this.

Always good to hear that you enjoy these test positions. Hope it helps in tuning and optimizing.

Keep up the good work!

Best wishes,
Swami

Q for Dann and Swami

Re: Q for Dann and Swami

Re: Q for Dann and Swami

Re: Q for Dann and Swami

Re: Q for Dann and Swami

Re: Q for Dann and Swami

Re: Q for Dann and Swami

Re: Q for Dann and Swami

Re: Q for Dann and Swami

Re: Q for Dann and Swami