Thank you again, Dann.
Xpdnt is scoring somewhere around 77% over-all of STS @ 10s/pos
It has particular trouble with the pawn advances.
Q for Dann and Swami
Moderators: hgm, Rebel, chrisw
-
- Posts: 166
- Joined: Wed Mar 08, 2006 9:49 pm
- Location: S. New Jersey, USA
-
- Posts: 90
- Joined: Sun Nov 02, 2008 4:43 pm
- Location: Barcelona
Re: Q for Dann and Swami
I don't tune with this test suite, but I use it to spot bugs and regressions.
Last time I checked...
at 0.05 seconds -> 5015/6690
at 0.1 seconds -> 5204/6690
at 1 second -> 5320/6690
there is still room to improve !!
to fix a level for compare: Simplex 096 is at elo 2372 in CCRL.
let me know how yours perform.
Last time I checked...
at 0.05 seconds -> 5015/6690
at 0.1 seconds -> 5204/6690
at 1 second -> 5320/6690
there is still room to improve !!
to fix a level for compare: Simplex 096 is at elo 2372 in CCRL.
let me know how yours perform.
-
- Posts: 216
- Joined: Thu Mar 09, 2006 9:54 pm
Re: Q for Dann and Swami
Thanks very much, Dann! I can run this test suite automatically in my program now.Dann Corbit wrote:I made a translation of it.Mark wrote:Thanks for the link. Is there any easy way to convert the format of the bm to algebraic notation? (like Kc4 instead of Kd3-c4)Antonio Torrecillas wrote:An opposed approach, may be better for this purpose.
Rather than challenging the engine with dificult test position.
use a simple test suite, which will test key elements of the engine.
Once I constructed a test suite as follow:
I walked the file twic663.pgn to get all position reached.
on these set I discarded each position where fruit does not
give the same answer for depth=3 and depth=6
Then I ran the test at 10" with fruit, crafty, rybka 1.0,
spike 1.2 and other engines I do not remember now.
The final set of 6690 positions where these engines agree
was very useful to detect errors in quiesce, misevaluation etc.
at least for my weak engine.
This approach but with shorter time control will do your business.
if somebody is interested you can find twic663.epd at http://sites.google.com/site/barajandotrebejos/
regards,Antonio.
Here it is:
http://cap.connx.com/chess-engines/new- ... 3a.epd.bz2
-
- Posts: 12540
- Joined: Wed Mar 08, 2006 8:57 pm
- Location: Redmond, WA USA
Re: Q for Dann and Swami
I have over 98% of the test suite analyzed. Here is a version with all the positions that I have analyzed in it:Mark wrote:Thanks very much, Dann! I can run this test suite automatically in my program now.Dann Corbit wrote:I made a translation of it.Mark wrote:Thanks for the link. Is there any easy way to convert the format of the bm to algebraic notation? (like Kc4 instead of Kd3-c4)Antonio Torrecillas wrote:An opposed approach, may be better for this purpose.
Rather than challenging the engine with dificult test position.
use a simple test suite, which will test key elements of the engine.
Once I constructed a test suite as follow:
I walked the file twic663.pgn to get all position reached.
on these set I discarded each position where fruit does not
give the same answer for depth=3 and depth=6
Then I ran the test at 10" with fruit, crafty, rybka 1.0,
spike 1.2 and other engines I do not remember now.
The final set of 6690 positions where these engines agree
was very useful to detect errors in quiesce, misevaluation etc.
at least for my weak engine.
This approach but with shorter time control will do your business.
if somebody is interested you can find twic663.epd at http://sites.google.com/site/barajandotrebejos/
regards,Antonio.
Here it is:
http://cap.connx.com/chess-engines/new- ... 3a.epd.bz2
http://cap.connx.com/chess-engines/new- ... na.epd.bz2
There are 6609 out of 6690 positions analyzed. If your program picks a different best move, and it is not the suggested best move in the original, and yet it is the best move in the analyzed set, it should be considered as a good choice, since the analysis is usually good enough to form a decent opinion.
-
- Posts: 12540
- Joined: Wed Mar 08, 2006 8:57 pm
- Location: Redmond, WA USA
Re: Q for Dann and Swami
There are also 286 positions in that set that have enhanced pm information, such as the following:Dann Corbit wrote:I have over 98% of the test suite analyzed. Here is a version with all the positions that I have analyzed in it:Mark wrote:Thanks very much, Dann! I can run this test suite automatically in my program now.Dann Corbit wrote:I made a translation of it.Mark wrote:Thanks for the link. Is there any easy way to convert the format of the bm to algebraic notation? (like Kc4 instead of Kd3-c4)Antonio Torrecillas wrote:An opposed approach, may be better for this purpose.
Rather than challenging the engine with dificult test position.
use a simple test suite, which will test key elements of the engine.
Once I constructed a test suite as follow:
I walked the file twic663.pgn to get all position reached.
on these set I discarded each position where fruit does not
give the same answer for depth=3 and depth=6
Then I ran the test at 10" with fruit, crafty, rybka 1.0,
spike 1.2 and other engines I do not remember now.
The final set of 6690 positions where these engines agree
was very useful to detect errors in quiesce, misevaluation etc.
at least for my weak engine.
This approach but with shorter time control will do your business.
if somebody is interested you can find twic663.epd at http://sites.google.com/site/barajandotrebejos/
regards,Antonio.
Here it is:
http://cap.connx.com/chess-engines/new- ... 3a.epd.bz2
http://cap.connx.com/chess-engines/new- ... na.epd.bz2
There are 6609 out of 6690 positions analyzed. If your program picks a different best move, and it is not the suggested best move in the original, and yet it is the best move in the analyzed set, it should be considered as a good choice, since the analysis is usually good enough to form a decent opinion.
[d]rnbqk2r/ppppppbp/5np1/8/2P5/2N2N2/PP1PPPPP/R1BQKB1R w KQkq - ce 56; acd 29; pv d4 d5 cxd5 Nxd5 Qb3 Nxc3 bxc3 O-O e3 Nc6 Bd3 Na5 Qc2 Bg4 Ba3 Re8 Nd2 e5 O-O b6 Rae1 Be6 f4 exd4 exd4 Bd5 f5 Qh4 Be4 Bxe4 Rxe4 Rxe4 Qxe4 Qxe4 Nxe4; pm g3 {1398} e4 {703} d4 {54} e3 {17} d3 {9} b3 {2} h3 {1} ; bm d4; id "ATG_Reg_18-126";
For this position, from a collection of high quality games, 1398 players/engines took move choice g3 (so it is probably an excellent choice) and 703 took move choice e4, while 54 took d4. Despite the low frequency of participants choosing d4, it is also probably a pretty good move because it has been verified with a 29 ply search by a very strong chess engine.
-
- Posts: 216
- Joined: Thu Mar 09, 2006 9:54 pm
Re: Q for Dann and Swami
Still trying to figure things out as far as testing goes, but I think the test suite can be shortened quite a bit.
Some background regarding my engine which is still a work in progress. I'm not a programmer by any stretch of the imagination, so progress is slow. It's a basic recursive alpha-beta with null move, no hash tables (haven't figured them out yet), quies currently just extends one ply if last move was a capture or promotion, and the only eval so far is material and piece square tables. Just finished computing Perft 9 = 2439530234167, so I'm pretty happy about that.
With this very minimal setup, at 1 second/move I get 3768/6690. And with just material eval, I get 3312/6690. I'm thinking that you could just remove the 3300 or so positions that are easily solved with just material only eval. Would there be any reason to keep these positions in the test suite?
Some background regarding my engine which is still a work in progress. I'm not a programmer by any stretch of the imagination, so progress is slow. It's a basic recursive alpha-beta with null move, no hash tables (haven't figured them out yet), quies currently just extends one ply if last move was a capture or promotion, and the only eval so far is material and piece square tables. Just finished computing Perft 9 = 2439530234167, so I'm pretty happy about that.
With this very minimal setup, at 1 second/move I get 3768/6690. And with just material eval, I get 3312/6690. I'm thinking that you could just remove the 3300 or so positions that are easily solved with just material only eval. Would there be any reason to keep these positions in the test suite?
-
- Posts: 90
- Joined: Sun Nov 02, 2008 4:43 pm
- Location: Barcelona
Re: Q for Dann and Swami
This test suite was built at September 2007.The main idea was to have a group of positions properly labelled,where in the worst case I would not have to debug the search beyond ply 3-4.It is probably true that it need a refresh, clean up and improvement. (It seems that Dann is already in this work )
It is true that some can be solved only with material, I would even say that some can be solved without evaluation.If you order moves by SEE you can resolve basic cases of recaptures/quiesce .
My engine uses a phasing 0-32 and what I did was to separate the test for each phase.so that I could control how a change is affecting the results in each stage.For this type of operation, redundancy is useful.
Try to think on this as a stock/database of _easy_to_debug_ positions correctly labelled (I hope).You can always filter by a criteria to match your needs.
It is true that some can be solved only with material, I would even say that some can be solved without evaluation.If you order moves by SEE you can resolve basic cases of recaptures/quiesce .
My engine uses a phasing 0-32 and what I did was to separate the test for each phase.so that I could control how a change is affecting the results in each stage.For this type of operation, redundancy is useful.
Try to think on this as a stock/database of _easy_to_debug_ positions correctly labelled (I hope).You can always filter by a criteria to match your needs.
-
- Posts: 216
- Joined: Thu Mar 09, 2006 9:54 pm
Re: Q for Dann and Swami
Yes, thanks. It is a useful set of positions. Especially since I need a rather easy set!Antonio Torrecillas wrote:This test suite was built at September 2007.The main idea was to have a group of positions properly labelled,where in the worst case I would not have to debug the search beyond ply 3-4.It is probably true that it need a refresh, clean up and improvement. (It seems that Dann is already in this work )
It is true that some can be solved only with material, I would even say that some can be solved without evaluation.If you order moves by SEE you can resolve basic cases of recaptures/quiesce .
My engine uses a phasing 0-32 and what I did was to separate the test for each phase.so that I could control how a change is affecting the results in each stage.For this type of operation, redundancy is useful.
Try to think on this as a stock/database of _easy_to_debug_ positions correctly labelled (I hope).You can always filter by a criteria to match your needs.
-
- Posts: 6640
- Joined: Thu Mar 09, 2006 4:21 am
Re: Q for Dann and Swami
Sorry for late reply, David. I didn't see this thread until later as I was busy the past few weeks. I see that Dann has already answered to this.opraus wrote:The better ones that Rybka, Zappa, and Stockfish come up with, rather than the really stupid one that Xpdnt comes up with.Dann Corbit wrote: So which inferior answer do we choose?
Thank you (and Swami), BTW for STS. They are very helpful indeed.
Always good to hear that you enjoy these test positions. Hope it helps in tuning and optimizing.
Keep up the good work!
Best wishes,
Swami