STS results and errata

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

jesper_nielsen

STS results and errata

Post by jesper_nielsen »

Hi Again!

Here are some more STS results from Pupsi2 0.07 on a 1.86 GHz Intel Pentium(R) M laptop.

Fast = 10 seconds per position.
Medium = 1 minute per position.
Slow = 7 minutes per position.

Code: Select all

Solved
       Fast Medium Slow
1.0    57    67    75
2.0    68    79    78
3.0    54    62    71
4.0    64    71    --
5.0    56    61    --

Points
       Fast Medium Slow
1.0    656   724   809
2.0    745   846   863
3.0    664   736   802
4.0    745   806   ---
5.0    629   687   ---
Note: The number of solved are the number of 10 point moves selected.

Weird that the 2.0 solved actually goes down with more time. But the points still go up! I wander how that should be interpreted. :) To bad I already proclaimed the 2.0 as Pupsi's favorite test set! :)

STS2.0 where the points awarded do not match the "bm" tag:

Code: Select all

2b2n2/2q2p1k/1r6/2pPP1QB/2p1P3/2P4P/1P4P1/2B3K1 b - - bm Ra6; c0 "Rb6g6=10, Kh7h8=3, Qc7b7=5, Rb6a6=4"; id "Open Files & Diagnonals.017";
2r3k1/p1pb2pp/3p1p1r/1p6/4P2q/1PQB2R1/PBP1N1P1/6K1 w - - bm Bc1; c0 "Qa5=10, Bc1=10"; id "Open Files & Diagnonals.026"; 
3r3k/p1q2ppp/5b2/3Pp3/2p4P/P4BQ1/1PP3P1/1K5R w - - bm Be4; c0 "Rh1f1=10, Bf3e4=10, Rh1e1=4, h4h5=1"; id "Open Files & Diagnonals.042"; 
q4rk1/5ppp/p1r4b/2p1p3/Q3P2P/1R4P1/5PK1/3R1B2 w - - bm Bc4; c0 "Rd7=10, Bc4=7, Rf3=3"; id "Open Files & Diagnonals.081"; 
r6k/p3Npb1/q5pp/1p2p3/1B6/PPP3P1/3R1P1P/3R2K1 b - - bm Qe6; c0 "Qf6=10, Qb7=8, Qe6=10"; id "Open Files & Diagnonals.098"; 
Kind regards,
Jesper
swami
Posts: 6662
Joined: Thu Mar 09, 2006 4:21 am

Re: STS results and errata

Post by swami »

It's good to know that Pupsi is real good in open files and diagonals, I reckon that's the toughest of all the test suites for most of the engines.

But STS 5.0 Bishop vs Knight happen to be the easiest amongst all test suites for many top engines. Pupsi does the toughest very well but the easiest mediocre :)

STS 4.0 Square vacancy is 2nd toughest of all test suites for many engines but according to Pupsi it's the 2nd easiest... also interesting is that STS 3.0 Knight outposts is 2nd easiest for many engines, but according to Pupsi, it's the 2nd toughest :)


My observation about Pupsi's results:

I think changing values for Bishop and Knight might help a big deal or implementing code that has something to do with Knight (that's where I spot the weakness lies). It's the essential knowledge. Pupsi seems to be weak at handling the knight however.

Pupsi has exceptional understanding of opening up files and diagonals for Rook, Queen and Bishop, mobility, posting queens on vacant squares etc


Thanks for the bug reports on STS 2.0, Do you happen to notice any bugs in STS 3.0, 4.0 and 5.0? Points distribution in thousand points scale is also interesting.

Regards,
Swami
jesper_nielsen

Re: STS results and errata

Post by jesper_nielsen »

Thanks for your observations!

I will definitely look into the knight evaluation of Pupsi. I desperately need a positive result soon! :) I have tried many, many things but non that would give any kind of improvement.

I have not noticed any "best move vs points" bugs in the 3.0 set. And I will go through the 4.0 and 5.0 when the slow results are done.

Kind regards,
Jesper
jesper_nielsen

Re: STS results and errata

Post by jesper_nielsen »

STS 4.0 results are done.

Pupsi2 0.07 on a 1.86 GHz Intel Pentium(R) M laptop.

Fast = 10 seconds per position.
Medium = 1 minute per position.
Slow = 7 minutes per position.

Code:

Solved
Fast Medium Slow
1.0: 57 67 75
2.0: 68 79 78
3.0: 54 62 71
4.0: 64 71 73
5.0: 56 61 --

Points
Fast Medium Slow
1.0: 656 724 809
2.0: 745 846 863
3.0: 664 736 802
4.0: 745 806 829
5.0: 629 687 ---



Note: The number of solved are the number of 10 point moves selected.

I could not find any point vs best move problems in STS 4.0.

Kind regards,
Jesper
swami
Posts: 6662
Joined: Thu Mar 09, 2006 4:21 am

Re: STS results and errata

Post by swami »

jesper_nielsen wrote:STS 4.0 results are done.

Pupsi2 0.07 on a 1.86 GHz Intel Pentium(R) M laptop.

Fast = 10 seconds per position.
Medium = 1 minute per position.
Slow = 7 minutes per position.


Solved
Fast Medium Slow
1.0: 57 67 75
2.0: 68 79 78
3.0: 54 62 71
4.0: 64 71 73
5.0: 56 61 --
Slow - Fast:

Improvement in

1.0 = 18
2.0 = 10
3.0 = 17
4.0 = 9

Pupsi needs to speed up thinking time on v1.0 and v3.0 to be able to solve the positions quickly.

v4.0 score range seems to be consistent but it could be improved on longer time controls. Puspi has shown to possess good understanding of open, files and diagonals, on par with Spike, Rybka, Bright, Glaurung etc
I could not find any point vs best move problems in STS 4.0.
Thanks for trying to find bugs in the tests. Your inputs are much appreciated.
swami
Posts: 6662
Joined: Thu Mar 09, 2006 4:21 am

Re: STS results and errata

Post by swami »

jesper_nielsen wrote:Thanks for your observations!

I will definitely look into the knight evaluation of Pupsi. I desperately need a positive result soon! :) I have tried many, many things but non that would give any kind of improvement.

I have not noticed any "best move vs points" bugs in the 3.0 set. And I will go through the 4.0 and 5.0 when the slow results are done.

Kind regards,
Jesper
Pupsi will improve a great deal if it manages to handle Knight well. Results have clearly shown that it's weakness lies in Knight management. I hope you find ways to improve the code regarding placement of knights on good squares, reposition the knight when it's right time (meaning the position is closed rather than open), justified trade of Knight for Bishop etc.

Best Regards.
Swami
jesper_nielsen

Re: STS results and errata

Post by jesper_nielsen »

swami wrote:
jesper_nielsen wrote:Thanks for your observations!

I will definitely look into the knight evaluation of Pupsi. I desperately need a positive result soon! :) I have tried many, many things but non that would give any kind of improvement.

I have not noticed any "best move vs points" bugs in the 3.0 set. And I will go through the 4.0 and 5.0 when the slow results are done.

Kind regards,
Jesper
Pupsi will improve a great deal if it manages to handle Knight well. Results have clearly shown that it's weakness lies in Knight management. I hope you find ways to improve the code regarding placement of knights on good squares, reposition the knight when it's right time (meaning the position is closed rather than open), justified trade of Knight for Bishop etc.

Best Regards.
Swami
I am going to run some tests adjusting the knight value and perhaps the bishop pair bonus this weekend.

I have tried adding outpost bonus for knights, but that actually seemed to hurt performance.

But the knight needs looking into!

Keep up the good work!

Kind regards,
Jesper
swami
Posts: 6662
Joined: Thu Mar 09, 2006 4:21 am

Re: STS results and errata

Post by swami »

jesper_nielsen wrote:I am going to run some tests adjusting the knight value and perhaps the bishop pair bonus this weekend.
While you're at it, you may find this resource useful:

http://home.comcast.net/~danheisman/Art ... alance.htm

It may give you some ideas regarding the material values.

Also, repositioning of knight in closed position is actually tougher to tackle than outposts. Bishop - Knight trade off is easy to improve I think. :)

Good luck and have fun experimenting the changes this weekend!

Best Regards,
Swami
jesper_nielsen

Re: STS results and errata

Post by jesper_nielsen »

STS 5.0 results are done.

Pupsi2 0.07 on a 1.86 GHz Intel Pentium(R) M laptop.

Fast = 10 seconds per position.
Medium = 1 minute per position.
Slow = 7 minutes per position.

Solved

Code: Select all

Fast Medium Slow 
1.0: 57 67 75 
2.0: 68 79 78 
3.0: 54 62 71 
4.0: 64 71 73 
5.0: 56 61 66 
Points

Code: Select all

Fast Medium Slow 
1.0: 656 724 809 
2.0: 745 846 863 
3.0: 664 736 802 
4.0: 745 806 829 
5.0: 629 687 737 

Note: The number of solved are the number of 10 point moves selected.

I could not find any point vs best move problems in STS 5.0 either.

I already have the Larry Kaufman material evaluation values. So I think it must be something else messing up the knight evaluation?!


Kind regards,
Jesper
Dann Corbit
Posts: 12791
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: STS results and errata

Post by Dann Corbit »

jesper_nielsen wrote:Hi Again!

Here are some more STS results from Pupsi2 0.07 on a 1.86 GHz Intel Pentium(R) M laptop.

Fast = 10 seconds per position.
Medium = 1 minute per position.
Slow = 7 minutes per position.

Code: Select all

Solved
       Fast Medium Slow
1.0    57    67    75
2.0    68    79    78
3.0    54    62    71
4.0    64    71    --
5.0    56    61    --

Points
       Fast Medium Slow
1.0    656   724   809
2.0    745   846   863
3.0    664   736   802
4.0    745   806   ---
5.0    629   687   ---
Note: The number of solved are the number of 10 point moves selected.

Weird that the 2.0 solved actually goes down with more time. But the points still go up! I wander how that should be interpreted. :) To bad I already proclaimed the 2.0 as Pupsi's favorite test set! :)

STS2.0 where the points awarded do not match the "bm" tag:

Code: Select all

2b2n2/2q2p1k/1r6/2pPP1QB/2p1P3/2P4P/1P4P1/2B3K1 b - - bm Ra6; c0 "Rb6g6=10, Kh7h8=3, Qc7b7=5, Rb6a6=4"; id "Open Files & Diagnonals.017";
2r3k1/p1pb2pp/3p1p1r/1p6/4P2q/1PQB2R1/PBP1N1P1/6K1 w - - bm Bc1; c0 "Qa5=10, Bc1=10"; id "Open Files & Diagnonals.026"; 
3r3k/p1q2ppp/5b2/3Pp3/2p4P/P4BQ1/1PP3P1/1K5R w - - bm Be4; c0 "Rh1f1=10, Bf3e4=10, Rh1e1=4, h4h5=1"; id "Open Files & Diagnonals.042"; 
q4rk1/5ppp/p1r4b/2p1p3/Q3P2P/1R4P1/5PK1/3R1B2 w - - bm Bc4; c0 "Rd7=10, Bc4=7, Rf3=3"; id "Open Files & Diagnonals.081"; 
r6k/p3Npb1/q5pp/1p2p3/1B6/PPP3P1/3R1P1P/3R2K1 b - - bm Qe6; c0 "Qf6=10, Qb7=8, Qe6=10"; id "Open Files & Diagnonals.098"; 
Kind regards,
Jesper
There are clearly problems with those records. Thanks for your investigations. I have been slowly removing the manual steps from my QA process, but there are still things that can go wrong even now.

The first record is just plain wrong and should be removed from the set.
The other three records have alternate solutions and should also probably be replaced as well.

[d]2b2n2/2q2p1k/1r6/2pPP1QB/2p1P3/2P4P/1P4P1/2B3K1 b - - bm Rg8c8; c0 "Rg8c8=10, Kh8=3, Ng6=5, Qb7=5, Ra6=4, Rg6=4"; id "Open Files & Diagnonals.017";
[d]2r3k1/p1pb2pp/3p1p1r/1p6/4P2q/1PQB2R1/PBP1N1P1/6K1 w - - bm Bc1 Qa5; c0 "Qa5=10, Bc1=10"; id "Open Files & Diagnonals.026";
[d]3r3k/p1q2ppp/5b2/3Pp3/2p4P/P4BQ1/1PP3P1/1K5R w - - bm Be4 Rf1; c0 "Rf1=10, Be4=10, Re1=4, Rf1=5, h5=1"; id "Open Files & Diagnonals.042";
[d]q4rk1/5ppp/p1r4b/2p1p3/Q3P2P/1R4P1/5PK1/3R1B2 w - - bm Bc4 Rd7; c0 "Rd7=10, Bc4=10, Rf3=7"; id "Open Files & Diagnonals.081";
r6k/p3Npb1/q5pp/1p2p3/1B6/PPP3P1/3R1P1P/3R2K1 b - - bm Qe6 Qf6; c0 "Qf6=10, Qb7=8, Qe6=10"; id "Open Files & Diagnonals.098";