Tony's positional test suite

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
Steve Maughan
Posts: 1221
Joined: Wed Mar 08, 2006 8:28 pm
Location: Florida, USA

Re: Tony's positional test suite

Post by Steve Maughan »

Hi Dann,

Interesting! These relative positional scores remind me of the "Chess Magazine's" puzzles.

Back in 1989(!), I created a system to evaluate chess engines using this type of score. The interesting part of the approach was it tried to evaluate how chess engines strength changed at different time controls. It was published by Eric Hallsworth as part of his Selective Search magazine, You can read about it here:

http://www.chesscomputeruk.com/Evaluati ... rams_1.pdf

It would be interesting to use these positions and create an automated evaluation system using something like PyChess.

- Steve
http://www.chessprogramming.net - Maverick Chess Engine
Ferdy
Posts: 4833
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Sample regression

Post by Ferdy »

Code: Select all

A. Processor
Brand          : Intel(R) Celeron(R) CPU B800 @ 1.50GHz
Arch           : X86_64
Count          : 2

B. Engine settings
Threads        : 1
Hash (mb)      : 128
Time(s)/pos    : 30.0

C. Test set
Filename       : tony-dcc-caleb.epd
NumPos         : 16

D. Results
Engine                   : Rating   Best  Score  SRate  Elap(s)
Stockfish 8 64           :   3334     10     86   0.82      451
Fire 5 x64               :   3132      8     82   0.78      451
Komodo 9.02 64-bit       :   3200      8     75   0.71      450
Bobcat v8.0              :   2816      8     70   0.67      428
Texel 1.06               :   2947      7     69   0.66      451
Hannibal 1.7 x64         :   2981      8     67   0.64      451
Cheng 4.39               :   2785      6     67   0.64      451
Deuterium v2017.1.35.431 :   2760      6     63   0.60      451
Arasan 20.2              :   2880      5     62   0.59      450
Rhetoric 1.4.3 x64       :   2631      6     61   0.58      429
Ethereal 8.19            :   2506      7     59   0.56      451
spark-1.0                :   2778      5     58   0.55      450
Gaviota v1.0             :   2716      4     55   0.52      450
Alaric 707               :   2479      3     54   0.51      453
Arminius 2014-01-18      :   2346      4     53   0.50      450
Cheese 1.9 64 bits       :   2558      4     52   0.50      450
Maverick 1.5 x64         :   2380      3     43   0.41      451
Linear regression.
Estimated Rating = (2443 x ScoreRate) + 1306
ScoreRate = totalScore/maxScore

Image
Dann Corbit
Posts: 12537
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: Sample regression

Post by Dann Corbit »

Thank you for running such a fun experiment.

I really think this is a new kind of result.

Typically, there is a very poor regression between engine strength and EPD test suites.

I remember back in the day, when Shredder topped the Elo charts, it scored 285/300 on WAC which was very average.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
Ferdy
Posts: 4833
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: Sample regression

Post by Ferdy »

Dann Corbit wrote:Thank you for running such a fun experiment.

I really think this is a new kind of result.

Typically, there is a very poor regression between engine strength and EPD test suites.

I remember back in the day, when Shredder topped the Elo charts, it scored 285/300 on WAC which was very average.
Epd suites with multi solution is very much different compared to a suite with single solution when used to compare engine strengths.

One way to improve WAC is to supply it with 2nd solution :) And if the 2nd solution is winning, just add 8 or 9 points for example and the mate solution is 10. If the 2nd solution is just equal then perhaps it may get only 1 or 0 point.

Identifying which position bears more weight than the others takes more time. One idea is give more weight to position whose bestmove take more time to find, this is also the idea of Steve on the pdf if I interpreted it corectly.
Ferdy
Posts: 4833
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: Tony's positional test suite

Post by Ferdy »

Rebel wrote:Here is some more human analysis, snippet:

Code: Select all

[Event ""]
[Site "C3E2=10 G2G4=06 F1D3=05 D2D6=02 D1E1=0"]
[Date "1994.05.05"]
[Round "1"]
[White "beat10  (01)"]
[Black "Ply : 7"]
[Result "*"]
[BlackElo ""]
[WhiteElo ""]
[FEN "r3r1k1/1p3nqp/2pp4/p4p2/Pn3P1Q/2N4P/1PPR2P1/3R1BK1 w - - 0 1"]

{ C3E2=10 G2G4=06 F1D3=05 D2D6=02 D1E1=02 H4H5=01 G1H2=01 F1E2=01  } *

About 700 of them, I all typed in myself from paper. No internet in those days.

http://www.top-5000.nl/misc.htm
I download the file rebel.pgn and tried to convert it to tony format. Here are the errors I encountered on move legality.
This is impressive considering that you had done this by hand :) for more than 700 positions with multi good move test suite.

[d]r1bqrbk1/2n4p/3p1pp1/pppP3n/4P3/P1N2NPP/1P1B1PB1/R2QR1K1 w - - 0 1
game: 16
comment: B2B4=10 G3G4=08 A4A5=05 F3H4=02 F3H2=02 D1C2=02
uciMove: a4a5, score: 5
probably illegal move: a4a5

[d]2rqkb1r/3n1p1p/p3p1pn/1p1pP1N1/5P2/2N5/PPP3PP/R1BQ1R1K w kq - 0 1
game: 45
comment: F4F5=10 C3E2=06 G2G4=06 A2A4=06 D1D3=05 D2E3=03
uciMove: d2e3, score: 3
probably illegal move: d2e3

[d]2r1rbk1/1b1n1pp1/p6p/1p1nPB2/2q5/P4NNP/1B1Q1PP1/R3R1K1 b - - 0 1
game: 76
comment: D7C5=10 C8D8=07 E8E6=04 E8D8=04 C4C7=02 C8C7=02 F6H5=02
uciMove: f6h5, score: 2
probably illegal move: f6h5

[d]3q1rk1/pp1bpp1p/3p1npQ/8/3NP1P1/2r2P2/PPP5/2KR3R w - - 0 1
game: 533
comment: G2G4=12
uciMove: g2g4, score: 12
probably illegal move: g2g4

[d]8/8/p1BN1k2/3P4/1p1K1P1p/r7/8/8 b - - 0 1
game: 638
comment: A2A1=10 B4B3=08 H4H3=06 A3F3=05
uciMove: a2a1, score: 10
probably illegal move: a2a1

[d]r1b1kb1r/pp3pp1/4p2p/4q3/3N2PP/4n3/PPPQ1PB1/2KR3R w kq - 0 1
game: 673
comment: F2E3=10 G1E1=08 F2F4=07 D4C6=05 D1E1=03 D2E3=03
uciMove: g1e1, score: 8
probably illegal move: g1e1

[d]r4rk1/pp4bp/2pq2p1/3p1P1n/PP1P4/3B1P2/4NP1P/1R1Q1RK1 w - - 0 1
game: 676
comment: G1H1=10 D1D2=08 B4B5=07 D1C1=06 C1C2=04 F5G6=02
uciMove: c1c2, score: 4
probably illegal move: c1c2

[d]8/8/p1BN1k2/3P4/1p1K1P1p/r7/8/8 b - - 0 1
game: 705
comment: A2A1=10 B4B3=08 H4H3=06 A3F3=05
uciMove: a2a1, score: 10
probably illegal move: a2a1
duplicate
Ferdy
Posts: 4833
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: Tony's positional test suite

Post by Ferdy »

These are the dupes in rebel.pgn.

Code: Select all

dupes: 8/4bkpp/p4p2/r1pR4/2P2P1P/4BK2/6P1/8 b - - 0 1
dupes: 8/4k1pp/p4p2/7R/2r2P1P/5K2/6P1/8 b - - 0 1
dupes: 8/4k1p1/p4p1p/R7/2r2P1P/5K2/6P1/8 b - - 0 1
dupes: 8/4k1p1/p1r2p1p/R7/5PKP/8/6P1/8 b - - 0 1
dupes: 8/4k1p1/pr3p1p/R4P2/6KP/8/6P1/8 b - - 0 1
dupes: 8/5k2/pr3p1K/5R2/7P/8/6P1/8 b - - 0 1
dupes: 1r6/5k2/p4p2/5R1K/7P/8/6P1/8 b - - 0 1
dupes: 8/5k2/5p2/p7/4K2P/8/6P1/8 b - - 0 1
dupes: 8/p4pkp/1p4p1/3R4/4pK2/4P1P1/n4P1P/8 b - - 0 1
dupes: 8/p4pkp/1p4p1/1R6/3K4/4P1P1/5n1P/8 b - - 0 1
dupes: 8/p5kp/1p4p1/5p2/3K4/4P1P1/1R3n1P/8 b - - 0 1
dupes: 8/p5kp/1p3np1/5p2/3K4/4P1PP/2R5/8 b - - 0 1
dupes: 8/p6p/1p3kp1/5P2/3Kn3/4P2P/2R5/8 b - - 0 1
dupes: 8/p1R4p/1p4p1/5k2/3Kn3/4P2P/8/8 b - - 0 1
dupes: 8/R6p/1p4p1/5kn1/3K4/4P2P/8/8 b - - 0 1
dupes: 8/8/1p4p1/5knp/3K4/R3P2P/8/8 b - - 0 1
dupes: 8/8/1p4p1/5k1p/8/R2KPn1P/8/8 b - - 0 1
dupes: 8/8/1p4p1/5k1p/8/R3P2P/3K4/6n1 b - - 0 1
dupes: 8/8/1p6/5kpp/8/1R2P2P/3K4/6n1 b - - 0 1
dupes: 8/8/2R3k1/p4p2/r6P/6P1/2P3K1/8 b - - 0 1
dupes: 8/5k2/8/p1R2p2/4r2P/5KP1/2P5/8 b - - 0 1
dupes: 8/8/R4k2/5p2/2r4P/5KP1/2P5/8 b - - 0 1
dupes: 8/8/8/4kp2/2r4P/5KP1/1RP5/8 b - - 0 1
dupes: 1R6/8/2r2k2/5p2/7P/6PK/2P5/8 b - - 0 1
dupes: 8/8/1R3k2/5p2/7P/6PK/2r5/8 b - - 0 1
dupes: 8/6k1/1R6/5p1P/8/6PK/2r5/8 b - - 0 1
dupes: r2q1rk1/ppp2pbp/3n4/4p3/3nP3/2NBB2P/PPP2QP1/R4RK1 w - - 0 1
dupes: r2q1r1k/ppp2pbp/3n4/4p3/3nP3/2NBB1QP/PPP3P1/R4RK1 w - - 0 1
dupes: r2q1r1k/pp3pbp/2pn4/4p3/3nP1Q1/2NBB2P/PPP3P1/R4RK1 w - - 0 1
dupes: r3qr1k/pp3pbp/2pn4/4p2Q/3nP3/2NBB2P/PPP3P1/R4RK1 w - - 0 1
dupes: 1r4k1/pB3p1p/4b1p1/8/2P5/1PR5/r4PPP/5RK1 w - - 0 1
dupes: 1R6/4p2p/5k2/1p3P2/p2p1K2/P1n4P/5P2/8 w - - 0 1
dupes: 1r6/6pp/4pn2/2k5/1r1pP3/N4P2/1PKR2PP/7R w - - 0 1
dupes: 6k1/3b1ppp/p7/P1b5/3NpP2/1p2B1P1/1P4KP/8 b - - 0 1
dupes: 5k2/n7/5P2/2N2KP1/8/7p/8/8 b - - 0 1
dupes: 3R1b2/1p3pkp/p3p1pn/2P1P3/1P6/1b5P/6P1/R5K1 w - - 0 1
dupes: 6k1/r2np2p/4N1p1/2pPp3/6P1/1P6/P6P/R4K2 w - - 0 1
dupes: 8/3bk3/2r2p2/2P1p1pp/PK6/1PB3R1/7P/8 b - - 0 1
dupes: 8/2R3pp/5k2/5p2/4nP1P/2p5/6KP/8 w - - 0 1
dupes: 2R2nk1/pp3ppp/4p3/4P3/3p3P/1P3N2/q4PP1/3R2K1 b - - 0 1
dupes: 6k1/pb1r1pp1/1p3n1p/1P2p3/4P3/P3BP2/1n2N1PP/1BR3K1 b - - 0 1
dupes: r2bBk2/pp3p2/6p1/1P2p3/P3P3/2N1B1P1/2n2PK1/R7 w - - 0 1
dupes: 8/2p2p2/p5p1/2p1P3/4kPKp/7P/PP4P1/8 w - - 0 1
dupes: 8/8/p3k3/1pB3p1/5p1p/P1N5/1P1K2bP/8 w - - 0 1
dupes: 8/8/p1r1p1R1/2p2rp1/2K3k1/1PP1R1P1/P7/8 b - - 0 1
dupes: 8/8/4k3/5p2/p1N5/3K2P1/1P5P/3n4 b - - 0 1
dupes: 8/8/p1BN1k2/3P4/1p1K1P1p/r7/8/8 b - - 0 1
dupes: 3r4/3Pkpp1/8/p2R1Pp1/3p4/P5P1/5K1P/8 w - - 0 1
dupes: 8/6pp/3kpp2/p2p4/P7/1K6/4B1PP/8 b - - 0 1
dupes: 8/8/4p3/2Kpk3/5p1R/5P2/4n2P/8 w - - 0 1
dupes: 4B1k1/p1r5/4bP2/8/4R3/1np1B3/5K1P/8 b - - 0 1
User avatar
Nordlandia
Posts: 2821
Joined: Fri Sep 25, 2015 9:38 pm
Location: Sortland, Norway

Re: Tony's positional test suite

Post by Nordlandia »

Code: Select all

3R1b2/1p3pkp/p3p1pn/2P1P3/1P6/1b5P/6P1/R5K1 w - - 0 1
Positional draw?

[pgn][Event "?"]
[Site "?"]
[Date "????.??.??"]
[Round "?"]
[White "New game"]
[Black "?"]
[Result "*"]
[SetUp "1"]
[FEN "3R1b2/1p3pkp/p3p1pn/2P1P3/1P6/1b5P/6P1/R5K1 w - - 0 1"]
[PlyCount "11"]

1. Rxa6 bxa6 2. c6 Nf5 3. c7 Ne7 4. Re8 Bd5 5. Rxe7 Bxe7 6. c8=Q *[/pgn]
User avatar
Rebel
Posts: 6991
Joined: Thu Aug 18, 2011 12:04 pm

Re: Tony's positional test suite

Post by Rebel »

Ferdy wrote:
Rebel wrote:Here is some more human analysis, snippet:

Code: Select all

[Event ""]
[Site "C3E2=10 G2G4=06 F1D3=05 D2D6=02 D1E1=0"]
[Date "1994.05.05"]
[Round "1"]
[White "beat10  (01)"]
[Black "Ply : 7"]
[Result "*"]
[BlackElo ""]
[WhiteElo ""]
[FEN "r3r1k1/1p3nqp/2pp4/p4p2/Pn3P1Q/2N4P/1PPR2P1/3R1BK1 w - - 0 1"]

{ C3E2=10 G2G4=06 F1D3=05 D2D6=02 D1E1=02 H4H5=01 G1H2=01 F1E2=01  } *

About 700 of them, I all typed in myself from paper. No internet in those days.

http://www.top-5000.nl/misc.htm
I download the file rebel.pgn and tried to convert it to tony format. Here are the errors I encountered on move legality.
Work to do. How nice, mistakes of about 30 years ago are backfiring :)
Ferdy wrote: This is impressive considering that you had done this by hand :) for more than 700 positions with multi good move test suite.
Image

No PC in those days, only the Apple 2E, 32Kb Ram, 1 Mhz doing 100-150 NPS and 2 floppy drives each with a capacity of (if I remember right) 360 Kb.

No PGN nor EPD, let alone match facilities. What to do? So positions chosen by good players with multi good moves was a gift. Positions came from chess magazines, Steve already mentioned the Beat the Masters series.

Regarding the errors, either I must have typed the wrong position setup or made a mistake typing the moves. Boring work.
Ferdy
Posts: 4833
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: Tony's positional test suite

Post by Ferdy »

This is now fully converted. Duplicates are also removed. Illegal moves are discarded and not replaced, if there is only one move and it is illegal, the epd is removed.

Code: Select all

r3r1k1/1p3nqp/2pp4/p4p2/Pn3P1Q/2N4P/1PPR2P1/3R1BK1 w - - bm Ne2; c0 "positional scores are: Ne2=10, g4=6, Bd3=5, Rxd6=2, Re1=2, Qh5=1, Kh2=1, Be2=1"; id "rebel.pos.01";
4rrk1/pp1b2pp/5n2/3p1N2/8/2QB1qP1/PP3P1P/4RRK1 w - - bm Rxe8; c0 "positional scores are: Rxe8=10, Ne7+=7, Re3=6, Nd4=4"; id "rebel.pos.02";
r6r/p6p/1pnpkn2/q1p2p1p/2P5/2P1P3/P4PP1/1RBQKB1R w K - bm Rb3; c0 "positional scores are: Rb3=10, Qc2=7, Rxh5=7, Be2=7, Bd3=2, g4=2, e4=2, Rb5=1"; id "rebel.pos.03";
Download rebel.epd
https://drive.google.com/file/d/0BwAOsu ... sp=sharing

Sample run at 1s/pos

Code: Select all

A. Processor
Brand          : Intel(R) Celeron(R) CPU B800 @ 1.50GHz
Arch           : X86_64
Count          : 2

B. Engine settings
Threads        : 1
Hash (mb)      : 128
Time(s)/pos    : 1.0

C. Test set
Filename       : rebel.epd
NumPos         : 657

D. Results
Engine                   : Rating   Best  Score  SRate  Elap(s)

Stockfish 8 64           :   3334    345   3193   0.64      674
Deuterium v2017.1.35.431 :   2760    278   2650   0.53      673
User avatar
pedrox
Posts: 1056
Joined: Fri Mar 10, 2006 6:07 am
Location: Basque Country (Spain)

Re: Sample regression

Post by pedrox »

Ferdy wrote:

Code: Select all

A. Processor
Brand          : Intel(R) Celeron(R) CPU B800 @ 1.50GHz
Arch           : X86_64
Count          : 2

B. Engine settings
Threads        : 1
Hash (mb)      : 128
Time(s)/pos    : 30.0

C. Test set
Filename       : tony-dcc-caleb.epd
NumPos         : 16

D. Results
Engine                   : Rating   Best  Score  SRate  Elap(s)
Stockfish 8 64           :   3334     10     86   0.82      451
Fire 5 x64               :   3132      8     82   0.78      451
Komodo 9.02 64-bit       :   3200      8     75   0.71      450
Bobcat v8.0              :   2816      8     70   0.67      428
Texel 1.06               :   2947      7     69   0.66      451
Hannibal 1.7 x64         :   2981      8     67   0.64      451
Cheng 4.39               :   2785      6     67   0.64      451
Deuterium v2017.1.35.431 :   2760      6     63   0.60      451
Arasan 20.2              :   2880      5     62   0.59      450
Rhetoric 1.4.3 x64       :   2631      6     61   0.58      429
Ethereal 8.19            :   2506      7     59   0.56      451
spark-1.0                :   2778      5     58   0.55      450
Gaviota v1.0             :   2716      4     55   0.52      450
Alaric 707               :   2479      3     54   0.51      453
Arminius 2014-01-18      :   2346      4     53   0.50      450
Cheese 1.9 64 bits       :   2558      4     52   0.50      450
Maverick 1.5 x64         :   2380      3     43   0.41      451
Linear regression.
Estimated Rating = (2443 x ScoreRate) + 1306
ScoreRate = totalScore/maxScore

Image
For maxScore it seems that you have used 104, however adding on epd file I think I get 114.