Tony's positional test suite

Discussion of chess software programming and technical issues.

Moderator: Ras

zullil
Posts: 6442
Joined: Tue Jan 09, 2007 12:31 am
Location: PA USA
Full name: Louis Zulli

Re: Tony's positional test suite

Post by zullil »

first25plus5 wrote:Something pointed out in Robin Smith’s book “Modern Chess Analysis” (Gambit books, 2004) are ‘ruler flat’ evaluations which indicate fortress draws. (or the evaluation tendency to ‘settle’ approximately so).
This evaluation behavior is further examined in a paper with later engines “Detecting Fortresses in Chess” (Guid & Bratko, 2012).
Example is if an evaluation eventually stabilizes at approximately say +2.24 and maintains this for some time then this behavior strongly indicates a fortress draw, despite a high evaluation for White.
From the article:
6 CONCLUSIONS

We introduce a novel idea for detecting fortresses in the
game of chess. We demonstrate that a heuristic-searchbased
program is able to detect fortresses on the basis of
backed-up values obtained at different levels of search.
If a particular position is a fortress, the program is not
able to show any progress towards a win and thus the
backed-up values cease to change significantly from a
certain search depth on.
Calling this idea "novel" in 2012 seems dubious, at best. :cry: Probably should not comment further...
BeyondCritics
Posts: 410
Joined: Sat May 05, 2012 2:48 pm
Full name: Oliver Roese

Re: Tony's positional test suite

Post by BeyondCritics »

Thank you for that.
I gleaned over the test suite with analysis and diagrams on the web (http://privat.bahnhof.se/wb432434/pos.htm), these are all open positions, except for #14. That means that in the remaining 15 positions stockfish should be irrefutable by humans. I checked that conjecture and indeed in 8(!) out of 15 cases the commentators got it wrong or backwards. How many points would you give for that??
I personally enjoyed this rebuttal the most:

[d]1rN1r1k1/1pq2pp1/2p1nn1p/p2p1B2/3P4/4P2P/PPQ1NPP1/2R2RK1 b - - 0 1

1..Rxbc8 2.Nf4 (allegedly the refutation) Nxf4! 3.Bxc8 Nxg2!

In #14 the alleged best move 1.Nb1, played by Kasparov, is neutralized outright by 1..b5 and black is well.
[d]r3r1k1/ppqbbpp1/2pp1nnp/3Pp3/2P1P3/5N1P/PPBN1PP1/R1BQR1K1 w - - 0 1


In #16 after 34.Qxc5 (stockfish) resigning is an option.
[d]2r2k2/5p2/2Bp1b1r/2qPp1pp/PpN1P3/1P2Q3/5PPP/4R1K1 w - - 0 1

Interestingly with the help of stockfish you might save even this position against a strong human master. Since after the 34. Rc1(?) Qxe3 35.Nxe3(?!) Bd8 36.Rc4(?!) Ba5 37.Nc2(?!) g4 38.Nxb4(??) it follows 38...Rb8 39.Bb5 Bxb4 40. Rxb4 f5! and white is only minimal better (stockfish).

Never trust your test suite.
User avatar
Evert
Posts: 2929
Joined: Sat Jan 22, 2011 12:42 am
Location: NL

Re: Tony's positional test suite

Post by Evert »

zullil wrote: Calling this idea "novel" in 2012 seems dubious, at best. :cry: Probably should not comment further...
Yes... it's one of those things that make me wonder how it got past the referee. As it is, the paper points out some obvious points and proceeds to offer no real idea for how to handle fortress detection.
Saying that the engines "detect" the fortress by having a flat eval seems rather generous; I'd call not returning a draw score a sign of not detecting the fortress.
Still, the paper has a list of interesting fortress positions that I might use if/when I go back to tinkering with fortress detection.
Ferdy
Posts: 4846
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: Tony's positional test suite

Post by Ferdy »

MEA - Multiple move EPD Analyzer beta interface can be found here.

https://mea.bitballoon.com/
Dann Corbit
Posts: 12777
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: Tony's positional test suite

Post by Dann Corbit »

Ferdy wrote:MEA - Multiple move EPD Analyzer beta interface can be found here.

https://mea.bitballoon.com/
Thanks
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
User avatar
Rebel
Posts: 7302
Joined: Thu Aug 18, 2011 12:04 pm
Full name: Ed Schröder

Re: Tony's positional test suite

Post by Rebel »

Ferdy wrote:MEA - Multiple move EPD Analyzer beta interface can be found here.

https://mea.bitballoon.com/
Will try.
first25plus5
Posts: 11
Joined: Sat Jul 22, 2017 2:50 am
Location: New Zealand

Re: Tony's positional test suite

Post by first25plus5 »

If anyone has the time to re-calibrate / re-test all sixteen positions with today's engines that would be appreciated.
OldMan
Posts: 1
Joined: Thu Oct 10, 2024 6:43 pm
Full name: William Bryant

Re: Tony's positional test suite

Post by OldMan »

Beginner Question:
I assume you add the score of the position found for each position to get an engine rating.
At what time control.
I understand you searched for 24 hours to get the results, but for the test, any specific time controls.

I am returning to chess programming after a long absence.
My old program currently runs on a G4 Mac Laptop form 2004 while I work on re writing it.
I have forgotten a lot and am essentially starting over.

Wm.
Dann Corbit
Posts: 12777
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: Tony's positional test suite

Post by Dann Corbit »

The multi-answer type problems give a different score based upon the assumed correctness of the given move.
As programs get stronger and stronger, and hardware gets more and more powerful, the old answers (which would have been more or less correct at the time because of hardware and software limitations) improve and something that used to be a purely strategic concept like undermining, king safety, rooks on the 8th, because no material win was seen on the horizon become tactical because the engines can see deeply enough to find a tactical shot. It is also possible that other positional factors can become more important when combined with teh search.

A test set like Tony's positional suite has a number of different possible answers, each scoring differently depending upon what your engine chooses.

To update such a test suite, one might take several strong programs and run them on high end hardware for four hours per position with multi-pv set to 10 so that the engine is forced to think about many alternatives. Suppose that the best move as found by the engines has a score of 429. We divide this best score by 429 and multiply by ten giving a score of ten.
Suppose that for the next best solution we have a score of 313. We divide 313 by 429 and multiply by 10 giving a score of seven, and so on through all moves until the calculated score is less than one. Then on to the next problem.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.