Tony's positional test suite

zullil · Post by **zullil** » Tue Aug 01, 2017 12:12 pm

first25plus5 wrote:Something pointed out in Robin Smith’s book “Modern Chess Analysis” (Gambit books, 2004) are ‘ruler flat’ evaluations which indicate fortress draws. (or the evaluation tendency to ‘settle’ approximately so).
This evaluation behavior is further examined in a paper with later engines “Detecting Fortresses in Chess” (Guid & Bratko, 2012).
Example is if an evaluation eventually stabilizes at approximately say +2.24 and maintains this for some time then this behavior strongly indicates a fortress draw, despite a high evaluation for White.

From the article:

6 CONCLUSIONS

We introduce a novel idea for detecting fortresses in the
game of chess. We demonstrate that a heuristic-searchbased
program is able to detect fortresses on the basis of
backed-up values obtained at different levels of search.
If a particular position is a fortress, the program is not
able to show any progress towards a win and thus the
backed-up values cease to change significantly from a
certain search depth on.

Calling this idea "novel" in 2012 seems dubious, at best.

Probably should not comment further...

BeyondCritics · Post by **BeyondCritics** » Tue Aug 01, 2017 11:17 pm

Thank you for that.
I gleaned over the test suite with analysis and diagrams on the web (http://privat.bahnhof.se/wb432434/pos.htm), these are all open positions, except for #14. That means that in the remaining 15 positions stockfish should be irrefutable by humans. I checked that conjecture and indeed in 8(!) out of 15 cases the commentators got it wrong or backwards. How many points would you give for that??
I personally enjoyed this rebuttal the most:

[d]1rN1r1k1/1pq2pp1/2p1nn1p/p2p1B2/3P4/4P2P/PPQ1NPP1/2R2RK1 b - - 0 1

1..Rxbc8 2.Nf4 (allegedly the refutation) Nxf4! 3.Bxc8 Nxg2!

In #14 the alleged best move 1.Nb1, played by Kasparov, is neutralized outright by 1..b5 and black is well.
[d]r3r1k1/ppqbbpp1/2pp1nnp/3Pp3/2P1P3/5N1P/PPBN1PP1/R1BQR1K1 w - - 0 1

In #16 after 34.Qxc5 (stockfish) resigning is an option.
[d]2r2k2/5p2/2Bp1b1r/2qPp1pp/PpN1P3/1P2Q3/5PPP/4R1K1 w - - 0 1

Interestingly with the help of stockfish you might save even this position against a strong human master. Since after the 34. Rc1(?) Qxe3 35.Nxe3(?!) Bd8 36.Rc4(?!) Ba5 37.Nc2(?!) g4 38.Nxb4(??) it follows 38...Rb8 39.Bb5 Bxb4 40. Rxb4 f5! and white is only minimal better (stockfish).

Never trust your test suite.

Evert · Post by **Evert** » Thu Aug 03, 2017 10:28 am

zullil wrote: Calling this idea "novel" in 2012 seems dubious, at best. Probably should not comment further...

Yes... it's one of those things that make me wonder how it got past the referee. As it is, the paper points out some obvious points and proceeds to offer no real idea for how to handle fortress detection.
Saying that the engines "detect" the fortress by having a flat eval seems rather generous; I'd call not returning a draw score a sign of not detecting the fortress.
Still, the paper has a list of interesting fortress positions that I might use if/when I go back to tinkering with fortress detection.

Ferdy · Post by **Ferdy** » Sun Aug 13, 2017 7:56 pm

MEA - Multiple move EPD Analyzer beta interface can be found here.

https://mea.bitballoon.com/

Dann Corbit · Post by **Dann Corbit** » Mon Aug 14, 2017 7:27 am

Ferdy wrote:MEA - Multiple move EPD Analyzer beta interface can be found here.

https://mea.bitballoon.com/

Thanks

Rebel · Post by **Rebel** » Mon Aug 14, 2017 8:22 am

Ferdy wrote:MEA - Multiple move EPD Analyzer beta interface can be found here.

https://mea.bitballoon.com/

Will try.

first25plus5 · Post by **first25plus5** » Sat Jul 20, 2024 1:55 pm

If anyone has the time to re-calibrate / re-test all sixteen positions with today's engines that would be appreciated.

OldMan · Post by **OldMan** » Mon Oct 14, 2024 1:36 am

Beginner Question:
I assume you add the score of the position found for each position to get an engine rating.
At what time control.
I understand you searched for 24 hours to get the results, but for the test, any specific time controls.

I am returning to chess programming after a long absence.
My old program currently runs on a G4 Mac Laptop form 2004 while I work on re writing it.
I have forgotten a lot and am essentially starting over.

Wm.

Dann Corbit · Post by **Dann Corbit** » Mon Oct 14, 2024 6:11 pm

The multi-answer type problems give a different score based upon the assumed correctness of the given move.
As programs get stronger and stronger, and hardware gets more and more powerful, the old answers (which would have been more or less correct at the time because of hardware and software limitations) improve and something that used to be a purely strategic concept like undermining, king safety, rooks on the 8th, because no material win was seen on the horizon become tactical because the engines can see deeply enough to find a tactical shot. It is also possible that other positional factors can become more important when combined with teh search.

A test set like Tony's positional suite has a number of different possible answers, each scoring differently depending upon what your engine chooses.

To update such a test suite, one might take several strong programs and run them on high end hardware for four hours per position with multi-pv set to 10 so that the engine is forced to think about many alternatives. Suppose that the best move as found by the engines has a score of 429. We divide this best score by 429 and multiply by ten giving a score of ten.
Suppose that for the next best solution we have a score of 313. We divide 313 by 429 and multiply by 10 giving a score of seven, and so on through all moves until the calculated score is less than one. Then on to the next problem.

Tony's positional test suite

Re: Tony's positional test suite

Re: Tony's positional test suite

Re: Tony's positional test suite

Re: Tony's positional test suite

Re: Tony's positional test suite

Re: Tony's positional test suite

Re: Tony's positional test suite

Re: Tony's positional test suite

Re: Tony's positional test suite