I have two questions.
Precondition:
Assume the test suite would contain all test positions and studies
that were ever composed.
1) What are the top engines for solving such a test suite?
My general impression is that Sting and Crystal are best
when it comes to positions which are (too) hard for Stockfish.
In extreme cases Chest is also interesting.
Otherwise Stockfish is a very good choice.
2) What would be your set of choice of the 3 engines to solve as many positions as possible?
We assume that all 3 run at the same time on different but equally strong machines
and we want to see the solution as quickly as possible.
In this case, we accept a position as solved if at least one engine shows the solution
(this is known as the 'Maximum Coverage Problem').
Thank you.
Top engines or top set of engines for solving test suites
Moderator: Ras
-
- Posts: 47
- Joined: Sat Aug 15, 2020 8:08 am
- Full name: Frank Karger
-
- Posts: 1952
- Joined: Tue Apr 19, 2016 6:08 am
- Location: U.S.A
- Full name: Andrew Grant
Re: Top engines or top set of engines for solving test suites
Stockfish is the best engine for solving any general suite
The impression that its not -- and that some SF variant is better -- is a result of a subtle bias against Stockfish
IE, someone is coming up with a set of test positions. If Stockfish fails to solve it, then they think "Oh, that is interesting. Lets add that"
The impression that its not -- and that some SF variant is better -- is a result of a subtle bias against Stockfish
IE, someone is coming up with a set of test positions. If Stockfish fails to solve it, then they think "Oh, that is interesting. Lets add that"
-
- Posts: 47
- Joined: Sat Aug 15, 2020 8:08 am
- Full name: Frank Karger
Re: Top engines or top set of engines for solving test suites
Makes sense.AndrewGrant wrote: ↑Tue Apr 22, 2025 11:27 am Stockfish is the best engine for solving any general suite
The impression that its not -- and that some SF variant is better -- is a result of a subtle bias against Stockfish
IE, someone is coming up with a set of test positions. If Stockfish fails to solve it, then they think "Oh, that is interesting. Lets add that"
Although there could also be another subtle difference:
SF is optimized to be the strongest engine in practical play.
But test suites even if they are not designed to be 'anti Stockfish' could
be different to practical play.
-
- Posts: 666
- Joined: Sun Aug 04, 2013 1:19 pm
Re: Top engines or top set of engines for solving test suites
1)fkarger wrote: ↑Tue Apr 22, 2025 10:52 am I have two questions.
Precondition:
Assume the test suite would contain all test positions and studies
that were ever composed.
1) What are the top engines for solving such a test suite?
My general impression is that Sting and Crystal are best
when it comes to positions which are (too) hard for Stockfish.
In extreme cases Chest is also interesting.
Otherwise Stockfish is a very good choice.
2) What would be your set of choice of the 3 engines to solve as many positions as possible?
We assume that all 3 run at the same time on different but equally strong machines
and we want to see the solution as quickly as possible.
In this case, we accept a position as solved if at least one engine shows the solution
(this is known as the 'Maximum Coverage Problem').
Thank you.
Stockfish
LC0
Torch (If you can't get it, try the "position solver engine" you like. It doesn't matter if the elo is much lower.)
All three are super strong and all three are very different if you compare them to each other.
2)
The most difficult: Top Chess Engines Testsuite 2024 v2
https://www.mediafire.com/file/cypaz2t0 ... 2.pgn/file
If this test suit isn't difficult enough, take it and put more studies inside.
To solve as many positions as possible... for what?
It should be clear that Stockfish and ... will solve 1500 elo problems.
Take 500.000 studies and delete the positions which are solved instantly and which are solved to often.
-
- Posts: 47
- Joined: Sat Aug 15, 2020 8:08 am
- Full name: Frank Karger
Re: Top engines or top set of engines for solving test suites
The second question is about the best team of solvers if you had to choose a team of 3 solvers.Hai wrote: ↑Tue Apr 22, 2025 11:58 amTo solve as many positions as possible... for what?fkarger wrote: ↑Tue Apr 22, 2025 10:52 am
2) What would be your set of choice of the 3 engines to solve as many positions as possible?
We assume that all 3 run at the same time on different but equally strong machines
and we want to see the solution as quickly as possible.
In this case, we accept a position as solved if at least one engine shows the solution
(this is known as the 'Maximum Coverage Problem').
Thank you.
It should be clear that Stockfish and ... will solve 1500 elo problems.
Take 500.000 studies and delete the positions which are solved instantly and which are solved to often.
-
- Posts: 3361
- Joined: Sat Feb 16, 2008 7:38 am
- Full name: Peter Martan
Re: Top engines or top set of engines for solving test suites
And they should be different, what sense would there be in positional testing anyway, if I'd just want and get the same results like in game playing?
"Problem", many testers and programmers have with positional testing, as it is done most of the times, is just this difference between the results out of game playing and out of positional tests. To me these are features, not bugs.

What you have to deal with (but I'm sure you know so): there isn't one test suite as well as there isn't one sinlge position of chess answering all the questions you can have to "playing strength" of engines (as well as that of humans) not even the very basic starting position of classical chess is of really much more meaning than other positions of interest are. You see that best today, if you try to get statistically meaningful results out of eng-eng-game playing from starting position (without books or given opening test positions) only, not even with very short TC and weak hardware you get out of error bar with reasonable amounts of games, as for more than or 2 single engines, their versions and settings. This kind of eng-eng-testing is drawn- dead already since quite a while too.
So which one suite of test positions out of opening (to let engines play out against each other, of course you can use opening positions for positional testing too, MEA is a way to go like this e.g. and I like to use it also, just to mention, I've got a suite of 1001 UHO- postions in MEA- syntax also) or out of midgame and endgame you use, if positions especially chosen as anti engine puzzles or out of eng-eng-games (NICE e.g., pity latest version of Ed's is still buggy
viewtopic.php?p=978298#p978298
and evaluated with too little hardware- time
viewtopic.php?p=975854#p975854
and those 10" are used single thread with MultiPV=4, see postings below that of the link and in second one recent thread about NICE, that's too little hardware- time for me to get halfway reliable evals of positions with near to each other candidate moves as for their WDL- chances, the biggest MEA- suite I use is 10124 positions, which I did let SF evaluate for 1 minute/pos., 30 threads of a 16x3.5GHz- CPU and MultiPV=4), which test- tool (besides MEA I like EloStatTS from Frank Schubert still very much) and which hardware- TC for what kind of engine- pool you use, that's what makes the real big differences, maybe a bigger one but letting all the positions you're interested in being outplayed eng-eng, head to head, one by one, or yet just trust other ways of adjudicating and evaluating certain kinds of positons without thousands of games of outplay of each and any of all of the positions, engines, their versions, nets, parameter- settings, patches...

Last edited by peter on Tue Apr 22, 2025 1:14 pm, edited 1 time in total.
Peter.
-
- Posts: 47
- Joined: Sat Aug 15, 2020 8:08 am
- Full name: Frank Karger
Re: Top engines or top set of engines for solving test suites
Thank you for your insights, Peter!peter wrote: ↑Tue Apr 22, 2025 12:53 pmAnd they should be different, what sense would there be in positional testing anyway, if I'd just want and get the same results like in game playing?
"Problem", many testers and programmers have with positional testing, as it is done most of the times, is just this difference between the results out of game playing and out of positional tests. To me it's a feature, not a bug.
What you have to deal with (but I'm sure you know so): there isn't one test suite as well as there isn't one sinlge position of chess answering all the questions you can have to "playing strength" of engines (as well as that of humans) not even the very basic starting position of classical chess is of really much more meaning than other positions of interest are. You see that best today, if you try to get statistically meaningful results out of eng-eng-game playing from starting position (without books or given opening test positions) only, not even with very short TC and weak hardware you get out of error bar with reasonable amounts of games, as for more than or 2 single engines, their versions and settings. This kind of eng-eng-testing is drawn- dead already since quite a while too.
So which one suite of test positions out of opening (to let engines play out against each other, of course you can use opening positions for positional testing too, MEA is a way to go like this e.g. and I like to use it also, just to mention, I've got a suite of 1001 UHO- postions in MEA- syntax also) or out of midgame and endgame you use, if positions especially chosen as anti engine puzzles or out of eng-eng-games (NICE e.g., pity latest version of Ed's is still buggy
viewtopic.php?p=978298#p978298
and evaluated with too little hardware- time
viewtopic.php?p=975854#p975854
and those 10" are used single thread with MultiPV=4, see postings below that of the link and in second one recent thread about NICE, that's too little hardware- time for me to get halfway reliable evals of positions with near to each other candidate moves as for their WDL- chances, the biggest MEA- suite I use is 10124 positions, which I did let SF evaluate for 1 minute/pos., 30 threads of a 16x3.5GHz and MultiPV=4), which test- tool (besides MEA I like EloStatTS from Frank Schubert still very much) and which hardware- TC for what kind of engine- pool you use, that's what makes the real big differences, maybe a bigger one but letting all the positions, you're interested in, being outplayed eng-eng head to head one by one, or yet trust other ways of adjudicating certain kind posions without letting thousands of game being played out out of each and every single position
![]()
I agree that it is difficult to determine the playing strength of engines by using test positions.
That is probably the reason why they use billions of them in machine learning.
Another interesting question could be: what is the smallest amount of test positions
suited to precisely estimate the playing strength of an engine?
This could have practical relevance in machine learning or engine optimization.
At the moment this is not too important too me.
Currently I find it more interesting to see the engines having problems to solve
some of the positions and then to understand why.
-
- Posts: 3611
- Joined: Wed Mar 08, 2006 8:15 pm
- Full name: Jouni Uski
Re: Top engines or top set of engines for solving test suites
In my test suites ShashChess High Tal is the best solver. Version 35.1. Later are much weaker.
Jouni
-
- Posts: 47
- Joined: Sat Aug 15, 2020 8:08 am
- Full name: Frank Karger
Re: Top engines or top set of engines for solving test suites
Thank you Jouni!
I will try that version.
Is this https://github.com/amchess/ShashChess/releases/tag/35.1
the correct version (I dont see High Tal there) ?
-
- Posts: 3611
- Joined: Wed Mar 08, 2006 8:15 pm
- Full name: Jouni Uski
Re: Top engines or top set of engines for solving test suites
High Tal is UCI parameter in engine.
Jouni