Are the WAC300 positions still a valuable test?

JoAnnP38 · Post by **JoAnnP38** » Mon May 01, 2023 12:12 am

I know some people are still using the positions from Winning at Chess as a means to test the accuracy of their engine and they are getting pretty close to 100% success rate against those positions. I just ran the test with the latest version of Pedantic and I was able to find the correct move in 296 out of 300 positions (even though the book discusses 301 positions.) However, how much effort are people using these tests putting in to make sure their engine is 100% accurate? Also, is it a conflicting goal of achieving 100% accuracy and strongest Elo? Maybe I'm feeling lazy, but I'm thinking that 98.7% success rate is good enough and I have bigger fish to fry to strengthen Pedantic. Are there any other public domain test suites I can use that are better than these? I've tried using the STS tests but I just don't have enough information about them to make sure I'm using them correctly.

smatovic · Post by **smatovic** » Mon May 01, 2023 7:47 am

JoAnnP38 wrote: ↑Mon May 01, 2023 12:12 am I've tried using the STS tests but I just don't have enough information about them to make sure I'm using them correctly.

I use STS1-15 LAN by Ferdy, there is a "bm" bestmove, and different moves are rated with a score, hence it is open to count the bestmove hits or the score, recent versions are evaluated by SF15 with 60s, so in principle you test against SF search+eval:

https://github.com/fsmosca/STS-Rating
https://github.com/fsmosca/STS-Rating/b ... LAN_v6.epd

https://talkchess.com/forum3/viewtopic.php?f=2&t=80876
https://talkchess.com/forum3/viewtopic. ... 10#p945710

--
Srdja

JVMerlino · Post by **JVMerlino** » Mon May 01, 2023 8:29 pm

JoAnnP38 wrote: ↑Mon May 01, 2023 12:12 am I know some people are still using the positions from Winning at Chess as a means to test the accuracy of their engine and they are getting pretty close to 100% success rate against those positions. I just ran the test with the latest version of Pedantic and I was able to find the correct move in 296 out of 300 positions (even though the book discusses 301 positions.) However, how much effort are people using these tests putting in to make sure their engine is 100% accurate? Also, is it a conflicting goal of achieving 100% accuracy and strongest Elo? Maybe I'm feeling lazy, but I'm thinking that 98.7% success rate is good enough and I have bigger fish to fry to strengthen Pedantic. Are there any other public domain test suites I can use that are better than these? I've tried using the STS tests but I just don't have enough information about them to make sure I'm using them correctly.

It's definitely "good enough", depending on how much time you gave your engine to solve each position. Also note that WAC 230 is considered to not be a good test position as the solution is not clear (both Rb4 and Kb5 appear to be valid). My mediocre engine gets 297 correct within five seconds at one core. More than good enough for me.

Testing with WAC is really just a way to make sure that your engine can quickly solve relatively easy tactical positions. You can find lots of test sets here:
https://www.chessprogramming.org/Test-Positions

But mostly, it's all about lots of games. I only use test sets as a nice change of pace (STS is one of them, along with the Eigenmann Endgame Test), and to make sure I haven't broken anything.

jm

syzygy · Post by **syzygy** » Wed May 03, 2023 9:42 pm

JoAnnP38 wrote: ↑Mon May 01, 2023 12:12 amAlso, is it a conflicting goal of achieving 100% accuracy and strongest Elo?

In practical terms, yes.

Running test suites is not the best way to track engine progress. Playing many games is.

Some decades ago it wasn't very feasible to play huge amounts of games at a time control that still allows for meaningful play, so people ran test suites. (Also, there wasn't necessarily the awareness that playing many bullet games was the way to go. I think this was Rajlich's main innovation.)

Of course test suites can be a good way to identify major engine weaknesses or bugs in the early phase of development.

lithander · Post by **lithander** » Thu May 04, 2023 12:33 pm

A decent chess engine (which I hope I'm developing) should be able to solve most (>97%) of the position within less than a second. And I use the WAC set mainly to make sure that Leorik doesn't develop new blind spots. For example I have had aggressive pruning ideas that would improve Elo in selfplay but hurt the performance on the WAC set massively.

You could say I use the WAC for regression testing in my development process.

Jouni · Post by **Jouni** » Thu May 04, 2023 8:36 pm

I think 200 position Arasan suite is quite good https://www.arasanchess.org/arasan21.epd. With 15s limit (4 threads) engine should score about 150. Only some SF clones score 190 or more.

JVMerlino · Post by **JVMerlino** » Fri May 05, 2023 7:07 pm

Jouni wrote: ↑Thu May 04, 2023 8:36 pm I think 200 position Arasan suite is quite good https://www.arasanchess.org/arasan21.epd. With 15s limit (4 threads) engine should score about 150. Only some SF clones score 190 or more.

It had been a while since I ran the Arasan suite with Myrddin (~2600 elo CCRL), since it had always done so poorly on it. So I gave it another go, and now I'm either VERY embarrassed, or your expectation of scoring 150 at 15s with 4 threads is for a 3000+ elo engine. Because Myrddin only got TEN correct.

To clarify, this was the 2020 version of the test suite, as I didn't know that the one you linked was different from the one I already had. It has about 10 positions that are different from the 2021 suite, but I doubt it would have made much of a difference.

Thoughts?

JoAnnP38 · Post by **JoAnnP38** » Fri May 05, 2023 8:03 pm

JVMerlino wrote: ↑Fri May 05, 2023 7:07 pm
Jouni wrote: ↑Thu May 04, 2023 8:36 pm I think 200 position Arasan suite is quite good https://www.arasanchess.org/arasan21.epd. With 15s limit (4 threads) engine should score about 150. Only some SF clones score 190 or more.
It had been a while since I ran the Arasan suite with Myrddin (~2600 elo CCRL), since it had always done so poorly on it. So I gave it another go, and now I'm either VERY embarrassed, or your expectation of scoring 150 at 15s with 4 threads is for a 3000+ elo engine. Because Myrddin only got TEN correct.

To clarify, this was the 2020 version of the test suite, as I didn't know that the one you linked was different from the one I already had. It has about 10 positions that are different from the 2021 suite, but I doubt it would have made much of a difference.

Thoughts?

LMAO, I was embarrassed to respond, but Pedantic only was able to get 18 out of 200 at 90s/move! Of course, I'm only a single threaded engine currently rated somewhere between 2400 - 2550 (still waiting for confirmation from CCRL) but still, that was humbling.

JoAnnP38 · Post by **JoAnnP38** » Fri May 05, 2023 8:22 pm

lithander wrote: ↑Thu May 04, 2023 12:33 pm A decent chess engine (which I hope I'm developing) should be able to solve most (>97%) of the position within less than a second. And I use the WAC set mainly to make sure that Leorik doesn't develop new blind spots. For example I have had aggressive pruning ideas that would improve Elo in selfplay but hurt the performance on the WAC set massively.

You could say I use the WAC for regression testing in my development process.

I wish I had been running the tests a little more religiously after every update. Somewhere along the line I am now one position less accurate!

JVMerlino · Post by **JVMerlino** » Fri May 05, 2023 11:59 pm

JoAnnP38 wrote: ↑Fri May 05, 2023 8:22 pm
lithander wrote: ↑Thu May 04, 2023 12:33 pm A decent chess engine (which I hope I'm developing) should be able to solve most (>97%) of the position within less than a second. And I use the WAC set mainly to make sure that Leorik doesn't develop new blind spots. For example I have had aggressive pruning ideas that would improve Elo in selfplay but hurt the performance on the WAC set massively.

You could say I use the WAC for regression testing in my development process.
I wish I had been running the tests a little more religiously after every update. Somewhere along the line I am now one position less accurate!

Not a big deal at all if the engine is stronger. I'll run Myrddin at 90s with one thread overnight and we'll see who is more embarrassed.

jm

Are the WAC300 positions still a valuable test?

Are the WAC300 positions still a valuable test?

Re: Are the WAC300 positions still a valuable test?

Re: Are the WAC300 positions still a valuable test?

Re: Are the WAC300 positions still a valuable test?

Re: Are the WAC300 positions still a valuable test?

Re: Are the WAC300 positions still a valuable test?

Re: Are the WAC300 positions still a valuable test?

Re: Are the WAC300 positions still a valuable test?

Re: Are the WAC300 positions still a valuable test?

Re: Are the WAC300 positions still a valuable test?