I've read about people using test positions to find strategic weaknesses in their engine, and I think its about time for me to try this out. I've seen some people here have some success with "IQ Test". I have some questions on how you actually address these strategic weaknesses though, and how much you can rely on test scores to improve your engine.
My main question is, can you use a set of test positions to tune your search parameters the same way you can for evaluation? I assume that if you take a bunch of positions that only have one clear best move, and then give your engine limited time to search them all, you can get a error that you minimize by changing things like the depths that certain heuristics are applied and such.
The main problem I see in this is that it is so obvious that there must either be something glaringly stupid about the idea, or it is something everybody already does and I am simply unaware of it. Can somebody enlighten me on this?
Tuning search parameters
Moderator: Ras
-
- Posts: 58
- Joined: Wed Mar 18, 2020 10:00 pm
- Full name: Jonathan McDermid
Tuning search parameters
Clovis GitHub
-
- Posts: 60
- Joined: Sat Dec 11, 2021 5:03 am
- Full name: expositor
Re: Tuning search parameters
I've never tried this, but I remember the Mantissa dev doing something similar. Here's what I recall, though I may be wrong:
He found several hundred thousand¹ positions that Mantissa blundered in self-play games – positions which had an only move² that Mantissa missed – and then tuned to maximize the number that she could play correctly. It didn't have any positive effect on general playing strength; his takeaway was that either the technique simply didn't work³ or that it would require much more data.
1 Maybe it was only tens of thousands? but I think more than that.
2 Determined by using several other engines, which had to unanimously agree.
3 Although unusual, perhaps it wouldn't be terribly surprising; puzzle strength, for example, seems to correlate more weakly with general playing strength than one might expect.
Actually, I suppose we should just ask him. @jtwright did I get all that (somewhat) right?
He found several hundred thousand¹ positions that Mantissa blundered in self-play games – positions which had an only move² that Mantissa missed – and then tuned to maximize the number that she could play correctly. It didn't have any positive effect on general playing strength; his takeaway was that either the technique simply didn't work³ or that it would require much more data.
1 Maybe it was only tens of thousands? but I think more than that.
2 Determined by using several other engines, which had to unanimously agree.
3 Although unusual, perhaps it wouldn't be terribly surprising; puzzle strength, for example, seems to correlate more weakly with general playing strength than one might expect.
Actually, I suppose we should just ask him. @jtwright did I get all that (somewhat) right?
-
- Posts: 16
- Joined: Fri Dec 27, 2019 8:47 pm
- Full name: Jacek Dermont
Re: Tuning search parameters
There is (quite oldish in todays standards
) paper how they used genetic algorithms to evolve evaluation function and tune search parameters. https://arxiv.org/abs/1711.08337
They used several thousands (low in todays standards) tactical positions and the fitness was the number of nodes needed to achieve desired move. At time, they had some moderate success.

They used several thousands (low in todays standards) tactical positions and the fitness was the number of nodes needed to achieve desired move. At time, they had some moderate success.
-
- Posts: 5695
- Joined: Tue Feb 28, 2012 11:56 pm
Re: Tuning search parameters
I don't see a fundamental difference between tuning evaluation parameters and search parameters by running an engine on a set of test positions.jmcd wrote: ↑Wed Dec 07, 2022 5:42 am I've read about people using test positions to find strategic weaknesses in their engine, and I think its about time for me to try this out. I've seen some people here have some success with "IQ Test". I have some questions on how you actually address these strategic weaknesses though, and how much you can rely on test scores to improve your engine.
My main question is, can you use a set of test positions to tune your search parameters the same way you can for evaluation? I assume that if you take a bunch of positions that only have one clear best move, and then give your engine limited time to search them all, you can get a error that you minimize by changing things like the depths that certain heuristics are applied and such.
The main problem I see in this is that it is so obvious that there must either be something glaringly stupid about the idea, or it is something everybody already does and I am simply unaware of it. Can somebody enlighten me on this?
However, tuning an engine on test positions will only optimise the engine for a specific set of positions and not for playing real games. The best way to tune an engine is by letting it play many (fast) games (typically against its unmodified self).