Testing the Search Function

aaronmell · Post by **aaronmell** » Sun Oct 26, 2014 8:44 pm

Any advice on how to test a search implementation for correctness?

I've written a few tests to catch some of the edge cases like 3 move repetitions and 50 move rule. I am not quite sure though, how to test the general overall search function for correctness, especially as the complexity increases by adding transposition tables, quiescence search, etc.

I've thought about adding tests that test a number of endgame positions x moves to mate as an additional validation function, which I think would provide some benefit, just not sure how much though.

Stan Arts · Post by **Stan Arts** » Sun Oct 26, 2014 9:57 pm

A lot of testpositions with special case themes and to which you know the solution and in how many ply. Usually you can simply construct such a position on the fly else you'll come across a lot of interesting positions in threads posted on here in the past.

But really simply observing the program play is a sure way to run into and catch almost any bug eventually.
All is fine till some special case where you notice some strange behaviour, look into the logs, stare at your code, stare at the walls, lose a few hours of your life then feel secure you've fixed everything till you notice..

A chessprogram is so complex that there's always bugs and oversights no matter how meticulous and smart you are.
Have to spend time with the program for it to mature but I suppose that's also the point.

PK · Post by PK » Sun Oct 26, 2014 10:40 pm

Most programs have incorrect search. TSCP has correct one, but its strength is about 1700 Elo. Techniques that can bring a program from 1700 to 3200 Elo progressively introduce more and more incorrectness. The real trick is to use these techniques in such a way that all their errors are hidden.

Having said that, there are things that a search has to do correctly. Beside things that You mentioned (repetitions, draw by 50 move rule), many search bugs can be detected by looking at simple endgames. If a program doesn't checkmate with K+R vs K, it's obviously broken. If it cannot see a simple win like 8/1k3ppp/8/5PPP/K7/8/8/8 w - - 0 1 it has a problem. If it gets these two right, harder stuff can be found in endgame manuals. In general endgames are good for exposing search errors.

bob · Post by **bob** » Sun Oct 26, 2014 11:51 pm

PK wrote:Most programs have incorrect search. TSCP has correct one, but its strength is about 1700 Elo. Techniques that can bring a program from 1700 to 3200 Elo progressively introduce more and more incorrectness. The real trick is to use these techniques in such a way that all their errors are hidden.

Having said that, there are things that a search has to do correctly. Beside things that You mentioned (repetitions, draw by 50 move rule), many search bugs can be detected by looking at simple endgames. If a program doesn't checkmate with K+R vs K, it's obviously broken. If it cannot see a simple win like 8/1k3ppp/8/5PPP/K7/8/8/8 w - - 0 1 it has a problem. If it gets these two right, harder stuff can be found in endgame manuals. In general endgames are good for exposing search errors.

Your examples are not very good. whether you can mate with K+R vs K is not always a search issue. And playing any type of game where evaluation is involved hides search issues.

Best way to debug is to run a few positions to a fairly minimal depth (say 6 plies) and dump the search tree. Then check what is searched in what order (hash move, which captures, killers, etc. See what was reduced or extended or pruned. Etc.

Uri Blass · Post by **Uri Blass** » Mon Oct 27, 2014 8:29 am

bob wrote:
PK wrote:Most programs have incorrect search. TSCP has correct one, but its strength is about 1700 Elo. Techniques that can bring a program from 1700 to 3200 Elo progressively introduce more and more incorrectness. The real trick is to use these techniques in such a way that all their errors are hidden.

Having said that, there are things that a search has to do correctly. Beside things that You mentioned (repetitions, draw by 50 move rule), many search bugs can be detected by looking at simple endgames. If a program doesn't checkmate with K+R vs K, it's obviously broken. If it cannot see a simple win like 8/1k3ppp/8/5PPP/K7/8/8/8 w - - 0 1 it has a problem. If it gets these two right, harder stuff can be found in endgame manuals. In general endgames are good for exposing search errors.
Your examples are not very good. whether you can mate with K+R vs K is not always a search issue. And playing any type of game where evaluation is involved hides search issues.

Best way to debug is to run a few positions to a fairly minimal depth (say 6 plies) and dump the search tree. Then check what is searched in what order (hash move, which captures, killers, etc. See what was reduced or extended or pruned. Etc.

[D]8/8/8/8/8/5k2/6R1/7K w - - 0 1

I think that even with no evaluation today programs should search deep enough to see the shortest mate in KR vs K so if your program does not see the shortest mate in less than a minute or see forced mate but cannot win then it clearly has a problem in the search that you need to fix.

Stockfish(3 cores) without tablebases see mate in 16 in few seconds.
Senpai(3 cores) needs more time but still see mate in 16 after 25 seconds without using a fast computer.

Uri Blass · Post by **Uri Blass** » Mon Oct 27, 2014 9:12 am

PK wrote:Most programs have incorrect search. TSCP has correct one, but its strength is about 1700 Elo. Techniques that can bring a program from 1700 to 3200 Elo progressively introduce more and more incorrectness. The real trick is to use these techniques in such a way that all their errors are hidden.

Having said that, there are things that a search has to do correctly. Beside things that You mentioned (repetitions, draw by 50 move rule), many search bugs can be detected by looking at simple endgames. If a program doesn't checkmate with K+R vs K, it's obviously broken. If it cannot see a simple win like 8/1k3ppp/8/5PPP/K7/8/8/8 w - - 0 1 it has a problem. If it gets these two right, harder stuff can be found in endgame manuals. In general endgames are good for exposing search errors.

I do not think that tscp has a correct search.

If I define a correct search as search that can give the right move in every position if you search for enough time then it certainly does not have a correct search because based on my memory there is a limit for the number of plies that tscp can search forward and not a big limit like 20000 when it is obvious that there is no mate in 10000 because there is going to be a draw by the 50 move rule earlier.

I also do not think that pruning techniques like LMR or null move pruning with verification search mean that the search is not correct search.

You can say that stockfish and other top programs do not have a correct search for different reasons(for example hash collisions can lead to a wrong move even if the probability is only 1 to 10^10)

PK · Post by PK » Mon Oct 27, 2014 10:46 am

KR vs K allowed me to detect an obvious, bug in my first program. It kept showing mate scores without delivering checkmate (because of some mate scores incosistency). Fine no. 70 is a good test of hash table. Probably it should be possible to build a set of progressively more difficult endgame positions for testing search between these two. To minimize the influence of eval, wins should be crushing (checkmate, overwhelming material gain).

As for "correctness" - Uri is right that reductions are always correct in game-theoretic sense (given enough time, engine still should find a win). Speculative prunings (including null move) are not. Prunings turned into reductions (like verified null move) should be more correct, assuming perfect implementation of hash table, but there is ample scope for a "recursive error", becoming smaller and smaller but never eradiated completely.

The thing is, null move is verified only when there is sufficient depth remaining. Because of that, at some depth verification is done by the search that uses null move without verification, and therefore is incorrect in game-theoretical sense.

hgm · Post by **hgm** » Mon Oct 27, 2014 11:22 am

KPK (Ke1, Pe2, Ke8) has also always been very useful for me.

Stan Arts · Post by **Stan Arts** » Mon Oct 27, 2014 11:34 am

Uri Blass wrote: [D]8/8/8/8/8/5k2/6R1/7K w - - 0 1

Nemeton needs 5 seconds i5 1 core. No tricks just hash because this position is all hashtable.

But what's interesting to notice and reason to post this is how shorter and shorter mates get found here. Rarely see that happen quite so graphic.

Code: Select all

20  552    364     8301955    Rg8 Ke4 Kg2 Kd4 Kf3 Kd5 Rd8+ Ke5 Ke3 Kf5 Rd5+ Ke6 Ke4 Kf6 Rd6+ Kf7 Ke5 Ke7 Rd3 Ke8 Rd5 Ke7 Rd4 Kf7 Rd8 Ke7 Rd5 Kf7 Rd7+ Kf8 Rd4 Ke7 Rd2 Kf7 Rd8 Ke7 Rd5 Kf7 Rd7+ Kf8 Rd4 Ke7 Rd2 Kf7 Rd8 
21  552    447     10236107   Rg8 Ke4 Kg2 Kd4 Kf3 Kd5 Rd8+ Ke5 Ke3 Kf5 Rd5+ Ke6 Ke4 Kf6 Rd6+ Kf7 Ke5 Ke7 Rd3 Kf7 Rd8 Ke7 Rd5 Kf7 Rd7+ Kf8 Rd4 Kf7 Rd8 Ke7 Rd5 Kf7 Rd7+ Kf8 Rd4 Kf7 Rd8 
22  29860  521     11900629   Rg8 <++> 
22  29944  584     13233180   Rg8 Ke3 Rd8 Ke4 Kg2 Ke3 Kg3 Ke4 Kg4 Ke5 Kf3 Kf5 Rd5+ Kf6 Ke4 Ke6 Rd4 Kf6 Rd6+ Kf7 Ke5 Ke7 Rd3 Kf7 Kf5 Ke7 Rd5 Kf8 Kf6 Ke8 Rd6 Kd8 M28#
22  29946  695     15603510   Rb2 Kf4 Rb4+ Kg5 Kg2 Kf5 Kf3 Ke5 Rb5+ Ke6 Ke4 Ke7 Re5+ Kd6 Kd4 Kc6 Re6+ Kc7 Kd5 Kd7 Rf6 Ke7 Rf4 Kd7 Rg4 Ke7 Rf4 Kd7 Rg4 Ke7 Rf4 M27#
22  29952  706     15815131   Ra2 Ke3 Kg2 Ke4 Kf2 Kd4 Ra5 Ke4 Ke2 Kd4 Kf3 Kd3 Ra4 Kc3 Ke3 Kb3 Re4 Kc3 Rh4 Kb3 Kd4 Ka2 Re4 Kb1 Re7 Ka1 Rd7 Kb2 Rd5 Kc2 Rc5+ M24#
23  29956  751     16730150   Ra2 Ke3 Kg2 Ke4 Re2+ Kd5 Kf3 Kd4 Re4+ Kc5 Kf4 Kd5 Kf5 Kd6 Rd4+ Kc5 Ke5 Kc6 Rg4 Kd7 Rg5 Kc7 Ke6 Kc6 Re5 M22#
23  29958  810     17924198   Rg8 Ke3 Kg2 Ke4 Re8+ Kd3 Kf3 Kd4 Rd8+ Kc4 Ke4 Kc5 Rd7 Kc6 Rd4 Kc5 Ke5 Kc6 Rg4 Kd7 Rg5 Kc7 Ke6 Kc6 Re5 M21#
24  29960  930     20354585   Rg8 Ke3 Kg2 Ke4 Re8+ Kd3 Kf3 Kd4 Rd8+ Kc4 Kf4 Kc5 Ke4 Kc4 Rc8+ Kb4 Kd3 Kb5 Kd4 Kb6 Rc5 Ka7 Ke5 Ka6 Kd6 Kb6 Rd5 Kb7 Rb5+ Ka6 Kc6 Ka7 Kc7 Ka6 Re5 Ka7 Ra5+ M20#
24  29962  968     21128487   Ra2 Kf4 Kg2 Ke4 Re2+ Kd5 Kf3 Kd4 Rd2+ Kc4 Ke4 Kc5 Rd1 Kc4 Rc1+ Kb4 Kd4 Kb5 Rc8 Kb6 Rc5 Ka7 Ke5 Ka6 Kd6 Kb6 Rd5 Kb7 Rb5+ Ka6 Kc6 Ka7 Kc7 Ka6 Re5 Ka7 Ra5+ M19#
25  29964  1078    23364115   Ra2 Kf4 Kg2 Ke4 Ra4+ Kd5 Kf3 Ke5 Ra5+ Kd6 Ke4 Kc6 Ke5 Kb6 Rd5 Kc7 Ke6 Kc6 Rg5 Kc7 Rc5+ Kb7 Kd6 Kb6 Rd5 Kb7 Rb5+ Ka6 Kc6 Ka7 Kc7 Ka6 Re5 Ka7 Ra5+ M18#
25  29966  1126    24329129   Rg8 Ke3 Rg4 Kf2 Kh2 Kf3 Kh3 Ke2 Rg3 Kd2 Kg2 Ke2 Kg1 Kd1 Kf1 Kd2 Kf2 Kd1 Ke3 Kc1 Rg4 Kb1 Re4 Kc2 Kd4 Kd2 Re3 Kc2 Re2+ Kb3 Rh2 Kb4 Rh5 Kb3 Rh2 Kb4 Rh5 Kb3 Rh2 M17#
26  29966  1280    27619554   Rg8 Ke4 Kg2 Kd4 Kf3 Kd5 Rd8+ Kc6 Ke4 Kc5 Rd5+ Kc4 Rg5 Kc3 Re5 Kc4 Ke3 Kc3 Rc5+ Kb4 Kd4 Kb3 Rc8 Kb2 Kd3 Kb3 Rc4 Ka2 Rc5 Kb3 Rf5 Kb4 Re5 Kb3 Rb5+ Ka4 Kc4 Ka3 Kc3 Ka2 Rb8 Ka1 Kc2 Ka2 Ra8+ M17#
27  29966  1473    31744568   Rg8 Ke3 Rg4 Kf3 Ra4 Kg3 Kg1 Kf3 Kf1 Ke3 Ke1 Kd3 Kf2 Kc3 Ke2 Kb3 Rh4 Kc3 Rg4 Kc2 Rc4+ Kb3 Kd3 Ka2 Rc5 Kb3 Rb5+ Ka4 Kc4 Ka3 Kc3 Ka2 Rb8 Ka1 Kc2 Ka2 Ra8+ M17#
28  29968  1631    35111732   Rg8 Ke3 Rg4 Kf3 Ra4 Kg3 Kg1 Kf3 Kf1 Ke3 Ke1 Kd3 Kf2 Kc3 Ke2 Kb3 Rh4 Kc3 Rg4 Kc2 Rg3 Kb1 Re3 Kb2 Kd3 Kb3 Re4 Ka2 Re5 M16#
29  29968  1794    38662501   Rg8 Ke3 Rg4 Kf3 Ra4 Kg3 Kg1 Kf3 Kf1 Ke3 Ke1 Kd3 Kf2 Kc3 Ke2 Kb3 Rh4 Kc3 Rf4 Kc2 Rd4 Kc3 Ra4 Kb3 Rh4 Kc3 Rf4 Kc2 Rd4 Kc3 Ra4 M16#
30  29968  2027    43664705   Rg8 Ke3 Rg4 Kf3 Ra4 Kg3 Kg1 Kf3 Kf1 Ke3 Ke1 Kd3 Kf2 Kc3 Ke2 Kb3 Rh4 Kc3 Rg4 Kc2 Rg3 Kb1 Re3 Kc1 Rc3+ Kb1 Ke3 Ka1 Kd2 Kb2 Re3 Ka1 Rb3 Ka2 Kc2 Ka1 Ra3+ M16#
31  29968  2243    48381985   Rg8 Ke3 Rg4 Kf3 Ra4 Kg3 Kg1 Kf3 Kf1 Ke3 Ke1 Kd3 Kf2 Kc3 Ke2 Kb3 Rh4 Kc3 Rg4 Kc2 Rg3 Kb1 Kd2 Kb2 Re3 Ka1 Rb3 Ka2 Kc2 Ka1 Ra3+ M16#
32  29968  2468    53298699   Rg8 Ke3 Rg4 Kf3 Ra4 Kg3 Kg1 Kf3 Kf1 Ke3 Ke1 Kd3 Kf2 Kc3 Ke2 Kb3 Rh4 Kc3 Rg4 Kc2 Rg3 Kc1 Rc3+ Kb1 Kd2 Kb2 Re3 Ka1 Rb3 Ka2 Kc2 Ka1 Ra3+ M16#
33  29968  2701    58322494   Rg8 Ke3 Rg4 Kf3 Ra4 Kg3 Kg1 Kf3 Kf1 Ke3 Ke1 Kd3 Kf2 Kc3 Ke2 Kb3 Rh4 Kc3 Rg4 Kc2 Rg3 Kc1 Rc3+ Kb1 Kd2 Kb2 Re3 Ka1 Rb3 Ka2 Kc2 Ka1 Ra3+ M16#

Respectively mate in 28, 27, 24, 22, 21, 20, 19, 18, 17, 17, 17 and finally 16 is found which I suppose is the correct solution. Robert Hyatt gets a point because I must admit my hash pv's are pretty crappy here.

Henk · Post by **Henk** » Mon Oct 27, 2014 11:48 am

hgm wrote:KPK (Ke1, Pe2, Ke8) has also always been very useful for me.

So far I've only tested 1 f4 * 2 g4. But black may as well play 1 .. Nf6.
1 .. d5 is less bad for Queen diagonal to h4 is still not blocked

Testing the Search Function

Testing the Search Function

Re: Testing the Search Function

Re: Testing the Search Function

Re: Testing the Search Function

Re: Testing the Search Function

Re: Testing the Search Function

Re: Testing the Search Function

Re: Testing the Search Function

Re: Testing the Search Function

Re: Testing the Search Function