Any advice on how to test a search implementation for correctness?
I've written a few tests to catch some of the edge cases like 3 move repetitions and 50 move rule. I am not quite sure though, how to test the general overall search function for correctness, especially as the complexity increases by adding transposition tables, quiescence search, etc.
I've thought about adding tests that test a number of endgame positions x moves to mate as an additional validation function, which I think would provide some benefit, just not sure how much though.
Testing the Search Function
Moderators: hgm, Rebel, chrisw
-
- Posts: 179
- Joined: Fri Feb 14, 2014 10:53 pm
- Location: the Netherlands
Re: Testing the Search Function
A lot of testpositions with special case themes and to which you know the solution and in how many ply. Usually you can simply construct such a position on the fly else you'll come across a lot of interesting positions in threads posted on here in the past.
But really simply observing the program play is a sure way to run into and catch almost any bug eventually.
All is fine till some special case where you notice some strange behaviour, look into the logs, stare at your code, stare at the walls, lose a few hours of your life then feel secure you've fixed everything till you notice..
A chessprogram is so complex that there's always bugs and oversights no matter how meticulous and smart you are.
Have to spend time with the program for it to mature but I suppose that's also the point.
But really simply observing the program play is a sure way to run into and catch almost any bug eventually.
All is fine till some special case where you notice some strange behaviour, look into the logs, stare at your code, stare at the walls, lose a few hours of your life then feel secure you've fixed everything till you notice..
A chessprogram is so complex that there's always bugs and oversights no matter how meticulous and smart you are.
Have to spend time with the program for it to mature but I suppose that's also the point.
-
- Posts: 893
- Joined: Mon Jan 15, 2007 11:23 am
- Location: Warsza
Re: Testing the Search Function
Most programs have incorrect search. TSCP has correct one, but its strength is about 1700 Elo. Techniques that can bring a program from 1700 to 3200 Elo progressively introduce more and more incorrectness. The real trick is to use these techniques in such a way that all their errors are hidden.
Having said that, there are things that a search has to do correctly. Beside things that You mentioned (repetitions, draw by 50 move rule), many search bugs can be detected by looking at simple endgames. If a program doesn't checkmate with K+R vs K, it's obviously broken. If it cannot see a simple win like 8/1k3ppp/8/5PPP/K7/8/8/8 w - - 0 1 it has a problem. If it gets these two right, harder stuff can be found in endgame manuals. In general endgames are good for exposing search errors.
Having said that, there are things that a search has to do correctly. Beside things that You mentioned (repetitions, draw by 50 move rule), many search bugs can be detected by looking at simple endgames. If a program doesn't checkmate with K+R vs K, it's obviously broken. If it cannot see a simple win like 8/1k3ppp/8/5PPP/K7/8/8/8 w - - 0 1 it has a problem. If it gets these two right, harder stuff can be found in endgame manuals. In general endgames are good for exposing search errors.
Pawel Koziol
http://www.pkoziol.cal24.pl/rodent/rodent.htm
http://www.pkoziol.cal24.pl/rodent/rodent.htm
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Testing the Search Function
Your examples are not very good. whether you can mate with K+R vs K is not always a search issue. And playing any type of game where evaluation is involved hides search issues.PK wrote:Most programs have incorrect search. TSCP has correct one, but its strength is about 1700 Elo. Techniques that can bring a program from 1700 to 3200 Elo progressively introduce more and more incorrectness. The real trick is to use these techniques in such a way that all their errors are hidden.
Having said that, there are things that a search has to do correctly. Beside things that You mentioned (repetitions, draw by 50 move rule), many search bugs can be detected by looking at simple endgames. If a program doesn't checkmate with K+R vs K, it's obviously broken. If it cannot see a simple win like 8/1k3ppp/8/5PPP/K7/8/8/8 w - - 0 1 it has a problem. If it gets these two right, harder stuff can be found in endgame manuals. In general endgames are good for exposing search errors.
Best way to debug is to run a few positions to a fairly minimal depth (say 6 plies) and dump the search tree. Then check what is searched in what order (hash move, which captures, killers, etc. See what was reduced or extended or pruned. Etc.
-
- Posts: 10282
- Joined: Thu Mar 09, 2006 12:37 am
- Location: Tel-Aviv Israel
Re: Testing the Search Function
[D]8/8/8/8/8/5k2/6R1/7K w - - 0 1bob wrote:Your examples are not very good. whether you can mate with K+R vs K is not always a search issue. And playing any type of game where evaluation is involved hides search issues.PK wrote:Most programs have incorrect search. TSCP has correct one, but its strength is about 1700 Elo. Techniques that can bring a program from 1700 to 3200 Elo progressively introduce more and more incorrectness. The real trick is to use these techniques in such a way that all their errors are hidden.
Having said that, there are things that a search has to do correctly. Beside things that You mentioned (repetitions, draw by 50 move rule), many search bugs can be detected by looking at simple endgames. If a program doesn't checkmate with K+R vs K, it's obviously broken. If it cannot see a simple win like 8/1k3ppp/8/5PPP/K7/8/8/8 w - - 0 1 it has a problem. If it gets these two right, harder stuff can be found in endgame manuals. In general endgames are good for exposing search errors.
Best way to debug is to run a few positions to a fairly minimal depth (say 6 plies) and dump the search tree. Then check what is searched in what order (hash move, which captures, killers, etc. See what was reduced or extended or pruned. Etc.
I think that even with no evaluation today programs should search deep enough to see the shortest mate in KR vs K so if your program does not see the shortest mate in less than a minute or see forced mate but cannot win then it clearly has a problem in the search that you need to fix.
Stockfish(3 cores) without tablebases see mate in 16 in few seconds.
Senpai(3 cores) needs more time but still see mate in 16 after 25 seconds without using a fast computer.
-
- Posts: 10282
- Joined: Thu Mar 09, 2006 12:37 am
- Location: Tel-Aviv Israel
Re: Testing the Search Function
I do not think that tscp has a correct search.PK wrote:Most programs have incorrect search. TSCP has correct one, but its strength is about 1700 Elo. Techniques that can bring a program from 1700 to 3200 Elo progressively introduce more and more incorrectness. The real trick is to use these techniques in such a way that all their errors are hidden.
Having said that, there are things that a search has to do correctly. Beside things that You mentioned (repetitions, draw by 50 move rule), many search bugs can be detected by looking at simple endgames. If a program doesn't checkmate with K+R vs K, it's obviously broken. If it cannot see a simple win like 8/1k3ppp/8/5PPP/K7/8/8/8 w - - 0 1 it has a problem. If it gets these two right, harder stuff can be found in endgame manuals. In general endgames are good for exposing search errors.
If I define a correct search as search that can give the right move in every position if you search for enough time then it certainly does not have a correct search because based on my memory there is a limit for the number of plies that tscp can search forward and not a big limit like 20000 when it is obvious that there is no mate in 10000 because there is going to be a draw by the 50 move rule earlier.
I also do not think that pruning techniques like LMR or null move pruning with verification search mean that the search is not correct search.
You can say that stockfish and other top programs do not have a correct search for different reasons(for example hash collisions can lead to a wrong move even if the probability is only 1 to 10^10)
-
- Posts: 893
- Joined: Mon Jan 15, 2007 11:23 am
- Location: Warsza
Re: Testing the Search Function
KR vs K allowed me to detect an obvious, bug in my first program. It kept showing mate scores without delivering checkmate (because of some mate scores incosistency). Fine no. 70 is a good test of hash table. Probably it should be possible to build a set of progressively more difficult endgame positions for testing search between these two. To minimize the influence of eval, wins should be crushing (checkmate, overwhelming material gain).
As for "correctness" - Uri is right that reductions are always correct in game-theoretic sense (given enough time, engine still should find a win). Speculative prunings (including null move) are not. Prunings turned into reductions (like verified null move) should be more correct, assuming perfect implementation of hash table, but there is ample scope for a "recursive error", becoming smaller and smaller but never eradiated completely.
The thing is, null move is verified only when there is sufficient depth remaining. Because of that, at some depth verification is done by the search that uses null move without verification, and therefore is incorrect in game-theoretical sense.
As for "correctness" - Uri is right that reductions are always correct in game-theoretic sense (given enough time, engine still should find a win). Speculative prunings (including null move) are not. Prunings turned into reductions (like verified null move) should be more correct, assuming perfect implementation of hash table, but there is ample scope for a "recursive error", becoming smaller and smaller but never eradiated completely.
The thing is, null move is verified only when there is sufficient depth remaining. Because of that, at some depth verification is done by the search that uses null move without verification, and therefore is incorrect in game-theoretical sense.
Pawel Koziol
http://www.pkoziol.cal24.pl/rodent/rodent.htm
http://www.pkoziol.cal24.pl/rodent/rodent.htm
-
- Posts: 27793
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: Testing the Search Function
KPK (Ke1, Pe2, Ke8) has also always been very useful for me.
-
- Posts: 179
- Joined: Fri Feb 14, 2014 10:53 pm
- Location: the Netherlands
Re: Testing the Search Function
Nemeton needs 5 seconds i5 1 core. No tricks just hash because this position is all hashtable.Uri Blass wrote: [D]8/8/8/8/8/5k2/6R1/7K w - - 0 1
But what's interesting to notice and reason to post this is how shorter and shorter mates get found here. Rarely see that happen quite so graphic.
Code: Select all
20 552 364 8301955 Rg8 Ke4 Kg2 Kd4 Kf3 Kd5 Rd8+ Ke5 Ke3 Kf5 Rd5+ Ke6 Ke4 Kf6 Rd6+ Kf7 Ke5 Ke7 Rd3 Ke8 Rd5 Ke7 Rd4 Kf7 Rd8 Ke7 Rd5 Kf7 Rd7+ Kf8 Rd4 Ke7 Rd2 Kf7 Rd8 Ke7 Rd5 Kf7 Rd7+ Kf8 Rd4 Ke7 Rd2 Kf7 Rd8
21 552 447 10236107 Rg8 Ke4 Kg2 Kd4 Kf3 Kd5 Rd8+ Ke5 Ke3 Kf5 Rd5+ Ke6 Ke4 Kf6 Rd6+ Kf7 Ke5 Ke7 Rd3 Kf7 Rd8 Ke7 Rd5 Kf7 Rd7+ Kf8 Rd4 Kf7 Rd8 Ke7 Rd5 Kf7 Rd7+ Kf8 Rd4 Kf7 Rd8
22 29860 521 11900629 Rg8 <++>
22 29944 584 13233180 Rg8 Ke3 Rd8 Ke4 Kg2 Ke3 Kg3 Ke4 Kg4 Ke5 Kf3 Kf5 Rd5+ Kf6 Ke4 Ke6 Rd4 Kf6 Rd6+ Kf7 Ke5 Ke7 Rd3 Kf7 Kf5 Ke7 Rd5 Kf8 Kf6 Ke8 Rd6 Kd8 M28#
22 29946 695 15603510 Rb2 Kf4 Rb4+ Kg5 Kg2 Kf5 Kf3 Ke5 Rb5+ Ke6 Ke4 Ke7 Re5+ Kd6 Kd4 Kc6 Re6+ Kc7 Kd5 Kd7 Rf6 Ke7 Rf4 Kd7 Rg4 Ke7 Rf4 Kd7 Rg4 Ke7 Rf4 M27#
22 29952 706 15815131 Ra2 Ke3 Kg2 Ke4 Kf2 Kd4 Ra5 Ke4 Ke2 Kd4 Kf3 Kd3 Ra4 Kc3 Ke3 Kb3 Re4 Kc3 Rh4 Kb3 Kd4 Ka2 Re4 Kb1 Re7 Ka1 Rd7 Kb2 Rd5 Kc2 Rc5+ M24#
23 29956 751 16730150 Ra2 Ke3 Kg2 Ke4 Re2+ Kd5 Kf3 Kd4 Re4+ Kc5 Kf4 Kd5 Kf5 Kd6 Rd4+ Kc5 Ke5 Kc6 Rg4 Kd7 Rg5 Kc7 Ke6 Kc6 Re5 M22#
23 29958 810 17924198 Rg8 Ke3 Kg2 Ke4 Re8+ Kd3 Kf3 Kd4 Rd8+ Kc4 Ke4 Kc5 Rd7 Kc6 Rd4 Kc5 Ke5 Kc6 Rg4 Kd7 Rg5 Kc7 Ke6 Kc6 Re5 M21#
24 29960 930 20354585 Rg8 Ke3 Kg2 Ke4 Re8+ Kd3 Kf3 Kd4 Rd8+ Kc4 Kf4 Kc5 Ke4 Kc4 Rc8+ Kb4 Kd3 Kb5 Kd4 Kb6 Rc5 Ka7 Ke5 Ka6 Kd6 Kb6 Rd5 Kb7 Rb5+ Ka6 Kc6 Ka7 Kc7 Ka6 Re5 Ka7 Ra5+ M20#
24 29962 968 21128487 Ra2 Kf4 Kg2 Ke4 Re2+ Kd5 Kf3 Kd4 Rd2+ Kc4 Ke4 Kc5 Rd1 Kc4 Rc1+ Kb4 Kd4 Kb5 Rc8 Kb6 Rc5 Ka7 Ke5 Ka6 Kd6 Kb6 Rd5 Kb7 Rb5+ Ka6 Kc6 Ka7 Kc7 Ka6 Re5 Ka7 Ra5+ M19#
25 29964 1078 23364115 Ra2 Kf4 Kg2 Ke4 Ra4+ Kd5 Kf3 Ke5 Ra5+ Kd6 Ke4 Kc6 Ke5 Kb6 Rd5 Kc7 Ke6 Kc6 Rg5 Kc7 Rc5+ Kb7 Kd6 Kb6 Rd5 Kb7 Rb5+ Ka6 Kc6 Ka7 Kc7 Ka6 Re5 Ka7 Ra5+ M18#
25 29966 1126 24329129 Rg8 Ke3 Rg4 Kf2 Kh2 Kf3 Kh3 Ke2 Rg3 Kd2 Kg2 Ke2 Kg1 Kd1 Kf1 Kd2 Kf2 Kd1 Ke3 Kc1 Rg4 Kb1 Re4 Kc2 Kd4 Kd2 Re3 Kc2 Re2+ Kb3 Rh2 Kb4 Rh5 Kb3 Rh2 Kb4 Rh5 Kb3 Rh2 M17#
26 29966 1280 27619554 Rg8 Ke4 Kg2 Kd4 Kf3 Kd5 Rd8+ Kc6 Ke4 Kc5 Rd5+ Kc4 Rg5 Kc3 Re5 Kc4 Ke3 Kc3 Rc5+ Kb4 Kd4 Kb3 Rc8 Kb2 Kd3 Kb3 Rc4 Ka2 Rc5 Kb3 Rf5 Kb4 Re5 Kb3 Rb5+ Ka4 Kc4 Ka3 Kc3 Ka2 Rb8 Ka1 Kc2 Ka2 Ra8+ M17#
27 29966 1473 31744568 Rg8 Ke3 Rg4 Kf3 Ra4 Kg3 Kg1 Kf3 Kf1 Ke3 Ke1 Kd3 Kf2 Kc3 Ke2 Kb3 Rh4 Kc3 Rg4 Kc2 Rc4+ Kb3 Kd3 Ka2 Rc5 Kb3 Rb5+ Ka4 Kc4 Ka3 Kc3 Ka2 Rb8 Ka1 Kc2 Ka2 Ra8+ M17#
28 29968 1631 35111732 Rg8 Ke3 Rg4 Kf3 Ra4 Kg3 Kg1 Kf3 Kf1 Ke3 Ke1 Kd3 Kf2 Kc3 Ke2 Kb3 Rh4 Kc3 Rg4 Kc2 Rg3 Kb1 Re3 Kb2 Kd3 Kb3 Re4 Ka2 Re5 M16#
29 29968 1794 38662501 Rg8 Ke3 Rg4 Kf3 Ra4 Kg3 Kg1 Kf3 Kf1 Ke3 Ke1 Kd3 Kf2 Kc3 Ke2 Kb3 Rh4 Kc3 Rf4 Kc2 Rd4 Kc3 Ra4 Kb3 Rh4 Kc3 Rf4 Kc2 Rd4 Kc3 Ra4 M16#
30 29968 2027 43664705 Rg8 Ke3 Rg4 Kf3 Ra4 Kg3 Kg1 Kf3 Kf1 Ke3 Ke1 Kd3 Kf2 Kc3 Ke2 Kb3 Rh4 Kc3 Rg4 Kc2 Rg3 Kb1 Re3 Kc1 Rc3+ Kb1 Ke3 Ka1 Kd2 Kb2 Re3 Ka1 Rb3 Ka2 Kc2 Ka1 Ra3+ M16#
31 29968 2243 48381985 Rg8 Ke3 Rg4 Kf3 Ra4 Kg3 Kg1 Kf3 Kf1 Ke3 Ke1 Kd3 Kf2 Kc3 Ke2 Kb3 Rh4 Kc3 Rg4 Kc2 Rg3 Kb1 Kd2 Kb2 Re3 Ka1 Rb3 Ka2 Kc2 Ka1 Ra3+ M16#
32 29968 2468 53298699 Rg8 Ke3 Rg4 Kf3 Ra4 Kg3 Kg1 Kf3 Kf1 Ke3 Ke1 Kd3 Kf2 Kc3 Ke2 Kb3 Rh4 Kc3 Rg4 Kc2 Rg3 Kc1 Rc3+ Kb1 Kd2 Kb2 Re3 Ka1 Rb3 Ka2 Kc2 Ka1 Ra3+ M16#
33 29968 2701 58322494 Rg8 Ke3 Rg4 Kf3 Ra4 Kg3 Kg1 Kf3 Kf1 Ke3 Ke1 Kd3 Kf2 Kc3 Ke2 Kb3 Rh4 Kc3 Rg4 Kc2 Rg3 Kc1 Rc3+ Kb1 Kd2 Kb2 Re3 Ka1 Rb3 Ka2 Kc2 Ka1 Ra3+ M16#
Respectively mate in 28, 27, 24, 22, 21, 20, 19, 18, 17, 17, 17 and finally 16 is found which I suppose is the correct solution. Robert Hyatt gets a point because I must admit my hash pv's are pretty crappy here.
-
- Posts: 7216
- Joined: Mon May 27, 2013 10:31 am
Re: Testing the Search Function
So far I've only tested 1 f4 * 2 g4. But black may as well play 1 .. Nf6.hgm wrote:KPK (Ke1, Pe2, Ke8) has also always been very useful for me.
1 .. d5 is less bad for Queen diagonal to h4 is still not blocked