Having only a very weak evaluation yet, I am about to verify / optimize my quiescence search, move sorting, and killer heuristic. I have some test positions, to which I evaluate all possible moves by unshrinked quiescence and count every call of the quiescence routine.
Are there any comparable node counts e.g. to following position:
How many times you call quiescence depends on your scoring routine (it will change how much you cutoff), among other things. So counting these nodes is not useful.
For verifying you have reasonable move ordering you can instrument your code so that it measures how often the first move is successful (causes cutoff or is returned as the best move at the end of qsearch), how often it is the 2nd move, etc.
You should be getting a high percentage for the first few moves. The first four moves will probably sum to 90% or higher.
Note though for the quiescence search, the ordering methods are usually different than for the main search and there are usually far fewer moves to search. So you will get different numbers there, but still the first few moves should be successful most of the time.
While i can see it may be interesting to compare with a top engine, it's not really as simple as a node count as the philosophy of a qsearch may vary significantly from engine to engine.
- try bad (SEE < 0) captures?
- try futile captures (capture value expected to be insufficient to avoid next player standing pat and failing high)?
- generate full evasions while king in check instead of standing pat?
- allow safe(SEE >=0) noncapture checks for the first N ply of quiescence?
- allow unsafe noncapture checks for the first N ply of quiescence?
The philosophy of answering no to any of those questions is that you'd rather the quiescence to be as small as possible and you'll deal with the less common issues those moves might provide in a future iteration where the node is being considered more fully. Answering yes to those questions you're trying to catch some remaining outstanding tactics on the current iteration.
Furthermore, pathological cases like the example you tested are not really so important (beyond the fact that they do terminate) as positions that are typical within the search (which while different than positions that are typical within a game are not so different as to match with your example).
Well I see, that modifying and testing of quiescence search will not make much sense, before a usable evaluation function is implemented. When I have added a simple king savety component, I already got: