Statistical Interpretation

D Sceviour · Post by **D Sceviour** » Thu Dec 15, 2016 3:32 pm

Robert Pope wrote:So if you are doing fixed depth testing, it will always show null-move as a loser and you will never incorporate it.

Hello Robert,
I do not follow. That cannot happen unless maximum depth is treated as equal to the fixed depth. Only then would cutoffs corrupt the search.

You go a full ply deeper in the same amount of time and uncover new tactics that will be missed if you can only get to 10.

How does this differ from a normal search?

Robert Pope · Post by **Robert Pope** » Thu Dec 15, 2016 4:32 pm

D Sceviour wrote:
Robert Pope wrote:So if you are doing fixed depth testing, it will always show null-move as a loser and you will never incorporate it.
Hello Robert,
I do not follow. That cannot happen unless maximum depth is treated as equal to the fixed depth. Only then would cutoffs corrupt the search.

I don't follow your response, so perhaps we are talking past each other. Let me post a different example, that might be easier to talk about.

Let's say you want to test the following change to your model: instead of potentially searching every move at each depth, you automatically produce a cutoff after the first 5 moves tried. As a result, a typical d10 search requires only 5,000 nodes instead of 25,000,000 (branching factor reduced from 5.5 to 2.3).

If you do fixed depth testing, there is no theoretical way that this patch can be considered an improvement. For any fixed depth search, they will either return the exact same move, or the patch version will return what is considered an inferior move because it pruned a refutation or a needed continuation. So fixed depth testing would always reject this patch. Do you agree with this assessment?

If your program has random move sorting, then the best move is guaranteed not to be played (or even considered!) at least 10% of the time, and the patch will perform poorly under real tournament conditions.

But what if your program already uses a super awesome move sorting algorithm you developed that 99.99% of the time manages to put the best move in the top 3? In that case, the patch will never prune a move it shouldn't, but searches are sped up by a factor of 5,000. The patch would perform much, much better in actual games and should definitely be added. And a new patch that only considered the first 3 moves would perform even better.

But you never get there - you rejected the patch because it performs worse at any given fixed depth.
-----
Or consider a patch that runs the eval 5 times at each leaf, to make sure cosmic rays didn't flip a bit in the result? By fixed depth testing (and even fixed node testing), this is a neutral patch. But it obviously slows your program by almost a factor of 5, and is a bad patch.

You go a full ply deeper in the same amount of time and uncover new tactics that will be missed if you can only get to 10.
How does this differ from a normal search?

That's sort of the point - it is much closer to a normal search than a depth-based search is, so you don't hit the same type of limitations, but it is still fully reproducible and separated from timing issues.

D Sceviour · Post by **D Sceviour** » Thu Dec 15, 2016 5:21 pm

Robert Pope wrote:If you do fixed depth testing, there is no theoretical way that this patch can be considered an improvement. For any fixed depth search, they will either return the exact same move, or the patch version will return what is considered an inferior move because it pruned a refutation or a needed continuation. So fixed depth testing would always reject this patch. Do you agree with this assessment?

Yes. If the search is functioning theoretically perfect, then the PV should remain the same for a fixed depth. Only the number of nodes should be reduced, and both engines should produce equal results. In the above tests, I do suspect the history move is pruning a needed continuation and that is what makes it overall useless (as also posted earlier). Therefore the fixed depth search has still revealed results, hopefully faster and more accurately than a timed test. One has to be very careful on how to interpret the statistics from a test, which is the purpose of the topic.

The exercise has not been futile. I will look closer at move ordering algorithm now and the overall effect on the insertion of a history move.

Karlo Bala · Post by **Karlo Bala** » Sat Dec 17, 2016 1:13 am

D Sceviour wrote:
Robert Pope wrote:If you do fixed depth testing, there is no theoretical way that this patch can be considered an improvement. For any fixed depth search, they will either return the exact same move, or the patch version will return what is considered an inferior move because it pruned a refutation or a needed continuation. So fixed depth testing would always reject this patch. Do you agree with this assessment?
Yes. If the search is functioning theoretically perfect, then the PV should remain the same for a fixed depth. Only the number of nodes should be reduced, and both engines should produce equal results. In the above tests, I do suspect the history move is pruning a needed continuation and that is what makes it overall useless (as also posted earlier). Therefore the fixed depth search has still revealed results, hopefully faster and more accurately than a timed test. One has to be very careful on how to interpret the statistics from a test, which is the purpose of the topic.

The exercise has not been futile. I will look closer at move ordering algorithm now and the overall effect on the insertion of a history move.

In a fixed depth search, every pruning/reduction is a loser, every extension is a winner (it is (almost) always better to search more nodes). In real games, quite often it is the opposite. It is acceptable for fast eval testing.

D Sceviour · Post by **D Sceviour** » Sat Dec 17, 2016 3:49 pm

Karlo Bala wrote:In a fixed depth search, every pruning/reduction is a loser, every extension is a winner (it is (almost) always better to search more nodes). In real games, quite often it is the opposite. It is acceptable for fast eval testing.

Yes that is true, but an extra killer/history move for a simple Shannon beta cutoff is being tested here. Most of the cutoff work is performed by the hash best move or the first killer. Very few nodes are reduced by the addition of the history move. Internal debug statistics show that the history move does produce a few hits, but at a great expense to a necessary width of search. The problem has been bothering me for some time. How is it that Stockfish and Crafty get away with the extra history cutoff without damaging the strength of the engine? I could assemble Stockfish and Crafty versions and try to figure it out myself, but maybe someone has a simpler explanation.

Statistical Interpretation

Re: Statistical Interpretation

Re: Statistical Interpretation

Re: Statistical Interpretation

Re: Statistical Interpretation

Re: Statistical Interpretation