Minimax, noisy evaluations, PUCT

grahamj · Post by **grahamj** » Wed Mar 20, 2019 3:58 pm

Minimax is known to have a bias which does not diminish with depth. The paper
Bias and pathology in minimax search, 2005, A. Sadikov, I. Bratko, I. Kononenko
uses KRK endings to investigate this.

Here's a simplified situation which I was able to analyse mathematically. Suppose the branching factor is 2, the true value of all nodes is 0, and that the evaluation function gives noisy estimates of this, independent random variables which are symmetric about 0, with cdf F. The minimax evaluation at the root tends towards F^-1((sqrt(5)-1)/2) ~= F^-1(.618) as the depth tends to infinity. It's funny to see the golden ratio appearing. If the noise follows a standard normal, the median root evaluations look like this.

Code: Select all

Ply    root eval
0        0       median of standard normal
1        0.545   median of max(child evals)
2        0.103   
3        0.46 
4        0.216 
5        0.369 
6        0.245 
7        0.345 
8        0.264 
9        0.33 
10        0.277 
11        0.32 
12        0.285 
13        0.313 
14        0.29 
15        0.309 
16        0.294 
17        0.306 
18        0.296 
19        0.304 
20        0.297 
HUGE      0.3003214

In a real game, the bias will occur when there is more than one nearly optimal move. This is not necessarily a bad thing. Positions with several nearly optimal moves are arguably superior to ones with only one. I found this from 2007 viewtopic.php?f=7&t=13433&p=114597&hili ... ax#p114597, where H.G.Muller suggests adding extra noise to obtain the effect, apparently unaware of the bias that already exists.

For fixed-depth minimax, the bias does not affect the ordering of evaluations. I guess there is some interaction with quiescent search and the width of aspiration windows. But I guess the bias is not a big issue for traditional engines.

On the other hand, if the search is to variable depths, as in PUCT, the even-odd oscillations seem problematic. I wonder if this part of why DeepMind found mean() better than max() for PUCT.

hgm · Post by **hgm** » Thu Mar 21, 2019 7:37 am

Note that the oscillation is the opposite from what seems reasonable: it puts the side to move in the leaves at a disadvantantage. In reality one would expect that side to have the advantage. A side-to-move bonus in the static eval could cure this.

Minimax, noisy evaluations, PUCT

Minimax, noisy evaluations, PUCT

Re: Minimax, noisy evaluations, PUCT