Perft(15) estimates thread
Moderators: hgm, Dann Corbit, Harvey Williamson
Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
Perft(15) estimates thread
This is the perft(15) estimates thread.
Mine: 2e+21
Mine: 2e+21
Re: Perft(15) estimates thread
I fully expect to come back to this forum in a handful of years and discover that you guys have actually calculated perft(15).
It will probably have taken 8 months using a bunch of thenstateoftheart monster machines with 512GB of RAM, 256 beefy cores, a petabyte offline storage array, ...
It will probably have taken 8 months using a bunch of thenstateoftheart monster machines with 512GB of RAM, 256 beefy cores, a petabyte offline storage array, ...

 Posts: 4105
 Joined: Tue Mar 14, 2006 10:34 am
 Location: Ethiopia
 Contact:
MC methods
I have a perft estimator runnable on cluster now. This perft estimation stuff has proven so useful as a playground for UCT search. So I am planning to write a summary of monte carlo methods perft estimation methods based on discussion we had couple of years ago. Maybe something good enough for chess wiki or just a formal document.

I have some doubts about proportioning. Let me start from Michel's formula.
For two variables , I can explicitly derive that Ni should be proportional to sqrt(V1). For more than three variables, I had problems to derive the same thing, but some clever mathematician showed me how to do it using lagrange multiplier.
Indeed the proof becomes simple that way. Anyway is the formula with the mean used only the montecarlo part where we don't save the means? I think for intree updates it may not be necessary. Ofcourse the variance may be proportional to tree size indirectly so it has effect the mean has in Michel's formula indirectly.
Daniel
 * The sampling distribution is lognormal, derived from the fact that perft estimate is a product of branching factors.
* Non uniform sampling of moves is better than uniform sampling. This fact becomes clear when the simulations are started from a larger depth say 3. Each of the 20 root moves will be simulated different number of times naturally by a depthfirst searcher, and by proportioning by the subtree size (number of leaf nodes to be exact) when using bestfirst searcher. For the latter the proportioning of simulations is not so clear. I have about 5 formulas that i tested. I forgot the specific motive for each formula but in general proportioning by some function of the tree size (assumed proportional to variance). Which one is the best is still unsettled. See below for more.
* Combining depthfirst and bestfirst estimators proved to be better. Also adding a selective layer helped so.* Improvements for the Montecarlo part includeCode: Select all
Dynamic tree (different depth) + K full depth + L selective depth + Montecarlo (exactly 1 move)
  proportioning by heuristics (captures etc.)
 'Anthithetic' variables for variance reduction
 Hashing
  proportioning by heuristics (captures etc.)

I have some doubts about proportioning. Let me start from Michel's formula.
I do not remember why p_i depended on the mean value x_i, but thinking about it now from scratch, I get p_i = mu sqrt(sigma_i^2). The problem is that of variance reduction. Say the original variances are Vi=sigma_i^2. Now given N simulations N_i for each, how to proportion to arrive at the minimum variance? With Ni simulations variance will be reduced to Vi/Ni so:if one wants to estimate x_1+...+x_n and
one has unbiased estimators X_i for x_i with variance sigma_i^2 then
if I calculated correctly one should choose X_i with probability
p_i=mu sqrt(x_i^2+sigma_i^2)
where mu should be adjusted such that the sum of the p_i is 1.
Code: Select all
Objective: Minimize(V1/N1+V2/N2+...Vn/Nn)
Constraint: N = N1+N2+...+Nn
Indeed the proof becomes simple that way. Anyway is the formula with the mean used only the montecarlo part where we don't save the means? I think for intree updates it may not be necessary. Ofcourse the variance may be proportional to tree size indirectly so it has effect the mean has in Michel's formula indirectly.
Daniel
Re: Perft(15) estimates thread
What a waste of time and electricity. Think about the environmentwgarvin wrote:I fully expect to come back to this forum in a handful of years and discover that you guys have actually calculated perft(15).
It will probably have taken 8 months using a bunch of thenstateoftheart monster machines with 512GB of RAM, 256 beefy cores, a petabyte offline storage array, ...
Perft is only a unit test. Nothing interesting in itself.
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.

 Posts: 4105
 Joined: Tue Mar 14, 2006 10:34 am
 Location: Ethiopia
 Contact:
Re: Perft(15) estimates thread
I need this paper from ICGA (the international chewing gam association ):
"Computing Deep Perft and Divide Numbers for Checkers", Art Bik
@Steven, Do you have a some kind of article on your work that I can cite?
Thanks
"Computing Deep Perft and Divide Numbers for Checkers", Art Bik
@Steven, Do you have a some kind of article on your work that I can cite?
Thanks

 Posts: 4105
 Joined: Tue Mar 14, 2006 10:34 am
 Location: Ethiopia
 Contact:
Re: MC methods
Alright i have started writing something now with some introduction and some descritpion of hocuspocus methods. This is basically a summary of what i can find on the net and hopefully more interesting stuff follows as we go on. You can follow the progress at this public drop box link. I will work through it at a convenient pace while doing some tests in parallel. Let me know if you have comments. I know my writing skills suck so go easy on that

 Posts: 4105
 Joined: Tue Mar 14, 2006 10:34 am
 Location: Ethiopia
 Contact:
Re: MC methods
Wow, It is hard to digest a 2 year old 58 pages material, but i finally spotted Michel mentioned the proof by Lagrange multiplier here at page 55!.For two variables , I can explicitly derive that Ni should be proportional to sqrt(V1). For more than three variables, I had problems to derive the same thing, but some clever mathematician showed me how to do it using lagrange multiplier.
Still trying to find out why mean is included in the formula sqrt(mean^2+sigma^2) but i suspect it applies only to the MC part where statistics is not kept. I wouldn't know because i never used it.
@Michel, if you see this please do the proofs for both so that i can include it verbatim

 Posts: 4105
 Joined: Tue Mar 14, 2006 10:34 am
 Location: Ethiopia
 Contact:
Re: MC methods
I think i have hit at something Gold. It turns out that Michel's best proportioning formula and my best selection algorithm selection algorithm are the same!! At first look they look different but I can prove that they lead to the same proportioning scheme. The proof is not very formal but I will find out a formal mathematical proof of equivalence later. I need to dig deeper to convergence of infinite series to prove it...

Lets say we are given N simulations to spend on k moves at the root. The variances for subtree size under each move are \sigma_i^2. The question Michel tried to answer is what should be the final proportion delivered to each move so that the variance of the perft estimate will be minumum. As Michel have already done so , and as i have confirmed now, it is possible to prove using Lagrange multipliers that proportioning by standard deviation is the best.

The question that I tried to answer is which move to pick to simulate given number variances and completed number of simulations for each move. Note that my approach is a selection algorithm similar to UCB formula. As I have stated 2 years ago, pick the move which gives the highest variance reduction. Now when you think about this it seems a completely different algorithm than Michel's. Here is how i arrived at my formula as posted in that old thread. At the top of the page, you can see that Michel is trying to convince me to use standard deviation proportioning. And towards the bottom of the page I am trying to convince him mine is the best. It will be really funny if this equivalence i claim now is indeed true.
The derivation is simple. Pick the move which gives the biggest reduction in variance.
The advantage of this method is that any time you stop the simulations the simulations will have followed more or less the standard deviation proportioning. Michel's proportioning formula, by definition, can not be used for selection as it only predicts the final proportion, and not how it will be achieved.

My 'proof' for now is to just simulate my selection and see to what kind of proportioning it leads to. It turns out it proportions to square of the variance i.e. standard deviation. Here is a matlab code to test the idea:
Here is how n varies
Clearly the proportions follow the population standard devition \sigma_i (doubling in this case). The is my proof for _now_ but i need to formalize it probably with some math guy but i will challenge my self to do it without any help.
Daniel

Lets say we are given N simulations to spend on k moves at the root. The variances for subtree size under each move are \sigma_i^2. The question Michel tried to answer is what should be the final proportion delivered to each move so that the variance of the perft estimate will be minumum. As Michel have already done so , and as i have confirmed now, it is possible to prove using Lagrange multipliers that proportioning by standard deviation is the best.
Code: Select all
Objective: Minimize(V1/N1+V2/N2+...Vn/Nn)
Constraint: N = N1+N2+...+Nn
Code: Select all
Answer: Proportion by \sigma_i
The question that I tried to answer is which move to pick to simulate given number variances and completed number of simulations for each move. Note that my approach is a selection algorithm similar to UCB formula. As I have stated 2 years ago, pick the move which gives the highest variance reduction. Now when you think about this it seems a completely different algorithm than Michel's. Here is how i arrived at my formula as posted in that old thread. At the top of the page, you can see that Michel is trying to convince me to use standard deviation proportioning. And towards the bottom of the page I am trying to convince him mine is the best. It will be really funny if this equivalence i claim now is indeed true.
Code: Select all
Original variance = sigma^2 / N
After one game = sigma^2 / (N + 1)
Reduction = sigma^2 / N  sigma^2 / (N + 1)
= sigma^2 (1 / N  1 / N + 1)
= sigma^2 (1 / N * (N + 1)
= (sigma^2 / N) / (N + 1)
= Original variance / (N + 1)
Code: Select all
pick \argmax_i \frac {\sigma_i^2}{N_i(N_i+1)}

My 'proof' for now is to just simulate my selection and see to what kind of proportioning it leads to. It turns out it proportions to square of the variance i.e. standard deviation. Here is a matlab code to test the idea:
Code: Select all
sigma = [32 16 8 4 2 1];
n = [1 1 1 1 1 1];
for i=1:100
maxr=0;
maxj=1;
for j=1:6
t=(sigma(j)^2)/(n(j)*(n(j)+1));
if(t>maxr)
maxr=t;
maxj=j;
end
end
n(maxj)=n(maxj)+1;
n
end
Code: Select all
2 1 1 1 1 1
3 1 1 1 1 1
3 2 1 1 1 1
4 2 1 1 1 1
5 2 1 1 1 1
5 3 1 1 1 1
6 3 1 1 1 1
6 3 2 1 1 1
.....
13 6 3 2 1 1
13 7 3 2 1 1
14 7 3 2 1 1
...
54 27 13 7 3 2
...
5082 2541 1271 635 318 159
Daniel

 Posts: 4105
 Joined: Tue Mar 14, 2006 10:34 am
 Location: Ethiopia
 Contact:
Re: MC methods
Ok the proof by inifinite series seems to be really hard since there is a max term in it. Also after a certain number of simulations the proportions will hang around the best proportion so i am not sure if it was correct to think of it an an infinite series. The problem is:
Trying to prove equivalence the straight forward way maybe impossible. So i try to prove it by other means. Here goes:
Reduction of variance is a monotonically decreasing function i.e. making one more simulation will always decrease the variance. The same is true for the total perft that adds the the means and variances. Now if i take the best strategy at each iteration (gready selection of largest reduction) it should lead me to the optimum proportion. For a nonmonotonic function this is not true, so it is important that the reduction is monotonic. Monotonic functions have one unique optimal point., there fore the proportion we get following this procedure should be optimum. Then we go back to Michel's proof that proportioning by standard deviation gives the least variance, so it must be the optimum point for my selection method too since the function (variance reduction) is monotonic and hence has one optimal point.
That is my proof. Please let me know if you see a problem in the logic.
Daniel
Code: Select all
pick the move with: \argmax_i \frac {\sigma_i^2}{N_i(N_i+1)}
update its count: n(i) = n(i) + 1
Prove that final proportions of n(i) are: n(1):n(2):n(3)...n(k) = sigma(i):sigma(2)...sigma(k)
Reduction of variance is a monotonically decreasing function i.e. making one more simulation will always decrease the variance. The same is true for the total perft that adds the the means and variances. Now if i take the best strategy at each iteration (gready selection of largest reduction) it should lead me to the optimum proportion. For a nonmonotonic function this is not true, so it is important that the reduction is monotonic. Monotonic functions have one unique optimal point., there fore the proportion we get following this procedure should be optimum. Then we go back to Michel's proof that proportioning by standard deviation gives the least variance, so it must be the optimum point for my selection method too since the function (variance reduction) is monotonic and hence has one optimal point.
That is my proof. Please let me know if you see a problem in the logic.
Daniel
Re: Perft(15) estimates thread
Not really. I started doing movepath enumeration (my name for perft) more than thirty years ago as a part of program verification. My early work and later additions by myself and others appear at http://oeis.org/A048987 (which also lists a truly ancient email address of mine).Daniel Shawul wrote:@Steven, Do you have a some kind of article on your work that I can cite?